TeX - LaTeX Asked by qwesix on April 11, 2021
I’m trying to read some text from a word document into my LaTex-Files. I just want the plain text without math or formatting.
I tried with input{} but that doesn’t recognize all utf characters:
Package inputenc: Unicode character (U+0003)
(inputenc) not set up for use with LaTeX.
Text line contains an invalid character.
PK
documentclass[ngerman, fontsize=12pt]{scrbook}
usepackage[ngerman]{babel}
usepackage[T1]{fontenc}
usepackage[utf8]{inputenc}
usepackage{lmodern}
usepackage{amsmath}
usepackage{amsfonts}
usepackage{amssymb}
usepackage[hidelinks]{hyperref}
usepackage[baselinestretch,linenumbers,lines=30,chars=60,noindent]{stdpage}
begin{document}
input{test.docx}
end{document}
A .docx
file is actually a binary file, more precisely a ZIP archive containing several files that are compressed/decompressed on the spot.
For instance if I do, from the command line interface,
file /usr/local/texlive/2020/texmf-dist/doc/fonts/tex-gyre-math/test-word-texgyre_termes_math.docx
unzip -l /usr/local/texlive/2020/texmf-dist/doc/fonts/tex-gyre-math/test-word-texgyre_termes_math.docx
just to examine a file included in the TeX Live, I get
/usr/local/texlive/2020/texmf-dist/doc/fonts/tex-gyre-math/test-word-texgyre_termes_math.docx: Microsoft Word 2007+
Archive: /usr/local/texlive/2020/texmf-dist/doc/fonts/tex-gyre-math/test-word-texgyre_termes_math.docx
Length Date Time Name
--------- ---------- ----- ----
1554 01-01-1980 00:00 [Content_Types].xml
590 01-01-1980 00:00 _rels/.rels
1290 01-01-1980 00:00 word/_rels/document.xml.rels
63800 01-01-1980 00:00 word/document.xml
7105 01-01-1980 00:00 word/theme/theme1.xml
3222 01-01-1980 00:00 word/settings.xml
17027 01-01-1980 00:00 word/stylesWithEffects.xml
296 01-01-1980 00:00 customXml/_rels/item1.xml.rels
16274 01-01-1980 00:00 word/styles.xml
341 01-01-1980 00:00 customXml/itemProps1.xml
631 01-01-1980 00:00 docProps/core.xml
218 01-01-1980 00:00 customXml/item1.xml
2218 01-01-1980 00:00 word/fontTable.xml
428 01-01-1980 00:00 word/webSettings.xml
998 01-01-1980 00:00 docProps/app.xml
--------- -------
115992 15 files
The document text is somewhere in those .xml
files, precisely in document.xml
, but cannot be input in TeX in a straightforward way. I tried with a file just containing abcdef
and a small extract from the document.xml
file is
<w:body>
<w:p w14:paraId="47EF316A" w14:textId="128C1C44" w:rsidR="004807B4" w:rsidRPr="004807B4" w:rsidRDefault="004807B4">
<w:pPr>
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:lang w:val="en-US"/>
</w:rPr>
<w:t>
abcdef
</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidR="004807B4" w:rsidRPr="004807B4">
<w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="708" w:footer="708" w:gutter="0"/>
<w:cols w:space="708"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
Save your document “text only”. Then it is a plain text file and you can input it.
Correct answer by egreg on April 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP