TeX - LaTeX Asked on December 10, 2020
Can some people please give me some suggestions for the most efficient method to strip all of the LaTeX code from a document?
The best method that comes to my mind, which I have no clue about how to do, is to use some sort of a latexmk_flat_file command that generates a flat text file (without code) instead of a *.pdf.
Running an optical character recognition on the *.pdf will also result in lots of errors and require a substantial amount of manual clean up.
Blocking and copying the resultant *.pdf file gives unwanted line breaks and doesn’t normally permit select all text spanning multiple pages.
I used a trial version of Tex2Word by Chikrii, but it was unable to properly handle the type of LaTeX business letter that I am currently using.
catdvi
appears to have been last updated in the year 2002, and the kpathsea
library presently used by TexLive for Mac/OSX does not have what is required to install the universal distribution of catdvi-0.14
— i.e., lkpathsea
is missing (and perhaps others).
I would like to keep the tabs, spaces, and original line endings.
This is a task that will need to be completed by me several times each month.
With respect to the working draft perl script written by cmhughes
, these are the most common codes (modified for the perl script) that are contained within my LaTeX documents:
s/begin{.*?}([.*?])?({.*?})?//g;
s/end{.*?}//g;
s/hspace*{.*?}//g;
s/vspace*{.*?}//g;
s/tab //g;
s/~\//g;
s/>//g;
s/=//g;
s/textit{//g;
s/newpage//g;
s/{bf underline{//g;
s/{bsi{//g;
s/uuline{//g;
s/underline{//g;
s/}//g;
s///g;
s/~//g;
Here's a little perl
script that might get you started. You can use it as
perl removelatexcode.pl myfile.tex myfile1.tex
and can call it with as many files as you like (or you could pipe into it too).
It does the following:
myfile.tex
to myfile.tex.bak
just in case something goes wrongbegin{document}
begin{<myenvironmentname>}
, end{environmentname}
, <name of command>
you can add to it as you see fit.The way the code stands it won't overwrite the original file. Once you're happy with it, and have tested it to your liking, feel free to go ahead and use the file as
perl removelatexcode.pl -o myfile.tex
which will overwrite myfile.tex
.
Always be careful when using scripts like this- there was no malicious intent here, but, you should test it thoroughly before using it on live files.
If there are some commands for which you wish to keep the argument, for example, underline{keep this argument}
then simply populate
my %keeparguments=("textit"=>1,
"underline"=>1,
);
with the appropriate commands.
removelatexcode.pl
#!/usr/bin/perl
use strict;
use warnings;
use File::Copy;
use Getopt::Std;
# get the options
my %options=();
getopts("o", %options);
my $inpreamble=1; # switch for in the preamble or not
my $filename;
my @lines=(); # @lines: stores the new lines without commands
# commands for which we want to keep the arguments- populate
# as necessary
my %keeparguments=("textit"=>1,
"underline"=>1,
);
while (@ARGV)
{
# get filename from arguments
$filename = shift @ARGV;
# open the file
open(INPUTFILE,$filename) or die "Can't open $filename";
# reset the preamble switch
$inpreamble=1;
# reset the lines array
@lines=();
# loop through the lines in the INPUT file
while(<INPUTFILE>)
{
# check that the document has begun
if($_ =~ m/begin{document.*/)
{
$inpreamble=0;
}
# ignore the preamble, and make string substitutions in
# the main document
if(!$inpreamble)
{
# remove begin{<stuff>}[<optional arguments>]
s/begin{.*?}([.*?])?({.*?})?//g;
# remove end{<stuff>}
s/end{.*?}//g;
# remove <commandname>{with argument}
while ($_ =~ m/(.*?){.*?}/)
{
if($keeparguments{$1})
{
s/.*?{(.*?)}/$1/;
}
else
{
s/.*?{.*?}//;
}
}
# print the current line (if we're not overwritting the current file)
print $_ if(!$options{o});
push(@lines,$_);
}
}
# close the file
close(INPUTFILE);
# if we want to over write the current file
if ($options{o})
{
# make a backup of each file
my $backupfile= "$filename.bak";
copy($filename,$backupfile);
# reopen the input file to overwrite it
open(INPUTFILE,">",$filename) or die "Can't open $filename";
print INPUTFILE @lines;
close(INPUTFILE);
# output to terminal
print "Backed up original file to $filename.bakn";
print "Overwritten original file without commands";
}
}
exit
Here's a little test case:
myfile.tex
documentclass{article}
% in the preamble
% in the preamble
% in the preamble
begin{document}
begin{myenvironment}
text text text text text text text text text text
text text text text text text text text text text
text text text text text text text text text text
text text text text text text text text text text
end{myenvironment}
mycommand{argument} more text after it anothercommand{another argument}
textit{keep this argument} more text after it anothercommand{another argument} yet more text
anothercommand{another argument} yet more text textit{keep this argument} more text after it
begin{anotherenvironment}[optional arguments] could have text here
other other other other other other other other other other
other other other other other other other other other other
other other other other other other other other other other
other other other other other other other other other other
end{anotherenvironment}
begin{anotherenvironment}[optional arguments]{mandatory args} could have text here
another another another another another another
another another another another another another
another another another another another another
another another another another another another
end{anotherenvironment} can have text here
end{document}
and the output of
perl removelatexcode.pl myfile.tex
Output
text text text text text text text text text text
text text text text text text text text text text
text text text text text text text text text text
text text text text text text text text text text
more text after it
keep this argument more text after it yet more text
yet more text keep this argument more text after it
could have text here
other other other other other other other other other other
other other other other other other other other other other
other other other other other other other other other other
other other other other other other other other other other
could have text here
another another another another another another
another another another another another another
another another another another another another
another another another another another another
can have text here
A few words about regexp
You'll notice the script uses lines such as
s/begin{.*?}([.*?])?({.*?})?//g;
This matches
begin{<environmentname>}
begin{<environmentname>}[<optional arguments>]
begin{<environmentname>}[<optional arguments>]{<mandatory arguments>}
but it does so in a non-greedy way. The .*?
makes it no-greedy, and the ?
after the grouping ()
make them optional. If these matches were greedy (which they would be without the ?
) then you would get a lot of potentially unwanted results.
Answered by cmhughes on December 10, 2020
Pandoc accepts many different input formats including LaTeX and can produce a variety of outputs including plain text. To try Pandoc online, visit the Try pandoc! site.
As stated on the Pandoc website:
If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, or MediaWiki markup to
- HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, Slideous, S5, or DZSlides.
- Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML
- Ebooks: EPUB version 2 or 3, FictionBook2
- Documentation formats: DocBook, GNU TexInfo, Groff man pages
- TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides
- PDF via LaTeX
- Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, Emacs Org-Mode, Textile
Answered by Pandoc User on December 10, 2020
In the spirit of the Pandoc answer, I'd like to suggest the excellent Org-mode for the Emacs editor. Once you are comfortable with Emacs (which might take a few days, but if you want edit lots of text files efficiently, this is a wise investment), Org-mode is very easy to start with, and contains not only powerful export options (including LaTeX, ODT, HTML, and more), is wholly based on plain text files, and comes with task and time management systems and much more.
Disclaimer: Org-mode is a free tool and I'm not affiliated with it;).
Answered by mbork on December 10, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP