TransWikia.com

Is there any way to do a correct word count of a LaTeX document?

TeX - LaTeX Asked by Vivi on May 28, 2021

Often assignments (or even papers) have a word count limit. That is not a big deal when using Word, but I don’t know how to do that using LaTeX. My solution has been so far to compile the document and then do a rough word count of my pdf file, sometimes even copying the contents of the pdf file and pasting in Word to get a mostly correct Word count.

Is there any tool (maybe even an online tool), package, script or software to do that directly from my .tex document and still get the right word count (i.e., ignore commands, equations, etc)?

28 Answers

This is in the UK TeX FAQ. The solutions suggested are:

  • detex filename (which tries to strip LaTeX commands), then use any word count tool. (e.g. wc)

  • latexcount.pl, a Perl script for word count

  • texcount, another script even has an online interface

  • wordcount, which has a script that runs LaTeX with some settings, then counts word indications in the log file.

Correct answer by ShreevatsaR on May 28, 2021

The first one to come to mind is detex which strips a tex file of commands. You will then have to pass it through wc or some other word counting software. A search on the internet also brought up two items on Sourceforge: word counter 1 and word counter 2.

Disclaimer: out of the three, I've only used detex before. It worked reasonably well, but I was working with an English essay and it had no equations, so I don't know how it plays with math mode stuff. (Currently I don't have it installed so I can't check.)

Answered by Willie Wong on May 28, 2021

The last time I had to worry about this, I compiled my LaTeX document to PDF and ran it through pdftotext.

Answered by Blake Stacey on May 28, 2021

Way back in the depths of time, I scribbled my own perl script to do this. My reason for doing this myself was that sometimes I wanted to count words in command arguments and sometimes not, so I built in a selection routine. Plus I figured that a bit of maths was worth a word so added that in. As the script is really simple, I'm copying it here (which automatically makes it some sort of free-to-use, I guess!).

I don't think that I've used it for years, though - it's been a long time since "number of words" mattered to me at all.

#!/usr/bin/perl -w

@ARGV and $ARGV[0] =~ /^-+h(elp)?$/ && die "Usage:t$0 filesnt$0 < filesnt$0n";

my $count = 0;
my $first = "";
my $tex = 0;

while ($first =~ /^s*$/) {
    $first = <>;
}

if ($first =~ /^(input|section|setlength|documentstyle|chapter|documentclass|relax|contentsline|indexentry|begin|glossaryentry)/) {
    $tex = sub { $r = $_[0];
                 $m = $_[1];
                 $r =~ s/(emph|textbf|textit|texttt|em){//g;
                 $r =~ s/(sub)*section*?{[^}]*}//;
                 $r =~ s/title{[^}]*}//;
                 $r =~ s/(.*?)/maths/g;
                 $r =~ s/(.*?$/maths/;
                 $r =~ s/^.*?)/maths/;
                 $r =~ s/[.*?]/maths/g;
                 $r =~ s/.*?]// and $m = 0;
                 $m and $r = "";
                 $r =~ s/[.*?$// and $m = 1;
                 $r =~ s/S*//g;
                 $r =~ s/%.*//;
                 return ($r,$m) };
} else {
    $tex = sub { return ($_[0],0) };
    @split = split(" ", $first);
    $count += $#split + 1;
}

while ($s = <>) {
    ($t,$n) = &$tex($s,$n);
    @split = split(" ", $t);
    $count += $#split + 1;
}

print "Number of words: $countn";

Answered by Andrew Stacey on May 28, 2021

Here’s an excerpt from my .vimrc that gives me a comfortable word count in Vim:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | tr -d [:space:]"
    let result = system(cmd)
    echo result . " words"
endfunction

command WC call WC()

Now I can invoke :WC in command mode to have the word count echoed in the status line.

Answered by Konrad Rudolph on May 28, 2021

You can try Microspell. It's a very robust software that knows if you have a main tex document and other subsidiary ones.

Answered by yCalleecharan on May 28, 2021

For Windows users, the LaTeX Word Counter is pretty neat.

Answered by MSpeed on May 28, 2021

If you are on Windows and do not mind purchasing software, use WinEdt. It has a built in word count feature (Document->word count).

Answered by user11232 on May 28, 2021

Compile the Tex-File to DVI and then execute

 catdvi document.dvi | wc -w

This converts your DVI file to a text-only file and counts the words using 'wc'.

Answered by Bob on May 28, 2021

You can use the word count code from Context (lang-wrd.lua). I took the liberty and adapted it for Plain (should work with the LaTeX format as well). The code is stripped of more Context specific features and relies on the character property definitions from char-def.lua. This way there’s no need for external tools and as a bonus you can insert the current word count wherever you like inside the document itself.

The usage example has some explanations.

setwordthreshold{3} %%% min chars in a row to count as word
startwordcount      %%% start callback
input knuthpar     %%% counted
currentwordcount    %%% => 94 with threshold == 3
input knuth         %%% counted
stopwordcount       %%% deregister callback
input knuth         %%% not counted
dumpwordcount       %%% => 188

Everything between startwordcount and stopwordcount picked up, the rest will be ignored, so you can manually exempt passages from being counted. The word threshold would have to be set to 1 for English.

Due to the nature of thre pre_linebreak_filter you will get word counts only by paragraph, though.

Answered by Philipp Gesang on May 28, 2021

In general the answer is NO.

Nearly all requesters of word counts are not interested in the number of words but rather in the amount of space (pages) that the document will need when printed. If there are figures should the words in captions be counted without the space required by the illustration being taken into account? Are equations words, and if so is it one 'word' per variable/symbol or one 'word' per equation? If a paper consists of nothing more than title, author, a sentence and 100 math expressions is that about 50 or 500 'words'? Is a hyphenated word one or two? Does a document that mainly consists of 3 or 4 letter words compare equally with one that has a preponderance of 8 to 10 letter words?

I think that the traditional method is best: print the document, count the average number of 'words' per line in a typical page and multiply by the average number of lines per page and by the number of pages.

It is highly unlikely that the recipient of your work will actually count the number of words.

Answered by Peter Wilson on May 28, 2021

The Texmaker integrated pdf viewer offers a word count feature since version 3.4.
Just right-click in the pdf document, then click Number of words in the document.

enter image description here

Answered by matth on May 28, 2021

kile the latex editor for the kde (ubuntu) desktop has a word count. It is under the statistics menu

Answered by Magpie on May 28, 2021

I use texcount with the following parameters:

texcount file.tex -inc -incbib -sum -1

Output is simple like this:

9079

If you remove the -1, then you can get more information:

word count (#headers/#floats/#inlines/#displayed)
3996+48+99 (22/9/0/0) Included file: parts/blup.tex

Answered by aphex on May 28, 2021

You can obtain texcount results in the own LaTeX document:

MWE

Note that this MWE require the filename borra.tex (or modify the code accordingly).

% CAUTION !!!
% 1) Need --enable-write18 or --shell-escape 
% 2) This file MUST be saved 
%    as "borra.tex" before the compilation
%    in your working directory
% 3) This code will write wordcount.tex
%    and charcount.tex in /tmp of your disk.
%    (Windows users must change this path)
% 4) Do not compile if you are unsure
%    of what you are doing.

documentclass{article}
usepackage{moreverb} % for verbatim ouput

% Count of words

immediatewrite18{texcount -inc -incbib 
-sum borra.tex > /tmp/wordcount.tex}
newcommandwordcount{
verbatiminput{/tmp/wordcount.tex}}

% Count of characters

immediatewrite18{texcount -char -freq
 borra.tex > /tmp/charcount.tex}
newcommandcharcount{
verbatiminput{/tmp/charcount.tex}}


begin{document}


section{Section: text example with a float}

Words and characters of this example file are 
automatically counted from the source file 
when compiled (therefore generated text as 
textbackslash{}lipsum[1-10] is {bfseries not} 
counted). The results are showed at the end 
of the compiled version.
Counts are made in headers, caption floats 
and normal text for the whole file. Subcounts 
for structured parts (sections, subsections, 
etc.) are also made. Number of headers, 
floats and math chunks are also counted. 

begin{figure}[h]
centering
framebox{This is only a example float} 
caption{This is a example caption}
end{figure}

subsection{Subsection: Little text with math chunks}

In line math: $pi +2 = 2+pi$    
Display math: [pi +2 = 2+pi] 

%TC:ignore  
dotfill End of the example dotfill 

subsubsection*{Counts of words} 
wordcount

%TC:endignore   

end{document}

Answered by Fran on May 28, 2021

For Mac users, TeXShop (at least version 3.26) has a line, word and character count under Edit>Statistics. I never tested how well it works, but since TeXShop recognises syntax for colour-coding, I assume it is able to ignore most commands for the text.

Answered by Matthijs on May 28, 2021

Combining texcount + knitr + R allows for dynamic in-text word count estimation. The code chunk below works on a Mac by calling the Texcount Perl script, grabbing the name of the current file (or, running it on myfile.tex) and then returning a limited set of stats (the -total option) including the sum of all words (the -sum) option. As noted elsewhere in this thread, you may want to adjust the texcount options to include things like the bibliography. Once word count is extracted, a comma is added (if appropriate) and can then be referenced inline with the Sweave command Sexpr{}.

The word count will always be for the second-to-last compile but compiling twice will solve that (much as with bibtex or table/figure references). I believe the code to call Perl from within R varies by platform so you may need to adjust the system() command below for non-Macs.

<<wordcount, echo = FALSE, cache = FALSE>>=
# adds comma for printing numbers, from scales package by Hadley Wickham
comma <- function (x, ...) {
  format(x, ..., big.mark = ",", scientific = FALSE, trim = TRUE)
} 

# To dynamically extract name of the current file, use code below 
file_name     <- current_input() # get name of file
file_name     <- strsplit(file_name,".")[[1]][1] # extract name, drop extension
file_name_tex <- paste0(file_name, ".tex") # add .tex extension

system_call   <- paste0("system('texcount -inc -incbib -total -sum ", file_name_tex, "', intern=TRUE)") # paste together texcount system command  
texcount_out  <- eval(parse(text=system_call)) # run texcount on current last compiled .tex file

# Or, to manually write name of `myfile.tex`, uncomment and modify line below
# texcount_out <- system("texcount -total -sum myfile.tex", intern=TRUE) 

sum_row <- grep("Sum count", texcount_out, value=TRUE) # extract row
pattern <- "(d)+" # regex pattern for digits

count   <- regmatches(sum_row, regexpr(pattern, sum_row) ) # extract digits
count   <- comma(as.numeric(count)) # add comma
@

Word count: Sexpr{count} % reference R variable in Latex prose

Answered by Omar Wasow on May 28, 2021

If you use the online tool ShareLatex then this now has a built in word count:

https://www.sharelatex.com/blog/2015/09/15/word-count.html

Answered by Peter C on May 28, 2021

In the specific case where Sublime Text is used for writing latex documents, one can use the package LaTeX Word Count.

Answered by sodiumnitrate on May 28, 2021

In addition to Philipp Gesang's answer I'd like to mention how to use the spellchecker module in ConTeXt to count words. It is adapted from the Spellchecker wiki page.

The word count extracted in the wiki includes inline math and content set with type, though. To have the word count per language without math and type you have to query categories.document.languages.en.total of the words file array.

setupspellchecking[state=start,method=2]
ctxlua{languages.words.threshold=1}

starttext
input knuth
startformula
  x_{1,2} = frac{-bpmsqrt{b^2-4ac}}{2a}
stopformula
input ward
m{E = m c^2}

startluacode
local wordfile = "jobname.words"
if file.is_readable(wordfile) then
    local data = dofile(wordfile)
    context.startitemize({"packed"})
    context.item("Total words (including inline math): " .. data.total)
    context.item("Total words (in language type{en}): "
                 .. data.categories.document.languages.en.total)
    context.item("Total unique words (in language type{en}): "
                 .. data.categories.document.languages.en.unique)
    context.stopitemize()
end
stopluacode

stoptext

enter image description here

Answered by Henri Menke on May 28, 2021

If you are using Overleaf you can click the word count button: wordcount

This will show these stats: stats

You can also easily import an existing document.

Answered by Fre_d on May 28, 2021

In bash, try:

detex file.tex | wc -w

The first command detex strips latex commands/comments from the file. The output of that is piped to wc -w, which counts the number of words.

Answered by innisfree on May 28, 2021

The prior answers are (I believe) more than adequate for the original question. But for the benefit of others who find this via search, I would like to provide more information.

"Word count" can mean many things. It is not necessarily determined by looking for word boundaries (space and return).

One widely-used measure, at least for U.S. English, is to visualize an old-fashioned typewriter, where each keystroke generates a character (including quote, period, comma, and space). Carriage return is also a character. Then, take the number of characters, and divide by six. This assumes an average word length (in U.S. English) of five letters, plus a space.

The above definition is useful for estimating how many pages will be used in a lengthy, printed book or manuscript. Of course, if you are preparing a PDF with TeX, you know exactly how many pages it uses.

Note that this criterion is not useful for academic papers containing illustrations, tables, and images.

I do not know whether MS Word counts word boundaries, or characters/6. In theory, the result should be almost the same, for lengthy flowing text (U.S. English).

I recently wrote a book, for which the page count measured by characters/6 was 220. The actual page count, using TeX with 5.5"x8.5" layout, was 240 pages including blanks. Not a bad estimate.

You may ask: In the case of a term paper, why not specify number of pages instead of word count? The obvious answer is that the number of pages can be gamed using different fonts, font sizes, or leading.

Answered by user139954 on May 28, 2021

Texstudio offers an advanced word count. It is located in the menus under Tools --> Word Analysis.

It refers to words as 'phrases' and offers different options and filters. It can also do word count on specific selection.

I have compared the output to MS Word and LibreOffice Writer, and they are mostly the same. The advantage of Texstudio is that by default it will not count table of contents and bibliography in the total word ('phrase') count. That makes it really convenient to get a reliable estimate on the go as one is editing the document.

TExstudio Menu Tools Word Analysis Tool

Answered by Niko Z. on May 28, 2021

Here is a quick and simple way to include a word count in a LaTeX document with TeXcount:

  1. Download and extract TeXcount into the directory of the document
  2. Make sure Perl is installed and accessible via the perl command (should already be the case on Linux, you can get Perl from here for all OSes)
  3. Copy and paste this where you want the word count to appear: Word count: input{|"perl DOCUMENT_FULL_PATH/texcount.pl -brief -sum -total DOCUMENT_FULL_PATH/mytexfile.tex"}
  4. Make sure to run your LaTeX engine with the option --shell-escape (or --tex-option=--shell-escape if you use TeXworks/MiKTeX/texify)

Done!

Answered by GuiTeK on May 28, 2021

If you happen to have the wonderful document conversion packagepandocinstalled:

pandoc -f latex -t plain main.tex | wc -w

Note: This works with any other format pandocsupports not just LaTeX.

Answered by Mahomet on May 28, 2021

My solution to this problem was a bit of a workaround, but I wanted to share it, because somebody might have an interest in selectively defining what should be counted in the word count.

I've put all content that should not be counted, such as tables and footnotes, into a command. Then I can comment it out to get a document of only the content that should be counted. To do this, define this before the document starts:

newififtablesnewcommand{tables}[1]{iftables#1fi}tablestrue

You can change the word 'tables' to anything else. Then I would put all my tables into a tables{} command. When I wanted to count the words, I could set that command to false to compile a document without it. Then I could just copy-paste the entire pdf content into a word processor to word-count it. It's just a bit cumbersome to add, but I can manually decide what I want counted or not.

Answered by JohnBig on May 28, 2021

As many answers pointed out an accurate word count is nearly impossible. Atom has a LaTeX word count package aware of that difficulty, but I believe it works fine.

Answered by Luis Turcio on May 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP