TransWikia.com

How to count all characters per page, excluding footnotes?

TeX - LaTeX Asked by LORENZO on August 16, 2021

Im looking for a method to count all characters on a per-page basis. I would like all characters in the body of the page to be counted, including white space and punctuation, but excluding footnotes.

2 Answers

I'm afraid I don't have a solution to the problem, but here are at least some thoughts and hints that might be of some help.

There are a number of options for counting characters outlined in an older question: How to count all characters including spaces? Unfortunately, these are not per page, and you would have to check how good they are at counting spaces and punctuation.

My best guess would be to go for converting the PDF to text, and then count the characters in the text: ie the approved answer to the question I linked to above. You would then need some way of splitting the PDF into pages first, and process per page. Which commands you have available would depend on what kind of operating system you are running: maybe someone else can help on that.

Unfortunately, I'm not sure how to get it to exclude the footnotes using this approach. You might typeset the document without the footnotes, eg redefine the footnote macros to exclude them from the output, and run the count on that.

A more manual approach could be to open the PDF, and then copy-paste the text (not including the footnotes) into a text editor and do the character count there. Not elegant, and tedious to do if the document is large, but should work without any special tools.

Tools that operate directly on the .tex files, like TeXcount, do not run the actual typesetting and so have no idea where the page breaks are, so my guess is you have to do the processing on the PDF file. However, you might still try some out on the total document count for comparison. TeXcount will keep headers, main text, and footnotes etc counted separately by default.

Beware that TeXcount, when counting characters (option -char), do not include spaces, and in order to include punctuation you must include the option -all-nonspace-char. You can still get a rought estimate by counting first the number of characters (letters in words) including punctuation, and then count the number of words, adding these together to get an estimate of the total character count: the number of words would give a fair estimate of the number of spaces.

Answered by Einar Rødland on August 16, 2021

Not an answer but many years ago I wrote a short lua program that would count the number of lines, the number of words, and list all the occurances of characters (e.g. the number of a's the number of A's etc.). This was described in TUGboat and can be seen at https://tug.org/TUGboat/tb31-1/tb97glister.pdf

You, or someone else, might be able to modify it to suit your needs.

Answered by Peter Wilson on August 16, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP