TransWikia.com

How does verb detect spaces that shouldn't exist

TeX - LaTeX Asked on June 5, 2021

Consider the following MWE:

documentclass{article}

usepackage{listings}
lstset{basicstyle=ttfamily}

begin{document}

lstinline |asdf|asdf asdfasdf

verb |asdf|asdf asdfasdf

end{document} 

My understanding of what is to expect here has always been the following (let cmd stand for either verb or lstinline in the following):

  • When TeX first tokenized cmd |, it gobbles the space following it, leaving only the token cmd in its "mouth" (and | behind it in the input stream).
  • It then expands cmd, which leads to a series of category code changes, basically making every otherwise special character other, followed by some macro that looks at the next token (in this case, |).
  • This macro then grabs everything up to the next occurrence of that token (being tokenized then), applies some formatting and changes the category codes back.

Notably, the space following cmd is gobbled during that control sequence’s tokenization, i.e. before any category codes are changed.

With this understanding, I would expect both of the lines above to typeset

asdfasdf asdfasdf

But I get the following output:

MWE output

lstinline behaves as expected, but verb somehow knows about the space following it.

How?? To my knowledge, there shouldn’t ever have been a space token behind the verb token.

2 Answers

At the very beginning you said:

When TeX first tokenized cmd |

but that's wrong. TeX is a well-behaved gentleman and doesn't get ahead of itself scanning a  and a | before knowing what cmd is supposed to do. As far as TeX is concerned, the space and the | and whatever other character could all mean the same thing, and could change in meaning, so pre-scanning would only cause confusion.

When TeX sees cmd, the only “special” thing it does to blank spaces is to set state:=skip_blanks, so that when, say, typesetting, TeX code will write enter image description here, ignoring the spaces after the control sequence as usual. You can check for yourself with:

deftest{catcode` =12 testx}
deftestx{futurelettokentesty}
deftesty{showtokenafterassignmenttestxlettoken = }
test     x

and you'll see that it shows 5 the character before showing the letter x.


Now back to the problem at hand: update your LaTeX :-)

The old behaviour of verb was to look at the next token, whichever it happened to be, and use that as a delimiter (given the exception of {). This has now been fixed for the 2020-10-01 LaTeX release (from LaTeX News Issue 32):

Avoid problematic spaces after verb
If a user typed verb␣!~!␣foo instead of verb!~!␣foo by mistake, then surprisingly the result was “!~!foo” without any warning or error. What happened was that the ␣ became the argument delimiter due to the rather complex processing done by verb to render verbatim.  This has been fixed and spaces directly following the command verb or verb* are now ignored as elsewhere. (github issue 327)

Answered by Phelype Oleinik on June 5, 2021

I believe what happens is as follows:

  • verb is first tokenized (the space character, which has catcode 10 just before verb is tokenized, marks the end of this control word but is not discarded).

  • TeX will go into state S, since verb is a control word (control sequence whose name is made of “letters” only), but it doesn't skip blanks yet.

  • verb is expanded and code from its expansion is executed. This code first gives spaces the catcode 12 (via letdo@makeother dospecials), this is important.

  • A the end of verb's replacement text, there is @ifstar@sverb@verb. This @ifstar looks ahead in the input, thus the state S kicks in. Since spaces have catcode 12 at this point, the space character following verb is not skipped. It gets tokenized with catcode 12.

  • Since we used the no-star form of verb and @verb is defined as def@verb{@vobeyspaces frenchspacing @sverb}, spaces are now made active, and @sverb is expanded (so, the end delimiter will be a catcode-13 space, while the start delimiter was a catcode-12 space).

  • @sverb grabs the catcode-12 space token as its only argument and defines active spaces to be let-equal to verb@egroup (if verb* had been used, @sverb would have done @setupverbvisiblespace @vobeyspaces too; thus, spaces end up active in all cases). This is how the verbatim text will end in non-erroneous conditions: verb@egroup will yield egroup, which will terminate the group started by verb (there is a bgroup in verb's replacement text). Since the special catcode setup has been done locally inside this group, this terminates the special catcode setup.

Thus, the sentence from the question “This macro then grabs everything up to the next occurrence of that token” is not really correct: there is no grabbing of the verbatim contents as an argument. Tokens between the start and the end delimiters are simply processed as catcode-12 tokens, except space tokens which are always active at the end of @sverb, as we've seen.

Note: as Phelype Oleinik pointed out, the behavior of verb was changed in the LaTeX format from 2020-10-01. My comments here are based on LaTeX2e <2020-02-02> patch level 5.

Answered by frougon on June 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP