TransWikia.com

Expl3: It seems the TeXhackers note about tl_set_rescan:Nnn in interface3.pdf is incorrect - Do I misunderstand that note?

TeX - LaTeX Asked on January 15, 2021

interface3.pdf says:

tl_set_rescan:Nnn ⟨tl var⟩ {⟨setup⟩} {⟨tokens⟩}

Sets ⟨tl var⟩ to contain ⟨tokens⟩, applying the category code régime specified in the ⟨setup⟩ before carrying out the assignment. (Category codes applied to tokens not explicitly covered by the ⟨setup⟩ are those in force at the point of use of tl_set_rescan:Nnn.) This allows the ⟨tl var⟩ to contain material with category codes other than those that apply when ⟨tokens⟩ are absorbed. The ⟨setup⟩ is run within a group and may contain any valid input, although only changes in category codes, such as uses of cctab_select:N, are relevant. See also tl_rescan:nn.

TeXhackers note: The ⟨tokens⟩ are first turned into a string (using tl_to_str:n). If the string contains one or more characters with character code newlinechar (set equal to endlinechar unless that is equal to 32, before the user ⟨setup⟩), then it is split into lines at these characters, then read as if reading multiple lines from a file, ignoring spaces (catcode 10) at the beginning and spaces and tabs (character code 32 or 9) at the end of every line. Otherwise, spaces (and tabs) are retained at both ends of the single-line string, as if it appeared in the middle of a line read from a file.

I think the TeXhackers note is not correct/does not correctly describe the behavior of tl_set_rescan:Nnn:

ExplSyntaxOn
endlinechar=13relax
catcodeendlinechar=12relax%
tl_new:N {__my_tl}%
tl_set_rescan:Nnn __my_tl {catcode` =12}{~A~
~B~
~C~
}%
edeftest{tl_use:N __my_tl}%
showtest%
stop

(Be aware that with ExplSyntaxOn where ~ has category code 10(spaqce) ~ yields an explicit space token (category code 10 and character code 32) if the reading apparatus is in state M and is ignored if the reading apparatus is in state S or N. Unlike space-characters at the ends of lines ~ at the ends of lines will not be removed when TeX’s eyes preprocess the line.)

In this example the ⟨tokens⟩ are:

⟨explicit space token of catcode 10⟩
A11
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩
B11
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩
C11
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩

Applying detokenize/tl_to_str:n yields the following "string":

⟨explicit space token of catcode 10⟩
A12
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩
B12
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩
C12
⟨explicit space token of catcode 10⟩
⟨explicit return-character of catcode 12⟩

The space at the beginning of the first line, before A is not ignored:

I get:

$ pdflatex --enable-write18 test.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=pdflatex)
 write18 enabled.
entering extended mode
(./test.tex
LaTeX2e <2020-10-01> patch level 2
L3 programming layer <2020-12-07> xparse <2020-03-03>
> test=macro:
-> A^^MB^^MC^^M.
l.12 showtest

If the space was ignored, the line -> A^^MB^^MC^^M. (which contains a space behind ->) would be ->A^^MB^^MC^^M. (without space behind ->).

I suppose contradictory to the TeXhackers-note spaces (catcode 10) at the beginnings of multiline-strings are not ignored at the time of reading the string which results from applying tl_to_str:n.

I suppose spaces at the beginning of lines are ignored at the time of gathering the arguments of tl_set_rescan:nn. They are ignored only in case of gathering by reading/tokenizing .tex-input. At that time/in that case they are ignored due to TeX’s reading-apparatus being in state N when encountering them. With the very first space the reading-apparatus is not in state N but in state M (middle of line) at that time because at that time the very first space is not at the beginning of a line of .tex-input but behind the curly opening brace which denotes the begin of the ⟨tokens⟩-argument.

This supposition is supported by the fact that spaces at the beginnings of lines of multiline-strings are not ignored at all if these multiline-strings come into being in ways where no switching to state N occurs while gathering the arguments of tl_set_rescan:nn:

With the following example you get a multiline-string where none of the spaces at the beginning of a line is ignored while all spaces at the ends of lines are ignored (due to the fact that spaces at ends of lines are removed when TeX’s eyes preprocess the lines at the time of carrying out scantokens):

ExplSyntaxOn
endlinechar=13relax
catcodeendlinechar=12relax%
tl_new:N {__my_tl}%
tl_set_rescan:Nnn __my_tl {catcode` =12}{~A~^^M~B~^^M~C~^^M}%
edeftest{tl_use:N __my_tl}%
showtest%
stop

Console-output:

This is pdfTeX, Version 3.14159265-2.6-1.40.21 (TeX Live 2020) (preloaded format=pdflatex)
 write18 enabled.
entering extended mode
(./test.tex
LaTeX2e <2020-10-01> patch level 2
L3 programming layer <2020-12-07> xparse <2020-03-03>
> test=macro:
-> A^^M B^^M C^^M.
l.7 showtest

The point is:

In interface3.pdf it is suggested that tl_set_rescan:Nnn is applied to ⟨tokens⟩.

The TeXhackers note suggests that during the processing of ⟨tokens⟩ spaces at the beginnings of lines of multiline-strings are ignored.

But these "spaces" are not tokens. They are characters in the .tex-input-file. Due to TeX’s reading-apparatus being in state N (new line) they are ignored at the time of gathering the arguments of tl_set_rescan:Nnn. They don’t make it into the ⟨tokens⟩-argument.

As these spaces don’t make it into ⟨tokens⟩ the idea of ignoring them during the processing of the ⟨tokens⟩ is void.

Questions:

Did I misunderstand the TeXhackers note?

Is the TeXhackers note really incorrect/misleading?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP