TeX - LaTeX Asked by Michaël Le Barbier on December 24, 2020
Assume that a token register contains a sequence of letter tokens, each having a catcode
of 12. I want to store in a second token register the same sequence of letter token, altering some catcodes
on the way (like changing all #
so that they have catcode 6).
I see three ways to do this:
Iterating over the sequence of tokens with futurelet
and do the replacement. This method has the inconvenient that adding tokens at the end of a token register involves a lot of expansions and parsing but the advantage that it can be generalised.
Using a macro with an argument template (like what we do to see if a token is in a sequence) to quickly parse the token register. It has the drawback that it is maybe not easy to generalise.
Write the tokens in some auxiliary file, reconfigure the lexer and reread these tokens. It is ugly.
Is there an easier way to go? (In plain TeX, no Lua, no Perl, no OCaml…)
I took this question as a learning opportunity for me, more than an actual attempt to satisfy a question for you, and so my answer is not yet a generalized approach (though I think it could get there with a little work). I would say this method falls in your category #2, "Using a macro with an argument template (like what we do to see if a token is in a sequence) to quickly parse the token register."
I didn't know how to test whether I was successful with manipulating a #
symbol's catcode, so I took a less challenging problem, the underscore _
, knowing that I could throw it between some dollar signs and immediately know whether I was successful in changing the catcode.
The solution was EDITED so that the iterationengine
is now recursive, and will continue until all catcode12 underscores have been replaced with catcode8 underscores. Originally, I had to invoke the iteration engine successively, by hand.
In the MWE below, after setting up my macro, I test it on four separate strings, containing 0, 1, 2, and 3 underscores.
I haven't been able to confirm it, but the detector routine (detectus
) may fail if it encounters a catcode12 underscore followed by a normal catcoded testchar
(in this case, relax
).
documentclass{article}
usepackage[T1]{fontenc}
deftestchar{relax}
defusEight{_}
catcode`_=12
defdetectus#1_#2relax{%
defDetected{#2testchar}iftestcharDetectedelsedefDetected{T}fi}
defchangeus#1_#2relax{#1usEight#2}
makeatletterdefiterationengine{%
if TDetected%
expandafterdetectuscatEightTok_relax%
if TDetectedprotected@edefcatEightTok{expandafterchangeuscatEightTokrelax}fi%
iterationengine%
fi%
}makeatother
catcode`_=8
defiteration{letcatEightTokcatTwelveTokdefDetected{T}iterationengine}
longdefruntest{%
makebox[3cm][l]{$rawtok$} raw token in mathpar
edefcatTwelveTok{detokenizeexpandafter{rawtok}}
makebox[3cm][l]{catTwelveTok} detokenized raw token (cat12)par
makebox[3cm][l]{$catTwelveTok$} cat12 token in mathpar
iteration
makebox[3cm][l]{detokenizeexpandafter{catEightTok}}
detokenized converted token (cat 8)par
makebox[3cm][l]{$catEightTok$} converted token, in math
}
begin{document}
vspace{1ex}defrawtok{ab}runtestpar
vspace{1ex}defrawtok{a_b}runtestpar
vspace{1ex}defrawtok{a_b c_d}runtestpar
vspace{1ex}defrawtok{a_b c_d e_f}runtestpar
end{document}
Answered by Steven B. Segletes on December 24, 2020
This is a Plain TeX (or any other format) general purpose macro scanthechars
which accepts on input a string of character tokens of catcode 12
(1) and puts them in the same order but with transformed catcodes in a token list register called replacetoks
.
(1) this update adds two string
's to the @@scanthechars
macro, and as a result the input is not anymore restricted to be with catcode 12
tokens only. It may even contain control sequences (but not braced material) which will go through unarmed, and one may convert back and forth between various catcode regimes -- see updated image.
The interface to specify the catcodes is via a macro replacesetup
which is fetched with a comma separated list of things like !{3}
which means to transform the character token !
into a catcode 3
character token !
.
The illustrative tests use detokenize
for simplicity sake, to produce easily catcode 12
tokens, or verbatim output, hence the code needs etex
or pdftex
for compilation. But the macros themselves are strictly Knuthian.
Only catcodes of 3
, 4
, 6
, 7
, 8
, 11
, 12
, 13
are dealt with.
Each use of scanthechars
resets replacesetup
: next call to scanthechars
must follow a renewed replacesetup
. Since now the input is not restricted to be only with catcode 12
tokens, one may use scanthechars
many times on the same string. See the code and image for how this is done.
The code:
catcode`@ 11
longdef@gobble #1{}
longdef@firstoftwo #1#2{#1}
longdef@secondoftwo #1#2{#2}
newtoksreplacetoks
defreplacesetup #1{%
% this assumes non nil escapechar
defreplace@list {}%
defreplace@do ##1##2%
{expandafterdefcsname replace@setup@expandafter
@gobblestring##1endcsname {##2}}%
replace@setup #1,relaxrelax,%
}
defreplace@setup #1#2,{%
ifx#1relaxreplace@list
else
expandafterdefexpandafterreplace@listexpandafter
{replace@list replace@do #1{#2}}%
expandafterreplace@setup
fi
}
defscanthechars #1{replacetoks{}%
edefreplace@restore{lccode`$=thelccode`$
lccode`^=thelccode`^
lccode`_=thelccode`_
lccode`&=thelccode`&
lccode`noexpand#=thelccode`#
lccode`a=`a lccode`(=thelccode`(
lccode`noexpand~=thelccode`~relax
}%
@scanthechars #1relax}
def@scanthechars #1{ifx #1relaxexpandafter@firstoftwo
elseexpandafter@secondoftwo
fi
{defreplace@do ##1##2{expandafter
letcsname replace@setup@expandafter
@gobblestring##1endcsnamerelax}%
replace@list % resets to relax everything
replace@restore}%
{@@scanthechars #1}%
}
def@@scanthechars #1{expandafter
ifxcsname replace@setup@string#1endcsnamerelax
replacetoksexpandafter{thereplacetoks #1}%
else
ifcasecsname replace@setup@string#1endcsnamerelax
ororor % catcode=3
lccode`$=`#1 %$
lowercase
{replacetoksexpandafter{thereplacetoks $}}%$
or % catcode=4
lccode`&=`#1
lowercase
{replacetoksexpandafter{thereplacetoks &}}%
oror % 6
lccode`#=`#1
lowercase
{replacetoksexpandafter{thereplacetoks ##}}%
or % 7
lccode`^=`#1
lowercase
{replacetoksexpandafter{thereplacetoks ^}}%
or % 8
lccode`_=`#1
lowercase
{replacetoksexpandafter{thereplacetoks _}}%
ororor % 11
lccode`a=`#1
lowercase
{replacetoksexpandafter{thereplacetoks a}}%
or % 12
lccode`(=`#1
lowercase
{replacetoksexpandafter{thereplacetoks (}}%
or % 13
lccode`~=`#1
lowercase
{replacetoksexpandafter{thereplacetoks ~}}%
fi
fi
@scanthechars
}
catcode`@ 12
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% utility macro to get the catcode of a character token
% manages only catcodes 3,4,6,7,8,11,12,13
defTheCatcode #1{ifcatnoexpand#1$3else% $
ifcatnoexpand#1&4else
ifcatnoexpand#1##6else
ifcatnoexpand#1^7else
ifcatnoexpand#1_8else
ifcatnoexpand#1a11else
ifcatnoexpand#1(12else
ifcatnoexpand#1noexpand~13else not handled
fifififififififi }
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
tt
Test I.
replacesetup{#{3},!{7},?{8}}
detokenize{replacesetup{#{3},!{7},?{8}}}
edefx{string#a!b?cstring#}
x
expandafterscanthecharsexpandafter{x}
thereplacetoks
begingroup
catcode`#=3
catcode`!=7
catcode`?=8
gdefz{#a!b?c#}
endgroup
thereplacetoks
=z
edefy{thereplacetoks}
ifxzy OKelse WRONGfi
bigskip
Test II.
replacesetup{a{11},b{11},c{11},{3}}
detokenize{replacesetup{a{11},b{11},c{11},{3}}}
catcode`@ 11
edefbackslashchar{expandafter@gobblestring}
catcode`@ 12
% creates an x with catcode 12 a, b, c
edefx{stringabcbackslashchar}
x
expandafterscanthecharsexpandafter{x}
edefy{thereplacetoks}
begingroup
catcode`|=0
catcode`=3
|gdef|z{abc}
|endgroup
y=z
ifxzy OKelse WRONGfi
bigskip
Test III.
replacesetup{a{3},b{7},c{8},d{12}}
detokenize{replacesetup{a{3},b{7},c{8},d{12}}}
expandafterscanthecharsexpandafter {detokenize{abcd}}% convert to catcode 12
edefy {thereplacetoks}
defPrintTheCatcode #1{string#1 with catcode TheCatcode #1par }
% previously, we used xinttools.sty:
% xintApplyInline{PrintTheCatcode}{y}
defPrintTheCatcodes #1{ifx#1relaxelsePrintTheCatcode#1expandafter
PrintTheCatcodesfi }
expandafterPrintTheCatcodesyrelax
% replacesetup must be redone each time
replacesetup{a{3},b{7},c{8},d{12}}
expandafterscanthecharsexpandafter {detokenize{adbdcda}}
detokenize{adbdcda}->thereplacetoks
bigskip
Test IV.
overfullrule 0pt
replacesetup {!{6}}
scanthechars {!}
expandaftermeaningthereplacetoks
edefy{thereplacetoks}
begingroup
catcode`!6
gdefz{!!}
endgroup
ifxzy OKelse WRONGfi
bigskip
TEST V.
From now on we don't require the input to be with catcode 12 tokens.
replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}
detokenize{replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}}
scanthechars {tabskip1ex #mathstrut @#hfil!hfil #@#hfil!hfil#@#cr
noalignbgrouphruleegroup
1!23!456cr 7890!a!bccr
noalignbgrouphruleegroup }
detokenize{scanthechars {tabskip1ex #mathstrut @#hfil!hfil #@#hfil!hfil#@#cr
noalignbgrouphruleegroup
1!23!456cr 7890!a!bccr
noalignbgrouphruleegroup }
}
detokenize{halign{spanthereplacetoks}}
output:
medskip
halign{spanthereplacetoks}
% expandafterPrintTheCatcodesthereplacetoksrelax
% attention _ is mathematically active!
% in plain.tex mathcode`_="8000 % _
bigskip
replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}
detokenize {replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}}
scanthechars {##___^?bgroup abcegroup &bgrouphboxbgroup___egroupegroup##}
detokenize {scanthechars {##___^?bgroup abcegroup &bgrouphboxbgroup___egroupegroup##}}
detokenize{hrulethereplacetokshrule}
output:
hrulethereplacetokshrule
medskip
string_ is mathematically active, the subscript is in an stringhboxspace
hence the catcode 11 version is used, but the first three use string_ as
defined in plain.tex for the math active string_, which now has catcode letter
bigskip
Let's now go back to catcode 12:
edefy{thereplacetoks}
replacesetup {#{12}, _{12}, ^{12}, !{12}, ?{12}, &{12}, @{12}}
expandafterscanthecharsexpandafter{y}
detokenize{edefy{thereplacetoks}}
detokenize{replacesetup {#{12}, _{12}, ^{12}, !{12}, ?{12}, &{12}, @{12}}}
detokenize{expandafterscanthecharsexpandafter{y}}
detokenize{thereplacetoks}
thereplacetoks
edefz{thereplacetoks}
detokenize{edefz{thereplacetoks}meaningz}
meaningz
nopagenumbers
bye
Answered by user4686 on December 24, 2020
In the intervening years since my other answer, I have written the tokcycle
package for taking an argument or input stream one token at a time, and "doing something with it". It can easily be set up to do something with cat-12 #
tokens and reconvert them to cat-6, since this seems to be what the OP desires.
The MWE works through a progression:
the standard cat-6 approach to def
What happens if the argument of the def
contains cat-12 #
.
What happens if both the argument and the def
parameter itself are cat-12 #
.
Now I set up tokcycle
to convert cat-12 #
tokens back to cat-6. In this step, I first pass only cat-6 #
tokens to the cycle, to show that it can process them without issue (but no conversion is needed).
Here, I convert a cat-12 #
in the argument to a def
back to cat-6, with the expected result.
Here, I convert the cat-12 #
tokens in both the argument and in the def
parameter specification back to cat-6, with the expected result.
Here, I operate in an underlying cat-12 #
environment, and still successfully convert occurrences of #
back to cat-6, with the expected result.
The MWE:
documentclass {article}
usepackage[T1]{fontenc}
usepackage{tokcycle}
letSIXhash#
catcode`#=12
letTWELVEhash#
catcode`#=6
begin{document}
1.defQ#1{My argument is [#1]}
Q{X}
2.defQQ#1{My argument is [TWELVEhash1]}
QQ{X}
3.defQQQTWELVEhash1{My argument is [TWELVEhash1]}
QQQTWELVEhash1
Characterdirective{%
whennotprocessingparameter#1{tctestifx{TWELVEhash#1}%
{addcytoks{##}}{addcytoks{#1}}}}
4.tokencyclexpress
gdefW#1{My argument is [#1]}
endtokencyclexpress
W{X}
5.tokencyclexpress
gdefWW#1{My argument is [TWELVEhash1]}
endtokencyclexpress
WW{X}
6.tokencyclexpress
gdefWWWTWELVEhash1{My argument is [TWELVEhash1]}
endtokencyclexpress
WWW{X}
7.catcode`#=12
tokencyclexpress
gdefWWWW#1{My argument is [#1]}
endtokencyclexpress
WWWW{X} #####
end{document}
SUPPLEMENT
The case, above, of cat-6 changes is more challenging because of the way cat-6 tokens are processed (the need for ##
, ####
, etc). For other catcode conversions, the tokcycle process is easier, since I don't have to ascertain whether the token is part of a parameter. Here, I do something analogous with the underscore _
.
documentclass {article}
usepackage[T1]{fontenc}
usepackage{tokcycle}
catcode`_=12
letTWELVEus_
catcode`_=8
begin{document}
1.defQ{$aTWELVEus1$}
Q
Characterdirective{%
tctestifx{TWELVEus#1}{addcytoks{_}}{addcytoks{#1}}}
2. expandaftertokencyclexpressQendtokencyclexpress
3.catcode`_=12
defQ{$a_1$}
expandaftertokencyclexpressQendtokencyclexpress
4. $tokencyclexpress a_1endtokencyclexpress$
____
end{document}
Answered by Steven B. Segletes on December 24, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP