What is the easiest way to modify catcodes in a string?

Question

Assume that a token register contains a sequence of letter tokens, each having a catcode of 12.  I want to store in a second token register the same sequence of letter token, altering some catcodes on the way (like changing all # so that they have catcode 6).

I see three ways to do this:

Iterating over the sequence of tokens with futurelet and do the replacement. This method has the inconvenient that adding tokens at the end of a token register involves a lot of expansions and parsing but the advantage that it can be generalised.
Using a macro with an argument template (like what we do to see if a token is in a sequence) to quickly parse the token register.  It has the drawback that it is maybe not easy to generalise.
Write the tokens in some auxiliary file, reconfigure the lexer and reread these tokens.  It is ugly.

Is there an easier way to go? (In plain TeX, no Lua, no Perl, no OCaml…)

Steven B. Segletes · Answer

I took this question as a learning opportunity for me, more than an actual attempt to satisfy a question for you, and so my answer is not yet a generalized approach (though I think it could get there with a little work).  I would say this method falls in your category #2, "Using a macro with an argument template (like what we do to see if a token is in a sequence) to quickly parse the token register."

I didn't know how to test whether I was successful with manipulating a # symbol's catcode, so I took a less challenging problem, the underscore _, knowing that I could throw it between some dollar signs and immediately know whether I was successful in changing the catcode.

The solution was EDITED so that the iterationengine is now recursive, and will continue until all catcode12 underscores have been replaced with catcode8 underscores.  Originally, I had to invoke the iteration engine successively, by hand.

In the MWE below, after setting up my macro, I test it on four separate strings, containing 0, 1, 2, and 3 underscores.

I haven't been able to confirm it, but the detector routine (detectus) may fail if it encounters a catcode12 underscore followed by a normal catcoded testchar (in this case, relax).

documentclass{article}
usepackage[T1]{fontenc}
deftestchar{relax}
defusEight{_}
catcode`_=12
defdetectus#1_#2relax{%
  defDetected{#2testchar}iftestcharDetectedelsedefDetected{T}fi}
defchangeus#1_#2relax{#1usEight#2}
makeatletterdefiterationengine{%
  if TDetected%
    expandafterdetectuscatEightTok_relax%
    if TDetectedprotected@edefcatEightTok{expandafterchangeuscatEightTokrelax}fi%
    iterationengine%
  fi%
}makeatother
catcode`_=8
defiteration{letcatEightTokcatTwelveTokdefDetected{T}iterationengine}
longdefruntest{%
  makebox[3cm][l]{$rawtok$} raw token in mathpar
  edefcatTwelveTok{detokenizeexpandafter{rawtok}}
  makebox[3cm][l]{catTwelveTok} detokenized raw token (cat12)par
  makebox[3cm][l]{$catTwelveTok$} cat12 token in mathpar
  iteration
  makebox[3cm][l]{detokenizeexpandafter{catEightTok}} 
    detokenized converted token (cat 8)par
  makebox[3cm][l]{$catEightTok$} converted token, in math
}
begin{document}
vspace{1ex}defrawtok{ab}runtestpar
vspace{1ex}defrawtok{a_b}runtestpar
vspace{1ex}defrawtok{a_b c_d}runtestpar
vspace{1ex}defrawtok{a_b c_d e_f}runtestpar
end{document}

user4686 · Answer

This is a Plain TeX (or any other format) general purpose macro scanthechars which accepts on input a string of character tokens of catcode 12 (1)  and puts them in the same order but with transformed catcodes in a token list register called replacetoks.

(1) this update adds  two string's to the @@scanthechars macro, and as a result the input is not anymore restricted to be with catcode 12 tokens only. It may even contain control sequences (but not braced material) which will go through unarmed, and one may convert back and forth between various catcode regimes -- see updated image.

The interface to specify the catcodes is via a macro replacesetup which is fetched with  a comma separated list of things like !{3} which means to transform the character token ! into a catcode 3 character token !.

The illustrative tests use detokenize for simplicity sake, to produce easily catcode 12 tokens, or verbatim output, hence the code needs etex or pdftex for compilation. But the macros themselves are strictly Knuthian.

Only catcodes of 3, 4, 6, 7, 8, 11, 12, 13 are dealt with.

Each use of scanthechars resets replacesetup: next call to scanthechars must follow a renewed replacesetup. Since now the input is not restricted to be only with catcode  12 tokens, one may use scanthechars many times on the same string. See the code and image for how this is done.

The code:

catcode`@ 11
longdef@gobble      #1{}
longdef@firstoftwo  #1#2{#1}
longdef@secondoftwo #1#2{#2}

newtoksreplacetoks

defreplacesetup #1{%
% this assumes non nil escapechar
    defreplace@list {}%
    defreplace@do ##1##2%
       {expandafterdefcsname replace@setup@expandafter
                                      @gobblestring##1endcsname {##2}}%
    replace@setup #1,relaxrelax,%
}

defreplace@setup #1#2,{%
    ifx#1relaxreplace@list
    else
        expandafterdefexpandafterreplace@listexpandafter
          {replace@list replace@do #1{#2}}%
        expandafterreplace@setup
    fi
}

defscanthechars #1{replacetoks{}%
           edefreplace@restore{lccode`$=thelccode`$
                                 lccode`^=thelccode`^
                                 lccode`_=thelccode`_
                                 lccode`&=thelccode`&
                                 lccode`noexpand#=thelccode`#
                                 lccode`a=`a lccode`(=thelccode`(
                                 lccode`noexpand~=thelccode`~relax
                                 }%
           @scanthechars #1relax}

def@scanthechars #1{ifx #1relaxexpandafter@firstoftwo
                              elseexpandafter@secondoftwo
                      fi
                      {defreplace@do ##1##2{expandafter
                         letcsname replace@setup@expandafter
                                     @gobblestring##1endcsnamerelax}%
                       replace@list % resets to relax everything
                       replace@restore}%
                      {@@scanthechars #1}%
}

def@@scanthechars #1{expandafter
                 ifxcsname replace@setup@string#1endcsnamerelax
                    replacetoksexpandafter{thereplacetoks #1}%
                 else
                    ifcasecsname replace@setup@string#1endcsnamerelax
                    ororor % catcode=3
                      lccode`$=`#1 %$
                      lowercase
                      {replacetoksexpandafter{thereplacetoks $}}%$
                    or % catcode=4
                      lccode`&=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks &}}%
                    oror % 6
                      lccode`#=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks ##}}%
                    or % 7
                      lccode`^=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks ^}}%
                    or % 8
                      lccode`_=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks _}}%
                    ororor % 11
                       lccode`a=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks a}}%
                    or % 12
                       lccode`(=`#1 
                      lowercase
                      {replacetoksexpandafter{thereplacetoks (}}%
                    or % 13
                       lccode`~=`#1
                      lowercase
                      {replacetoksexpandafter{thereplacetoks ~}}%
                   fi
                fi
    @scanthechars
}

catcode`@ 12
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% utility macro to get the catcode of a character token
% manages only catcodes 3,4,6,7,8,11,12,13
defTheCatcode #1{ifcatnoexpand#1$3else% $
                    ifcatnoexpand#1&4else
                     ifcatnoexpand#1##6else
                      ifcatnoexpand#1^7else
                       ifcatnoexpand#1_8else
                        ifcatnoexpand#1a11else
                         ifcatnoexpand#1(12else
                          ifcatnoexpand#1noexpand~13else not handled
                   fifififififififi }

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

tt

Test I.

replacesetup{#{3},!{7},?{8}}

detokenize{replacesetup{#{3},!{7},?{8}}}

edefx{string#a!b?cstring#}

x

expandafterscanthecharsexpandafter{x}

thereplacetoks

begingroup
  catcode`#=3
  catcode`!=7
  catcode`?=8
  gdefz{#a!b?c#}
endgroup

thereplacetoks
=z

edefy{thereplacetoks}

ifxzy OKelse WRONGfi

bigskip

Test II.

replacesetup{a{11},b{11},c{11},{3}}

detokenize{replacesetup{a{11},b{11},c{11},{3}}}

catcode`@ 11
edefbackslashchar{expandafter@gobblestring}
catcode`@ 12

% creates an x with catcode 12 a, b, c
edefx{stringabcbackslashchar}

x

expandafterscanthecharsexpandafter{x}

edefy{thereplacetoks}

begingroup
catcode`|=0
catcode`=3
|gdef|z{abc}
|endgroup

y=z

ifxzy OKelse WRONGfi

bigskip

Test III.

replacesetup{a{3},b{7},c{8},d{12}}

detokenize{replacesetup{a{3},b{7},c{8},d{12}}}

expandafterscanthecharsexpandafter {detokenize{abcd}}% convert to catcode 12

edefy {thereplacetoks}

defPrintTheCatcode #1{string#1 with catcode TheCatcode #1par }
% previously, we used xinttools.sty:
% xintApplyInline{PrintTheCatcode}{y}

defPrintTheCatcodes #1{ifx#1relaxelsePrintTheCatcode#1expandafter
                                          PrintTheCatcodesfi }
expandafterPrintTheCatcodesyrelax

% replacesetup must be redone each time
replacesetup{a{3},b{7},c{8},d{12}}

expandafterscanthecharsexpandafter {detokenize{adbdcda}}

detokenize{adbdcda}->thereplacetoks

bigskip

Test IV.

overfullrule 0pt

replacesetup {!{6}}

scanthechars {!}

expandaftermeaningthereplacetoks

edefy{thereplacetoks}

begingroup
catcode`!6
gdefz{!!}
endgroup

ifxzy OKelse WRONGfi

bigskip
TEST V.

From now on we don't require the input to be with catcode 12 tokens.

replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}

detokenize{replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}}

scanthechars {tabskip1ex #mathstrut @#hfil!hfil #@#hfil!hfil#@#cr
               noalignbgrouphruleegroup
                        1!23!456cr 7890!a!bccr
                noalignbgrouphruleegroup }

detokenize{scanthechars {tabskip1ex #mathstrut @#hfil!hfil #@#hfil!hfil#@#cr
               noalignbgrouphruleegroup
                        1!23!456cr 7890!a!bccr
                noalignbgrouphruleegroup }

}

detokenize{halign{spanthereplacetoks}}

output:
medskip
halign{spanthereplacetoks}
% expandafterPrintTheCatcodesthereplacetoksrelax

% attention _ is mathematically active!
% in plain.tex mathcode`_="8000 % _

bigskip

replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}
detokenize {replacesetup{#{3}, _{11}, ^{11}, !{4}, ?{7}, &{8}, @{6}}}
scanthechars {##___^?bgroup abcegroup &bgrouphboxbgroup___egroupegroup##}

detokenize {scanthechars {##___^?bgroup abcegroup &bgrouphboxbgroup___egroupegroup##}}

detokenize{hrulethereplacetokshrule}

output:

hrulethereplacetokshrule

medskip
string_ is mathematically active, the subscript is in an stringhboxspace
hence the catcode 11 version is used, but the first three use string_ as
defined in plain.tex for the math active string_, which now has catcode letter
bigskip

Let's now go back to catcode 12:

edefy{thereplacetoks}

replacesetup {#{12}, _{12}, ^{12}, !{12}, ?{12}, &{12}, @{12}}

expandafterscanthecharsexpandafter{y}

detokenize{edefy{thereplacetoks}}

detokenize{replacesetup {#{12}, _{12}, ^{12}, !{12}, ?{12}, &{12}, @{12}}}

detokenize{expandafterscanthecharsexpandafter{y}}

detokenize{thereplacetoks}

thereplacetoks

edefz{thereplacetoks}

detokenize{edefz{thereplacetoks}meaningz}

meaningz
nopagenumbers

bye

Steven B. Segletes · Answer

In the intervening years since my other answer, I have written the tokcycle package for taking an argument or input stream one token at a time, and "doing something with it".  It can easily be set up to do something with cat-12 # tokens and reconvert them to cat-6, since this seems to be what the OP desires.
The MWE works through a progression:

the standard cat-6 approach to def

What happens if the argument of the def contains cat-12 #.

What happens if both the argument and the def parameter itself are cat-12 #.

Now I set up tokcycle to convert cat-12 # tokens back to cat-6. In this step, I first pass only cat-6 # tokens to the cycle, to show that it can process them without issue (but no conversion is needed).

Here, I convert a cat-12 # in the argument to a def back to cat-6, with the expected result.

Here, I convert the cat-12 # tokens in both the argument and in the def parameter specification back to cat-6, with the expected result.

Here, I operate in an underlying cat-12 # environment, and still successfully convert occurrences of # back to cat-6, with the expected result.

The MWE:
documentclass {article}
usepackage[T1]{fontenc}
usepackage{tokcycle}
letSIXhash#
catcode`#=12
letTWELVEhash#
catcode`#=6
begin{document}
1.defQ#1{My argument is [#1]}
Q{X}

2.defQQ#1{My argument is [TWELVEhash1]}
QQ{X}

3.defQQQTWELVEhash1{My argument is [TWELVEhash1]}
QQQTWELVEhash1

Characterdirective{%
  whennotprocessingparameter#1{tctestifx{TWELVEhash#1}%
    {addcytoks{##}}{addcytoks{#1}}}}

4.tokencyclexpress
gdefW#1{My argument is [#1]}
endtokencyclexpress
W{X}

5.tokencyclexpress
gdefWW#1{My argument is [TWELVEhash1]}
endtokencyclexpress
WW{X}

6.tokencyclexpress
gdefWWWTWELVEhash1{My argument is [TWELVEhash1]}
endtokencyclexpress
WWW{X}

7.catcode`#=12
tokencyclexpress
gdefWWWW#1{My argument is [#1]}
endtokencyclexpress
WWWW{X} #####
end{document}

SUPPLEMENT
The case, above, of cat-6 changes is more challenging because of the way cat-6 tokens are processed (the need for ##, ####, etc).  For other catcode conversions, the tokcycle process is easier, since I don't have to ascertain whether the token is part of a parameter.  Here, I do something analogous with the underscore _.
documentclass {article}
usepackage[T1]{fontenc}
usepackage{tokcycle}
catcode`_=12
letTWELVEus_
catcode`_=8
begin{document}
1.defQ{$aTWELVEus1$}
Q

Characterdirective{%
  tctestifx{TWELVEus#1}{addcytoks{_}}{addcytoks{#1}}}

2. expandaftertokencyclexpressQendtokencyclexpress

3.catcode`_=12
defQ{$a_1$}
expandaftertokencyclexpressQendtokencyclexpress

4. $tokencyclexpress a_1endtokencyclexpress$
____
end{document}

What is the easiest way to modify catcodes in a string?

3 Answers

Add your own answers!

Ask a Question