TransWikia.com

How to get what is before the next '[' or ']'

TeX - LaTeX Asked by Jérôme LAURENS on January 29, 2021

Given next LaTeX source code

mymacro blablabla[ lorem ipsum
mymacro blablabla[ lorem ipsum ] etc
mymacro blablabla] lorem ipsum
mymacro blablabla] lorem ipsum [ etc

How would I define mymacro with 2 args such that #1 is blablabla, and #2 is either ‘[‘ in the former cases or ‘]’ in the latter.

Notice that splitting in 2 different macros, one for ‘[‘ and another one for ‘]’ is not an option. That would be too easy.

Nota Bene: Examples were updated to more closely reflect the problem stated in the title.

This question may be strange because of the unusual latex usage, the original problem is to import in a LaTeX document some text material created by a third party tool.

4 Answers

Maybe I interpret the question correctly, maybe not, but at least this may help to clarify it.

documentclass{article}
makeatletter
defmymacro#1]{edefmymacro@tmp{noexpandin@{[}{#1}}%
mymacro@tmp
ifin@
expandaftermymacro@i#1%
else
#1=#1,#2=]%
fi}
defmymacro@i#1[#2]{#1=#1,#2=[}
makeatother
begin{document}
mymacro blablabla[ lorem ipsum ] etc

mymacro blablabla] lorem ipsum [ etc
end{document}

enter image description here

Answered by user229669 on January 29, 2021

The OP poses a very strange syntax, with ungrouped arguments. But here, I use a token cycle to achieve the desired output. The macro mymacroaux is where you need to specify what to do with the arguments...here, I just echo them so that one can see they were digested properly.

documentclass{article}
usepackage{tokcycle,txfonts}
makeatletterletgobble@gobblemakeatother
Characterdirective{%
  aftertokcycle{expandaftermymacroauxexpandafter{thecytoks}{#1}}%
  tctestifx{]#1}{expandafterendtokcycrawgobble}{%
    tctestifx{[#1}{expandafterendtokcycrawgobble}{addcytoks{#1}}}%
}
defmymacroaux#1#2{(#1 is ``#1'' and #2 is ``#2'')}
letmymacrotokencyclexpress
begin{document}
mymacro blablabla[ lorem ipsum

mymacro blablabla[ lorem ipsum ] etc

mymacro blablabla] lorem ipsum

mymacro blablabla] lorem ipsum [ etc
end{document}

enter image description here

SUPPLEMENT

Here, I tried to make it even more user-friendly to write your own token intercept routines using tokcycle.

I have introduced a macro

abortiftokenis{<test token>}{<command if not test token>}

which can be nested, to screen more than one token, such as [ and ]. One must have some familiarity with the tokcycle approach, in which tokens in the input stream are shunted off to one of four directives, Character, Group, Macro, or Space. Thus, trapping macros (commands) must occur in the Macrodirective, spaces in the Spacedirective, and normal characters in the Characterdirective. The Groupdirective is set up to pass through its content, as one cannot properly break out of the token cycle if submerged in a group.

The macro aborttokcycle is separately defined, if one wishes to bail out of the tokencycle for reasons other than a matching token. So you will notice in this new version, the directives are much more streamlined as a result.

In this MWE, I will branch to the mymacroaux handler upon finding a [, ], today or a space in the top level content (but not within a group). Recall, this handler routine takes two arguments: the tokens leading up to the trapped token and the trapped token that caused the token-cycle exit.

documentclass{article}
usepackage[T1]{fontenc}
usepackage{tokcycle,txfonts}
makeatletter
defaborttokcycle{expandafterendtokcycraw@gobble}
defabortiftokenis#1{%
  aftertokcycle{expandaftermymacroauxexpandafter{thecytoks}{#1}}%
  tctestifx{tc@next#1}{aborttokcycle}%
}
makeatother
Characterdirective{abortiftokenis{]}{abortiftokenis{[}{addcytoks{#1}}}}
Groupdirective{addcytoks{#1}}
Macrodirective{abortiftokenis{today}{addcytoks{#1}}}
Spacedirective{abortiftokenis{ }{addcytoks{#1}}}
defmymacroaux#1#2{(#1 is ``detokenize{#1}'' and #2 is ``detokenize{#2}'')}
letmymacrotokencyclexpress
begin{document}
mymacro b{la b}labla[ lorem ipsum

mymacro blabrelax labla] lorem ipsum ] etc

mymacro blabtoday labla] lorem ipsum [ etc

mymacro blabla bla] lorem ipsum
end{document}

enter image description here

Answered by Steven B. Segletes on January 29, 2021

I can offer a macro

SplitAtSquareBracketAndPassToMacro{⟨macro A⟩}{⟨macro B⟩}{⟨tokens⟩}
which works as follows:

If ⟨tokens⟩ doesn't contain any square bracket of category code 12(other) which is not nested in curly braces, then

⟨macro B⟩{⟨tokens⟩}

is delivered.

If ⟨tokens⟩ does contain at least one square bracket of category code 12(other) which is not nested in curly braces, then

⟨macro A⟩{⟨tokens before first square bracket⟩}{⟨square bracket⟩}{⟨tokens behind first square bracket⟩}

is delivered.

First in a romannumeral-expansion-driven tail-recursive loop it is tested whether the ⟨remaining tokens⟩-argument

  • is empty
  • or has a leading [12 or ]12
  • or has a leading explicit space token.

If it is empty, then ⟨macro B⟩{⟨tokens⟩} is delivered.

If it has a leading [12 or ]12, then a macro processing a [12-delimited argument respective ]12-delimited argument is called for splitting ⟨tokens⟩ accordingly.

If it has a leading explicit space token, then that is removed and another iteration of the loop is done.

If neither is the case, then a non-delimited argument is removed and another iteration of the loop is done.

As ⟨macro A⟩-argument you can pass mymacro to SplitAtSquareBracketAndPassToMacro.

documentclass{article}
makeatletter
%%=============================================================================
%% Paraphernalia:
%%    UD@firstoftwo, UD@secondoftwo,
%%    UD@PassFirstToSecond, UD@Exchange, UD@removespace
%%    UD@CheckWhetherNull, UD@CheckWhetherLeadingTokens, 
%%    UD@ExtractFirstArgLoop
%%=============================================================================
newcommandUD@firstoftwo[2]{#1}%
newcommandUD@secondoftwo[2]{#2}%
newcommandUD@PassFirstToSecond[2]{#2{#1}}%
newcommandUD@Exchange[2]{#2#1}%
newcommandUD@removespace{}UD@firstoftwo{defUD@removespace}{} {}%
%%-----------------------------------------------------------------------------
%% Check whether argument is empty:
%%.............................................................................
%% UD@CheckWhetherNull{<Argument which is to be checked>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is empty>}%
%%                     {<Tokens to be delivered in case that argument
%%                       which is to be checked is not empty>}%
%%
%% The gist of this macro comes from Robert R. Schneck's ifempty-macro:
%% <https://groups.google.com/forum/#!original/comp.text.tex/kuOEIQIrElc/lUg37FmhA74J>
newcommandUD@CheckWhetherNull[1]{%
  romannumeralexpandafterUD@secondoftwostring{expandafter
  UD@secondoftwoexpandafter{expandafter{string#1}expandafter
  UD@secondoftwostring}expandafterUD@firstoftwoexpandafter{expandafter
  UD@secondoftwostring}expandafterz@UD@secondoftwo}%
  {expandafterz@UD@firstoftwo}%
}%
%%-----------------------------------------------------------------------------
%% Check whether argument's leading tokens form a specific 
%% token-sequence that does neither contain explicit character tokens of 
%% category code 1 or 2 nor contain tokens of category code 6:
%%.............................................................................
%% UD@CheckWhetherLeadingTokens{<argument which is to be checked>}%
%%                              {<a <token sequence> without explicit 
%%                                character tokens of category code
%%                                1 or 2 and without tokens of
%%                                category code 6>}%
%%                              {<internal token-check-macro>}%
%%                              {<tokens to be delivered in case
%%                                <argument which is to be checked> has
%%                                <token sequence> as leading tokens>}%
%%                              {<tokens to be delivered in case 
%%                                <argument which is to be checked>
%%                                does not have <token sequence> as
%%                                leading tokens>}%
newcommandUD@CheckWhetherLeadingTokens[3]{%
  romannumeralUD@CheckWhetherNull{#1}{expandafterz@UD@secondoftwo}{%
    expandafterUD@secondoftwostring{expandafter
    UD@@CheckWhetherLeadingTokens#3{relax}#1#2}{}}%
}%
newcommandUD@@CheckWhetherLeadingTokens[1]{%
  expandafterUD@CheckWhetherNullexpandafter{UD@firstoftwo{}#1}%
  {UD@Exchange{UD@firstoftwo}}{UD@Exchange{UD@secondoftwo}}%
  {expandafterexpandafterexpandafterexpandafter
   expandafterexpandafterexpandafterz@expandafterexpandafter
   expandafter}expandafterUD@secondoftwoexpandafter{string}%
}%
%%-----------------------------------------------------------------------------
%% Extract first inner undelimited argument:
%%
%%   romannumeralUD@ExtractFirstArgLoop{ABCDEUD@SelDOm} yields  {A}
%%
%%   romannumeralUD@ExtractFirstArgLoop{{AB}CDEUD@SelDOm} yields  {AB}
%%.............................................................................
@ifdefinableUD@RemoveTillUD@SelDOm{%
  longdefUD@RemoveTillUD@SelDOm#1#2UD@SelDOm{{#1}}%
}%
newcommandUD@ExtractFirstArgLoop[1]{%
  expandafterUD@CheckWhetherNullexpandafter{UD@firstoftwo{}#1}%
  {z@#1}%
  {expandafterUD@ExtractFirstArgLoopexpandafter{UD@RemoveTillUD@SelDOm#1}}%
}%
%%-----------------------------------------------------------------------------
%% UD@internaltokencheckdefiner{<internal token-check-macro>}%
%%                              {<token sequence>}%
%% Defines <internal token-check-macro> to snap everything 
%% until reaching <token sequence>-sequence and spit that out
%% nested in braces.
%%-----------------------------------------------------------------------------
newcommandUD@internaltokencheckdefiner[2]{%
  @ifdefinable#1{longdef#1##1#2{{##1}}}%
}%
%%=============================================================================
%% Supplementary macros for SplitAtSquareBracketAndPassToMacro 
%% and SplitAtSquareBracketAndPassToMacro 
%%=============================================================================
UD@internaltokencheckdefiner{UD@InternalExplicitSpaceCheckMacro}{ }%
UD@internaltokencheckdefiner{UD@InternalLeftSquaeBracketCheckMacro}{[}%
UD@internaltokencheckdefiner{UD@InternalRightSquaeBracketCheckMacro}{]}%
@ifdefinableUD@SplitAtLeftSquareBracket{%
  longdefUD@SplitAtLeftSquareBracket#1[{expandafterz@expandafter{UD@firstoftwo{}#1}{[}}%
}%
@ifdefinableUD@SplitAtRightSquareBracket{%
  longdefUD@SplitAtRightSquareBracket#1]{expandafterz@expandafter{UD@firstoftwo{}#1}{]}}%
}%
newcommandUD@SplitAtSquareBracket[3]{%
  expandafterUD@PassFirstToSecondexpandafter{%
     romannumeral
     expandafterexpandafterexpandafterz@expandafterUD@firstoftwoexpandafter{expandafter}%
     romannumeral
     expandafterexpandafterexpandafterz@expandafterUD@firstoftwoexpandafter{expandafter}%
     romannumeral#3#1%
  }{%
    expandafterUD@PassFirstToSecond
    romannumeralexpandafterexpandafterexpandafterUD@ExtractFirstArgLoop
                 expandafterexpandafterexpandafter{%
                 expandafterUD@firstoftwoexpandafter{expandafter}%
                 romannumeral#3#1UD@SelDOm}{%
      expandafterUD@PassFirstToSecond
      romannumeralexpandafterUD@ExtractFirstArgLoopexpandafter{%
                   romannumeral#3#1UD@SelDOm}{%
        z@#2% 
      }%
    }%
  }%
}%
newcommandSplitAtSquareBracketAndPassToMacro[3]{%
  romannumeralUD@SplitAtSquareBracketAndPassToMacroLoop{#3}{#3}{#1}{#2}%
}%
newcommandUD@SplitAtSquareBracketAndPassToMacroLoop[4]{%
  % #1 = <remaining tokens>
  % #2 = <tokens>
  % #3 = <macro A>
  % #4 = <macro B>
  UD@CheckWhetherNull{#1}{z@#4{#2}}{%
    UD@CheckWhetherLeadingTokens{#1}{ }{UD@InternalExplicitSpaceCheckMacro}{%
      expandafterUD@SplitAtSquareBracketAndPassToMacroLoopexpandafter{UD@removespace#1}{#2}{#3}{#4}%
    }{%
      UD@CheckWhetherLeadingTokens{#1}{[}{UD@InternalLeftSquaeBracketCheckMacro}{%
         UD@SplitAtSquareBracket{.#2}{#3}{UD@SplitAtLeftSquareBracket}%
      }{%
        UD@CheckWhetherLeadingTokens{#1}{]}{UD@InternalRightSquaeBracketCheckMacro}{%
          UD@SplitAtSquareBracket{.#2}{#3}{UD@SplitAtRightSquareBracket}%
        }{%
          expandafterUD@SplitAtSquareBracketAndPassToMacroLoopexpandafter{UD@firstoftwo{}#1}{#2}{#3}{#4}%
        }%
      }%
    }%
  }%
}%
makeatother

%%=============================================================================
%% mymacro{<tokens 1>}{<tokens 2>}{<tokens 3>}
%% Whan arguments are passed to  mymacro from 
%% SplitAtSquareBracketAndPassToMacro, then 
%% - <tokens 1> is the things before the first [ respective ] .
%% - <tokens 2> is either [ or ] .
%% - <tokens 3> is the things behind the first [ respective ] .
%%=============================================================================
newcommandmymacro[3]{%
  noindent
  scantokensexpandafterexpandafterexpandafter{%
             expandafterstringexpandafterverbexpandafter|stringmymacro|:%
  }%
  Argument 1 is: scantokensexpandafter{stringverb|#1|.}%
  Argument 2 is: scantokensexpandafter{stringverb|#2|.}%
  Argument 3 is: scantokensexpandafter{stringverb|#3|.}%
 }%
%%=============================================================================
%% macro in case there was no square bracket
%%=============================================================================
newcommandnosquarebracketsmacro[1]{%
  noindent
  scantokensexpandafterexpandafterexpandafter{%
             expandafterstringexpandafterverbexpandafter|%
             stringnosquarebracketsmacro|:%
  }%
  Argument is: scantokensexpandafter{stringverb|#1|.}%
}%

parindent=0ex

begin{document}

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB]CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB]CDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB[CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB[CDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{ABCDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{ABCDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}]CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}]CDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}[CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}[CDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}CDE}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB]{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB]{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB[{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB[{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}]{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}]{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}[{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}[{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}{CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB}{CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB]CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB]CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB[CDE}}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{AB[CDE}}

vfill

verb|SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{AB]CDE}|:
SplitAtSquareBracketAndPassToMacro{mymacro}{nosquarebracketsmacro}{{ABCDE}}

end{document}

enter image description here

As far as the task/macro-writing-challenge in the question is concerned, I admittedly cheated a bit:

By defining the macro SplitAtSquareBracketAndPassToMacro in such a way that the tokens to be split up are passed to it as a macro argument, I have dodged the need to "fish" these tokens out of the token stream directly.

I did this because tokens can be "fished out" of the token stream in principle only as macro arguments.
For example, undelimited macro arguments consisting of several tokens are nested in a token pair consisting of an explicit character token of category code 1 (begin group) and an explicit character token of category code 2 (end group). This pair of tokens forms the so-called argument-group.
Usually, argument-groups are formed by the character-tokens {1 and }2, but it is not excluded that for some obscure reason the category code régime is different and other characters of these category-codes are in use.
I don't know of any reliable method to fish undelimited macro arguments out of the token stream in a way where the braces or category-code-1- and -2-character-tokens forming the argument-group in which the undelimited arguments are nested are preserved exactly instead of being replaced by some "hard-coded" pair of explicit character tokens of category code 1/2.
You can have TeX "look" via futurelet and ifcat for the category code of the next token in the token stream. But firstly a category-code-1-character-token denoting the begin of the argument-group is not fished out from just "looking", and secondly the category-code-2-character-token denoting the end of the argument-group is not the next token in the token stream...

Answered by Ulrich Diez on January 29, 2021

With an up-to-date LaTeX system, you can use peek_regex_replace_once:nn

documentclass{article}

ExplSyntaxOn
NewDocumentCommand{mymacro}{}
 {
  peek_regex_replace_once:nn { ([^[]]*) ([|]) } { c{innermymacro}cB{1cE}2 }
 }
ExplSyntaxOff

newcommand{innermymacro}[2]{%
  First argument: ``#1''; second argument: ``#2''%
}


begin{document}

mymacro blablabla[ lorem ipsum

mymacro bla bla bla[ lorem ipsum ] etc

mymacro blablabla] lorem ipsum

mymacro bla bla bla] lorem ipsum [ etc

end{document}

Explanation:

  1. the regular expression ([^[]]*) ([|]) looks for all tokens until finding either a [] or a ], saving the finding as 1 (the tokens up to the bracket) and 2 (the bracket);

  2. the replacement text is c{innermymacro}cB{1cE}2, which means "put myinnermacro, a brace {, the tokens represented by 1, a brace } and the tokens represented by 2 (in this case a single token, either [ or ]);

  3. next the processing will restart from myinnermacro that can use its two given arguments.

enter image description here

Answered by egreg on January 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP