Mathematica Asked on October 22, 2021
sample2 = "this is a test to find whether pi works for other words or not as well pi pi pi potatpi pineapple pi-neapple";
so sample2
is a string, and I want to search the word/substring "pi"
only, not in "pi-neapple"
or "potatpi"
The code I tried:
StringCases[sample2, RegularExpression["\b(pi)\b"]]
output :
{"pi", "pi", "pi", "pi", "pi"}
Could you please help me?
The extra "pi"
in the output is simply because -
is not a word character, and therefore pi
in pi-neaple
matches b(pi)b
.
StringMatchQ["-", RegularExpression["\w"]]
(*False*)
You can use the following pattern to add -
to word characters:
(?<![w-])(pi)(?![w-])
which leads to one less pi
in the result:
StringCases[
sample2,
RegularExpression["(?<![\w\-])(pi)(?![\w\-])"]
]
(*{"pi", "pi", "pi", "pi"}*)
To ensure that these are the right pi
s, we can use the following test case:
StringCases[
"pi1 foo-pi2 pi3-foo foo-pi4-bar api5 pi6peline pi7 pi8",
RegularExpression["(?<![\w\-])(pi\d)(?![\w\-])"]
]
(*{"pi1", "pi7", "pi8"}*)
The pattern b(pi)b
means
pi
which is not preceded by a word character (w
) and is not followed by a word character.
All we need to do here is to replace by a word character with by a word character or a dash.
For this we can use lookarounds, which are explained, e.g., here. In a nutshell, (?<!foo)bar
means bar
not preceded by something matching foo
, and foo(?!bar)
means foo
not followed by something matching bar
.
Answered by Anton.Sakovich on October 22, 2021
As a start, try using s
, which stands for any white space character.
StringCases[
sample2,
RegularExpression["\s+(pi)\s+"] -> "$1",
Overlaps -> True
]
{"pi", "pi", "pi", "pi"}
Read towards the end of this answer for more information on how to make this more robust.
The corresponding Wolfram Language string pattern is this:
StringCases[
sample2,
Whitespace ~~ s:"pi" ~~ Whitespace -> s,
Overlaps -> True
]
{"pi", "pi", "pi", "pi"}
It is at least functionally equivalent in this case, but it does not use the exact same regular expression. We can see what regular expression it translates the string pattern into like this:
StringPattern`PatternConvert["[\s\n]+(pi)[\s\n]+"] // First
"(?ms)\[\\s\\n\]\+\(pi\)\[\\s\\n\]\+"
(Mathematica threw in a couple of extra backslashes for good measure upon copying the pattern.)
user1066 has identified issues with the regex solution. First, it doesn't work if the string starts or ends with a pi
. Second, it doesn't work if there are more than two spaces.
One possible way to patch the solution to work for these cases is:
StringCases[
StringReplace[s, " " .. -> " "], {
RegularExpression["\s+(pi)\s+"] -> "$1",
RegularExpression["^(pi)\s+"] -> "$1",
RegularExpression["\s+(pi)$"] -> "$1"
},
Overlaps -> True
]
user1066 found the following solution which neatly packs these patterns into one regex:
StringCases[
s,
RegularExpression["(?i)(^|\s)(pi)($|\s)"] -> "$2",
Overlaps -> True
]
Answered by C. E. on October 22, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP