TransWikia.com

Use StringSplit with a Whitelist

Mathematica Asked by M.A. on May 12, 2021

I have strings like these:

animals1 = {"Cow and Milk, extracreamy and Chicken and Egg and Pig and Sheep"};
animals2 = {"Cow and Milk and Chicken and Egg and Pig and Sheep"};

Now I want to use StringSplitusing " and " as expression but I was wondering if it is possible to define some kind of universal Go-List which defines exceptions for StringSplit such as this one

whitelist = {"Cow and Milk", "Chicken and Egg", "Cow and Milk, extracreamy"}

So the outcome should be

{"Cow and Milk, extracreamy", "Chicken and Egg", "Pig", "Sheep"} 

and

{"Cow and Milk", "Chicken and Egg", "Pig", "Sheep"} 

respectively. More specifically, the problem is that Cow and Milk and Cow and Milk, extracreamy are defined in one Go-List….

2 Answers

There are many ways to do this, among others:

  whitelist = {"Cow and Milk", "Chicken and Egg"}  

     ReadList[StringToStream[
       "Cow and Milk and Chicken and Egg and Pig and Sheep"]
      , Word
      , RecordLists -> True
      , RecordSeparators -> whitelist
      , WordSeparators -> "and"
      , TokenWords -> whitelist]  

    Flatten[%]
    StringTrim /@ %
    DeleteCases[%, ""] 

{{"Cow and Milk", " ", " "}, {"Chicken and Egg", " ", " Pig ", " Sheep"}}
{"Cow and Milk", " ", " ", "Chicken and Egg", " ", " Pig ", " Sheep"}
{"Cow and Milk", "", "", "Chicken and Egg", "", "Pig", "Sheep"}
{"Cow and Milk", "Chicken and Egg", "Pig", "Sheep"}

Another approach :

whitelist = {"Cow and Milk", "Chicken and Egg"};  

StringSplit["Cow and Milk and Chicken and Egg and Pig and Sheep", # ->
     white[#] & /@ whitelist] 
If[Head[#] === white, #[[1]], StringSplit[#, "and"]] & /@ %  


Flatten[%]
StringTrim /@ % 
DeleteCases[%, ""]    

{white["Cow and Milk"], " and ", white["Chicken and Egg"], " and Pig and Sheep"}
{"Cow and Milk", {" ", " "}, "Chicken and Egg", {" ", " Pig ", " Sheep"}}
{"Cow and Milk", " ", " ", "Chicken and Egg", " ", " Pig ", " Sheep"}
{"Cow and Milk", "", "", "Chicken and Egg", "", "Pig", "Sheep"}
{"Cow and Milk", "Chicken and Egg", "Pig", "Sheep"}

Answered by andre314 on May 12, 2021

A variation on ciao's suggestion in comments:

StringSplit["Cow and Milk and Chicken and Egg and Pig and Sheep", 
   {a : Alternatives @@ whitelist :> a, "and"}] // DeleteCases[" "] // StringTrim
 {"Cow and Milk", "Chicken and Egg", "Pig", "Sheep"}

Answered by kglr on May 12, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP