stripping tabs, newlines, and spaces from string output, but leave one space so that words are not connected

Question

I have a list_3, with one element, a string:
[['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttThomas Lon Vantttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2016tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttNo tttttttnn', 'nnn Funding PhasennnnntttttttttN/Atttttttnn'], ['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttMacKenzie T Stout,tttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2020tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttYestttttttnn', 'nnn Funding PhasennnnntttttttttPre-Seedtttttttnn']]

I want to use regex to strip ntr, from the output and return the text in an easy to read format
This is what I have tried:
list_33 = []
for i in list_3:
     string = ''.join(list_3)
     list_33.append(re.sub('s+','', string))
print(list_33)

output:
['HeadquartersorRegionalOfficeMainHeadquarters', 'FoundersThomasLonVan', 'FounderDiversityN/A', 'YearFounded2016', '#ofEmployees1-10', 'SeekingFunding?No', 'FundingPhaseN/A']

This is almost what I need but I would like there to be one space between each word and colon after the first text block from list_3, ie:
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2015', '# of Employees 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

Any ideas of how I can incorporate both regex functions into one?
Thanks
ps. I know that I don't need to use a for loop for a list with just one element, but in the future the list will have more elements, I am trying to generalize the code structure using just one input right now.

Prem Anand · Answer

You can navigate through each string in the list and the use re.sub to replace each occurrence of more than 2 white space by a : 
>>> import re
>>> lst = ['nnn Headquarters or Regional OfficennnnntttttttttMain Headquarterstttttttnn', 'nnn FoundersnnnnntttttttttThomas Lon Vantttttttnn', 'nnn Founder DiversitynnnnntttttttttN/Atttttttnn', 'nnn Year Foundednnnnnttttttttt2016tttttttnn', 'nnn # of Employeesnnnnnttttttttt1-10tttttttnn', 'nnn Seeking Funding?nnnnntttttttttNo tttttttnn', 'nnn Funding PhasennnnntttttttttN/Atttttttnn']
>>> [re.sub(r'ss+', ': ', word).strip(': ') for word in lst]
['Headquarters or Regional Office: Main Headquarters', 'Founders: Thomas Lon Van', 'Founder Diversity: N/A', 'Year Founded: 2016', '# of Employees: 1-10', 'Seeking Funding?: No', 'Funding Phase: N/A']

stripping tabs, newlines, and spaces from string output, but leave one space so that words are not connected

One Answer

Add your own answers!

Ask a Question