Data Science Asked by Bahi8482 on April 29, 2021
I have a set of customer reports, each is in ms word file. they are all in a similar pattern, for example they start with Name: –, Age: –, Date: –, etc…
is there a way to extract particular strings from each file to form a data set.
In orange, I was able to compile the word documents into corpus which I can display as one column (each report is in one cell). Does orange have a way to extract strings into columns (for example if between “age:” and “gender”)?
Maybe you could use Orange3-Text add-on, widget Preprocess Text, Tokenization > Regexp. The source code indicates it's a Python regex, so you might be able to use a regular expression pattern such as:
(?ix) # ignore case, ignore comments and whitespace in this RE
(?<=age:s) # preceded by 'age: '
.+ # characters you wish to match
(?=gender:) # followed by 'gender:'
Answered by K3---rnc on April 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP