TransWikia.com

Extracting text into columns

Data Science Asked by Bahi8482 on April 29, 2021

I have a set of customer reports, each is in ms word file. they are all in a similar pattern, for example they start with Name: –, Age: –, Date: –, etc…

is there a way to extract particular strings from each file to form a data set.

In orange, I was able to compile the word documents into corpus which I can display as one column (each report is in one cell). Does orange have a way to extract strings into columns (for example if between “age:” and “gender”)?

One Answer

Maybe you could use Orange3-Text add-on, widget Preprocess Text, Tokenization > Regexp. The source code indicates it's a Python regex, so you might be able to use a regular expression pattern such as:

(?ix)        # ignore case, ignore comments and whitespace in this RE
(?<=age:s)  # preceded by 'age: '
.+           # characters you wish to match
(?=gender:)  # followed by 'gender:'

Answered by K3---rnc on April 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP