Data Science Asked on June 25, 2021
I am working with a relatively small (~1000 samples) dataset that has some very messy text data (i.e lots of things missing, no real structure, etc.). I am trying to preprocess it and just going through each data sample and preprocessing it by hand is a viable option. I think that writing an algorithm to automate the process would actually end up being more tedious and take longer than just getting my hands dirty with the data.
So, my question is: is it common for data scientists/academics/professionals to preprocess a relatively small dataset by hand if it is the best and quickest option, or is anything done without automation looked down upon?
If you have a really small dataset like a few 100 samples it's okay to do preporcessing via hand. But since you have a thousand samples it's better to automate the process. You can use the na_values attribute in pandas to fill in the "???" or "??" value with nan. Then for each column replace the Nan values using some statistical measure (in your case the most occuring word..etc). But if youre data-wrangling is unworkable via an automation process or takes too much time, then you should proceed manually. If it's about meeting a deadline a deadline the manual way works best for really messy data.
Correct answer by Anoop A Nair on June 25, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP