TransWikia.com

Is it good practice to remove the numeric values from the text data during preprocessing?

Data Science Asked on January 20, 2021

Im doing preprocessing on a text dataset. I have certain numerics in it like:

  • date(1st July)
  • year(2019)
  • tentative values (3-5 years/ 10+ advantages).
  • unique values (room no 31/ user rank 45)
  • percentage(100%)

Is it recommended to discard this numerics before creating a vectorizer(bow/tf-idf) for any model(classification/regression) development?

Any quick help on this is much appreciated. Thank you

One Answer

Is it recommended to discard this numerics before creating a vectorizer(bow/tf-idf) for any model(classification/regression) development?

It depends on the problem statement for example year could be significant if you want to find the trend and year has many unique value but if it's constant then you can remove it.

To add to that if you are doing sentiment analysis then numeric variables don't make much sense.

Answered by prashant0598 on January 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP