Data Science Asked by Meysam on August 22, 2020
I get something like TF-IDF of training corpus in python with (something like) TfidfVectorizer. In test data some features (here are the words of test corpus, every word is a feature) are not seen in the training data and because of this, the shape of the test and the train matrix are not equal and the program gives an error (number of columns isn’t same and some words in the test data are not seen in the train data).
How should I solve this problem? How should I handle unseen features in test set?
It depends on the problem.There is no single answer to it.
Things you can do:
Correct answer by prashant0598 on August 22, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP