Data Science Asked by SMI9 on March 28, 2021
I used Decisiton Tree Classifier which I trained with 50 000 samples. I have also set with unlabeled samples, so I decided to use self training algorithm. Unlabeled set has 10 000 samples. I would like to ask if it is normal, that after retrainig model with these 10 000 unlabeled samples, accuracy didn’t chaned as well as confusion matrix has same values? I expected some changes (better or worse prediction). Thank you in advance.
Well, that is a bit of a turn down but: your model has limitations.
If the 50.000 data forms a complete set for your problem that means that more data won't be needed or helpful.
What do I mean by complete set is: there are enough samples to form a full rank correlation matrix in your feature space. So from your samples you can get a set that can generate all other samples in your feature space by linear combination.
Also, while your data might represent everything a decision three needs to know for classificating your data in the generated feature space, there may be other feature spaces that benefit from the extra data (such as deeper trees or other models)
You might try helping you decision tree by providing a few normalizations for data and feature engineering
Answered by Pedro Henrique Monforte on March 28, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP