Data Science Asked by Born New on December 29, 2020
I have different samples with different sizes. The instances of each sample have different features in comparison to the instances from the other samples. For each sample i train my model and tested it with 30% of the data as an unseen data. After obtaining the performance result, i would like to choose the best one that can predict properly ( in other words, the used features in that sample are more predictive than the other). Now, the problem of comparing their performance is that the size of test sets are not the same. I don’t know if the calculation of the 95% confidance intervals can lead to a conclusion by chossing the maximum the min ranges of the confidance intervals.
Thanks in advance.
Your question is unclear. If you're saying that you have datasets with different attributes then this answer is not relevant.
If you're saying that you've grouped your dataset by some similarity measure into different groups and then split every group into test and train subgroups then yes, you may have overfitting.
However, if you made a TEST dataset from the COMPLETE dataset which has mixed elements from the original groups and proportional to them, and then tested your model against this test dataset then your results should represent your complete dataset properly.
Answered by DGoiko on December 29, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP