Comparing different classification results with different trainig and test data

Question

I have different samples with different sizes. The instances of each sample have different features in comparison to the instances from the other samples. For each sample i train my model and tested it with 30% of the data as an unseen data. After obtaining the performance result, i would like to choose the best one that can predict properly ( in other words, the used features in that sample are more predictive than the other). Now, the problem of comparing their performance is that the size of test sets are not the same. I don’t know if the calculation of the 95% confidance intervals can lead to a conclusion by chossing the maximum the min ranges of the confidance intervals.

Thanks in advance.

classification machine learning prediction predictive modeling

Thanks in advance.

DGoiko · Answer

Your question is unclear. If you're saying that you have datasets with different attributes then this answer is not relevant.

If you're saying that you've grouped your dataset by some similarity measure into different groups and then split every group into test and train subgroups then yes, you may have overfitting.

However, if you made a TEST dataset from the COMPLETE dataset which has mixed elements from the original groups and proportional to them, and then tested your model against this test dataset then your results should represent your complete dataset properly.

Comparing different classification results with different trainig and test data

One Answer

Add your own answers!

Ask a Question