Comparing Dataset - Should I use the same Test dataset?

Question

I am training ML CNN model. I want to compare different images dataset. The dataset all have different characteristics (Translated or not, Rotated or not, etc.).
I do not modify the ML model between the different dataset training.
Should I use the same Test dataset to compare them ? This dataset would not be changed through the testing and would contain data that can't be found else where. It would not be more suited for a specific training dataset.
Or should I use a Test dataset that has the same caracteristic as the Training Dataset ? So that I can compare them at their best ?
For example, if I want to compare the dataset A and B, should I use a combinaison of Test dataset A and Test dataset B ? or When testing dataset A, use Test dataset A and when testing dataset B, use Test dataset B.

etiennedm · Answer

By testing a model on the same dataset (sharing same characteristics), you will have information on how pertinent you hyperparameters are for this dataset.
Then you can test on another dataset that has other characteristics. It will give you information on how good is a model to generalize.
I would not create a mixed dataset to test the model for interpretation as it could hide some interesting information. For instance, case 1 you predict 100% of the samples from test dataset A and only 50% from test dataset B. Case 2 you predict 75% from both. Interpretation will not be the same. But If you want to compare the two models created from 2 different training datasets, then yes, compare them with a same test dataset, otherwise the test would be biased.

Comparing Dataset - Should I use the same Test dataset?

One Answer

Add your own answers!

Ask a Question