Classification report question

Question

I need some help to interpret the 2 classification reports of the same logistic regression. The only difference between them is the size of test_size. Even though my second classification report has a much higher f1-score, because of its low test_size (0.1), it is an inferior algorithm than my first classification report?
I am not wholly sure how to interpret the relationship between the f1-score and support values. But from what I understand, the support values should all be close to each other and f1-score be as close to 1 as possible. In that case does that make the algorithm I developed a bad one then?

Sharare Zehtabian · Answer

Support is the number of occurrences of each label in the ground truth. For example in the results with test size=0.1, class P has only 3 samples. Based on this, if the support values are not close to each other, it only means that your data is unbalanced. This documentation might be helpful.

Answered by Sharare Zehtabian on February 12, 2021

Brian Spiering · Answer

You are misusing the idea of "test" dataset in machine learning. A test dataset should only be used once. You are re-using the test dataset and changing the modeling choices based on that re-use. This is an example of data leakage which will lower the generalization performance.
Additionally, the number of samples for the P group (3 and 21 respectively) are too small draw any meaningful conclusions. A small change with that few samples will greatly change the results, as evidence in those two reports. There needs to be much larger sample size to meaningfully apply machine learning techniques.

Classification report question

2 Answers

Add your own answers!

Ask a Question