Relating ROC curves with class statistics

Question

I have three neural net models that I am running on the same dataset (of 7 classes) and calculate their class performance and also ROC curves. The firs tmodel is a 4-layer model with 8 neurons in each layer, the second one is a 3-layer network of 32 nodes each, and the last one is a two-layer network of 64 nodes each. The class statistics and ROC curves for each network is shown below:

4x8 Network:

precision    recall  f1-score   support
         0.0      0.999     1.000     1.000     86582
         1.0      0.688     0.494     0.575      1732
         2.0      0.490     0.266     0.345       267
         3.0      0.929     0.955     0.942      8878
         4.0      0.000     0.000     0.000        70
         5.0      0.155     0.726     0.256       117
         6.0      0.740     0.520     0.611       148

accuracy                          0.983     97794
   macro avg      0.572     0.566     0.533     97794
weighted avg      0.984     0.983     0.983     97794

3x32 Network:

precision    recall  f1-score   support  
         0.0      0.999     1.000     0.999     86582
         1.0      0.690     0.622     0.654      1732
         2.0      0.547     0.367     0.439       267
         3.0      0.929     0.960     0.944      8878
         4.0      0.000     0.000     0.000        70
         5.0      0.330     0.325     0.328       117
         6.0      0.667     0.338     0.448       148

accuracy                          0.985     97794
   macro avg      0.595     0.516     0.545     97794
weighted avg      0.984     0.985     0.984     97794

2x64 Network:

precision    recall  f1-score   support
         0.0      0.999     1.000     0.999     86582
         1.0      0.689     0.641     0.664      1732
         2.0      0.411     0.139     0.207       267
         3.0      0.932     0.957     0.944      8878
         4.0      0.000     0.000     0.000        70
         5.0      0.241     0.453     0.315       117
         6.0      0.800     0.378     0.514       148

accuracy                          0.985     97794
   macro avg      0.582     0.510     0.520     97794
weighted avg      0.984     0.985     0.984     97794

Looking at the ROC graphs I conclude that 2x64 network is superior in all classes compared to the other two, but from the tables and considering F1 statistics, I prefer 3x32 network as it has better performance in most of classes. The AUC statsitics is almost always 1 for all classes except class 4 in network 3x32, which doesn't make sense to me considering the high range of precision and recall values that I get (also the class 4 has zero precision and recall in all models). In short, I find F1 statistics much more clear than ROC and I can not relate these two concepts together but I think there should be a unified explanation.

Rishabh Sharma · Answer

I don't want to offend you in any way but the problem you mentioned above is not really a problem but an opinion that you presented on ROC vs F1 score, and I guess you want other people's opinion on the same.
I would highly suggest you to ask to the point questions next time since if the questions get downvoted you can get banned (I am not downvoting so don't worry), just trying to be informative.
Now from what I was able to grasp from your post, I think that you are having difficulty in choosing the right metric for evaluating the three neural net models you have shown results for.
AUC vs F1 :
In general, the ROC is for many different levels of thresholds and thus it has many F score values. F1 score is applicable for any particular point on the ROC curve.
You may think of it as a measure of precision and recall at a particular threshold value whereas AUC is the area under the ROC curve. For F score to be high, both precision and recall should be high.
Consequently, when you have a data imbalance between positive and negative samples, you should always use F1-score because ROC averages over all possible thresholds.
Conclusion :
So in your case particularly since I can clearly see through precision and recall values that your classes are highly imbalanced, use F1 score to choose the model instead of AUC-ROC score.

Relating ROC curves with class statistics

One Answer

AUC vs F1 :

Conclusion :

Add your own answers!

Ask a Question