Data Science Asked by Shahriar49 on December 19, 2020
I have three neural net models that I am running on the same dataset (of 7 classes) and calculate their class performance and also ROC curves. The firs tmodel is a 4-layer model with 8 neurons in each layer, the second one is a 3-layer network of 32 nodes each, and the last one is a two-layer network of 64 nodes each. The class statistics and ROC curves for each network is shown below:
4×8 Network:
precision recall f1-score support
0.0 0.999 1.000 1.000 86582
1.0 0.688 0.494 0.575 1732
2.0 0.490 0.266 0.345 267
3.0 0.929 0.955 0.942 8878
4.0 0.000 0.000 0.000 70
5.0 0.155 0.726 0.256 117
6.0 0.740 0.520 0.611 148
accuracy 0.983 97794
macro avg 0.572 0.566 0.533 97794
weighted avg 0.984 0.983 0.983 97794
3×32 Network:
precision recall f1-score support
0.0 0.999 1.000 0.999 86582
1.0 0.690 0.622 0.654 1732
2.0 0.547 0.367 0.439 267
3.0 0.929 0.960 0.944 8878
4.0 0.000 0.000 0.000 70
5.0 0.330 0.325 0.328 117
6.0 0.667 0.338 0.448 148
accuracy 0.985 97794
macro avg 0.595 0.516 0.545 97794
weighted avg 0.984 0.985 0.984 97794
2×64 Network:
precision recall f1-score support
0.0 0.999 1.000 0.999 86582
1.0 0.689 0.641 0.664 1732
2.0 0.411 0.139 0.207 267
3.0 0.932 0.957 0.944 8878
4.0 0.000 0.000 0.000 70
5.0 0.241 0.453 0.315 117
6.0 0.800 0.378 0.514 148
accuracy 0.985 97794
macro avg 0.582 0.510 0.520 97794
weighted avg 0.984 0.985 0.984 97794
Looking at the ROC graphs I conclude that 2×64 network is superior in all classes compared to the other two, but from the tables and considering F1 statistics, I prefer 3×32 network as it has better performance in most of classes. The AUC statsitics is almost always 1 for all classes except class 4 in network 3×32, which doesn’t make sense to me considering the high range of precision and recall values that I get (also the class 4 has zero precision and recall in all models). In short, I find F1 statistics much more clear than ROC and I can not relate these two concepts together but I think there should be a unified explanation.
I don't want to offend you in any way but the problem you mentioned above is not really a problem but an opinion that you presented on ROC vs F1 score, and I guess you want other people's opinion on the same.
I would highly suggest you to ask to the point questions next time since if the questions get downvoted you can get banned (I am not downvoting so don't worry), just trying to be informative.
Now from what I was able to grasp from your post, I think that you are having difficulty in choosing the right metric for evaluating the three neural net models you have shown results for.
In general, the ROC is for many different levels of thresholds and thus it has many F score values. F1 score is applicable for any particular point on the ROC curve.
You may think of it as a measure of precision and recall at a particular threshold value whereas AUC is the area under the ROC curve. For F score to be high, both precision and recall should be high.
Consequently, when you have a data imbalance between positive and negative samples, you should always use F1-score because ROC averages over all possible thresholds.
So in your case particularly since I can clearly see through precision and recall values that your classes are highly imbalanced, use F1 score to choose the model instead of AUC-ROC score.
Answered by Rishabh Sharma on December 19, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP