What does the classification report interpret? Class 1 indicates abnormal data

Question

How to interpret the report and How is precision, recall values are calculated for individual class labels. What is the significance of macro avg ? Does this report signify a good predictions by the model?

anomaly detection class imbalance classification machine learning scikit learn

maya-ami · Answer

Classification report gives a perspective of your model performance. The 1st row shows the scores for class 0. The column 'support' displays how many object of class 0 were in the test set. The 2nd row provides info on the model performance for class 1.

Recall for each class is calculated as follows:
True Positives/(True Positives + False Negatives)
E.g., recall for class 1: 15/(15+27) = 0.56

Precision for each class is calculated as follows:
True Positives/(True Positives + False Positives)
E.g., precision for class 1: 15/(15+22) = 0.41

This particular classification report shows that the performance of the model is poor. Accuracy as a metric may be misleading. Although the model shows 99% accuracy, it hardly detects the object of class 1 (I suppose these are the object of interest, the abnormal data you want to detect).

One of the reasons for that may be the imbalance of the data, i.e. there are many objects of class 0 and few samples of class 1. As a result, the classifier is overfitting to class 0 to minimize error during training. Consider reading about some techniques to overcome the imbalance problem.

Donald S · Answer

It looks like you are using an imbalanced dataset in your model. The results look poor with high false negative values/low sensitivity ratio.
One metric you may want to use in addition is the AUC/ROC. This works well for comparing results for imbalanced data.
There are many options to improve your model for a better sensitivity result, such as

Using different weights for predicting a majority vs predicting a
minority label
Up-sample or down-sample the training data to help balance the
prediction across minority and majority, or use SMOTE for datasets
with few features.
Choose a modeling algorithm that is better at handling imbalanced
data, such as machine learning/neural networks, or a classical
algorithm such as bagging or boosting algorithms and/or decision
trees

Resources:
https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/
https://www.analyticsvidhya.com/blog/2017/03/imbalanced-data-classification/

What does the classification report interpret? Class 1 indicates abnormal data

2 Answers

Add your own answers!

Ask a Question