TransWikia.com

Confusion Matrix

Data Science Asked by Mugdha Bhatnagar on July 7, 2021

Here is my question in my assignment:

You have built a classification model with 90% accuracy but your client is not happy
because False Positive rate was very high then what will you do?

This is the question..nothing is given in the background

3 Answers

I think the only general solution would be to: increase the threshold of the model confidence.

For example you are doing binary classification of dog in images: Dog = 1, No Dog = 0

Generally a model (like Neural Network) would output the probability of the image being 1: if it's > 0.5 then predicts 1 else 0. Increasing the confidence to 0.7 would decrease the False Positive.

Answered by Francesco Pegoraro on July 7, 2021

This is likely to be caused by an imbalanced dataset. It means that some observations are less numerous, therefore your model is not learning enough about them. A possible solution might be: use Mini-Batch Gradient Descent optimization, and build the mini-batches in a way that the number of observations is balanced across all the classes. This would attribute greater weight to the observations that are less frequent.

Answered by Leevo on July 7, 2021

In my opinion, we should not consider the only accuracy as a performance measure as it evaluates only true positive, true Negative, and the sum total of a model. We have many performance measures like recall, precision, and f1-score. Now, coming to this question statement the classification model with 90% accuracy having a high false-positive rate. First of all False positive rate is a parameter of error metric derived from the confusion matrix. The confusion matrix depends on distinct respective model. Thus, each classification model will have different confusion matrix which turns out to have different False positive rate may be low or high as compared to the previous model. Thus, here we can go for various classification models available like logistics regression, Decision Tree, Neural networks, Random Forest, etc, and check false positive rates using confusion matrix for each of the models. In comparison we can conclude which machine learning model or statistical model is the best fit having high accuracy and lowest possible false-positive rate. A Machine learning paradigm known as ensemble learning can also be used in this condition. Ensemble learning is nothing but the group of different types of machine learning models developed using the same training dataset (some feature may or may not differ in the dataset). Ensemble learning is implemented in a technique known as bagging or Bootstrap Aggregating in which several models are trained on a dataset and the mean of the output is taken for the test dataset output by each model. Random forest is one such ensemble learning technique that aggregates the output of several decision trees to get the most appropriate result.  

Answered by user102799 on July 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP