Data Science Asked by rw2 on April 27, 2021
I’m fairly new to predictive modelling, so apologies if this is a stupid question.
I am working on a classification problem (predicting if customers commit fraud or not), and have been comparing a few different algorithms, as well as tuning some of those algorithms.
As part of this I want to be able to compare how well each model (or version of each model) can predict based on some unseen data with known response values.
My response is quite unbalanced, with a ratio of about 99:1 of the binary outcomes. I haven’t use up/down sampling, as I have a very large training dataset, so there are still plenty of examples of the less common outcome.
My question is about which metrics of model performance to use. I’ve read a fair bit about this, and have looked at various measures including confusion matrix, sensitivity, specificity, kappa etc.
My main aims are for a model that predicts the correct number of fraud cases. After that, it would obviously be better if the cases predicted as fraud are true positives. The latter I think can be measured with "Specificity". For the former I’ve tried just comparing the number of predicted fraud cases with true predicted fraud cases.
My question is whether there exists a metric that combines these two measures? In my reading I haven’t found anything that quite does this, but wondered if others new of an existing measure?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP