Data Science Asked by fricadelle on December 23, 2020
I wanted to compute the Area under the ROC curve for a logistic regression model in the context of binary classification. For that I computed, for a list of thresholds, say 0.1 0.2 … 0.9, the TPR and FPR. I have thus
(TPR_threshold1, ... TPR_thresholdN)
(FPR_threshold1, ... FPR_thresholdN)
Can I just do
TPR_threshold1*FPR_threshold1 + ... + TPR_thresholdN*FPR_thresholdN
in order to compute the area under the roc curve or do I need some more elaborate mathematical modeling?
Thanks a lot!
The thresholds don't matter; what matters are the (FPR, TRP) values at those thresholds, as they are points on the curve. Sort them by FPR ascending. For this to work out, you'll want to include the points (0,0) and (1,1) in your list, corresponding to thresholds 1 and 0.
You can use a trapezoidal approximation, as each successive pair of points defines a trapezoid of area under the curve. You'll just add up the areas.
Let's take two successive points (FPR1, TPR1) and (FPR2, TPR2). The area is (FPR2 - FPR1) * (TPR1 + TPR2) / 2. Just sum that over all successive pairs of points.
Of course, most libraries can compute this for you from this input, like scikit-learn.
Correct answer by Sean Owen on December 23, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP