TransWikia.com

Area under the ROC curve approximation

Data Science Asked by fricadelle on December 23, 2020

I wanted to compute the Area under the ROC curve for a logistic regression model in the context of binary classification. For that I computed, for a list of thresholds, say 0.1 0.2 … 0.9, the TPR and FPR. I have thus

 (TPR_threshold1, ... TPR_thresholdN)
 (FPR_threshold1, ... FPR_thresholdN)

Can I just do

TPR_threshold1*FPR_threshold1 + ... + TPR_thresholdN*FPR_thresholdN 

in order to compute the area under the roc curve or do I need some more elaborate mathematical modeling?

Thanks a lot!

One Answer

The thresholds don't matter; what matters are the (FPR, TRP) values at those thresholds, as they are points on the curve. Sort them by FPR ascending. For this to work out, you'll want to include the points (0,0) and (1,1) in your list, corresponding to thresholds 1 and 0.

You can use a trapezoidal approximation, as each successive pair of points defines a trapezoid of area under the curve. You'll just add up the areas.

Let's take two successive points (FPR1, TPR1) and (FPR2, TPR2). The area is (FPR2 - FPR1) * (TPR1 + TPR2) / 2. Just sum that over all successive pairs of points.

Of course, most libraries can compute this for you from this input, like scikit-learn.

Correct answer by Sean Owen on December 23, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP