Area under the ROC curve approximation

Question

I wanted to compute the Area under the ROC curve for a logistic regression model in the context of binary classification. For that I computed, for a list of thresholds, say 0.1 0.2 ... 0.9, the TPR and FPR. I have thus
 (TPR_threshold1, ... TPR_thresholdN)
 (FPR_threshold1, ... FPR_thresholdN)

Can I just do
TPR_threshold1*FPR_threshold1 + ... + TPR_thresholdN*FPR_thresholdN

in order to compute the area under the roc curve or do I need some more elaborate mathematical modeling?
Thanks a lot!

Sean Owen · Accepted Answer

The thresholds don't matter; what matters are the (FPR, TRP) values at those thresholds, as they are points on the curve. Sort them by FPR ascending. For this to work out, you'll want to include the points (0,0) and (1,1) in your list, corresponding to thresholds 1 and 0.
You can use a trapezoidal approximation, as each successive pair of points defines a trapezoid of area under the curve. You'll just add up the areas.
Let's take two successive points (FPR1, TPR1) and (FPR2, TPR2). The area is (FPR2 - FPR1) * (TPR1 + TPR2) / 2. Just sum that over all successive pairs of points.
Of course, most libraries can compute this for you from this input, like scikit-learn.

Area under the ROC curve approximation

One Answer

Add your own answers!

Ask a Question