TransWikia.com

At what stage are ROC curves used when building machine learning model?

Data Science Asked by erotavlas on April 20, 2021

When developing a machine learning model, at what stage are ROC curve with AUC used?

Typically I have three data sets

trainvalidationfinal test

I do K-Fold cross validation using the combined train + validation set
During that phase we can calculate the metrics including true positives, false positives as well as other metrics and average them to create a plot like the ROC curve. Similar to this example from scikit-learn

However we can also get the metrics at the end by training the final model using all the data from train + validation and testing on the test set This can also give us all the metrics, classification report and ROC curve etc.

My question is, do people generally do the ROC curves twice, once during cross validation and then a second time for the final testing? OR is it something that is used only during validation phase / hyper parameter tuning when selecting the algorithm?

One Answer

The ROC-AUC curves are used to find the best threshold that optimizes True Positive Rate vs False Positive Rate. Using it in a K-Fold cross-validation is a good practice to determine the best threshold to use.

Then, your final test is here to validate that you did not overfit on some hyperparameters, including this threshold. So ROC-AUC must not be used again in final test. You should compare the results of your final test with the same threshold used in your cross-validation.

Hope it helps.

Note on threshold (EDIT):
The threshold to optimize could be the threshold to use in a binary classification problem that outputs probabilities (for instance, output of a sigmoid or a logistic regression). In that case, various threshold settings gives various the model's predictions (FPR, TPR), and so is built the ROC curve.
You could read further on sklearn guide page.

Answered by etiennedm on April 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP