TransWikia.com

Is Cross Validation needed for regression if you already know the predictors in your model?

Data Science Asked by confused on December 27, 2020

Let’s say you want to model the behavior of Y = X1 + X2 and you know that this is the model you want to make. Whether or not that approximates the true relationship well is unknown. But since you want to be able to have coefficients that explain how Xi affects Y, you build a regression model. You don’t plan on adding/subtracting predictors (since you don’t have any additional data) and you don’t plan on comparing this model with another (no other model allows for interpretation).

Does it make sense to still use sample splitting or cross validation? If you do cross validation, do you average the coefficients? Or could you just use your entire data set to train the model.

Thanks!

2 Answers

Ask yourself why you perform cross validation. Contrary to what Dave's answer says, the point of cross-validation is to estimate your generalization error, that is how your model will perform on future data. Model selection comes out of this, definitely, however to say that the point of CV is model selection is not true.

That said, if all you are interested in is relationships between predictors and dependent variables and you aren't trying to do some sort of step-wise selection then you do not need to perform cross-validation. When was the last time a Statistics based regression textbook/class mentioned cross-validation? Never, at least not in any of the regression classes I took.

One point, if you do use CV, absolutely DO NOT average the coefficients. The correct process is to use CV to estimate your error rate and then regather all of your data and run the model on all data which would give you your coefficients.

Correct answer by astel on December 27, 2020

You might.

(But probably not.)

The point of cross validation is to help you conduct model selection. You've already selected your model. One place where you might want to use cross validation is if you want to use some kind of regularization, but that might interfere with your ability to infer how each $X$ influences $Y$. (The regularized estimates are biased.)

Answered by Dave on December 27, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP