Data Science Asked by Shahnawaz Khan on January 31, 2021
I am working on a linear regression problem. The features for my analysis have been selected using p-values and domain knowledge. After selecting these features, the performance of $R^2$ and the $RMSE$ improved from 0.25 to 0.85. But here is the issue, the features selected using domain knowledge have very high p-values (0.7, 0.9) and very low $R^2$ (0.002, 0.0004). Does it make sense to add such features even if your model shows improvement in performance. As far I know, according to linear regression, it is preferable to only keep features with low p-values.
Can anyone share their experience? If yes, then how can I back up my proposal of new features with high p-values.
In general, adding more features will increase the quality of model fit.
If your goal is best fitting modeling, add as many features as possible (regardless of p-value).
Sometimes people care about parsimonious models, they are will to lower the overall model fit because they also value a simpler model. Then they apply a threshold to features using p-values.
Answered by Brian Spiering on January 31, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP