Data Science Asked by Oussama Jabri on February 1, 2021
I have a multi linear regression problem,
$Y$ is my target and $X_1, X_2, X_3$ are my features.
In my regression, I consider the interaction between $X_1, X_2, X_3$ and I add a bias.
So my problem is given by :
$Y sim X_1 + X_2 + X_3 + X_1X_2 + X_1X_3+ X_2X_3+ bias$
Now, I fit my model with statsmodels.api.sm
and I want to eliminate the feature the highest p value recursively.
Thank you for your help
My first question is : for example, if the highest p value is for the X1X2 feature, is it okay to eliminate this feature even when X1 and X2 can be statistically significant ?
Of course, the interaction can have no information about the target. Per example if the problem is perfectly defined by X1 and X2. The interaction $X_1 cdot X_2$ won't add nothing to the model.
My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?
I would try a more experimental approach of removing them only if they don't improve the model accuracy rather than having a low P-Value.
As a further reccomendation I would reccomend sklearn.
Answered by Carlos Mougan on February 1, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP