How to do backward features elimination when considering interactions between them

Question

I have a multi linear regression problem,

$Y$ is my target and $X_1, X_2, X_3$ are my features.

In my regression, I consider the interaction between $X_1, X_2, X_3$ and I add a bias.

So my problem is given by :
$Y  sim X_1 + X_2 + X_3 + X_1X_2 + X_1X_3+ X_2X_3+ bias$

Now, I fit my model with statsmodels.api.sm and I want to eliminate the feature the highest p value recursively.

My first question is : for example, if the highest p value is for the $X_1X_2$ feature, is it okay to eliminate this feature even when $X_1$ and $X_2$ can be statistically significant ?
My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?

Thank you for your help

Carlos Mougan · Answer

My first question is : for example, if the highest p value is for the X1X2 feature, is it okay to eliminate this feature even when X1 and X2 can be statistically significant ?

Of course, the interaction can have no information about the target. Per example if the problem is perfectly defined by X1 and X2. The interaction $X_1 cdot X_2$ won't add nothing to the model.

My second question : in the case when all the interaction of some feature have a p value greater than 0.05 in the first iteration, Could I eliminate this feature and all the interactions ?

I would try a more experimental approach of removing them only if they don't improve the model accuracy rather than having a low P-Value.

As a further reccomendation I would reccomend sklearn.

How to do backward features elimination when considering interactions between them

One Answer

Add your own answers!

Ask a Question