Data Science Asked on January 2, 2021
So I’m going through a Machine Learning course, and this course explains that to avoid the dummy trap, a common practice is to drop one column. It also explains that since the info on the dropped column can be inferred from the other columns, we don’t really lose anything by doing that.
This course does not explain what the dummy trap exactly is, however. Neither it gives any examples on how the trap manifests itself. At first I assumed that dummy trap simply makes the model performance less accurate due to multicollinearity. But then I read this article. It does not mention dummy trap explicitly, but it does discuss how an attempt to use OHE with OLS results in an error (since the model attempts to invert a singular matrix). Then it shows how the practice of dropping one dummy feature fixes this. But then it goes on to demonstrate that this measure is unnecessary in practical cases, as apparently regularization fixes this issue just as well, and algorithms that are iterative (as opposed to closed-form solution) don’t have this issue in the first place.
So I’m confused right now in regards to what exactly stands behind the term "dummy trap". Does it refer specifically to this matrix inversion error? Or is it just an effect that allows the model to get trained but makes its performance worse, and the issue described in that article is totally unrelated? I tried training an sklearn LinearRegression model on a OHE-encoded dataset (I used pd.get_dummies()
with the drop_first=False
parameter) to try to reproduce the dummy trap, and the latter seems to be the case: the model got trained successfully, but its performance was noticeably worse compared to the identical model trained on the set with drop_first=True
. But I’m still confused about why my model got successfully trained at all, since if the article is to be believed, the inversion error should have prevented it from being successfully trained.
There are two main problems -
You have one Feature which is correlated (multi-collinearity) to all the others.
If you are trying to solve using "closed-form solution", the following will happen
$y = w_0 + w_1X_1 + w_2X_2 + w_3X_3$
$w_0$ is the $y$ intercept and to complete the matrix form 1=$X_0$. Hence,
$y = w_0X_0 + w_1X_1 + w_2X_2 + w_3X_3$
Solution for $w$ is $(X^{T}X)^{-1}X^{T}y$
So, X must be an Invertible Matrix. But,
If the model contains dummy variables for all values, then the encoded columns would add up (row-wise) to the intercept ($X_0$ here)(See below table) and this linear combination would prevent the matrix inverse from being computed (as it is singular).
begin{array} {|r|r|} hline X_0 & X1 &X2 &X3 hline 1 &1 &0 &0 hline 1 &0 &1 &0 hline 1 &0 &0 &1 hline end{array}
why my model got successfully trained at all since if the article is to be believed, the transposition error should have prevented it from being successfully trained.
Valid question! Aurelien Geron(Author of "Hands-On Machine Learning" has answered Here.
- The LinearRegression(Scikit-Learn) class actually performs SVD decomposition, it does not directly try to compute the inverse of X.T.dot(X). The singular values of X are available in the singular_ instance variable, and the rank of X is available as rank_
Dummy-variable Trap
I have never heard of this term except "Udemy course A-Z ML". So I don't think that there is any special meaning of the word "trap" if you understand the points(i.e. Singularity, Multi-collinearity, and Interpretability) separately
References -
www.feat.engineering - Sec#5.1
Sebastian Raschka
stats.stackexchange
Correct answer by 10xAI on January 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP