TransWikia.com

How does multicollinearity affect neural networks?

Data Science Asked on July 11, 2021

Multicollinearity is a problem for linear regression because the results become unstable / depend too much on single elements (source).

(Also, the inverse of $X^TX$ doesn’t exist so the standard OLS estimator does not exist … I have no idea how, but sklearn deals with it just fine)

Is (perfect) multicollinearity also a problem for neural networks?

2 Answers

Multi colinearity affects the learning of Artificial Neural network. Since the information in the dependent variable is very less compared to the other variables, the neural network will take more time to converge.

In packages like sklearn, the dependent variables are identified and omitted from the calculation. I have used the lm function in R and it marks the coefficient of the dependent variable with NA. one can remove the variable from the calculation and still the coefficients are going to be same. In these cases, the rank of the x matrix will be less than the number of columns.

Even though there are no inverse exists for xTx, most of the packages will not calculate the inverse directly, but they will calculate the pseudo inverse.

Answered by narasimman on July 11, 2021

I just came across a research paper that answers this question. In case this helps anyone in the future, the paper Multicollinearity: A tale of two nonparametric regressions mentions that neural networks generally do not suffer from multicollinearity because they tend to be overparameterized. The extra learned weights create redundancies that make things that affect any small subset of features (such as multicollinearity) unimportant.

Due to its overparameterization, the coefficients or weights of a neural network are inherently difficult to interpret. However, it is this very redundancy that makes the individual weights unimportant. That is, at each level of the network, the inputs are linear combinations of the inputs of the previous level. The final output is a functions of very many combinations of sigmoidal functions involving high order interactions of the original predictors. Thus neural networks guard against the problems of multicollinearity at the expense of interpretability.

Answered by user3667125 on July 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP