TransWikia.com

How does skewed data affect deep neural networks?

Data Science Asked by shaye059 on January 2, 2021

I’m playing around with deep neural networks for a regression problem. The dataset I have is skewed right and for a linear regression model, I would typically perform a log transform. Should I be applying the same practice to a DNN?

Specifically, I’m curious how skewed data affects regression with a DNN and, if it’s negatively, are the same methods that would be applied to a linear regression model the right way to go about fixing it? I couldn’t find any research articles about it but if you know of any feel free to link them in your answer!

One Answer

Strictly theoretically it makes no difference on DNN, I answered it today here and I said:

Here is why: We already know mathematically that NN can approximate any function. So lets say that we have Input X. X is highly correlated, than we can apply a decorrelation technique out there. Main Thing is, you get X` that has different numerical representation. Most likely more difficult for NN to learn to map to Outputs y. But still in Theory you can Change the architecure, Train for longer and you can still get the same Approximation, i.e. Accuracy.

Now, Theory and Praxis are same in Theory but different in Praxis, and I suspect that this Adjustments of Architecture etc will be much more costly in reality depending on the dataset.

BUT I want to add another point of view: Convergence speed. Strickly theoretically you dont even need [batch normalization] for performance (you can just adjust weights and bias and you should get same results) but we know that making this transformation has big benefits for NN

To conclude for you: Yeah, I had experience where it made difference, and where it didnt. You cant expect theoretical results that say skewed is bad

Correct answer by Noah Weber on January 2, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP