TransWikia.com

How does a zero centered activation functions like tanh helps in gradient decent?

Data Science Asked on April 11, 2021

I know that, if X are all positive, or negative then the sign of the downstream gradient will be same as that of the upstream gradient, but what I don’t understand is how the zero centered activation function can overcome this problem?

Since, even in the case of tanH function, if all X are positive then too the sign remains same.

Forgive my english, not a native speaker.

2 Answers

$tanh$ function is scaled standard sigmoid function $y=frac{1}{1+e^{-ax}}$. Due to that scaling it has more steep gradient that standard sigmoid function. Steep gradient is important, because it makes backprop training faster and less likely to get stuck in zero-gradient area.

Furthermore $y(0) neq 0$ where $tanh(0) = 0$. Sometimes it is very important, especially when dealing with normalized $[0,1]$ signals as inputs.

Answered by maksylon on April 11, 2021

You can have a look at this survey: https://arxiv.org/pdf/2004.06632.pdf It discusses different aspects of activation functions. It also explains why centered activation functions are considered to be more suitable in practice.

Note that if you consider the universal approximation theorem, the activation function does not need to be zero-centered.

Answered by Graph4Me Consultant on April 11, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP