Data Science Asked on April 23, 2021
ReLU is used as an activation function that serves two purposes:
For Exploding Gradient problem, we use Gradient Clipping approach where we set the max threshold limit of Gradient, similarly to ReLU which set the minimum Gradient limit which is 0.
So far I read, ReLU is considered as an activation function. In a similar fashion, can we use Gradient Clipping also as an activation function? If yes, any pros and cons of using it?
ReLU is considered as an activation function, on similar fashion can we use Gradient Clipping also as an activation function?
ReLU is an activation function. Gradient clipping is a technique to keep the problem of exploding gradient at bay.
I wish also to stress that the best technique to control for vanishing/exploding gradients is, at the moment, batch normalization. Dropout (a technique born to fight overfitting) also has a similar regularization effect - by forcing the model to distribute weights more evenly through the layer. That's why you don't see gradient clipping that often as it used to.
EDIT:
I forgot to mention that a proper scaling of your variables and appropriate weight initializations make the problem of vanishing/exploding gradient not very frequent. This of course is purely based on personale experience. It's still very important to take it into account
Answered by Leevo on April 23, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP