Data Science Asked by Malvrok on July 7, 2021
The question is simple: is there any advantage in using sigmoid function in a convolutional neural network? Because every website that talks about CNN uses Relu function.
The reason that sigmoid functions are being replaced by rectified linear units, is because of the properties of their derivatives.
Let's take a quick look at the sigmoid function $sigma$ which is defined as $frac{1}{1+e^{-x}}$. The derivative of the sigmoid function is $$sigma '(x) = sigma(x)*(1-sigma(x))$$ The range of the $sigma$ function is between 0 and 1. The maximum of the $sigma'$ derivative function is equal to $frac{1}{4}$. Therefore when we have multiple stacked sigmoid layers, by the backprop derivative rules we get multiple multiplications of $sigma'$. And as we stack more and more layers the maximum gradient decreases exponentially. This is commonly known as the vanishing gradient problem. The opposite problem is when the gradient is greater than 1, in which case the gradients explode toward infinity (exploding gradient problem).
Now let's check out the ReLU activation function which is defined as: $$R(x) = max(0,x)$$ The graph of which looks like
If you look at the derivatives of the function (slopes on the graph), the gradient is either 1 or 0. In this case we do not have the vanishing gradient problem or the exploding problem. And since the general trend in neural networks has been deeper and deeper architectures ReLU became the choice of activation.
Hope this helps.
Correct answer by Armen Aghajanyan on July 7, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP