Data Science Asked by Vadim Smolyakov on November 30, 2020
I’m wondering about the benefits of advanced activation layers such as LeakyReLU, Parametric ReLU, and Exponential Linear Unit (ELU). What are the differences between them and how do they benefit training?
Simply rectifies the input, meaning positive inputs are retained but negatives give an output of zero. (Hahnloser et al. 2010)
$$ f(x) = max(0,x) $$ Pros:
Cons:
Adds a small coefficient ($<1$) for negative values. (Maas, Hannun, & Ng 2013)
$$ f(x) = begin{cases} x & text{if } x geq 0 0.1 x & text{otherwise} end{cases} $$
Pros:
Cons:
Just like Leaky ReLUs but with a learnable coefficient. (Note that in the below equation a different $a$ can be learned for different channels.) (He et al. 2015)
$$ f(x) = begin{cases} x & text{if } x geq 0 a x & text{otherwise} end{cases} $$
Pros:
Cons:
$$ f(x) = begin{cases} x & text{if } x geq 0 alpha(exp(x)-1) & text{otherwise} end{cases} $$
Replaces the small linear gradient of Leaky ReLUs and PReLUs with a vanishing gradient. (Clevert, Unterthiner, Hochreiter 2016)
Pros:
Correct answer by Sophie Searcy - Metis on November 30, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP