What is the reason behind Keras choice of default (recurrent) activation functions in LSTM networks

Data Science Asked by Lauramvp on February 4, 2021

In the above link, the answer to the question whether activation function are required for LSTM layers was answered as follows: as an LSTM unit already consists of multiple non-linear activation functions, it is not necessary to use a (recurrent) activation function.

My question:
Is there a specific reason why Keras by default uses a "tanh" activation and "sigmoid" recurrent_activation if those activations are not necessary? I mean, for a Dense layer the default activation is "none". Keras could just have used none as default for LSTM units as well, right? Could it be that Keras uses these activations for a reason? Also, a lot of tutorials or blogs use ReLu (without clarifying why), and I have not come across a one specifying "none" as (recurrent) activation. Why is ReLu used so much (while the outputs from the LSTM unit are already activated)?

activation function keras lstm stacked lstm

Add your own answers!

Ask a Question

Get help from others!