Should output data scaling correspond to the activation function's output?

Question

I am building an LSTM with keras which have an activation parameter in the layer. I have read that scaling on the output data should match the activation function's output values.

Ex: tanh activation outputs values between -1 and 1, therefore the output training (and testing) data should be scaled to values between -1 and 1. So if the activation function is asigmoid the output data should be scaled to values between 0 and 1.

Does this hold for all activation functions? If I use ReLu as activation in my layers what should the output data be rescaled to?

user1825567 · Answer

What you read hold true for the neurons of the output layers and not for the hidden layers!

Hence, its true that if you are using tanh in output layers then you need the data labels to be within [-1, 1] where as between [0, 1] for sigmoid.

As for your concern with relu, use it on output layers if you know that the range of the labels of the data is positive only. If you are using relu in hidden layers then the scaling doesn't depend on relu but rather on the type of activation function used in output layers.

Should output data scaling correspond to the activation function's output?

One Answer

Add your own answers!

Ask a Question