Cross Validated Asked by user26067 on February 9, 2021
I need to predict something using a neural network. The output values are bound to be non-negative, but there’s not really an upper bound. I do know that the output is never going to be higher than a certain level in practice. Also, my expected output can should span all numbers between $0$ and the maximum.
So, which output activation function should I use? Sigmoid seems wrong, as the gradient would give too much importance to high value near the maximum. Unless I scaled my data so that the maximum value I ever encounter is around 0.6, so that this output behaves like a sigmoid near 0 and linearly in the rest of the image. Linear doesn’t seem right as it allows negative outputs. ReLU by definition gives me an output in the correct range… but it’s not really well behaved.
Any suggestion?
You could try one of the smooth relaxations of ReLU, like softplus (albeit more often than not it's outperformed by ReLU, it's usually used for variance terms in VAEs, where zeros are not allowed):
$$operatorname{softplus}(x) = log(1+exp x)$$
See, however, What are the benefits of using ReLU over softplus as activation functions? and ReLU outperforming Softplus
Answered by Firebug on February 9, 2021
Linear is actually quite reasonable for those reasons you mentioned (e.g. gradient doesn't get cut off at 0). It's not a big deal that you might get negatives, because at val/test time you can simply clip to 0.
This also depends on roughly how you expect the outputs to be distributed. For example, when predicting depth from an input image, common schemes include predicting 1/depth or log depth (which you can alternatively think of as using the activation 1/x or exp e, but directly transforming outputs is probably a better idea).
Answered by shimao on February 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP