Data Science Asked by Dimka Kopitkov on March 26, 2021
Typically the input to neural network (NN) is transformed to have zero mean and 1 std.
I wonder why std scale should be 1? What about other scales? 10? 100?
Doesn’t it make sense to provide NN with input of wider range so that NN can separate different clusters easier and deal with loss function for each cluster in more simple and robust way?
Did someone here tried different scales and can share his experience?
If answer depends on the activation function – in my case I use Relu.
Thanks a lot!
First, it is important to keep in mind that neural networks (like many other machine learning algorithms) work over the domain of the real numbers, i.e. they assume for granted the properties of this space. Some properties that are relevant to this context:
A side note on the implications of this: if one is planning to use neural networks for categorical or ordinal data, one needs to consider what mapping into the space of real numbers is reasonable and whether the standard mapping (e.g. to natural numbers or via string similarity) makes sense in the context of the problem.
From theoretic viewpoint scaling is a homogeneous transformation, i.e. it preserves the properties of the space up to a scalar, thus it doesn't matter what you choose. Since 1 is the multiplicative identity (nothing changes when you multiply by one) it makes sense to use it as a scale, because it simplifies calculations.
Another side note: This is actually valid also in a more general context. For example the machine representation of floating point numbers is via the so-called scientific notation. From this perspective, 0.01, 0.1, 1, 10, 100,... all have the same significand and only the exponent changes. Thus, the difference between the examples you mention is actually quite small.
Answered by mapto on March 26, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP