Data Science Asked on August 2, 2021
I just started learning tensorflow and I have a question regarding activation functions used in neural networks, I watched a 3b1b video a while ago and it seems it squished the value into an interval like sigmoid does so by squishing it between 0 and 1 so we could make more concrete comparisons however while watching a tutorial today the instructor said that it projects the data points into higher dimensional spaces. I didn’t really get how that’s the case as it seems it’s being converted to a scalar. Is there an interpretation/example for the latter claim?
This is the timestamped URL he talks about it during the couple of minutes following this.
We explain AI with intuition rather than maths most of the time, so everyone has its own explanation and representation of things, here is how I would explain activation functions (I'll try to make is clearer instead of adding another version to the already 2 you know):
Just in case you don't know what a basis is, you should have a look at Wikipedia since it is the core concept of your problem (to my eyes at least).
You need to consider the whole input values, not just their values. All input values are reals, so their dimension is 1, yet if we consider them all together, their dimension is N (number of inputs). And the basis of this space is $(Input1, Input2, ..., InputN)$.
Neurons are made of 2 parts :
Let's imagine we use neurons that are only a linear combination of inputs $(Output = WA + B)$. Then we project our initial points into a space that has exactly the same basis as the input space (since all outputs are linear combination of the input if we forget about biases). So this action is just a remapping of the inputs into the same space, but with a different basis. This may help if your problem is linearly separable, but doesn't suffice if your problem is not That's why we use activation function.
If we consider neurons with an activation function now. The particularity of activation functions is that they are non-linear. So the use of activation functions maps your inputs into a different space, basis needs to be different since all outputs are not anymore linear combination of the inputs. So this time the output space is different from the initial one.
The way I see it is that activation function do not generate information from nothing, but they allow to remap the available information into a higher dimension space where the problem is easier to solve.
Hope this helps, feel free to ask if you have any remaining question.
Correct answer by Ubikuity on August 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP