Data Science Asked on July 6, 2021
Let’s consider the space of feedforward neural networks with a given structure: $L$ layers, $m$ neurons per layer, ReLu activation, input dimension $d$, output dimension $k$.
Which means I’m considering the map $F: mathcal{W}_1 times mathcal{W}_2 times dots times mathcal{W}_L times mathbb{R}^d to mathbb{R}^k$, where $mathcal{W}_i$ is the space of possible weights for layer $i$. We also assume, for simplicity, that every weight matrix has a norm upper bounded by a constant $M$.
Let’s now assume that I have fixed parameters so that we obtain $v = F(W_1, dots, W_L, x^*) in mathbb{R}^k$ (note that $x^*$ is fixed as well).
Now imagine that I inject some random noise $eta in mathbb{R}^{m times m} $ in a weight matrix $W_i$, where the norm of the noise is 10% of the norm of the matrix, e.g. $||eta|| = ||W_i||/10$ . How does it affect my final output?
which means, what’s the expected value of $||v – v_*||$, where $v_*$ is the output of the network obtained after the small changes in the weights described before?
Note that this has nothing to do with the learning process, it’s just about the sensitivity/resistance of a neural network with respect to random noise injected in a weight matrix.
Its very hard to determine how exactly it would affect the learning of the network but from my experience, these are the possible scenarios. on how it can affect the output:
I am not quite aware if the affect on the output can be really modeled by a mathematical equation when a certain type of noise is added, if that is what you are looking for.
Answered by Nischal Hp on July 6, 2021
A lot depends on the nature of the noise you inject (Sparse and spiky or white noise or uniformly spread DC shift) and the type of regularizer used (TV regularizer, Lasso, ridge, Elastic net). But I guess, the Neural network will start to learn the noise as well with the data to some extent if the norm of the noise is 10% of the norm of the weight matrix.
Answered by Aastha Dua on July 6, 2021
you are asking for the condition I would say. If you skim through the formulae, the german explanation is more detailed.
In particular, the absolute condition at $x$ is defined as $kappa_{text{abs}}:= lim sup _{tilde{x} rightarrow x} frac{||f(x) - f(tilde{x})||}{||x - tilde{x}||}$, which means $kappa_{text{abs}} geq 0 $ is the smallest number such that there is a $delta >0 $ so that all $tilde{x}$ with $||tilde{x}-x|| < delta $ it holds that $|| f(tilde{x})-f(x) || leq kappa_text{abs} || tilde{x} - x ||.$
The relative condition $kappa_{text{rel}} geq 0 $ at $x$ is the smallest number such that there is a $delta > 0 $ so that all $tilde{x}$ with $||tilde{x}-x|| < delta $ satisfy: $ frac{|| f(tilde x)-f(x) ||}{||f(x)||} leq kappa_text{rel} frac{|| tilde{x} - x ||}{||x||}.$
The difference is thus, that the $kappa_{text{rel}}$ compares the relative change of the output with the relative change of the input.
In general, let $tilde{f}$ denote an approximation of the real function $f$, $tilde{x}$ the input with some noise, then there are 4 categories in numerics:
Your question is also related to adverserial attacks. You can have a look into the literature.
Note also that $kappa_{text{rel}} = frac{||Df(x)||||x||}{||f(x)||}$, so that you could compute the condition number for any given neural network.
Answered by Graph4Me Consultant on July 6, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP