# How can I randomly sample the space of consistent neural networks for given data?

Data Science Asked by Jack M on December 2, 2020

Suppose I have a dataset $$X$$ and target labels $$Y$$. For a fixed neural network architecture, how can I randomly and uniformly sample from the space of all possible assignments of weights such that the neural network maps $$X$$ to $$Y$$?

It's probably hard to get exactly a uniform distribution on the weights. One heuristic approximation is to repeat the following many times:

Randomly choose initial weights. Train the neural network until you get 100% accuracy on the training set. Save the resulting neural network.

Each neural network is a sample of weights that are consistent with the training set. Are they uniformly distributed among the set of all such weights? That seems unlikely. But they might give an approximation to such a sample.

This might fail, if training never gives you 100% accuracy. However, research has demonstrated empirically that if you choose a deep neural network architecture with sufficient capacity and you train for long enough, neural nets can memorize the training set and achieve 100% accuracy on the training set [1]. So, if it fails, I'd recommend increasing the size of the network and trying again. Of course, there are no guarantees -- it can still fail. It's a heuristic.

[1] Understanding deep learning requires rethinking generalization. Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals. arXiv:1611.03530

Answered by D.W. on December 2, 2020

I am trying to formalize your question before discussing it.

If I understand correctly, you ask for the following:

For $$X subset mathbb{R}^{n}$$ and $$Y subset mathbb{R}^m$$, let $$f:X rightarrow Y$$ be a map.

Let $$w in mathbb{R}^q$$ be weights. We consider a neural network $$g: mathbb{R}^{n}times mathbb{R}^q rightarrow mathbb{R}^m$$, and let $$g^{(w)}: mathbb{R}^{n} rightarrow mathbb{R}^{m}, x mapsto g(x,w)$$ be the neural network parametrized by $$w$$.

Now you want to sample from the set $$underline{W}(f,g,X):= {w in mathbb{R}^{q} mid f = (g^{(w)})_{mid X} }$$.

However, I think constructing $$underline{W}(f,g,X)$$ is very difficult in general.

The following question arises:

Do you already have some $$w in underline{W}(f,g,X)$$ ?

If not, note that $$underline{W}(f,g,X) = emptyset$$ is possible! (its easy to construct an example for that)

Note also that all known universal approximation theorems have some requirements on $$f$$, and only state that $$f$$ can be approximated by some neural network. However, for fixed architecture, it might be that there is no $$w in mathbb{R}^q$$ with $$f = (g^{(w)})_{mid X}$$ nor that $$f$$ can be approximated by $$(g^{(w)})_{mid X}$$ (e.g. in terms of the uniform-norm).

If you have some $$w in underline{W}(f,g,X)$$, there are certains trivial permutations (e.g. permuting the nodes of a fully-connected layer, or some channels). Apart from that, I am not aware of a full description of $$underline{W}(f,g,X)$$. And without further details or constraints, I think its there is no general answer at the moment.

I hope this helps!

Answered by Graph4Me Consultant on December 2, 2020