Data Science Asked by robbmorganf on April 22, 2021
I’m thinking about ways to initialize my neural networks for faster convergence, and I was wondering about initializing the weights with the singular vectors of the data in order to immediately start with "useful" features. (I couldn’t find any such paper) Obviously that’s pretty vague, so specifically:
Example: for a CNN with $k$ $n$ by $n$ kernels and p images of $m$ by $m$, assemble the $p(m-n)(m-n) by n^2$ matrix A consisting of every window of the kernel (or a random subset thereof) and initialize the convolutional filters to the first k left singular vectors of A
My question is: would this create a stable convergence for training, or is it likely to lead to exploding or vanishing gradients? Further, would it be useful to identify features more quickly, or just lead to overfitting?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP