Data Science Asked by Arham on May 19, 2021
I’ve been attempting to implement the Wavenet paper: https://arxiv.org/pdf/1609.03499v2.pdf
In the paper, the main diagram they use to describe the architecture is this one:
The paper mentions the use of residual and skip connections in order to enable the training of deeper networks, which I understand. But what I do not understand is why they extract the skip values and sum them before passing them to the last portion of the network.
In the paper, they state that they are attempting to predict the value of the sequence at time T, given the values of X0 -> X(T-1). They quantize the values into the range [0,255] and output a probability distribution describing the likelihood that the next element belongs to one of the 256 quantized classes. Therefore, this last portion of the network should output a probability vector fitting the above description.
My questions are:
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP