Cross Validated Asked by Gergő Horváth on February 19, 2021
I’m learning about variational autoencoders for a month now. The more I’m reading, and digging deep about it, the more confused I am.
Thinking about it an a neural network perspective probably doesn’t help this, and I can’t find the kind of questions I’m asking, so I’m definitely missing something.
In the tutorials/explanations, they say the initial goal is to calculate $p(z|x)$. The reason why we have to set up the whole network is to approximate this. And the reason why don’t have it directly is because of from $frac{p(x|z)p(z)}{p(x)}$, $p(x)$ is intractable. This makes the impression that we know $p(x|z)$. We defined $p(z)$ as a standard normal distribution, that’s clear. And even though we know that $p(x|z)$ is the output of decoder, if we look aside, or forget that we know this, it is not obvious at all. At least for me.
Are we looking at $x$, the input fed into the network as $p(x|z)$, and basically telling it "this is the probability of $x$ given $z$, figure out what’s the probability of $z$ given $x$, if this is the probability of $z$"? But because it’s missing $p(x)$, that’s why we have to just approximate it? And this is the reason why we think about $p(x|z)$ as we know it?
How would it be possible to reproduce a "perfect" autoencoder, where we know $p(x)$, even if it’s a very simple example? How $p(x)$ should be imagined in the context of variational autoencoders?
I think you might be mixing up between
Some examples:
$p(x)$ is the distribution modeled by the VAE.
$p(x|z)$
When you write that
And even though we know that $p(x|z)$ is the output of decoder
you're confusing $f(z;theta)$ -- the output of the decoder, with the distribution.
In the tutorials/explanations, they say the initial goal is to calculate $p(z|x)$
A better way to phrase this might be: remember when we said computing $p(x)$ was difficult? It turns out that it's really important to be able to compute $p(x)$ or $log p(x)$ efficiently. And also, it turns out that $log p(x) = E_{z sim p(z|x)}[log p(x|z)] - mathcal{D}_{KL}( p(z|x) || p(z) )$, so if we knew what $p(z|x)$ was, all our problems would be solved. Unfortunately, it's not practical to compute $p(z|x)$ either.
Using a normal distribution $q(z|x)$ to approximate $p(z|x)$ would make things much easier, since the first expectation can be approximated by monte carlo sampling, and KL divergence term has a closed form. The key to why a VAE works at all is that you can prove replacing $p(z|x)$ with any approximation $q(z|x)$ will result in a lower bound on $log p(x)$ -- you will never over-estimate $p(x)$, only under-estimate, which is crucial because we're trying to maximize $p(x)$. Without this property, there'd be no practical way to train a VAE.
To compare:
$p(z|x)$
$q(z|x)$
Correct answer by shimao on February 19, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP