Cross Validated Asked by Gergő Horváth on February 19, 2021

I’m learning about variational autoencoders for a month now. The more I’m reading, and digging deep about it, the more confused I am.

Thinking about it an a neural network perspective probably doesn’t help this, and I can’t find the kind of questions I’m asking, so I’m definitely missing something.

In the tutorials/explanations, they say the initial goal is to calculate $p(z|x)$. The reason why we have to set up the whole network is to approximate this. And the reason why don’t have it directly is because of from $frac{p(x|z)p(z)}{p(x)}$, $p(x)$ is intractable. This makes the impression that we know $p(x|z)$. We defined $p(z)$ as a standard normal distribution, that’s clear. And even though we know that $p(x|z)$ is the output of decoder, if we look aside, or forget that we know this, it is not obvious at all. At least for me.

Are we looking at $x$, the input fed into the network as $p(x|z)$, and basically telling it "this is the probability of $x$ given $z$, figure out what’s the probability of $z$ given $x$, if this is the probability of $z$"? But because it’s missing $p(x)$, that’s why we have to just approximate it? And this is the reason why we think about $p(x|z)$ as we know it?

How would it be possible to reproduce a "perfect" autoencoder, where we know $p(x)$, even if it’s a very simple example? How $p(x)$ should be imagined in the context of variational autoencoders?

I think you might be mixing up between

- the definition of various distributions
- how the densities can be computed
- how they can be sampled

Some examples:

$p(x)$ is the distribution modeled by the VAE.

- It's defined as $p(x) = int p(x|z)p(z) dz$
- In practice, if I gave you an $x$, you would have a hard time computing (even approximately), this integral.
- You can sample from $p(x)$ without too much trouble -- just sample from $z sim p(z)$, then sample $x sim p(x|z)$

$p(x|z)$

- is defined as a normal distribution with mean $f(z; theta)$, where $f$ is some arbitrary function (maybe a neural network).
- it's just a normal distribution, so it's trivial to compute the density
- and also trivial to sample from, again it's just drawing from a normal distribution.

When you write that

And even though we know that $p(x|z)$ is the output of decoder

you're confusing $f(z;theta)$ -- the output of the decoder, with the distribution.

In the tutorials/explanations, they say the initial goal is to calculate $p(z|x)$

A better way to phrase this might be: remember when we said computing $p(x)$ was difficult? It turns out that it's really important to be able to compute $p(x)$ or $log p(x)$ efficiently. And also, it turns out that $log p(x) = E_{z sim p(z|x)}[log p(x|z)] - mathcal{D}_{KL}( p(z|x) || p(z) )$, so if we knew what $p(z|x)$ was, all our problems would be solved. Unfortunately, it's not practical to compute $p(z|x)$ either.

Using a normal distribution $q(z|x)$ to approximate $p(z|x)$ would make things much easier, since the first expectation can be approximated by monte carlo sampling, and KL divergence term has a closed form. The key to why a VAE works at all is that you can prove replacing $p(z|x)$ with any approximation $q(z|x)$ will result in a lower bound on $log p(x)$ -- you will never over-estimate $p(x)$, only under-estimate, which is crucial because we're trying to maximize $p(x)$. Without this property, there'd be no practical way to train a VAE.

To compare:

$p(z|x)$

- defined as $frac{p(x,z)}{p(x)}$
- difficult to compute
- difficult to sample from

$q(z|x)$

- It's defined as a normal distribution with mean and diagonal covariance computed by a neural network as a function of $x$.
- It's easy to compute the density of (and more importantly, the KL divergence from $q$ to another normal distribution).
- It's easy to sample from, since it's normal.

Correct answer by shimao on February 19, 2021

Get help from others!

Recent Questions

- How can I transform graph image into a tikzpicture LaTeX code?
- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

Recent Answers

- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Jon Church on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP