Data Science Asked on August 23, 2020
I am a student and I am studying machine learning. I am focusing on deep generative models, and in particular to autoencoders and variational autoencoders (VAE).
I am trying to understand the concept, but I am having some problems.
So far, I have understood that an autoencoder takes an input, for example an image, and wants to reduce this image into a latent space, which should contain the underlying features of the dataset, with an operation of encoding, then, with an operation of decoding, it reconstrunct the image which has lost some information due to the encoding part.
After this, with a loss function, it reconstruct the latent space and so get the latent features.
about the VAE, it uses a probabilistic approch, so we have to learn the meand anv covariance of a gaussian.
So far this is what I have understood.
What I have really unclear is what are we trying to learn with autoencoders and VAE?
I have seen examples where an image goes feom a non smiling to a smiling face, or to a black and white image to a colored image.
But I don’t understand the main concept, which is: what does an autoencoder do?
I add here some sources of where I studied so that who needs can see them:
what does an auto-encoder do?
The simplest auto-encoder takes a high dimensional image (say, 100K pixels) down to a low-dimensional representation (say, a vector of length 10) and then uses only those 10 features to try to reconstruct the original image. You can imagine an analogy with humans: I look at someone, describe them ("tall, dark-haired, ...") then after I've forgotten what they look like, I try to sketch them using only my notes.
what are we trying to learn?
In other words, why bother? A few reasons:
I have seen examples where an image goes from a non smiling to a smiling face, or to a black and white image to a colored image.
There are many different types of auto-encoders. What I described above is the simplest kind. Another common type is a "denoising" auto-encoder -- instead of reconstructing the original image, the goal is to construct an image that is related to the original image, but different.
The classic example of this is denoising (hence the name): you can take a clean image, add a bunch of noise, run it through an auto-encoder, and then reward the auto-encoder for producing the clean image. So, the input (noisy image) is actually different from the desired output (clean image). The examples you give are similar.
The challenge in designing these types of auto-encoders is normally the loss -- you need some mechanism to tell the auto-encoder whether it did the right thing or not.
about the VAE, it uses a probabilistic approch, so we have to learn the mean and covariance of a gaussian.
A VAE is a third type of auto-encoder. It's a bit special because it is well-grounded mathematically; no ad-hoc metrics needed. The math is too complicated to go through here, but the key ideas are that:
So, the workflow is this:
Of course, there's nothing special about a Gaussian, you could just as easily use some other parametric distribution. In practice, people usually use Gaussians.
This sometimes gives better results than other auto-encoders. Further, you sometimes get interesting results when you look between the classes in your latent space. An image's distance in the latent space from the cluster center is sometimes related to uncertainty.
Moreover, there is the nice property that these high-dimensional Gaussians are probability distributions in a rigorous mathematical sense. They approximate the probability that a given image belongs to a given class. So, there is some thought that VAEs will be able to overcome the "hand waving" of deep learning and put everything back on a firm Bayesian probabilistic grounding. But of course, it is only an approximation, and the approximation involves a lot of deep neural networks, so there is still plenty of hand waving at the moment.
By the way, I like to use this question during interviews -- an astonishing number of people claim to have experience with VAEs but in fact do not realize that VAEs are different than "regular" AEs.
Correct answer by cag51 on August 23, 2020
An easy way to think about autoencoders is: how well a prticlar pice of infrmaton can be reconstrcted frm its reducd or otherwse comprssed reprsentaton. If you made it this far it means that you sucessfully reconstructed the previous sentence by using only 92 of its original 103 characters.
More specifically, autoencoders are neural networks that are trained to learn efficient data codings in an unsupervised manner. The aim is to learn a representation of a given dataset, by training the network to ignore "not important" signals like noise. Typically AE are considered for dimensionality reduction.
Practically, an AE
In this training loop, the AE minimises this reconstruction loss so that the output is as similar to the input as possible.
Answered by hH1sG0n3 on August 23, 2020
One approach that I've found helpful when considering autoencoders is the following result: whereas methods such as PCA identify axes of maximal variation in the input space, the introduction of non-linear activation functions in the autoencoder allows for the identification of axes of maximal variation embedded in a (potentially) non-linear transform of the space.
As an example, consider data in distributed according to the function , where . Here, the goal is to store inputs as one-dimensional compressions. A PCA approach could possibly introduce significant loss (as long as the support is sufficiently large), but an autoencoder with non-linearities will be able to identify the principal embedded axis in the transform space as the one with pre-image roughly at in the input space, and therefore will introduce much less loss. You can think of the autoencoder training regime as working to approximate a transform functor which produces a transform space with a linear pre-image at . The autoencoder then works by storing inputs in terms of where they lie on the linear image of .
Observe that absent the non-linear activation functions, an autoencoder essentially becomes equivalent to PCA — up to a change in basis. A useful exercise might be to consider why this is.
Answered by Josh Purtell on August 23, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP