what is the main difference between GAN and autoencoder?

Question

what is the main difference between GAN and other older generative models? what were the characteristics of GAN that made it more successful than other generative models?

timchap · Accepted Answer

The main differences are the philosophy that drives the loss metric, and consequently the architecture (the latter goes without saying, obviously).

Autoencoders

The job of an autoencoder is to simultaneously learn an encoding network and decoding network. This means an input (e.g. an image) is given to the encoder, which attempts to reduce the input to a strongly compressed encoded form, which is then fed to the decoder.

The network learns this encoding/decoding because the loss metric increases with the difference between the input and output image - every iteration, the encoder gets a little bit better at finding an efficient compressed form of the input information, and the decoder gets a little bit better at reconstructing the input from the encoded form.

Summarised: An autoencoder learns to represent some input information very efficiently, and subsequently how to reconstruct the input from it's compressed form.

Generative Adversarial Networks

Here, we have a "generator" whose job is to take some noise signal and transform it to some target space (again, images is a popular example). The other component (the adversary) is the "discriminator", whose job is to distinguish real images drawn from the desired target space from the fake images created by the generator. In this case, the network is trained in two alternating phases, each with a different loss:

Firstly, the discriminator is given labelled samples from the real set of images, and fake images generated by the generator (of course, at the start, these images are just noise). Loss is calculated according to some binary classification loss metric (e.g. crossentropy). The discriminator thus begins to learn the difference between a fake image and a real image.

Before it can learn too much, though, we switch over to the generator. To train the generator, we once again feed in some noise and check the output of the discriminator. This time, we want the discriminator to detect that the image (generated by the generator) is real - this means that we have successfully fooled the discriminator, and therefore the generator has begun to learn how to make an image that resembles a "real" image from the training set.

Summarised: A GAN uses an adversarial feedback loop to learn how to generate some information that "seems real" (i.e. looks the same/sounds the same/is otherwise indistinguishable from some real data)

The difference

Hopefully that shows that there is a very fundamental structural difference between the two networks, and the goals that each of them are trying to achieve during the learning process. I don't really know if GANs are necessarily "more successful", but for certain generative problems they have been observed to give more "realistic" outputs. Why? My gut says that a GAN probably learns more about "how can I make an image look real in general" rather than "how can I memorise this particular set of images with the greatest accuracy/efficiency". But there are certainly similarities, in particular between the generator (of the GAN) and the decoder (of the autoencoder).

what is the main difference between GAN and autoencoder?

One Answer

Add your own answers!

Ask a Question