Data Science Asked by Elena on September 5, 2021
Say I have a dataset $D = {a_1,a_2,a_3,…a_n}$ on which I train a basic variational autoencoder (VAE) on (a couple of fully connected layers separated by nonlinearities). Does the latent space of the VAE have a feature coordinate for $a_1 + a_2$ and $a_1 – a_2$ (which are not part of dataset $D$)?
It depends. Let $X$ be the domain of the data, i.e. $a_iin X$. $D$ is then a sample from $X$, following some distribution $P$. The point of the VAE is to model this distribution of the data such that we can sample from it. So, parts of the dataspace $X$ will be modelled well (where there are many training examples) and others poorly.
So the question is whether $a_i+a_j$ or $a_k - a_ell$ actually form "reasonable" (i.e., not too low density from $P$) values. Of course, since it seems $X=mathbb{R}^d$ in your case, the VAE will have no problem encoding or decoding these values; the issue comes in whether the latent encodings will be useful or sensible. They need not necessarily be in the dataset, but they can't be too far away from it. (This problem area is called domain adaptation in general). In other words, it will have "feature coordinates" for $X$ being an unbounded vector space, but whether they are useful or sensible depends on the situation.
For example, suppose $X$ is the natural images and all your $a_i$'s are images of sunflowers. You can perform $a_alpha + a_beta = c$ in a pixel-wise manner, but the encoding of $c$ is unlikely to have a reasonable latent representation. However, if $a_i$ is something like an embedding from a word model then it may be just fine.
Note that it's often a good idea to evaluate reconstruction performance of a VAE on a held-out test set (outside the training data, but still from the same or similar $P$). So, I'd ask myself this: could $a_i + a_j$ be reasonably considered part of a test set? If so, then yes, the latent embeddings will probably be ok too.
Answered by user3658307 on September 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP