Data Science Asked by David Černý on May 17, 2021
My question has been motivated by reading World Models by Ha and Schmidhuber. In shortcut, they introduce a RL framework where the current state (an image) is encoded via VAE into a latent vector $z$, which is then used as a state representation for the agent. Specifically, they first train the VAE on the images of their simulator, then freeze the model and use only the encoder for abstracting the image input for training the rest of their pipeline.
Now during training they of course sample the latent vector $z$ from $mu$ and $sigma$ that the encoder predicts, but I am wondering now: Should this still be the case when the whole pipeline finished training and is used for inference? Wouldn’t the sampling process introduce stochasticity into the pipeline which at this point should be (at least in my eyes) undesirable in use-cases like the ones in the paper (e.g. the self-driving car simulation)? I feel like just taking the predicted mean as our $z$, thus removing the stochasticity, would maximize the exploitative behaviour of the agent, or am I missing something?
Thanks in advance for your thoughts.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP