Artificial Intelligence Asked by M.S. on August 24, 2021
I have a question about implementing policy gradient methods for problems with continuous action spaces.
Assume that actions are sampled from a diagonal Gaussian distribution with mean vector $mu$ and standard deviation vector $sigma$. As far as I understand, we can define a neural network that takes the current state as the input and returns a $mu$ as its output. According to OpenAI Spinning Up, the standard deviation $sigma$ can be represented in two different ways:
I don’t completely understand the first method. Does it mean that we must set the log standard deviations to fix numbers? Then how do we choose these numbers?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP