Data Science Asked on June 22, 2021
I just saw this video on Youtube.
On what principle did Google’s DeepMind learn to walk?
Was it Q-Learning or a Genetic Algorithm or Policy Gradient?
The full method is explained in the paper Emergence of Locomotion Behaviours in Rich Environments by the DeepMind team.
Quoting from that paper:
Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required by the environment . . .
So to answer your question, the researchers used a policy gradient method.
Correct answer by Neil Slater on June 22, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP