On what principle did Google's DeepMind learn to walk?

Question

I just saw this video on Youtube.
On what principle did Google's DeepMind learn to walk?
Was it Q-Learning or a Genetic Algorithm or Policy Gradient?

Neil Slater · Accepted Answer

The full method is explained in the paper Emergence of Locomotion Behaviours in Rich Environments by the DeepMind team.
Quoting from that paper:

Using a novel scalable variant of policy gradient reinforcement learning, our agents learn to run, jump, crouch and turn as required
by the environment . . .

So to answer your question, the researchers used a policy gradient method.

On what principle did Google's DeepMind learn to walk?

One Answer

Add your own answers!

Ask a Question