What kind of policy evaluation and policy improvement AlphaGo, AlphaGo Zero and AlphaZero are using

Artificial Intelligence Asked by Daniel Wiczew on August 24, 2021

I’m trying to find out what kind of policy improvement and policy evaluation AlphaGo, AlphaGo Zero, and AlphaZero are using. By looking into their respective paper and SI, I can conclude that it is a kind of policy gradient actor-critic approach, where the policy is evaluated by a critic and is improved by and actor. Yet still can’t fit it to any of the known policy gradient algorithms.

alphazero policy gradients reinforcement learning

Add your own answers!

Ask a Question

Get help from others!