Artificial Intelligence Asked by Daniel Wiczew on August 24, 2021
I’m trying to find out what kind of policy improvement and policy evaluation AlphaGo, AlphaGo Zero, and AlphaZero are using. By looking into their respective paper and SI, I can conclude that it is a kind of policy gradient actor-critic approach, where the policy is evaluated by a critic and is improved by and actor. Yet still can’t fit it to any of the known policy gradient algorithms.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP