Cross Validated Asked on December 8, 2021
I am looking for a modern proof on why Q learning converges in the tabular setting.
I’ve skimmed the original proof by Dayan and Watkins and I have to say that the terminology and approach are a bit verbose and quickly lost me. I’ve also found some random lecture notes online, but I don’t really trust them. Plus a lot of these uses some martingale, filtration approach which is beyond my knowledge.
Is there any modern treatment for the convergence proof of this very important algorithm?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP