Why and under what conditions does Q learning converge?

Cross Validated Asked on December 8, 2021

I am looking for a modern proof on why Q learning converges in the tabular setting.

I’ve skimmed the original proof by Dayan and Watkins and I have to say that the terminology and approach are a bit verbose and quickly lost me. I’ve also found some random lecture notes online, but I don’t really trust them. Plus a lot of these uses some martingale, filtration approach which is beyond my knowledge.

Is there any modern treatment for the convergence proof of this very important algorithm?

machine learning neural networks q learning references reinforcement learning

Add your own answers!

Ask a Question

Get help from others!