TransWikia.com

Can we use imitation learning for on-policy algorithms?

Artificial Intelligence Asked by Khush Agrawal on February 27, 2021

Imitation learning uses experiences of an (expert) agent to train another agent, in my understanding. If I want to use an on-policy algorithm, for example, Proximal Policy Optimization, because of it’s on-policy nature we cannot use the experiences generated by another policy directly. Importance Sampling can be used to overcome this limitation, however, it is known to be highly unstable. How can imitation learning be used for such on-policy algorithms avoiding the stability issues?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP