TransWikia.com

Why MADDPG rather than taking all cooperating agents as a single meta-agent?

Data Science Asked by Forrest Wei on December 17, 2020

Since MADDPG uses a centralized critic for training, why not simply treat all cooperating agents as a single meta-agent with a concatenated observation space and a concatenated action space? In my opinion, MADDPG is centralized enough, so it won’t hurt to go one step further.

One Answer

MADDPG can be used to model agents that have limited observation and communication capabilities after training, which is an interesting and useful real world scenario.

why not simply treat all cooperating agents as a single meta-agent with a concatenated observation space and a concatenated action space?

Any real world implementation will then require resources to provide and manage that overview. This may not be practical or desirable in all cases.

There is no single fix for this, it is an open area of research. Whether to invest in better communication and central processing, or better autonomy for multiple agents is likely to have different answers depending on the problem and current technology limits for either approach.

MADDPG reduces the role of central processing to assessment of global reward signals during training. That means:

  • Each agent works with local signals only, and simpler observation and action spaces as a result. Only the reward signal processing is handled externally.

  • Trained agents can theoretically be used in environments where a central processor is not available.

So, for example, agents can be trained in simulation with all the oversight that allows, or with a carefully instrumented environment including high bandwidth connections between agents and central processing. They can then be deployed into matching environments where the central oversight is not available, or too costly.

Correct answer by Neil Slater on December 17, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP