TransWikia.com

Why Deep Reinforcement Learning fails to learn how to play Asteroids?

Data Science Asked by hipoglucido on November 16, 2020

Deep Q-learning, A3C, policies evolved with genetic algorithms, they all fail to learn Asteroids, or at least perform way worse than human. From the hardest Atari games according to RL most of the focus is on Montezuma’s revenge, which clearly suffers from sparse rewards. However I don’t think this is the case of Asteroids (video), since for every asteroid shot a reward is provided. Why DRL performs that bad then?

Here are some papers that report a bad result on Asteroids (some articles refer to each other):

One Answer

I suspect a bug, or some subtle implementation detail.

In many ways, Asteroids is as near an ideal environment as one can get without custom design:

  • Unambiguous, simple reward system (get points, don't die)
  • limited options (move or shoot a single weapon type)
  • all enemies are very similar
  • The player has perfect information.

Compare this to DOTA 2, which has not been mastered, but with a moderate reduction in complexity (1v1 instead of 5v5), OpenAI was able to achieve some impressive results, despite being magnitudes of order more complex than asteroids.

There are certain compromises made in the 2015 DQN paper, for example:

"Following previous approaches to playing Atari2600 games,we also use a simple frame-skipping technique (15). More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames. Because running the emulator forward for one step requires much less computation than having the agent select an action, this technique allows the agent to play roughly k times more games without significantly increasing the runtime.

We use k - 4 for all games"

As a counter-example, top level human players often make single frame level decisions / actions in modern fighting games (rendered both visually and in game logic at 60 FPS), so we know this approach will not work well with all video games.

I suspect that even though these were effective generalized approaches, one of them severely failed with asteroids.

This is a good question, as diagnosing why asteroids is causing difficulty could give some strong insight into testing procedures or algorithm design, as clearly something is happening. I think to get an accurate answer would require fixing the problem.

Sources

Correct answer by Wicked on November 16, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP