Why Deep Reinforcement Learning fails to learn how to play Asteroids?

Question

Deep Q-learning, A3C, policies evolved with genetic algorithms, they all fail to learn Asteroids, or at least perform way worse than human. From the hardest Atari games according to RL most of the focus is on Montezuma's revenge, which clearly suffers from sparse rewards. However I don't think this is the case of Asteroids (video), since for every asteroid shot a reward is provided. Why DRL performs that bad then?

Here are some papers that report a bad result on Asteroids (some articles refer to each other):

Human-level control through deep reinforcement
learning
Massively Parallel Methods for Deep Reinforcement Learning
Deep Reinforcement Learning with Double Q-learning
Dueling Network Architectures for Deep Reinforcement Learning
Prioritized Experience Replay
Rainbow: Combining Improvements in Deep Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
A Neuroevolution Approach to General Atari Game Playing

Wicked · Accepted Answer

I suspect a bug, or some subtle implementation detail.
In many ways, Asteroids is as near an ideal environment as one can get without custom design:

Unambiguous, simple reward system (get points, don't die)
limited options (move or shoot a single weapon type)
all enemies are very similar
The player has perfect information.

Compare this to DOTA 2, which has not been mastered, but with a moderate reduction in complexity (1v1 instead of 5v5), OpenAI was able to achieve some impressive results, despite being magnitudes of order more complex than asteroids.
There are certain compromises made in the 2015 DQN paper, for example:

"Following previous approaches to playing Atari2600 games,we also use
a simple frame-skipping technique (15). More precisely, the agent sees
and selects actions on every kth frame instead of every frame, and its
last action is repeated on skipped frames. Because running the
emulator forward for one step requires much less computation than
having the agent select an action, this technique allows the agent to
play roughly k times more games without significantly increasing the
runtime.
We use k - 4 for all games"

As a counter-example, top level human players often make single frame level decisions / actions in modern fighting games (rendered both visually and in game logic at 60 FPS), so we know this approach will not work well with all video games.
I suspect that even though these were effective generalized approaches, one of them severely failed with asteroids.
This is a good question, as diagnosing why asteroids is causing difficulty could give some strong insight into testing procedures or algorithm design, as clearly something is happening. I think to get an accurate answer would require fixing the problem.
Sources

Why Deep Reinforcement Learning fails to learn how to play Asteroids?

One Answer

Add your own answers!

Ask a Question