Data Science Asked by hipoglucido on November 16, 2020
Deep Q-learning, A3C, policies evolved with genetic algorithms, they all fail to learn Asteroids, or at least perform way worse than human. From the hardest Atari games according to RL most of the focus is on Montezuma’s revenge, which clearly suffers from sparse rewards. However I don’t think this is the case of Asteroids (video), since for every asteroid shot a reward is provided. Why DRL performs that bad then?
Here are some papers that report a bad result on Asteroids (some articles refer to each other):
I suspect a bug, or some subtle implementation detail.
In many ways, Asteroids is as near an ideal environment as one can get without custom design:
Compare this to DOTA 2, which has not been mastered, but with a moderate reduction in complexity (1v1 instead of 5v5), OpenAI was able to achieve some impressive results, despite being magnitudes of order more complex than asteroids.
There are certain compromises made in the 2015 DQN paper, for example:
"Following previous approaches to playing Atari2600 games,we also use a simple frame-skipping technique (15). More precisely, the agent sees and selects actions on every kth frame instead of every frame, and its last action is repeated on skipped frames. Because running the emulator forward for one step requires much less computation than having the agent select an action, this technique allows the agent to play roughly k times more games without significantly increasing the runtime.
We use k - 4 for all games"
As a counter-example, top level human players often make single frame level decisions / actions in modern fighting games (rendered both visually and in game logic at 60 FPS), so we know this approach will not work well with all video games.
I suspect that even though these were effective generalized approaches, one of them severely failed with asteroids.
This is a good question, as diagnosing why asteroids is causing difficulty could give some strong insight into testing procedures or algorithm design, as clearly something is happening. I think to get an accurate answer would require fixing the problem.
Correct answer by Wicked on November 16, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP