TransWikia.com

Would a convolution network make sense for policy based tic-tac-toe approach?

Data Science Asked by Achilles on August 18, 2021

This is inspired from my previous question, comments to which made me realize that a CNN was unsuitable for the problem The CNN required over 700k training datasets while a MLP did it in in less than 50k.

Now, I’m trying to solve the next problem and need to figure out if a CNN makes sense.

CNN detail –

Input – The board as an array of 9 elements that represents the board (0=empty, 1=’X’, 2=’O’)

Output – Recommended move as a one-hot encoded array of 9 elements. The index of 1 is the recommended move (for example, in [0,0,1,0,0,0,0,0,0] the recommended move is 2)

So, basically the CNN will be trained with a dataset that consists of boards and the move that the winner of the game has made for each of the boards. Then during evaluation, it’ll try to predict the best move for a given board.

Does a convolution neural network make sense for this problem?

Note: The convnet that I was going to use for this problem is the same as in my previous question

2 Answers

This problem is not complex enough to justify a large convolutional network. However, if you are determined to use a CNN, then you could, just try to keep the architecture very simple. Just one convolutional layer (probably SAME/padded), no pooling or dropout, only a few feature maps (e.g. no more than 4, maybe just 2 will do) - and a softmax fully-connected layer for the output.

Bear in mind that it can take more epochs, and perhaps more tuning of hyper-params, in order to get a more complex model to fit a simple problem. If you follow the plan in your earlier problem, and train against the whole population of valid states and moves, then you don't need to worry about over-fitting.

You should bear in mind that tic-tac-toe is simple enough that you can use tabular methods (i.e. approaches that simply enumerate and score all possible game states) to find an optimal policy. The network is being used in your case as a policy function approximation, and is a bit like using a sledgehammer to crack a nut. This makes sense if you are learning a technique used on more sophisticated grid-based games, using a toy problem. However, if your goal was more directly to learn a policy for a tic-tac-toe playing bot, then you would be better off not using any supervised learning model for the policy.

Correct answer by Neil Slater on August 18, 2021

For anyone interested, I'm posting my results of trying out both CNNs and MLPs -

A 3x3 kernel 4 feature maps CNN consistently performed better than a 9 neuron MLP. I trained both on completely random tic-tac-toe training data size of 10kx20 and the CNN learned to to achieve ~80% win/draw against a random opponent while the MLP achieved around ~60% for the same data. A 2x2 CNN performed much worse than 3x3.

Training data - Inputs and outputs are same as mentioned in the question with the exception that the AI is trained only if it loses or draws the game and the dataset is chosen from randomized player for drawn games.

Note: It is possible to achieve much better accuracy with less data but in this case my goal was to just use this game to learn machine learning programming so the code was not particularly optimized for anything. The comparison of MLPs and CNNs is also incidental and needs to be done with more rigor for any serious purposes.

Answered by Achilles on August 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP