Data Science Asked by efel on May 26, 2021
Im using the python sklearn library to attempt a linear regression TicTacToe AI.
I create my training set by simply having the computer play random ‘blind’ games against itself. For example… Player one plays a random segment of the board. Next player two plays a random valid segment of the board etc. This goes on until the board is full or someone has won. Each time player one wins, i store the board states leading up to the win. Every loss, i simply mark that board state (and past board states of the same game ) as a loss for player one. For every tie game(full board) I do not count it as anything. I play about 20k of these games. At the end I get my training data set which includes the board state (the feature set) and the outcome which is the percentage (a floating pint value. eg .8 is 80%) of games won for that state.
So for example going from board top left to bottom right: [1, 1, 1, 2, 0, 2, 0, 0, 0] would be:
X X X
O - O
- - -
would have a ‘1’ or 100 percent after playing 20k random games etc.
I’m trying to predict the success rate of player one’s next move. Basically the success rate of any free segment based on the board state.
However; after training sklearn linear regression with my training data, I get a very low R^2 score of .14 and any test is highly inaccurate. I’m beginning to think there is a flaw in my data? Is this how data scientists would go about creating the training set for tic tac toe?
Linear regression will not work for this problem because the relationship between the board features and target variable that you are using is not linear.
Is this how data scientists would go about creating the training set for tic tac toe?
It is not 100% clear what your goal is. For simplicity I will select that your goal as "Predict the probability of X winning eventually given the current board state and completely random play in future by both sides." That appears to be what you are doing.
As an aside, this is not a direct path to training a neural network to predict the best moves to make in a game. For this simple game, it might work acceptably if that is your eventual goal, but if you want machine learning for game playing you should probably look into reinforcement learning, and specifically self-play with reinforcement learning, as a framework to manage the training data.
Back to your question, what you are doing is acceptable for creating a data set, although I would want to check:
For every tie game(full board) I do not count it as anything
If that means you are still storing the states that lead to a tie, but with a different label, then that is ok. If you are discarding data about ties, then that will skew the dataset and might impact your predictions - unless you are also discarding ties when testing.
This is also slightly unusual:
At the end I get my training data set which includes the board state (the feature set) and the outcome which is the percentage (a floating pint value. eg .8 is 80%) of games won for that state.
This is unusual in that you have pre-processed the data into a summary row when features are identical. This skews the dataset when used with an approximation function (linear regression - like most ML statistical learners - is an approximation function), because you lose the number of times those features occurred. Any balancing the prediction function does to make itself more accurate for common states is lost when you do this. It is more normal to keep all records separate and have the ML method resolve the best way to take averages. If you measure the accuracy of your completed model by taking random samples of new played games, it could have lower accuracy than possible due to this.
For data collection of records, it is more usual to keep all observations separate and not to summarise them before training a classifier. The classifier can then fit the data allowing for the frequency of each observation.
Other than the caveats about ties (which you may well have right), and premature taking of averages, plus the limitation that your dataset will only help predict outcomes in fully random games, then the dataset collection looks ok to me. Neither of the above problems are major enough to cause the problem that you noticed. The reason your predictions are not working with linear regression is mainly due to needing non-linearity in the prediction function.
A simple fix for this would be to use a non-linear predictor such as a neural network or maybe a decision-tree algorithm like xgboost.
If you use a neural network, the following may help:
Use sigmoid activation in the output layer and binary cross-entropy loss. This should help when your output is a probability.
Use the value $-1$ instead of $2$ for marking positions in the board played by O. This is not strictly required, but neural networks tend to learn faster and more accurately when the input data in centered around zero with close to 1 standard deviation.
It is worth noting that your averaged win rate table is already quite a reasonable predictive model for game playing. For TicTacToe it should work quite well because there are a limited number of states. After 20k games with random play, you will have a record of nearly every possible state, and some will have reasonably accurate average values (for instance each initial play by X will have ~2000 sampled continuations which should give you the win rate within a few percent). The weakness of this approach is that it cannot generalise to new unseen states, but actually that is quite hard to do in board games where fine detail matters.
Correct answer by Neil Slater on May 26, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP