An attempt to solve Tic-Tac-Toe using Keras and LSTM

Deep Learning is not the way to solve Tic-Tac-Toe but it certainly is a nice educational experience.

2 March 2022 Updated 3 March 2022

After implementing my first Deep Learning LSTM model for a project I was thinking if Deep Learning could also solve a game. The first game that comes to mind is Tic-Tac-Toe. Then you search the internet and there appear to be many many people who had the same idea. Of course.

Below I present my solution to solve Tic-Tac-Toe using Keras and LSTM (Long Short Term Memory). This is Deep Learning and not a Reinforcement Learning solution, that is a totally different subject.

The solution does not use a single model, but a model for every move. Why? Because I wanted to avoid that the training data for an early move gets 'poluted' with training data for future moves.

About the Tic-Tac-Toe game

There are 255168 ways to play this game. Of these games 131184 are won by the first player, 77904 go to the opponent, and 46080 end in a draw.

A board consists of fields with content empty, X, and O. The total number of possible boards is 3^9 = 19,683.

What we know about the Tic-Tac-Toe game

Our Deep Learning black box knows the following about the game:

There are two players
It is played on 9 fields, let's call it a board
The players take turns making a move
Making a move means assigning an unused field to a player
This means we know which fields are available when making a move
After some moves someone tells us the game is over and who won, lost or if there was a draw

That's all we know, not really much. Can we use Deep Learning to play winning games?

Some questions and answers

How do we represent the board?
How do we represent a move?
Should we use regression or classification and how to use it?
Should we use MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?
Should every move have its own model?
Should both players have their own model?

How do represent the board?

The board is a vector of nine numbers, one for for each field. The number is 0 when it can be used by player1 or player2. It is 1 when taken by player1 and 2 when taken by player1.

How do represent the move?

This is not really important, let's use a list of a row and column position.

Regression or classification and how to use it?

With regression we can try to predict a value for the next move.

With classification we can try to predict a vector of best moves.

I choose regression here. We have two players. The values for player1:

2: won
1: draw
0: lost

The prediction will be a value between 2 and 0. If it is our turn, we predict the value for all available moves. Then we select the best available value, that is the maximum. This gives our move. For player2, the best available value is the minimum.

MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?

I don't know, let's try LSTM.

Every move has its own model?

I think this is important. If we have just one model, then the data after step N is also included. That seems totally wrong.

Should both players have their own model?

I think so. Player1 is always one move ahead, player2 is always 1 move behind.

Training data

We are using regression, see above, to generate a value for every possible move at a certain moment. Player1 selects the move with maximum value, the oppenent, player2 selects the move with the minimum value.

The board is a vector of nine fields. Fields can be empty, X for player1 and O for player2. A game consists of multiple vectors. We assign the outcome of the game to all vectors of a game.

For example, the data for a game lookes like:

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0] -> 2 # player1 makes a move
[1, 0, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player2
[1, 1, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player1
[1, 1, 0, 0, 0, 0, 0, 2, 0, 2] -> 2 # player2
[1, 1, 1, 0, 0, 0, 0, 2, 0, 2] -> 2 # player1 won

For LSTM information please see 'How to Develop LSTM Models for Time Series Forecasting', see links below. The model is a multivariate LSTM using n_steps_in=2 and n_steps_out=1.

To generate the training data we play the game many times.

Data from unique games only

To train our model we generate data by playing a number of games. We do this using random moves. This means we can have many duplicate games in our dataset, in other words, the dataset is poluted. At the moment I make sure that there are no duplicate games, by using a signature of the board and the final result.

A model for every move

If we use a single model then the training data becomes 'poluted' with future data. When it is our move, we want to know what is the next best move. Everything after that step polutes our existing data. That is why I use multiple models, one for every step.

move[n]    model[n] with data upto step[n+1]
move[n+1]  model[n+1] with data upto step[n+2]
etc.

For player 1 this means that at move[n] we use model[n] that has been trained with data up to move n+1. This way we can select the best available move to make.

The parameter n_steps_in=2 meaning we have no model for the first move. What we do is make the first a random move, for both players.

Also we do not use a model if there is only one move left.

Performance

Ok, I implemented this. How does it perform? The training result shows for all models that the mean_absolute_error is reduced from around 0.8 to 0.6. Not very good.

In the results below I have the two players. A player can be:

RD: makes random moves
NN: makes Neural Network moves

The first player is player1, the second is player2. Training data is generated as described above. It is generated until we have for example 1000 x won-draw-lost = 1000xwon, 1000xlost, and 1000xdraw.