# An attempt to solve Tic-Tac-Toe using Keras and LSTM

Deep Learning is not the way to solve Tic-Tac-Toe but it certainly is a nice educational experience.

2 March 2022 Updated 3 March 2022 After implementing my first Deep Learning LSTM model for a project I was thinking if Deep Learning could also solve a game. The first game that comes to mind is Tic-Tac-Toe. Then you search the internet and there appear to be many many people who had the same idea. Of course.

Below I present my solution to solve Tic-Tac-Toe using Keras and LSTM (Long Short Term Memory). This is Deep Learning and not a Reinforcement Learning solution, that is a totally different subject.

The solution does not use a single model, but a model for every move. Why? Because I wanted to avoid that the training data for an early move gets 'poluted' with training data for future moves.

There are 255168 ways to play this game. Of these games 131184 are won by the first player, 77904 go to the opponent, and 46080 end in a draw.

A board consists of fields with content empty, X, and O. The total number of possible boards is 3^9 = 19,683.

## What we know about the Tic-Tac-Toe game

Our Deep Learning black box knows the following about the game:

• There are two players
• It is played on 9 fields, let's call it a board
• The players take turns making a move
• Making a move means assigning an unused field to a player
• This means we know which fields are available when making a move
• After some moves someone tells us the game is over and who won, lost or if there was a draw

That's all we know, not really much. Can we use Deep Learning to play winning games?

• How do we represent the board?
• How do we represent a move?
• Should we use regression or classification and how to use it?
• Should we use MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?
• Should every move have its own model?
• Should both players have their own model?

### How do represent the board?

The board is a vector of nine numbers, one for for each field. The number is 0 when it can be used by player1 or player2. It is 1 when taken by player1 and 2 when taken by player1.

### How do represent the move?

This is not really important, let's use a list of a row and column position.

### Regression or classification and how to use it?

With regression we can try to predict a value for the next move.

With classification we can try to predict a vector of best moves.

I choose regression here. We have two players. The values for player1:

• 2: won
• 1: draw
• 0: lost

The prediction will be a value between 2 and 0. If it is our turn, we predict the value for all available moves. Then we select the best available value, that is the maximum. This gives our move. For player2, the best available value is the minimum.

### MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?

I don't know, let's try LSTM.

### Every move has its own model?

I think this is important. If we have just one model, then the data after step N is also included. That seems totally wrong.

### Should both players have their own model?

I think so. Player1 is always one move ahead, player2 is always 1 move behind.

## Training data

We are using regression, see above, to generate a value for every possible move at a certain moment. Player1 selects the move with maximum value, the oppenent, player2 selects the move with the minimum value.

The board is a vector of nine fields. Fields can be empty, X for player1 and O for player2. A game consists of multiple vectors. We assign the outcome of the game to all vectors of a game.

For example, the data for a game lookes like:

``````[1, 0, 0, 0, 0, 0, 0, 0, 0, 0] -> 2 # player1 makes a move
[1, 0, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player2
[1, 1, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player1
[1, 1, 0, 0, 0, 0, 0, 2, 0, 2] -> 2 # player2
[1, 1, 1, 0, 0, 0, 0, 2, 0, 2] -> 2 # player1 won``````

For LSTM information please see 'How to Develop LSTM Models for Time Series Forecasting', see links below. The model is a multivariate LSTM using n_steps_in=2 and n_steps_out=1.

To generate the training data we play the game many times.

## Data from unique games only

To train our model we generate data by playing a number of games. We do this using random moves. This means we can have many duplicate games in our dataset, in other words, the dataset is poluted. At the moment I make sure that there are no duplicate games, by using a signature of the board and the final result.

## A model for every move

If we use a single model then the training data becomes 'poluted' with future data. When it is our move, we want to know what is the next best move. Everything after that step polutes our existing data. That is why I use multiple models, one for every step.

``````move[n]    model[n] with data upto step[n+1]
move[n+1]  model[n+1] with data upto step[n+2]
etc.``````

For player 1 this means that at move[n] we use model[n] that has been trained with data up to move n+1. This way we can select the best available move to make.

The parameter n_steps_in=2 meaning we have no model for the first move. What we do is make the first a random move, for both players.

Also we do not use a model if there is only one move left.

## Performance

Ok, I implemented this. How does it perform? The training result shows for all models that the mean_absolute_error is reduced from around 0.8 to 0.6. Not very good.

In the results below I have the two players. A player can be:

• RD: makes random moves
• NN: makes Neural Network moves

The first player is player1, the second is player2. Training data is generated as described above. It is generated until we have for example 1000 x won-draw-lost = 1000xwon, 1000xlost, and 1000xdraw.

### 1. RD vs RD

Both players make random moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   578 |   293 |   129 |     1.97 |       7.7 |     12.9% |
|   593 |   274 |   133 |     2.16 |       7.7 |     13.3% |
|   590 |   276 |   134 |     2.14 |       7.6 |     13.4% |
|   589 |   296 |   115 |     1.99 |       7.6 |     11.5% |``````

This does not look unreasonable. Player1 starts the game and thus has an advantage over player2.

### 2. NN vs RD, train with 1000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   706 |   192 |   102 |     3.68 |       7.4 |     10.2% |
|   690 |   203 |   107 |     3.40 |       7.3 |     10.7% |
|   721 |   172 |   107 |     4.19 |       7.4 |     10.7% |
|   726 |   168 |   106 |     4.32 |       7.4 |     10.6% |``````

Better than random moves but far from perfect!

### 3. NN vs RD, train with 2000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   801 |   124 |    75 |     6.46 |       7.3 |      7.5% |
|   789 |   128 |    83 |     6.16 |       7.2 |      8.3% |
|   773 |   145 |    82 |     5.33 |       7.2 |      8.2% |
|   772 |   145 |    83 |     5.32 |       7.2 |      8.3% |``````

Again better, but again far from perfect!

### 4. NN vs RD, train with 5000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   901 |    49 |    50 |    18.39 |       7.0 |      5.0% |
|   888 |    58 |    54 |    15.31 |       7.0 |      5.4% |
|   889 |    56 |    55 |    15.88 |       7.0 |      5.5% |
|   898 |    48 |    54 |    18.71 |       7.0 |      5.4% |``````

This looks more like it! Can we do better?

### 5. NN vs RD, train with 20000 x won-draw-lost

Player1 make Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   955 |    21 |    24 |    45.48 |       6.7 |      2.4% |
|   959 |    15 |    26 |    63.93 |       6.7 |      2.6% |
|   962 |    13 |    25 |    74.00 |       6.7 |      2.5% |
|   969 |    12 |    19 |    80.75 |       6.7 |      1.9% |``````

Huge improvement, more (training) data matters! Are the wins of player2 just luck?

### 6. RD vs NN, train with 20000 x won-draw-lost

Now we change the order. Player1 makes random moves and player2 makes Neural Network moves.

Results of 4 runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   145 |   739 |   116 |     0.20 |       6.8 |     11.6% |
|   127 |   761 |   112 |     0.17 |       6.8 |     11.2% |
|   142 |   735 |   123 |     0.19 |       6.8 |     12.3% |
|   142 |   750 |   108 |     0.19 |       6.8 |     10.8% |``````

This seems reasonable. The results for player2 are not as good as player1 above. Also the number of draws increased. Does this have to do with the fact that player1 starts first?

### 7. NN vs NN, train with 20000 x won-draw-lost

Both players make Neural Network moves!

Results of runs of 1000 games:

``````+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   684 |   104 |   212 |     6.58 |       7.2 |     21.2% |
|   672 |   102 |   226 |     6.59 |       7.2 |     22.6% |
|   675 |   112 |   213 |     6.03 |       7.2 |     21.3% |``````

We expect won and lost to be close, but that is not the case. We also expect to see a lot more draws. That looks fine.

We can only conclude: Our black box is not smart enough ... :-(

## The code

Below is the code so you can try yourself. Keras Tuner is used to optimize the hyperparameters.

Important variables:

• tuner_dir
This is the directory created by Keras Tuner.
• must_train
Set True if you want to train the models.
Set False if you want use the trained models.
• won_draw_lost_count
This is about the training data, see above.

Typically you want to train once and then do multiple runs. If you change won_draw_lost_count, make sure you delete the directory tuner_dir first! Increasing won_draw_lost_count will increase training time.

``````# tictactoe.py

# std
import copy
import os
import random
import sys
import time
import operator

# optimizing hyperparameters with keras tuner
from keras.callbacks import EarlyStopping
from keras.layers import Dense, Dropout, LSTM
import keras_tuner as kt
import numpy as np
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split

# get reproducible results
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)

# constants: game results
# note(s):
# - the order for player1, higher is better
# - result is float
GAME_RESULT_PLAYER1_WON = 2.
GAME_RESULT_DRAW = 1.
GAME_RESULT_PLAYER2_WON = 0.

# tuner parameters
class TunerParameter:
def __init__(self, name, val):
self.name = name
self.val = val
self.args = (name, val)

class TunerParameters:
def __init__(self, pars=None):
self.t_pars = []
for p in pars:
tpar = TunerParameter(p, p)
self.t_pars.append(tpar)
setattr(self, p, tpar)
def get_pars(self):
return self.t_pars

# subclassed hyperband tuner to add hyperparameters (here, batch_size)
class HyperbandTuner(kt.tuners.Hyperband):
def __init__(self, *args, **kwargs):
self.t_pars = None
if 't_pars' in kwargs:
self.t_pars = kwargs.pop('t_pars')
super(HyperbandTuner, self).__init__(*args, **kwargs)

def run_trial(self, trial, *args, **kwargs):
if self.t_pars is not None:
for tpar in self.t_pars:
kwargs[tpar.name] = trial.hyperparameters.Choice(tpar.name, tpar.val)
return super(HyperbandTuner, self).run_trial(trial, *args, **kwargs)

# tuner class
class Tuner:

def __init__(
self,
input_shape=None,
t_pars=None,
t_dir='tuner_dir',
):
self.input_shape = input_shape
self.t_dir = t_dir
self.t_pars = t_pars

def get_model(self, hp):
# model params: layer_input
model_params_layer_input = dict(
units=hp.Choice(*self.t_pars.layer_input_units.args),
activation='relu',
input_shape=self.input_shape,
)

'''
# model params: layer_hidden_1
model_params_layer_hidden_1 = dict(
units=hp.Choice(*self.t_pars.layer_hidden_1_units.args),
activation='relu',
input_shape=self.input_shape,
)
'''

# model params: compile
model_params_compile = dict(
loss='mean_absolute_error',
metrics=['mean_absolute_error'],
#metrics=['mean_absolute_error', 'val_mean_absolute_error'],
#metrics=['mean_absolute_error', 'val_mae'],
)

# model
model = Sequential()
units=1,
))
# compile
model.compile(**model_params_compile)

return model

def get_model_path(
self,
model_num,
):
return os.path.join(self.t_dir, 'best_model_' + str(model_num))

def tune(
self,
X,
y,
model=None,
model_num=0,
use_early_stopping=False,
search_split_size=0.33,
fit_verbose=0,
plot_loss=False,
):
if model is None:
model = self.get_model

# use subclassed HyperBand tuner
tuner = HyperbandTuner(
model,
objective=kt.Objective('val_mean_absolute_error', direction='min'),
#allow_new_entries=True,
tune_new_entries=True,
hyperband_iterations=2,
max_epochs=260,
directory=self.t_dir,
project_name='model_' + str(model_num),
t_pars=[self.t_pars.batch_size],
)

print('\n{}\ntuner.search_space_summary()\n{}\n'.format('-'*60, '-'*60))
tuner.search_space_summary()

callbacks = []
if use_early_stopping:
callbacks = [EarlyStopping('val_loss', patience=3)]
print('\n{}\ntuner.search()\n{}\n'.format('-'*60, '-'*60))
tuner.search(
X,
y,
validation_split=search_split_size,
shuffle=False,
callbacks=callbacks
)

print('\n{}\ntuner.results_summary()\n{}\n'.format('-'*60, '-'*60))
tuner.results_summary()

print('\n{}\nbest_hps\n{}\n'.format('-'*60, '-'*60))
best_hps = tuner.get_best_hyperparameters()
best_hps_dict = {}
for tpar in self.t_pars.get_pars():
try:
best_val = best_hps[tpar.name]
print('- {} = {}'.format(tpar.name, best_val))
best_hps_dict[tpar.name] = best_val
except:
pass
best_hps_dict['epochs'] = best_hps['tuner/epochs']

# get best model
h_model = tuner.hypermodel.build(best_hps)

print('\n{}\nh_model.summary()\n{}\n'.format('-'*60, '-'*60))
h_model.summary()

# train best model
epochs = 50
print('training ...')
history = h_model.fit(
X,
y,
epochs=epochs,
workers=4,
use_multiprocessing=True,
verbose=fit_verbose,
)

if plot_loss:
fig_title = 'Loss - model[{}]'.format(model_num)
fig = go.Figure()
fig.update_layout(height=500, width=1200, title=fig_title, xaxis_title='Epoch', yaxis_title='Mean Absolute Error')
fig.show()

# save best model
h_model.save(self.get_model_path(model_num))

print('\n{}\nh_model.evaluate()\n{}\n'.format('-'*60, '-'*60))
h_eval_dict = h_model.evaluate(X, y, return_dict=True)
print('h_eval_dict = {}'.format(h_eval_dict))

return dict(
h_model=h_model,
h_eval_dict=h_eval_dict,
best_hps_dict=best_hps_dict,
)

self,
model_num,
):
try:
except:
return None

class DataUtils:

@classmethod
def get_model_boards_and_results(cls, game_results):
# generate datasets
model_boards_and_results = {}
for m in range(9):
model_boards_and_results[m] = []
for game_result in game_results:
result = game_result.result
for i, board in enumerate(game_result.boards):
if i == 0:
# skip board0
continue
# model_boards_data
if i < 4:
# store for all models
for m in range(2, 9):
model_boards_and_results[m].append((board, result))
else:
# store for models[i - 1]
for m in range(i - 1, 9):
model_boards_and_results[m].append((board, result))
return model_boards_and_results

@classmethod
def get_model_data(cls, model_boards_and_results):
model_data = {}
for model_num in model_boards_and_results:
boards_and_results = model_boards_and_results[model_num]
boards = []
results = []
for board, result in boards_and_results:
boards.append(board)
results.append(result)
boards_len = len(boards)
results_len = len(results)
print('model[{}]: boards_len = {}, results_len = {}'.format(model_num, boards_len, results_len))
print('model[{}]:'.format(model_num))
#print('- boards = {}'.format(boards))
#print('- results = {}'.format(results))

if boards_len == 0:
continue

# create data for model
# player1 models: 2, 4, 6, 8
# player2 models: 3, 5, 7
# 2 = player1 won
# 1 = draw
# 0 = player2 won
# this means:
# - player1 looks for the maximum
# - player2 looks for the minimum
dataset = []
for i, board in enumerate(boards):
result = results[i]
flattened_board_and_result = []
for row in board:
print('row = {}'.format(row))
flattened_board_and_result.extend(row)
flattened_board_and_result.append(result)
dataset.append(flattened_board_and_result)
dataset = np.array(dataset, dtype=object)
print('dataset = {}'.format(dataset))
X_in, y_in = cls.split_sequences(dataset, n_steps_in, n_steps_out)
print('X_in = {}'.format(X_in))
print('y_in = {}'.format(y_in))

X_in_shape = X_in.shape
print('X_in_shape = {}'.format(X_in_shape))

model_data[model_num] = dict(
X_in=X_in,
y_in=y_in,
)

return model_data

# split a multivariate sequence into samples
@classmethod
def split_sequences(cls, sequences, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out-1
# check if we are beyond the dataset
if out_end_ix > len(sequences):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
X.append(seq_x)
y.append(seq_y)
if n_steps_out == 1:
y1 = []
for v in y:
y1.append(v)
y = y1
return np.array(X), np.array(y)

@classmethod
def split_datasets(cls, X, y):
return train_test_split(X, y, test_size=0.33, shuffle=False, random_state=42)

class GameResult:
def __init__(
self,
boards=None,
moves=None,
result=None,
):
self.boards = boards
self.moves = moves
self.result = result

class GamePlay:

def __init__(
self,
n_steps_in=None,
n_steps_out=None,
n_features=None,
):
self.n_steps_in = n_steps_in
self.n_steps_out = n_steps_out
self.n_features = n_features
self.player_value2displays = {0: ' ', 1: 'X', 2: 'O'}

def get_empty_board(self):
return [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

def show_boards(self, boards, show_board_num=True, horizontal=True):
boards_strs = []
for i, board in enumerate(boards):
strs = self.show_board(board, board_num=i, return_strs=horizontal)
boards_strs.append(strs)
if horizontal:
for i in range(len(strs)):
line_parts = []
for b in boards_strs:
line_parts.append(b[i])
print('{}'.format('  '.join(line_parts)))

def show_board(self, board, board_num=None, return_strs=False):
strs = []
if board_num is not None:
strs.append('board: {:<6}'.format(board_num))
ds = '+---+---+---+'
strs.append('{}'.format(ds))
for r in range(3):
row_items = []
for c in range(3):
row_items.append(self.player_value2displays[board[r][c]])
dr = '| {} |'.format(' | '.join(row_items))
strs.append('{}'.format(dr))
strs.append('{}'.format(ds))
if return_strs:
return strs
for s in strs:
print(s)

def get_available_moves(self, board):
moves = []
for r in range(3):
for c in range(3):
if board[r][c] == 0:
moves.append([r, c])
return moves

def player_won(self, player, board):
# row, col win
for r in range(3):
if board[r] == player and board[r] == player and board[r] == player:
return True
for c in range(3):
if board[c] == player and board[c] == player and board[c] == player:
return True
# diag win
if board == player and board == player and board == player:
return True
if board == player and board == player and board == player:
return True
return False

def get_other_player(self, player):
return 2 if player == 1 else 1

def get_inverted_board(self, board):
board = copy.deepcopy(board)
inverted_board = self.get_empty_board()
for r in range(3):
for c in range(3):
if board[r][c] == 1:
inverted_board[r][c] = 2
elif board[r][c] == 2:
inverted_board[r][c] = 1
return inverted_board

def get_deeplearning_move(self, h_model, player, board, dbg=False):
# get moves
available_moves = self.get_available_moves(board)
if len(available_moves) == 1:
return available_moves
board = copy.deepcopy(board)

move_vs = []
X_pred = []
for i, move in enumerate(available_moves):
next_board = copy.deepcopy(board)
self.make_move(player, next_board, move)
# create X_pred
x_pred = [
self.board2vector(board),
self.board2vector(next_board),
]
if dbg: print('x_pred = {}'.format(x_pred))
X_pred.append(x_pred)
y = [0, move, move]
move_vs.append(y)

X_pred_len = len(X_pred)
if X_pred_len > 0:
if dbg: print('X_pred = {}'.format(X_pred))
X_pred = np.array(X_pred).reshape(X_pred_len, self.n_steps_in, self.n_features)
predictions = h_model.predict(X_pred)
if dbg: print('for X_Pred = {}, predictions = {}'.format(X_pred, predictions))
for i, y in enumerate(predictions):
move_vs[i] = y
if dbg: print('move_vs = {}'.format(move_vs))
# get maximum or minimum
if player == 1:
reverse = True
else:
reverse = False
sorted_move_vs = move_vs.sort(key=operator.itemgetter(0), reverse=reverse)
sorted_move_vs = move_vs
if dbg: print('sorted_move_vs = {}'.format(sorted_move_vs))
#sys.exit()
move = [sorted_move_vs, sorted_move_vs]
if dbg: print('player = {}, y = {} -> selected move = {}'.format(player, y, move))
#sys.exit()
return move

# random move
if dbg: print('player = {}, dont know, move = {}'.format(player, move))
move = random.choice(available_moves)
return move

def get_random_move(self, board):
moves = self.get_available_moves(board)
return random.choice(moves)

def make_move(self, player, board, move):
board[move][move] = player
return board

def play_game(self, player1_type=0, player2_type=0, h_models=[], show_game=False):
# player_type: 0 = random, 1 = dl
# first board is empty
board = self.get_empty_board()
boards = [board]
moves = []
player = 1
i = 0
while True:
if player == 1:
if player1_type == 0 or len(moves) < 2:
print('rd move: i = {}, player = {}'.format(i, player))
move = self.get_random_move(board)
else:
print('nn move: i = {}, player = {}'.format(i, player))
move = self.get_deeplearning_move(h_models[i], player, board)
else:
if player2_type == 0 or len(boards) <= 2:
print('rd move: i = {}, player = {}'.format(i, player))
move = self.get_random_move(board)
else:
print('nn move: i = {}, player = {}'.format(i, player))
move = self.get_deeplearning_move(h_models[i], player, board)

# make the move
self.make_move(player, board, move)

if show_game:
print('player = {}, made move = {}'.format(player, move))
self.show_board(board)

# store for players
boards.append(copy.deepcopy(board))
moves.append(copy.deepcopy(move))

# check won, lost, draw
if self.player_won(player, board):
# player won
if player == 1:
result = GAME_RESULT_PLAYER1_WON
else:
result = GAME_RESULT_PLAYER2_WON
return GameResult(
boards=boards,
moves=moves,
result=result,
)

other_player = self.get_other_player(player)
if self.player_won(other_player, board):
# other_player won
if other_player == 1:
result = GAME_RESULT_PLAYER1_WON
else:
result = GAME_RESULT_PLAYER2_WON
return GameResult(
boards=boards,
moves=moves,
result=result,
)

if len(self.get_available_moves(board)) == 0:
# draw
return GameResult(
boards=boards,
moves=moves,
result=GAME_RESULT_DRAW,
)

# change player
player = other_player
i += 1

def play_games(self, n_games, player1_type=0, player2_type=0, h_models=None, show_games=False):
player_wins = [0, 0]
draws = 0
total_moves = 0
result_to_text = {
0: 'lost',
1: 'draw',
2: 'won',
}
for i in range(n_games):
print('playing game {} of {} ...'.format(i, n_games))
game_result = self.play_game(
player1_type=player1_type,
player2_type=player2_type,
h_models=h_models,
show_game=False,
)
self.show_boards(game_result.boards)
total_moves += len(game_result.moves)
last_move = game_result.moves[-1]
if game_result.result == GAME_RESULT_DRAW:
draws += 1
elif game_result.result == GAME_RESULT_PLAYER1_WON:
player_wins += 1
elif game_result.result == GAME_RESULT_PLAYER2_WON:
player_wins += 1

if player_wins == 0:
won_div_lost_fmt = '{:<8}'.format('')
else:
won_div_lost = player_wins/player_wins
won_div_lost_fmt = '{:8.2f}'.format(won_div_lost)
moves_avg = total_moves/n_games
print('summary for player1 after {} games\n--------'.format(n_games))
sep_line = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}-+'.format('-'*5, '-'*5, '-'*5, '-'*8, '-'*9, '-'*9)
print(sep_line)
print('| {:<5} | {:<5} | {:<5} | {:<8} | {:<9} | {:<9} |'.format('won', 'lost', 'draw', 'won/lost', 'avg_moves', 'perc draw'))
print(sep_line)
print('| {:5d} | {:5d} | {:5d} | {} | {:9.1f} | {:8.1f}% |'.format(player_wins, player_wins, draws, won_div_lost_fmt, moves_avg, (100*draws)/n_games))
print('')
print('')

def board2vector(self, board):
b = []
for r in range(3):
for c in range(3):
b.append(board[r][c])
return b

def get_game_result_signature(self, game):
sig = ''
for board in game.boards:
for r in range(3):
for c in range(3):
sig += str(board[r][c])
for move in game.moves:
sig += str(move) + str(move)
sig += str(game.result)
return sig

def generate_datasets(self, n_games):
# game_result signatures to prevent duplicate data
sigs = []
won_count = 0
lost_count = 0
draw_count = 0
# generate game data
game_results = []
game_count = 0
while won_count < n_games or lost_count < n_games or draw_count < n_games:
if game_count > 255000:
break

print('playing game {}'.format(game_count))
game_count += 1
game_result = self.play_game()
boards_len = len(game_result.boards)
moves_len = len(game_result.moves)
result = game_result.result

print('boards_len = {}, moves_len = {}, result = {}'.format(boards_len, moves_len, result))
if boards_len != (moves_len + 1):
print('boards_len != moves_len:')
print('game_result.boards = {}'.format(game_result.boards))
print('game_result.moves = {}'.format(game_result.moves))
sys.exit()

# skip identical games
sig = self.get_game_result_signature(game_result)
if sig in sigs:
continue
# distribute won, draw, lost
if result == GAME_RESULT_DRAW:
if draw_count >= n_games:
continue
draw_count += 1
elif result == GAME_RESULT_PLAYER1_WON:
if won_count >= n_games:
continue
won_count += 1
elif result == GAME_RESULT_PLAYER2_WON:
if lost_count >= n_games:
continue
lost_count += 1

sigs.append(sig)
game_results.append(game_result)

print('game_count = {}'.format(game_count))
print('len(sigs) = {}'.format(len(sigs)))
print('won_count = {}, lost_count = {}, draw_count = {}'.format(won_count, lost_count, draw_count))
input('Press ENTER to continue ...')
return game_results

# start

# hyperparameters name and value, are the inputs for hp.Choice()
t_pars = TunerParameters(
pars=[
('layer_input_units', [64, 128]),
#('layer_hidden_1_units', [64, 128]),
('batch_size', [8, 32]),
],
)

# lstm
n_steps_in = 2
n_steps_out = 1
n_features = 9

# other
tuner_dir = 'tuner_dir_tictactoe'
won_draw_lost_count = 5000
#won_draw_lost_count = 20000

# train or use
must_train = True
#must_train = False

game_play = GamePlay(
n_steps_in=n_steps_in,
n_steps_out=n_steps_out,
n_features=n_features,
)

h_models = [None]*9
if must_train:
game_results = game_play.generate_datasets(won_draw_lost_count)
model_boards_and_results = DataUtils.get_model_boards_and_results(game_results)
model_data = DataUtils.get_model_data(model_boards_and_results)

model_data_lens = [None]*9
print('model_data:')
for model_num in model_data:
data = model_data[model_num]
print('model_num = {}'.format(model_num))
X_in = data['X_in']
y_in = data['y_in']
print('- X_in = {}'.format(X_in))
print('- y_in = {}'.format(y_in))
model_data_lens[model_num] = len(X_in)
#continue

# split into train and test
X_train, X_test, y_train, y_test = DataUtils.split_datasets(X_in, y_in)

# convert
X_in = np.asarray(X_in).astype(int)
y_in = np.asarray(y_in).astype(int)
X_train = np.asarray(X_train).astype(int)
y_train = np.asarray(y_train).astype(int)
print('type(X_train) = {}, len(X_train) = {}, X_train = {}'.format(type(X_train), len(X_train), X_train))
print('type(y_train) = {}, len(y_train) = {}, y_train = {}'.format(type(y_train), len(y_train), y_train))

n_features = X_train.shape
print('n_features = {}'.format(n_features))

tuner = Tuner(
input_shape=(n_steps_in, n_features),
t_pars=t_pars,
t_dir=tuner_dir,
)

tune_result = tuner.tune(
X_train,
y_train,
model_num=model_num,
#plot_loss=False,
plot_loss=True,
)

h_model = tune_result['h_model']

# add to our list of models
h_models[model_num] = h_model

h_eval_dict = tune_result['h_eval_dict']
print('h_model = {}'.format(h_model))
print('h_eval_dict = {}'.format(h_eval_dict))

# debug: show a prediction for the h_model
X = [
[
[ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
[ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
],
[
[ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
[ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
]
]
X_pred = np.array(X).reshape(2, n_steps_in, n_features)
predictions = h_model.predict(X_pred)
print('h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))

# debug: show a prediction for the previously saved h_model
print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))

# here we have trained and saved the models
print('model_data_lens = {}'.format(model_data_lens))
input('Press ENTER to continue ...')

if h_models is None:
h_models = [None]*9
for model_num in range(9):
tuner = Tuner(
input_shape=(n_steps_in, n_features),
t_pars=t_pars,
t_dir=tuner_dir,
)
continue

'''
# debug: show a prediction for the previously saved h_model
X = [
[
[ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
[ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
],
[
[ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
[ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
]
]
X_pred = np.array(X).reshape(2, n_steps_in, n_features)
print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))
sys.exit()
'''

# debug: show models
for model_num in range(9):
print('h_models[{}] = {}'.format(model_num, h_models[model_num]))

print('play some games')

# play games and see results, player[1|2]_type: 0 = RD (random move), 1 = NN (neural network move)
game_play.play_games(
1000,
player1_type=1,
player2_type=1,
h_models=h_models,
show_games=True,
)``````

## Improving performance

We can try to improve performance in the following ways:

• add more hyperparameters to the models like learning rate
• add extra layers to the models
• skip early moves reducing training data by

It would be nice to implement statistics that also show how many games completed in 5, 6, 7, 8, or 9 moves.

## Summary

Using Deep Learning this way is not the solution to solve this game. I knew this when I started, but I still wanted to try this first before looking into Reinforcement Learning. There are a lot of possible improvements possible and I will certainly try some in future. Using Keras Tuner to tune the hyperparameters again made my life more easy. This was a nice project but also a lot of work. Learned a lot though.

How many tic tac toe games are there?
https://tictactoebeast.com/post/how-many-tic-tac-toe-games-are-there

How many Tic-Tac-Toe (noughts and crosses) games are possible?
http://www.se16.info/hgb/tictactoe.htm

How to Develop LSTM Models for Time Series Forecasting
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting

Part 4 — Neural Network Q Learning, a Tic-Tac-Toe player that learns — kind of
https://medium.com/@carsten.friedrich/part-4-neural-network-q-learning-a-tic-tac-toe-player-that-learns-kind-of-2090ca4798d

Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51

Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51