An attempt to solve Tic-Tac-Toe using Keras and LSTM
Deep Learning is not the way to solve Tic-Tac-Toe but it certainly is a nice educational experience.

After implementing my first Deep Learning LSTM model for a project I was thinking if Deep Learning could also solve a game. The first game that comes to mind is Tic-Tac-Toe. Then you search the internet and there appear to be many many people who had the same idea. Of course.
Below I present my solution to solve Tic-Tac-Toe using Keras and LSTM (Long Short Term Memory). This is Deep Learning and not a Reinforcement Learning solution, that is a totally different subject.
The solution does not use a single model, but a model for every move. Why? Because I wanted to avoid that the training data for an early move gets 'poluted' with training data for future moves.
About the Tic-Tac-Toe game
There are 255168 ways to play this game. Of these games 131184 are won by the first player, 77904 go to the opponent, and 46080 end in a draw.
A board consists of fields with content empty, X, and O. The total number of possible boards is 3^9 = 19,683.
What we know about the Tic-Tac-Toe game
Our Deep Learning black box knows the following about the game:
- There are two players
- It is played on 9 fields, let's call it a board
- The players take turns making a move
- Making a move means assigning an unused field to a player
- This means we know which fields are available when making a move
- After some moves someone tells us the game is over and who won, lost or if there was a draw
That's all we know, not really much. Can we use Deep Learning to play winning games?
Some questions and answers
- How do we represent the board?
- How do we represent a move?
- Should we use regression or classification and how to use it?
- Should we use MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?
- Should every move have its own model?
- Should both players have their own model?
How do represent the board?
The board is a vector of nine numbers, one for for each field. The number is 0 when it can be used by player1 or player2. It is 1 when taken by player1 and 2 when taken by player1.
How do represent the move?
This is not really important, let's use a list of a row and column position.
Regression or classification and how to use it?
With regression we can try to predict a value for the next move.
With classification we can try to predict a vector of best moves.
I choose regression here. We have two players. The values for player1:
- 2: won
- 1: draw
- 0: lost
The prediction will be a value between 2 and 0. If it is our turn, we predict the value for all available moves. Then we select the best available value, that is the maximum. This gives our move. For player2, the best available value is the minimum.
MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?
I don't know, let's try LSTM.
Every move has its own model?
I think this is important. If we have just one model, then the data after step N is also included. That seems totally wrong.
Should both players have their own model?
I think so. Player1 is always one move ahead, player2 is always 1 move behind.
Training data
We are using regression, see above, to generate a value for every possible move at a certain moment. Player1 selects the move with maximum value, the oppenent, player2 selects the move with the minimum value.
The board is a vector of nine fields. Fields can be empty, X for player1 and O for player2. A game consists of multiple vectors. We assign the outcome of the game to all vectors of a game.
For example, the data for a game lookes like:
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0] -> 2 # player1 makes a move
[1, 0, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player2
[1, 1, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player1
[1, 1, 0, 0, 0, 0, 0, 2, 0, 2] -> 2 # player2
[1, 1, 1, 0, 0, 0, 0, 2, 0, 2] -> 2 # player1 won
For LSTM information please see 'How to Develop LSTM Models for Time Series Forecasting', see links below. The model is a multivariate LSTM using n_steps_in=2 and n_steps_out=1.
To generate the training data we play the game many times.
Data from unique games only
To train our model we generate data by playing a number of games. We do this using random moves. This means we can have many duplicate games in our dataset, in other words, the dataset is poluted. At the moment I make sure that there are no duplicate games, by using a signature of the board and the final result.
A model for every move
If we use a single model then the training data becomes 'poluted' with future data. When it is our move, we want to know what is the next best move. Everything after that step polutes our existing data. That is why I use multiple models, one for every step.
move[n] model[n] with data upto step[n+1]
move[n+1] model[n+1] with data upto step[n+2]
etc.
For player 1 this means that at move[n] we use model[n] that has been trained with data up to move n+1. This way we can select the best available move to make.
The parameter n_steps_in=2 meaning we have no model for the first move. What we do is make the first a random move, for both players.
Also we do not use a model if there is only one move left.
Performance
Ok, I implemented this. How does it perform? The training result shows for all models that the mean_absolute_error is reduced from around 0.8 to 0.6. Not very good.
In the results below I have the two players. A player can be:
- RD: makes random moves
- NN: makes Neural Network moves
The first player is player1, the second is player2. Training data is generated as described above. It is generated until we have for example 1000 x won-draw-lost = 1000xwon, 1000xlost, and 1000xdraw.
1. RD vs RD
Both players make random moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 578 | 293 | 129 | 1.97 | 7.7 | 12.9% |
| 593 | 274 | 133 | 2.16 | 7.7 | 13.3% |
| 590 | 276 | 134 | 2.14 | 7.6 | 13.4% |
| 589 | 296 | 115 | 1.99 | 7.6 | 11.5% |
This does not look unreasonable. Player1 starts the game and thus has an advantage over player2.
2. NN vs RD, train with 1000 x won-draw-lost
Player1 makes Neural Network moves, player2 makes random moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 706 | 192 | 102 | 3.68 | 7.4 | 10.2% |
| 690 | 203 | 107 | 3.40 | 7.3 | 10.7% |
| 721 | 172 | 107 | 4.19 | 7.4 | 10.7% |
| 726 | 168 | 106 | 4.32 | 7.4 | 10.6% |
Better than random moves but far from perfect!
3. NN vs RD, train with 2000 x won-draw-lost
Player1 makes Neural Network moves, player2 makes random moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 801 | 124 | 75 | 6.46 | 7.3 | 7.5% |
| 789 | 128 | 83 | 6.16 | 7.2 | 8.3% |
| 773 | 145 | 82 | 5.33 | 7.2 | 8.2% |
| 772 | 145 | 83 | 5.32 | 7.2 | 8.3% |
Again better, but again far from perfect!
4. NN vs RD, train with 5000 x won-draw-lost
Player1 makes Neural Network moves, player2 makes random moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 901 | 49 | 50 | 18.39 | 7.0 | 5.0% |
| 888 | 58 | 54 | 15.31 | 7.0 | 5.4% |
| 889 | 56 | 55 | 15.88 | 7.0 | 5.5% |
| 898 | 48 | 54 | 18.71 | 7.0 | 5.4% |
This looks more like it! Can we do better?
5. NN vs RD, train with 20000 x won-draw-lost
Player1 make Neural Network moves, player2 makes random moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 955 | 21 | 24 | 45.48 | 6.7 | 2.4% |
| 959 | 15 | 26 | 63.93 | 6.7 | 2.6% |
| 962 | 13 | 25 | 74.00 | 6.7 | 2.5% |
| 969 | 12 | 19 | 80.75 | 6.7 | 1.9% |
Huge improvement, more (training) data matters! Are the wins of player2 just luck?
6. RD vs NN, train with 20000 x won-draw-lost
Now we change the order. Player1 makes random moves and player2 makes Neural Network moves.
Results of 4 runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 145 | 739 | 116 | 0.20 | 6.8 | 11.6% |
| 127 | 761 | 112 | 0.17 | 6.8 | 11.2% |
| 142 | 735 | 123 | 0.19 | 6.8 | 12.3% |
| 142 | 750 | 108 | 0.19 | 6.8 | 10.8% |
This seems reasonable. The results for player2 are not as good as player1 above. Also the number of draws increased. Does this have to do with the fact that player1 starts first?
7. NN vs NN, train with 20000 x won-draw-lost
Both players make Neural Network moves!
Results of runs of 1000 games:
+-------+-------+-------+----------+-----------+-----------+
| won | lost | draw | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
| 684 | 104 | 212 | 6.58 | 7.2 | 21.2% |
| 672 | 102 | 226 | 6.59 | 7.2 | 22.6% |
| 675 | 112 | 213 | 6.03 | 7.2 | 21.3% |
We expect won and lost to be close, but that is not the case. We also expect to see a lot more draws. That looks fine.
We can only conclude: Our black box is not smart enough ... :-(
The code
Below is the code so you can try yourself. Keras Tuner is used to optimize the hyperparameters.
Important variables:
- tuner_dir
This is the directory created by Keras Tuner. - must_train
Set True if you want to train the models.
Set False if you want use the trained models. - won_draw_lost_count
This is about the training data, see above.
Typically you want to train once and then do multiple runs. If you change won_draw_lost_count, make sure you delete the directory tuner_dir first! Increasing won_draw_lost_count will increase training time.
# tictactoe.py
# std
import copy
import os
import random
import sys
import time
import operator
# optimizing hyperparameters with keras tuner
from keras.callbacks import EarlyStopping
from keras.models import load_model, Sequential
from keras.layers import Dense, Dropout, LSTM
import keras_tuner as kt
import numpy as np
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam
# get reproducible results
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)
# constants: game results
# note(s):
# - the order for player1, higher is better
# - result is float
GAME_RESULT_PLAYER1_WON = 2.
GAME_RESULT_DRAW = 1.
GAME_RESULT_PLAYER2_WON = 0.
# tuner parameters
class TunerParameter:
def __init__(self, name, val):
self.name = name
self.val = val
self.args = (name, val)
class TunerParameters:
def __init__(self, pars=None):
self.t_pars = []
for p in pars:
tpar = TunerParameter(p[0], p[1])
self.t_pars.append(tpar)
setattr(self, p[0], tpar)
def get_pars(self):
return self.t_pars
# subclassed hyperband tuner to add hyperparameters (here, batch_size)
class HyperbandTuner(kt.tuners.Hyperband):
def __init__(self, *args, **kwargs):
self.t_pars = None
if 't_pars' in kwargs:
self.t_pars = kwargs.pop('t_pars')
super(HyperbandTuner, self).__init__(*args, **kwargs)
def run_trial(self, trial, *args, **kwargs):
if self.t_pars is not None:
for tpar in self.t_pars:
kwargs[tpar.name] = trial.hyperparameters.Choice(tpar.name, tpar.val)
return super(HyperbandTuner, self).run_trial(trial, *args, **kwargs)
# tuner class
class Tuner:
def __init__(
self,
input_shape=None,
t_pars=None,
t_dir='tuner_dir',
):
self.input_shape = input_shape
self.t_dir = t_dir
self.t_pars = t_pars
def get_model(self, hp):
# model params: layer_input
model_params_layer_input = dict(
units=hp.Choice(*self.t_pars.layer_input_units.args),
activation='relu',
input_shape=self.input_shape,
)
'''
# model params: layer_hidden_1
model_params_layer_hidden_1 = dict(
units=hp.Choice(*self.t_pars.layer_hidden_1_units.args),
activation='relu',
input_shape=self.input_shape,
)
'''
# model params: compile
model_params_compile = dict(
#optimizer=Adam(learning_rate=hp.Choice(*self.t_pars.learning_rate.args)),
optimizer='adam',
loss='mean_absolute_error',
metrics=['mean_absolute_error'],
#metrics=['mean_absolute_error', 'val_mean_absolute_error'],
#metrics=['mean_absolute_error', 'val_mae'],
)
# model
model = Sequential()
model.add(LSTM(**model_params_layer_input))
model.add(Dense(
units=1,
))
# compile
model.compile(**model_params_compile)
return model
def get_model_path(
self,
model_num,
):
return os.path.join(self.t_dir, 'best_model_' + str(model_num))
def tune(
self,
X,
y,
model=None,
model_num=0,
use_early_stopping=False,
search_split_size=0.33,
fit_verbose=0,
plot_loss=False,
):
if model is None:
model = self.get_model
# use subclassed HyperBand tuner
tuner = HyperbandTuner(
model,
objective=kt.Objective('val_mean_absolute_error', direction='min'),
#allow_new_entries=True,
tune_new_entries=True,
hyperband_iterations=2,
max_epochs=260,
directory=self.t_dir,
project_name='model_' + str(model_num),
t_pars=[self.t_pars.batch_size],
)
print('\n{}\ntuner.search_space_summary()\n{}\n'.format('-'*60, '-'*60))
tuner.search_space_summary()
callbacks = []
if use_early_stopping:
callbacks = [EarlyStopping('val_loss', patience=3)]
print('\n{}\ntuner.search()\n{}\n'.format('-'*60, '-'*60))
tuner.search(
X,
y,
validation_split=search_split_size,
shuffle=False,
callbacks=callbacks
)
print('\n{}\ntuner.results_summary()\n{}\n'.format('-'*60, '-'*60))
tuner.results_summary()
print('\n{}\nbest_hps\n{}\n'.format('-'*60, '-'*60))
best_hps = tuner.get_best_hyperparameters()[0]
best_hps_dict = {}
for tpar in self.t_pars.get_pars():
try:
best_val = best_hps[tpar.name]
print('- {} = {}'.format(tpar.name, best_val))
best_hps_dict[tpar.name] = best_val
except:
pass
# add tuner epochs
best_hps_dict['epochs'] = best_hps['tuner/epochs']
# get best model
h_model = tuner.hypermodel.build(best_hps)
print('\n{}\nh_model.summary()\n{}\n'.format('-'*60, '-'*60))
h_model.summary()
# train best model
epochs = 50
print('training ...')
history = h_model.fit(
X,
y,
epochs=epochs,
workers=4,
use_multiprocessing=True,
verbose=fit_verbose,
)
if plot_loss:
fig_title = 'Loss - model[{}]'.format(model_num)
fig = go.Figure()
fig.add_trace(go.Scattergl(y=history.history['mean_absolute_error'], name='Train'))
##fig.add_trace(go.Scattergl(y=history.history['val_mean_absolute_error'], name='Valid'))
#fig.add_trace(go.Scattergl(y=history.history['loss'], name='Loss'))
fig.update_layout(height=500, width=1200, title=fig_title, xaxis_title='Epoch', yaxis_title='Mean Absolute Error')
fig.show()
# save best model
h_model.save(self.get_model_path(model_num))
print('\n{}\nh_model.evaluate()\n{}\n'.format('-'*60, '-'*60))
h_eval_dict = h_model.evaluate(X, y, return_dict=True)
print('h_eval_dict = {}'.format(h_eval_dict))
return dict(
h_model=h_model,
h_eval_dict=h_eval_dict,
best_hps_dict=best_hps_dict,
)
def load_model(
self,
model_num,
):
try:
return load_model(self.get_model_path(model_num))
except:
return None
class DataUtils:
@classmethod
def get_model_boards_and_results(cls, game_results):
# generate datasets
model_boards_and_results = {}
for m in range(9):
model_boards_and_results[m] = []
for game_result in game_results:
result = game_result.result
for i, board in enumerate(game_result.boards):
if i == 0:
# skip board0
continue
# model_boards_data
if i < 4:
# store for all models
for m in range(2, 9):
model_boards_and_results[m].append((board, result))
else:
# store for models[i - 1]
for m in range(i - 1, 9):
model_boards_and_results[m].append((board, result))
return model_boards_and_results
@classmethod
def get_model_data(cls, model_boards_and_results):
model_data = {}
for model_num in model_boards_and_results:
boards_and_results = model_boards_and_results[model_num]
boards = []
results = []
for board, result in boards_and_results:
boards.append(board)
results.append(result)
boards_len = len(boards)
results_len = len(results)
print('model[{}]: boards_len = {}, results_len = {}'.format(model_num, boards_len, results_len))
print('model[{}]:'.format(model_num))
#print('- boards = {}'.format(boards))
#print('- results = {}'.format(results))
if boards_len == 0:
continue
# create data for model
# player1 models: 2, 4, 6, 8
# player2 models: 3, 5, 7
# but we don't about this here
# 2 = player1 won
# 1 = draw
# 0 = player2 won
# this means:
# - player1 looks for the maximum
# - player2 looks for the minimum
dataset = []
for i, board in enumerate(boards):
result = results[i]
flattened_board_and_result = []
for row in board:
print('row = {}'.format(row))
flattened_board_and_result.extend(row)
flattened_board_and_result.append(result)
dataset.append(flattened_board_and_result)
dataset = np.array(dataset, dtype=object)
print('dataset = {}'.format(dataset))
X_in, y_in = cls.split_sequences(dataset, n_steps_in, n_steps_out)
print('X_in = {}'.format(X_in))
print('y_in = {}'.format(y_in))
X_in_shape = X_in.shape
print('X_in_shape = {}'.format(X_in_shape))
model_data[model_num] = dict(
X_in=X_in,
y_in=y_in,
)
return model_data
# split a multivariate sequence into samples
@classmethod
def split_sequences(cls, sequences, n_steps_in, n_steps_out):
X, y = list(), list()
for i in range(len(sequences)):
# find the end of this pattern
end_ix = i + n_steps_in
out_end_ix = end_ix + n_steps_out-1
# check if we are beyond the dataset
if out_end_ix > len(sequences):
break
# gather input and output parts of the pattern
seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
X.append(seq_x)
y.append(seq_y)
if n_steps_out == 1:
y1 = []
for v in y:
y1.append(v[0])
y = y1
return np.array(X), np.array(y)
@classmethod
def split_datasets(cls, X, y):
return train_test_split(X, y, test_size=0.33, shuffle=False, random_state=42)
class GameResult:
def __init__(
self,
boards=None,
moves=None,
result=None,
):
self.boards = boards
self.moves = moves
self.result = result
class GamePlay:
def __init__(
self,
n_steps_in=None,
n_steps_out=None,
n_features=None,
):
self.n_steps_in = n_steps_in
self.n_steps_out = n_steps_out
self.n_features = n_features
self.player_value2displays = {0: ' ', 1: 'X', 2: 'O'}
def get_empty_board(self):
return [[0, 0, 0], [0, 0, 0], [0, 0, 0]]
def show_boards(self, boards, show_board_num=True, horizontal=True):
boards_strs = []
for i, board in enumerate(boards):
strs = self.show_board(board, board_num=i, return_strs=horizontal)
boards_strs.append(strs)
if horizontal:
for i in range(len(strs)):
line_parts = []
for b in boards_strs:
line_parts.append(b[i])
print('{}'.format(' '.join(line_parts)))
def show_board(self, board, board_num=None, return_strs=False):
strs = []
if board_num is not None:
strs.append('board: {:<6}'.format(board_num))
ds = '+---+---+---+'
strs.append('{}'.format(ds))
for r in range(3):
row_items = []
for c in range(3):
row_items.append(self.player_value2displays[board[r][c]])
dr = '| {} |'.format(' | '.join(row_items))
strs.append('{}'.format(dr))
strs.append('{}'.format(ds))
if return_strs:
return strs
for s in strs:
print(s)
def get_available_moves(self, board):
moves = []
for r in range(3):
for c in range(3):
if board[r][c] == 0:
moves.append([r, c])
return moves
def player_won(self, player, board):
# row, col win
for r in range(3):
if board[r][0] == player and board[r][1] == player and board[r][2] == player:
return True
for c in range(3):
if board[0][c] == player and board[1][c] == player and board[2][c] == player:
return True
# diag win
if board[0][0] == player and board[1][1] == player and board[2][2] == player:
return True
if board[2][0] == player and board[1][1] == player and board[0][2] == player:
return True
return False
def get_other_player(self, player):
return 2 if player == 1 else 1
def get_inverted_board(self, board):
board = copy.deepcopy(board)
inverted_board = self.get_empty_board()
for r in range(3):
for c in range(3):
if board[r][c] == 1:
inverted_board[r][c] = 2
elif board[r][c] == 2:
inverted_board[r][c] = 1
return inverted_board
def get_deeplearning_move(self, h_model, player, board, dbg=False):
# get moves
available_moves = self.get_available_moves(board)
if len(available_moves) == 1:
return available_moves[0]
board = copy.deepcopy(board)
move_vs = []
X_pred = []
for i, move in enumerate(available_moves):
next_board = copy.deepcopy(board)
self.make_move(player, next_board, move)
# create X_pred
x_pred = [
self.board2vector(board),
self.board2vector(next_board),
]
if dbg: print('x_pred = {}'.format(x_pred))
X_pred.append(x_pred)
y = [0, move[0], move[1]]
move_vs.append(y)
X_pred_len = len(X_pred)
if X_pred_len > 0:
if dbg: print('X_pred = {}'.format(X_pred))
X_pred = np.array(X_pred).reshape(X_pred_len, self.n_steps_in, self.n_features)
predictions = h_model.predict(X_pred)
if dbg: print('for X_Pred = {}, predictions = {}'.format(X_pred, predictions))
for i, y in enumerate(predictions):
move_vs[i][0] = y[0]
if dbg: print('move_vs = {}'.format(move_vs))
# get maximum or minimum
if player == 1:
reverse = True
else:
reverse = False
sorted_move_vs = move_vs.sort(key=operator.itemgetter(0), reverse=reverse)
sorted_move_vs = move_vs
if dbg: print('sorted_move_vs = {}'.format(sorted_move_vs))
#sys.exit()
move = [sorted_move_vs[0][1], sorted_move_vs[0][2]]
if dbg: print('player = {}, y = {} -> selected move = {}'.format(player, y, move))
#sys.exit()
return move
# random move
if dbg: print('player = {}, dont know, move = {}'.format(player, move))
move = random.choice(available_moves)
return move
def get_random_move(self, board):
moves = self.get_available_moves(board)
return random.choice(moves)
def make_move(self, player, board, move):
board[move[0]][move[1]] = player
return board
def play_game(self, player1_type=0, player2_type=0, h_models=[], show_game=False):
# player_type: 0 = random, 1 = dl
# first board is empty
board = self.get_empty_board()
boards = [board]
moves = []
player = 1
i = 0
while True:
if player == 1:
if player1_type == 0 or len(moves) < 2:
print('rd move: i = {}, player = {}'.format(i, player))
move = self.get_random_move(board)
else:
print('nn move: i = {}, player = {}'.format(i, player))
move = self.get_deeplearning_move(h_models[i], player, board)
else:
if player2_type == 0 or len(boards) <= 2:
print('rd move: i = {}, player = {}'.format(i, player))
move = self.get_random_move(board)
else:
print('nn move: i = {}, player = {}'.format(i, player))
move = self.get_deeplearning_move(h_models[i], player, board)
# make the move
self.make_move(player, board, move)
if show_game:
print('player = {}, made move = {}'.format(player, move))
self.show_board(board)
# store for players
boards.append(copy.deepcopy(board))
moves.append(copy.deepcopy(move))
# check won, lost, draw
if self.player_won(player, board):
# player won
if player == 1:
result = GAME_RESULT_PLAYER1_WON
else:
result = GAME_RESULT_PLAYER2_WON
return GameResult(
boards=boards,
moves=moves,
result=result,
)
other_player = self.get_other_player(player)
if self.player_won(other_player, board):
# other_player won
if other_player == 1:
result = GAME_RESULT_PLAYER1_WON
else:
result = GAME_RESULT_PLAYER2_WON
return GameResult(
boards=boards,
moves=moves,
result=result,
)
if len(self.get_available_moves(board)) == 0:
# draw
return GameResult(
boards=boards,
moves=moves,
result=GAME_RESULT_DRAW,
)
# change player
player = other_player
i += 1
def play_games(self, n_games, player1_type=0, player2_type=0, h_models=None, show_games=False):
player_wins = [0, 0]
draws = 0
total_moves = 0
result_to_text = {
0: 'lost',
1: 'draw',
2: 'won',
}
for i in range(n_games):
print('playing game {} of {} ...'.format(i, n_games))
game_result = self.play_game(
player1_type=player1_type,
player2_type=player2_type,
h_models=h_models,
show_game=False,
)
self.show_boards(game_result.boards)
total_moves += len(game_result.moves)
last_move = game_result.moves[-1]
if game_result.result == GAME_RESULT_DRAW:
draws += 1
elif game_result.result == GAME_RESULT_PLAYER1_WON:
player_wins[0] += 1
elif game_result.result == GAME_RESULT_PLAYER2_WON:
player_wins[1] += 1
if player_wins[1] == 0:
won_div_lost_fmt = '{:<8}'.format('')
else:
won_div_lost = player_wins[0]/player_wins[1]
won_div_lost_fmt = '{:8.2f}'.format(won_div_lost)
moves_avg = total_moves/n_games
print('summary for player1 after {} games\n--------'.format(n_games))
sep_line = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}-+'.format('-'*5, '-'*5, '-'*5, '-'*8, '-'*9, '-'*9)
print(sep_line)
print('| {:<5} | {:<5} | {:<5} | {:<8} | {:<9} | {:<9} |'.format('won', 'lost', 'draw', 'won/lost', 'avg_moves', 'perc draw'))
print(sep_line)
print('| {:5d} | {:5d} | {:5d} | {} | {:9.1f} | {:8.1f}% |'.format(player_wins[0], player_wins[1], draws, won_div_lost_fmt, moves_avg, (100*draws)/n_games))
print('')
print('')
def board2vector(self, board):
b = []
for r in range(3):
for c in range(3):
b.append(board[r][c])
return b
def get_game_result_signature(self, game):
sig = ''
for board in game.boards:
for r in range(3):
for c in range(3):
sig += str(board[r][c])
for move in game.moves:
sig += str(move[0]) + str(move[1])
sig += str(game.result)
return sig
def generate_datasets(self, n_games):
# game_result signatures to prevent duplicate data
sigs = []
won_count = 0
lost_count = 0
draw_count = 0
# generate game data
game_results = []
game_count = 0
while won_count < n_games or lost_count < n_games or draw_count < n_games:
if game_count > 255000:
break
print('playing game {}'.format(game_count))
game_count += 1
game_result = self.play_game()
boards_len = len(game_result.boards)
moves_len = len(game_result.moves)
result = game_result.result
print('boards_len = {}, moves_len = {}, result = {}'.format(boards_len, moves_len, result))
if boards_len != (moves_len + 1):
print('boards_len != moves_len:')
print('game_result.boards = {}'.format(game_result.boards))
print('game_result.moves = {}'.format(game_result.moves))
sys.exit()
# skip identical games
sig = self.get_game_result_signature(game_result)
if sig in sigs:
continue
# distribute won, draw, lost
if result == GAME_RESULT_DRAW:
if draw_count >= n_games:
continue
draw_count += 1
elif result == GAME_RESULT_PLAYER1_WON:
if won_count >= n_games:
continue
won_count += 1
elif result == GAME_RESULT_PLAYER2_WON:
if lost_count >= n_games:
continue
lost_count += 1
sigs.append(sig)
game_results.append(game_result)
print('game_count = {}'.format(game_count))
print('len(sigs) = {}'.format(len(sigs)))
print('won_count = {}, lost_count = {}, draw_count = {}'.format(won_count, lost_count, draw_count))
input('Press ENTER to continue ...')
return game_results
# start
# hyperparameters name and value, are the inputs for hp.Choice()
t_pars = TunerParameters(
pars=[
('layer_input_units', [64, 128]),
#('layer_hidden_1_units', [64, 128]),
('batch_size', [8, 32]),
],
)
# lstm
n_steps_in = 2
n_steps_out = 1
n_features = 9
# other
tuner_dir = 'tuner_dir_tictactoe'
won_draw_lost_count = 5000
#won_draw_lost_count = 20000
# train or use
must_train = True
#must_train = False
game_play = GamePlay(
n_steps_in=n_steps_in,
n_steps_out=n_steps_out,
n_features=n_features,
)
h_models = [None]*9
if must_train:
game_results = game_play.generate_datasets(won_draw_lost_count)
model_boards_and_results = DataUtils.get_model_boards_and_results(game_results)
model_data = DataUtils.get_model_data(model_boards_and_results)
model_data_lens = [None]*9
print('model_data:')
for model_num in model_data:
data = model_data[model_num]
print('model_num = {}'.format(model_num))
X_in = data['X_in']
y_in = data['y_in']
print('- X_in = {}'.format(X_in))
print('- y_in = {}'.format(y_in))
model_data_lens[model_num] = len(X_in)
#continue
# split into train and test
X_train, X_test, y_train, y_test = DataUtils.split_datasets(X_in, y_in)
# convert
X_in = np.asarray(X_in).astype(int)
y_in = np.asarray(y_in).astype(int)
X_train = np.asarray(X_train).astype(int)
y_train = np.asarray(y_train).astype(int)
print('type(X_train) = {}, len(X_train) = {}, X_train = {}'.format(type(X_train), len(X_train), X_train))
print('type(y_train) = {}, len(y_train) = {}, y_train = {}'.format(type(y_train), len(y_train), y_train))
n_features = X_train.shape[2]
print('n_features = {}'.format(n_features))
tuner = Tuner(
input_shape=(n_steps_in, n_features),
t_pars=t_pars,
t_dir=tuner_dir,
)
tune_result = tuner.tune(
X_train,
y_train,
model_num=model_num,
#plot_loss=False,
plot_loss=True,
)
h_model = tune_result['h_model']
# add to our list of models
h_models[model_num] = h_model
h_eval_dict = tune_result['h_eval_dict']
print('h_model = {}'.format(h_model))
print('h_eval_dict = {}'.format(h_eval_dict))
# debug: show a prediction for the h_model
X = [
[
[ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
[ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
],
[
[ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
[ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
]
]
X_pred = np.array(X).reshape(2, n_steps_in, n_features)
predictions = h_model.predict(X_pred)
print('h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))
# debug: show a prediction for the previously saved h_model
loaded_h_model = tuner.load_model(model_num)
predictions = loaded_h_model.predict(X_pred)
print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))
# here we have trained and saved the models
print('model_data_lens = {}'.format(model_data_lens))
input('Press ENTER to continue ...')
# load saved models
if h_models[2] is None:
# load saved
print('loading saved models ...')
h_models = [None]*9
for model_num in range(9):
print('loading saved model {} ...'.format(model_num))
tuner = Tuner(
input_shape=(n_steps_in, n_features),
t_pars=t_pars,
t_dir=tuner_dir,
)
loaded_h_model = tuner.load_model(model_num)
if loaded_h_model is None:
continue
'''
# debug: show a prediction for the previously saved h_model
X = [
[
[ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
[ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
],
[
[ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
[ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
]
]
X_pred = np.array(X).reshape(2, n_steps_in, n_features)
predictions = loaded_h_model.predict(X_pred)
print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))
sys.exit()
'''
h_models[model_num] = loaded_h_model
# debug: show models
for model_num in range(9):
print('h_models[{}] = {}'.format(model_num, h_models[model_num]))
print('play some games')
# play games and see results, player[1|2]_type: 0 = RD (random move), 1 = NN (neural network move)
game_play.play_games(
1000,
player1_type=1,
player2_type=1,
h_models=h_models,
show_games=True,
)
Improving performance
We can try to improve performance in the following ways:
- add more training data
- add more hyperparameter choices
- add more hyperparameter choices
- add more hyperparameters to the models like learning rate
- add extra layers to the models
- skip early moves reducing training data by
It would be nice to implement statistics that also show how many games completed in 5, 6, 7, 8, or 9 moves.
Summary
Using Deep Learning this way is not the solution to solve this game. I knew this when I started, but I still wanted to try this first before looking into Reinforcement Learning. There are a lot of possible improvements possible and I will certainly try some in future. Using Keras Tuner to tune the hyperparameters again made my life more easy. This was a nice project but also a lot of work. Learned a lot though.
Links / credits
How many tic tac toe games are there?
https://tictactoebeast.com/post/how-many-tic-tac-toe-games-are-there
How many Tic-Tac-Toe (noughts and crosses) games are possible?
http://www.se16.info/hgb/tictactoe.htm
How to Develop LSTM Models for Time Series Forecasting
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting
Part 4 — Neural Network Q Learning, a Tic-Tac-Toe player that learns — kind of
https://medium.com/@carsten.friedrich/part-4-neural-network-q-learning-a-tic-tac-toe-player-that-learns-kind-of-2090ca4798d
Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51
Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51
Read more
Deep Learning Machine Learning
Most viewed
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using PyInstaller and Cython to create a Python executable
- Connect to a service on a Docker host from a Docker container
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Flask RESTful API request parameter validation with Marshmallow schemas