angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

An attempt to solve Tic-Tac-Toe using Keras and LSTM

Deep Learning is not the way to solve Tic-Tac-Toe but it certainly is a nice educational experience.

2 March 2022 Updated 3 March 2022
post main image

After implementing my first Deep Learning LSTM model for a project I was thinking if Deep Learning could also solve a game. The first game that comes to mind is Tic-Tac-Toe. Then you search the internet and there appear to be many many people who had the same idea. Of course.

Below I present my solution to solve Tic-Tac-Toe using Keras and LSTM (Long Short Term Memory). This is Deep Learning and not a Reinforcement Learning solution, that is a totally different subject.

The solution does not use a single model, but a model for every move. Why? Because I wanted to avoid that the training data for an early move gets 'poluted' with training data for future moves.

About the Tic-Tac-Toe game

There are 255168 ways to play this game. Of these games 131184 are won by the first player, 77904 go to the opponent, and 46080 end in a draw.

A board consists of fields with content empty, X, and O. The total number of possible boards is 3^9 = 19,683.

What we know about the Tic-Tac-Toe game

Our Deep Learning black box knows the following about the game:

  • There are two players
  • It is played on 9 fields, let's call it a board
  • The players take turns making a move
  • Making a move means assigning an unused field to a player
  • This means we know which fields are available when making a move
  • After some moves someone tells us the game is over and who won, lost or if there was a draw

That's all we know, not really much. Can we use Deep Learning to play winning games?

Some questions and answers

  • How do we represent the board?
  • How do we represent a move?
  • Should we use regression or classification and how to use it?
  • Should we use MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?
  • Should every move have its own model?
  • Should both players have their own model?

How do represent the board?

The board is a vector of nine numbers, one for for each field. The number is 0 when it can be used by player1 or player2. It is 1 when taken by player1 and 2 when taken by player1.

How do represent the move?

This is not really important, let's use a list of a row and column position.

Regression or classification and how to use it?

With regression we can try to predict a value for the next move.

With classification we can try to predict a vector of best moves.

I choose regression here. We have two players. The values for player1:

  • 2: won
  • 1: draw
  • 0: lost

The prediction will be a value between 2 and 0. If it is our turn, we predict the value for all available moves. Then we select the best available value, that is the maximum. This gives our move. For player2, the best available value is the minimum.

MLP (Multilayer Precepion) or LSTM (Long Short Term Memory) or something else?

I don't know, let's try LSTM.

Every move has its own model?

I think this is important. If we have just one model, then the data after step N is also included. That seems totally wrong.

Should both players have their own model?

I think so. Player1 is always one move ahead, player2 is always 1 move behind.

Training data

We are using regression, see above, to generate a value for every possible move at a certain moment. Player1 selects the move with maximum value, the oppenent, player2 selects the move with the minimum value.

The board is a vector of nine fields. Fields can be empty, X for player1 and O for player2. A game consists of multiple vectors. We assign the outcome of the game to all vectors of a game.

For example, the data for a game lookes like:

[1, 0, 0, 0, 0, 0, 0, 0, 0, 0] -> 2 # player1 makes a move
[1, 0, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player2
[1, 1, 0, 0, 0, 0, 0, 2, 0, 0] -> 2 # player1
[1, 1, 0, 0, 0, 0, 0, 2, 0, 2] -> 2 # player2
[1, 1, 1, 0, 0, 0, 0, 2, 0, 2] -> 2 # player1 won

For LSTM information please see 'How to Develop LSTM Models for Time Series Forecasting', see links below. The model is a multivariate LSTM using n_steps_in=2 and n_steps_out=1.

To generate the training data we play the game many times.

Data from unique games only

To train our model we generate data by playing a number of games. We do this using random moves. This means we can have many duplicate games in our dataset, in other words, the dataset is poluted. At the moment I make sure that there are no duplicate games, by using a signature of the board and the final result.

A model for every move

If we use a single model then the training data becomes 'poluted' with future data. When it is our move, we want to know what is the next best move. Everything after that step polutes our existing data. That is why I use multiple models, one for every step.

move[n]    model[n] with data upto step[n+1]
move[n+1]  model[n+1] with data upto step[n+2]
etc.

For player 1 this means that at move[n] we use model[n] that has been trained with data up to move n+1. This way we can select the best available move to make.

The parameter n_steps_in=2 meaning we have no model for the first move. What we do is make the first a random move, for both players.

Also we do not use a model if there is only one move left.

Performance

Ok, I implemented this. How does it perform? The training result shows for all models that the mean_absolute_error is reduced from around 0.8 to 0.6. Not very good.

In the results below I have the two players. A player can be:

  • RD: makes random moves
  • NN: makes Neural Network moves

The first player is player1, the second is player2. Training data is generated as described above. It is generated until we have for example 1000 x won-draw-lost = 1000xwon, 1000xlost, and 1000xdraw.

1. RD vs RD

Both players make random moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   578 |   293 |   129 |     1.97 |       7.7 |     12.9% |
|   593 |   274 |   133 |     2.16 |       7.7 |     13.3% |
|   590 |   276 |   134 |     2.14 |       7.6 |     13.4% |
|   589 |   296 |   115 |     1.99 |       7.6 |     11.5% |

This does not look unreasonable. Player1 starts the game and thus has an advantage over player2.

2. NN vs RD, train with 1000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   706 |   192 |   102 |     3.68 |       7.4 |     10.2% |
|   690 |   203 |   107 |     3.40 |       7.3 |     10.7% |
|   721 |   172 |   107 |     4.19 |       7.4 |     10.7% |
|   726 |   168 |   106 |     4.32 |       7.4 |     10.6% |

Better than random moves but far from perfect!

3. NN vs RD, train with 2000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   801 |   124 |    75 |     6.46 |       7.3 |      7.5% |
|   789 |   128 |    83 |     6.16 |       7.2 |      8.3% |
|   773 |   145 |    82 |     5.33 |       7.2 |      8.2% |
|   772 |   145 |    83 |     5.32 |       7.2 |      8.3% |

Again better, but again far from perfect!

4. NN vs RD, train with 5000 x won-draw-lost

Player1 makes Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   901 |    49 |    50 |    18.39 |       7.0 |      5.0% |
|   888 |    58 |    54 |    15.31 |       7.0 |      5.4% |
|   889 |    56 |    55 |    15.88 |       7.0 |      5.5% |
|   898 |    48 |    54 |    18.71 |       7.0 |      5.4% |

This looks more like it! Can we do better?

5. NN vs RD, train with 20000 x won-draw-lost

Player1 make Neural Network moves, player2 makes random moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   955 |    21 |    24 |    45.48 |       6.7 |      2.4% |
|   959 |    15 |    26 |    63.93 |       6.7 |      2.6% |
|   962 |    13 |    25 |    74.00 |       6.7 |      2.5% |
|   969 |    12 |    19 |    80.75 |       6.7 |      1.9% |

Huge improvement, more (training) data matters! Are the wins of player2 just luck?

6. RD vs NN, train with 20000 x won-draw-lost

Now we change the order. Player1 makes random moves and player2 makes Neural Network moves.

Results of 4 runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   145 |   739 |   116 |     0.20 |       6.8 |     11.6% |
|   127 |   761 |   112 |     0.17 |       6.8 |     11.2% |
|   142 |   735 |   123 |     0.19 |       6.8 |     12.3% |
|   142 |   750 |   108 |     0.19 |       6.8 |     10.8% |

This seems reasonable. The results for player2 are not as good as player1 above. Also the number of draws increased. Does this have to do with the fact that player1 starts first?

7. NN vs NN, train with 20000 x won-draw-lost

Both players make Neural Network moves!

Results of runs of 1000 games:

+-------+-------+-------+----------+-----------+-----------+
| won   | lost  | draw  | won/lost | avg_moves | perc draw |
+-------+-------+-------+----------+-----------+-----------+
|   684 |   104 |   212 |     6.58 |       7.2 |     21.2% |
|   672 |   102 |   226 |     6.59 |       7.2 |     22.6% |
|   675 |   112 |   213 |     6.03 |       7.2 |     21.3% |

We expect won and lost to be close, but that is not the case. We also expect to see a lot more draws. That looks fine.

We can only conclude: Our black box is not smart enough ... :-(

The code

Below is the code so you can try yourself. Keras Tuner is used to optimize the hyperparameters.

Important variables:

  • tuner_dir
    This is the directory created by Keras Tuner.
  • must_train
    Set True if you want to train the models.
    Set False if you want use the trained models.
  • won_draw_lost_count
    This is about the training data, see above.

Typically you want to train once and then do multiple runs. If you change won_draw_lost_count, make sure you delete the directory tuner_dir first! Increasing won_draw_lost_count will increase training time.

# tictactoe.py

# std
import copy
import os
import random
import sys
import time
import operator

# optimizing hyperparameters with keras tuner
from keras.callbacks import EarlyStopping
from keras.models import load_model, Sequential
from keras.layers import Dense, Dropout, LSTM
import keras_tuner as kt
import numpy as np
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from tensorflow.keras.optimizers import Adam

# get reproducible results
from numpy.random import seed
seed(1)
import tensorflow
tensorflow.random.set_seed(2)


# constants: game results
# note(s):
# - the order for player1, higher is better
# - result is float
GAME_RESULT_PLAYER1_WON = 2.
GAME_RESULT_DRAW = 1.
GAME_RESULT_PLAYER2_WON = 0.

# tuner parameters
class TunerParameter:
    def __init__(self, name, val):
        self.name = name
        self.val = val
        self.args = (name, val)

class TunerParameters:
    def __init__(self, pars=None):
        self.t_pars = []
        for p in pars:
            tpar = TunerParameter(p[0], p[1])
            self.t_pars.append(tpar)
            setattr(self, p[0], tpar)
    def get_pars(self):
        return self.t_pars

# subclassed hyperband tuner to add hyperparameters (here, batch_size)
class HyperbandTuner(kt.tuners.Hyperband):
    def __init__(self, *args, **kwargs):
        self.t_pars = None
        if 't_pars' in kwargs:
            self.t_pars = kwargs.pop('t_pars')
        super(HyperbandTuner, self).__init__(*args, **kwargs)

    def run_trial(self, trial, *args, **kwargs):
        if self.t_pars is not None:
            for tpar in self.t_pars:
                kwargs[tpar.name] = trial.hyperparameters.Choice(tpar.name, tpar.val)
        return super(HyperbandTuner, self).run_trial(trial, *args, **kwargs)

# tuner class
class Tuner:

    def __init__(
        self,
        input_shape=None,
        t_pars=None,
        t_dir='tuner_dir',
    ):
        self.input_shape = input_shape
        self.t_dir = t_dir
        self.t_pars = t_pars

    def get_model(self, hp):
        # model params: layer_input 
        model_params_layer_input = dict(
            units=hp.Choice(*self.t_pars.layer_input_units.args),
            activation='relu',
            input_shape=self.input_shape,
        )

        '''
        # model params: layer_hidden_1
        model_params_layer_hidden_1 = dict(
            units=hp.Choice(*self.t_pars.layer_hidden_1_units.args),
            activation='relu',
            input_shape=self.input_shape,
        )
        '''

        # model params: compile
        model_params_compile = dict(
            #optimizer=Adam(learning_rate=hp.Choice(*self.t_pars.learning_rate.args)),
            optimizer='adam', 
            loss='mean_absolute_error',
            metrics=['mean_absolute_error'],
            #metrics=['mean_absolute_error', 'val_mean_absolute_error'],
            #metrics=['mean_absolute_error', 'val_mae'],
        )

        # model
        model = Sequential()
        model.add(LSTM(**model_params_layer_input))
        model.add(Dense(
            units=1,
        ))
        # compile
        model.compile(**model_params_compile)

        return model

    def get_model_path(
        self,
        model_num,
    ):
        return os.path.join(self.t_dir, 'best_model_' + str(model_num))

    def tune(
        self,
        X,
        y,
        model=None,
        model_num=0,
        use_early_stopping=False,
        search_split_size=0.33,
        fit_verbose=0,
        plot_loss=False,
    ):
        if model is None:
            model = self.get_model

        # use subclassed HyperBand tuner
        tuner = HyperbandTuner(
            model,
            objective=kt.Objective('val_mean_absolute_error', direction='min'),
            #allow_new_entries=True,
            tune_new_entries=True,
            hyperband_iterations=2,
            max_epochs=260,
            directory=self.t_dir,
            project_name='model_' + str(model_num),
            t_pars=[self.t_pars.batch_size],
        )

        print('\n{}\ntuner.search_space_summary()\n{}\n'.format('-'*60, '-'*60))
        tuner.search_space_summary()

        callbacks = []
        if use_early_stopping:
            callbacks = [EarlyStopping('val_loss', patience=3)]
        print('\n{}\ntuner.search()\n{}\n'.format('-'*60, '-'*60))
        tuner.search(
            X,
            y,
            validation_split=search_split_size,
            shuffle=False,
            callbacks=callbacks
        )

        print('\n{}\ntuner.results_summary()\n{}\n'.format('-'*60, '-'*60))
        tuner.results_summary()

        print('\n{}\nbest_hps\n{}\n'.format('-'*60, '-'*60))
        best_hps = tuner.get_best_hyperparameters()[0]
        best_hps_dict = {}
        for tpar in self.t_pars.get_pars():
            try:
                best_val = best_hps[tpar.name]
                print('- {} = {}'.format(tpar.name, best_val))
                best_hps_dict[tpar.name] = best_val
            except:
                pass
        # add tuner epochs
        best_hps_dict['epochs'] = best_hps['tuner/epochs']

        # get best model
        h_model = tuner.hypermodel.build(best_hps)

        print('\n{}\nh_model.summary()\n{}\n'.format('-'*60, '-'*60))
        h_model.summary()

        # train best model
        epochs = 50
        print('training ...')    
        history = h_model.fit(
            X,
            y,
            epochs=epochs,
            workers=4,
            use_multiprocessing=True,
            verbose=fit_verbose,
        )

        if plot_loss:
            fig_title = 'Loss - model[{}]'.format(model_num)
            fig = go.Figure()
            fig.add_trace(go.Scattergl(y=history.history['mean_absolute_error'], name='Train'))
            ##fig.add_trace(go.Scattergl(y=history.history['val_mean_absolute_error'], name='Valid'))
            #fig.add_trace(go.Scattergl(y=history.history['loss'], name='Loss'))
            fig.update_layout(height=500, width=1200, title=fig_title, xaxis_title='Epoch', yaxis_title='Mean Absolute Error')
            fig.show()

        # save best model
        h_model.save(self.get_model_path(model_num))

        print('\n{}\nh_model.evaluate()\n{}\n'.format('-'*60, '-'*60))
        h_eval_dict = h_model.evaluate(X, y, return_dict=True)
        print('h_eval_dict = {}'.format(h_eval_dict))

        return dict(
            h_model=h_model,
            h_eval_dict=h_eval_dict,
            best_hps_dict=best_hps_dict,
        )

    def load_model(
        self,
        model_num,
    ):
        try:
            return load_model(self.get_model_path(model_num))
        except:
            return None

class DataUtils:

    @classmethod
    def get_model_boards_and_results(cls, game_results):
        # generate datasets
        model_boards_and_results = {}
        for m in range(9):
            model_boards_and_results[m] = []
        for game_result in game_results:
            result = game_result.result
            for i, board in enumerate(game_result.boards):
                if i == 0:
                    # skip board0
                    continue
                # model_boards_data
                if i < 4:
                    # store for all models
                    for m in range(2, 9):
                        model_boards_and_results[m].append((board, result))
                else:
                    # store for models[i - 1]
                    for m in range(i - 1, 9):
                       model_boards_and_results[m].append((board, result))
        return model_boards_and_results

    @classmethod
    def get_model_data(cls, model_boards_and_results):
        model_data = {}
        for model_num in model_boards_and_results:
            boards_and_results = model_boards_and_results[model_num]
            boards = []
            results = []
            for board, result in boards_and_results:
                boards.append(board)
                results.append(result)
            boards_len = len(boards)
            results_len = len(results)
            print('model[{}]: boards_len = {}, results_len = {}'.format(model_num, boards_len, results_len))
            print('model[{}]:'.format(model_num))
            #print('- boards = {}'.format(boards))
            #print('- results = {}'.format(results))

            if boards_len == 0:
                continue

            # create data for model
            # player1 models: 2, 4, 6, 8
            # player2 models: 3, 5, 7
            # but we don't about this here
            # 2 = player1 won
            # 1 = draw
            # 0 = player2 won
            # this means:
            # - player1 looks for the maximum
            # - player2 looks for the minimum
            dataset = []
            for i, board in enumerate(boards):
                result = results[i]
                flattened_board_and_result = []
                for row in board:
                    print('row = {}'.format(row))
                    flattened_board_and_result.extend(row)
                flattened_board_and_result.append(result)
                dataset.append(flattened_board_and_result)
            dataset = np.array(dataset, dtype=object)
            print('dataset = {}'.format(dataset))
            X_in, y_in = cls.split_sequences(dataset, n_steps_in, n_steps_out)
            print('X_in = {}'.format(X_in))
            print('y_in = {}'.format(y_in))

            X_in_shape = X_in.shape
            print('X_in_shape = {}'.format(X_in_shape))

            model_data[model_num] = dict(
                X_in=X_in,
                y_in=y_in,
            )

        return model_data

    # split a multivariate sequence into samples
    @classmethod
    def split_sequences(cls, sequences, n_steps_in, n_steps_out):
        X, y = list(), list()
        for i in range(len(sequences)):
            # find the end of this pattern
            end_ix = i + n_steps_in
            out_end_ix = end_ix + n_steps_out-1
            # check if we are beyond the dataset
            if out_end_ix > len(sequences):
                break
            # gather input and output parts of the pattern
            seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1:out_end_ix, -1]
            X.append(seq_x)
            y.append(seq_y)
        if n_steps_out == 1:
            y1 = []
            for v in y:
                y1.append(v[0])
            y = y1
        return np.array(X), np.array(y)

    @classmethod
    def split_datasets(cls, X, y):
        return train_test_split(X, y, test_size=0.33, shuffle=False, random_state=42)


class GameResult:
    def __init__(
        self,
        boards=None,
        moves=None,
        result=None,
    ):
        self.boards = boards
        self.moves = moves
        self.result = result


class GamePlay:
    
    def __init__(
        self,
        n_steps_in=None,
        n_steps_out=None,
        n_features=None,
    ):
        self.n_steps_in = n_steps_in
        self.n_steps_out = n_steps_out
        self.n_features = n_features
        self.player_value2displays = {0: ' ', 1: 'X', 2: 'O'}

    def get_empty_board(self):
        return [[0, 0, 0], [0, 0, 0], [0, 0, 0]]

    def show_boards(self, boards, show_board_num=True, horizontal=True):
        boards_strs = []
        for i, board in enumerate(boards):
            strs = self.show_board(board, board_num=i, return_strs=horizontal)
            boards_strs.append(strs)
        if horizontal:
            for i in range(len(strs)):
                line_parts = []
                for b in boards_strs:
                    line_parts.append(b[i])
                print('{}'.format('  '.join(line_parts)))

    def show_board(self, board, board_num=None, return_strs=False):
        strs = []    
        if board_num is not None:
            strs.append('board: {:<6}'.format(board_num))
        ds = '+---+---+---+'
        strs.append('{}'.format(ds))
        for r in range(3):
            row_items = []
            for c in range(3):
                row_items.append(self.player_value2displays[board[r][c]])
            dr = '| {} |'.format(' | '.join(row_items))
            strs.append('{}'.format(dr))
            strs.append('{}'.format(ds))
        if return_strs:
            return strs
        for s in strs:
            print(s)

    def get_available_moves(self, board):
        moves = []
        for r in range(3):
            for c in range(3):
                if board[r][c] == 0:
                    moves.append([r, c])
        return moves

    def player_won(self, player, board):
        # row, col win
        for r in range(3):
            if board[r][0] == player and board[r][1] == player and board[r][2] == player:
                return True
        for c in range(3):
            if board[0][c] == player and board[1][c] == player and board[2][c] == player:
                return True
        # diag win
        if board[0][0] == player and board[1][1] == player and board[2][2] == player:
                return True
        if board[2][0] == player and board[1][1] == player and board[0][2] == player:
                return True
        return False

    def get_other_player(self, player):
        return 2 if player == 1 else 1

    def get_inverted_board(self, board):
        board = copy.deepcopy(board)
        inverted_board = self.get_empty_board()
        for r in range(3):
            for c in range(3):
                if board[r][c] == 1:
                    inverted_board[r][c] = 2
                elif board[r][c] == 2:
                    inverted_board[r][c] = 1
        return inverted_board
        
    def get_deeplearning_move(self, h_model, player, board, dbg=False):
        # get moves
        available_moves = self.get_available_moves(board)
        if len(available_moves) == 1:
            return available_moves[0]
        board = copy.deepcopy(board)

        move_vs = []
        X_pred = []
        for i, move in enumerate(available_moves):
            next_board = copy.deepcopy(board)
            self.make_move(player, next_board, move)
            # create X_pred
            x_pred = [
                self.board2vector(board),
                self.board2vector(next_board),
            ]
            if dbg: print('x_pred = {}'.format(x_pred))
            X_pred.append(x_pred)
            y = [0, move[0], move[1]]
            move_vs.append(y)

        X_pred_len = len(X_pred)
        if X_pred_len > 0:
            if dbg: print('X_pred = {}'.format(X_pred))
            X_pred = np.array(X_pred).reshape(X_pred_len, self.n_steps_in, self.n_features)
            predictions = h_model.predict(X_pred)
            if dbg: print('for X_Pred = {}, predictions = {}'.format(X_pred, predictions))
            for i, y in enumerate(predictions):
                move_vs[i][0] = y[0]
            if dbg: print('move_vs = {}'.format(move_vs))
            # get maximum or minimum
            if player == 1:
                reverse = True
            else:
                reverse = False    
            sorted_move_vs = move_vs.sort(key=operator.itemgetter(0), reverse=reverse)
            sorted_move_vs = move_vs
            if dbg: print('sorted_move_vs = {}'.format(sorted_move_vs))
            #sys.exit()
            move = [sorted_move_vs[0][1], sorted_move_vs[0][2]]
            if dbg: print('player = {}, y = {} -> selected move = {}'.format(player, y, move))
            #sys.exit()
            return move    

        # random move
        if dbg: print('player = {}, dont know, move = {}'.format(player, move))
        move = random.choice(available_moves)
        return move

    def get_random_move(self, board):
        moves = self.get_available_moves(board)
        return random.choice(moves)

    def make_move(self, player, board, move):
        board[move[0]][move[1]] = player
        return board

    def play_game(self, player1_type=0, player2_type=0, h_models=[], show_game=False):
        # player_type: 0 = random, 1 = dl
        # first board is empty
        board = self.get_empty_board()
        boards = [board]
        moves = []
        player = 1
        i = 0
        while True:
            if player == 1:
                if player1_type == 0 or len(moves) < 2:
                    print('rd move: i = {}, player = {}'.format(i, player))
                    move = self.get_random_move(board)
                else:
                    print('nn move: i = {}, player = {}'.format(i, player))
                    move = self.get_deeplearning_move(h_models[i], player, board)
            else:
                if player2_type == 0 or len(boards) <= 2:
                    print('rd move: i = {}, player = {}'.format(i, player))
                    move = self.get_random_move(board)
                else:
                    print('nn move: i = {}, player = {}'.format(i, player))
                    move = self.get_deeplearning_move(h_models[i], player, board)

            # make the move
            self.make_move(player, board, move)

            if show_game:
                print('player = {}, made move = {}'.format(player, move))
                self.show_board(board)

            # store for players
            boards.append(copy.deepcopy(board))
            moves.append(copy.deepcopy(move))

            # check won, lost, draw
            if self.player_won(player, board):
                # player won
                if player == 1:
                    result = GAME_RESULT_PLAYER1_WON
                else:
                    result = GAME_RESULT_PLAYER2_WON
                return GameResult(
                    boards=boards,
                    moves=moves,
                    result=result,
                )

            other_player = self.get_other_player(player)
            if self.player_won(other_player, board):
                # other_player won 
                if other_player == 1:
                    result = GAME_RESULT_PLAYER1_WON
                else:
                    result = GAME_RESULT_PLAYER2_WON
                return GameResult(
                    boards=boards,
                    moves=moves,
                    result=result,
                )

            if len(self.get_available_moves(board)) == 0:
                # draw
                return GameResult(
                    boards=boards,
                    moves=moves,
                    result=GAME_RESULT_DRAW,
                )

            # change player
            player = other_player
            i += 1

    def play_games(self, n_games, player1_type=0, player2_type=0, h_models=None, show_games=False):
        player_wins = [0, 0]
        draws = 0
        total_moves = 0
        result_to_text = {
            0: 'lost',
            1: 'draw',
            2: 'won',
        }
        for i in range(n_games):
            print('playing game {} of {} ...'.format(i, n_games))
            game_result = self.play_game(
                player1_type=player1_type,
                player2_type=player2_type,
                h_models=h_models,
                show_game=False,
            )
            self.show_boards(game_result.boards)
            total_moves += len(game_result.moves)
            last_move = game_result.moves[-1]
            if game_result.result == GAME_RESULT_DRAW:
                draws += 1
            elif game_result.result == GAME_RESULT_PLAYER1_WON:
                player_wins[0] += 1
            elif game_result.result == GAME_RESULT_PLAYER2_WON:
                player_wins[1] += 1

        if player_wins[1] == 0:
            won_div_lost_fmt = '{:<8}'.format('')
        else:
            won_div_lost = player_wins[0]/player_wins[1]
            won_div_lost_fmt = '{:8.2f}'.format(won_div_lost)
        moves_avg = total_moves/n_games
        print('summary for player1 after {} games\n--------'.format(n_games))
        sep_line = '+-{}-+-{}-+-{}-+-{}-+-{}-+-{}-+'.format('-'*5, '-'*5, '-'*5, '-'*8, '-'*9, '-'*9)
        print(sep_line)
        print('| {:<5} | {:<5} | {:<5} | {:<8} | {:<9} | {:<9} |'.format('won', 'lost', 'draw', 'won/lost', 'avg_moves', 'perc draw'))
        print(sep_line)
        print('| {:5d} | {:5d} | {:5d} | {} | {:9.1f} | {:8.1f}% |'.format(player_wins[0], player_wins[1], draws, won_div_lost_fmt, moves_avg, (100*draws)/n_games))
        print('')
        print('')

    def board2vector(self, board):
        b = []
        for r in range(3):
            for c in range(3):
                b.append(board[r][c])
        return b

    def get_game_result_signature(self, game):
        sig = ''
        for board in game.boards:
            for r in range(3):
                for c in range(3):
                    sig += str(board[r][c])
        for move in game.moves:
            sig += str(move[0]) + str(move[1])
        sig += str(game.result)
        return sig

    def generate_datasets(self, n_games):
        # game_result signatures to prevent duplicate data
        sigs = []
        won_count = 0
        lost_count = 0
        draw_count = 0
        # generate game data
        game_results = []
        game_count = 0
        while won_count < n_games or lost_count < n_games or draw_count < n_games:
            if game_count > 255000:
                break

            print('playing game {}'.format(game_count))
            game_count += 1
            game_result = self.play_game()
            boards_len = len(game_result.boards)
            moves_len = len(game_result.moves)
            result = game_result.result
            
            print('boards_len = {}, moves_len = {}, result = {}'.format(boards_len, moves_len, result))
            if boards_len != (moves_len + 1):
                print('boards_len != moves_len:')
                print('game_result.boards = {}'.format(game_result.boards))
                print('game_result.moves = {}'.format(game_result.moves))
                sys.exit()

            # skip identical games
            sig = self.get_game_result_signature(game_result)
            if sig in sigs:
                continue
            # distribute won, draw, lost
            if result == GAME_RESULT_DRAW:
                if draw_count >= n_games:
                    continue
                draw_count += 1
            elif result == GAME_RESULT_PLAYER1_WON:
                if won_count >= n_games:
                    continue
                won_count += 1
            elif result == GAME_RESULT_PLAYER2_WON:
                if lost_count >= n_games:
                    continue
                lost_count += 1

            sigs.append(sig)
            game_results.append(game_result)

        print('game_count = {}'.format(game_count))
        print('len(sigs) = {}'.format(len(sigs)))
        print('won_count = {}, lost_count = {}, draw_count = {}'.format(won_count, lost_count, draw_count))
        input('Press ENTER to continue ...')
        return game_results

# start

# hyperparameters name and value, are the inputs for hp.Choice()
t_pars = TunerParameters(
    pars=[
        ('layer_input_units', [64, 128]),
        #('layer_hidden_1_units', [64, 128]),
        ('batch_size', [8, 32]),
    ],
)

# lstm
n_steps_in = 2
n_steps_out = 1
n_features = 9

# other
tuner_dir = 'tuner_dir_tictactoe'
won_draw_lost_count = 5000
#won_draw_lost_count = 20000

# train or use
must_train = True
#must_train = False

game_play = GamePlay(
    n_steps_in=n_steps_in,
    n_steps_out=n_steps_out,
    n_features=n_features,
)


h_models = [None]*9
if must_train:
    game_results = game_play.generate_datasets(won_draw_lost_count)
    model_boards_and_results = DataUtils.get_model_boards_and_results(game_results)
    model_data = DataUtils.get_model_data(model_boards_and_results)

    model_data_lens = [None]*9
    print('model_data:')
    for model_num in model_data:
        data = model_data[model_num]
        print('model_num = {}'.format(model_num))
        X_in = data['X_in']
        y_in = data['y_in']
        print('- X_in = {}'.format(X_in))
        print('- y_in = {}'.format(y_in))
        model_data_lens[model_num] = len(X_in)
        #continue

        # split into train and test
        X_train, X_test, y_train, y_test = DataUtils.split_datasets(X_in, y_in)

        # convert
        X_in = np.asarray(X_in).astype(int)
        y_in = np.asarray(y_in).astype(int)
        X_train = np.asarray(X_train).astype(int)
        y_train = np.asarray(y_train).astype(int)
        print('type(X_train) = {}, len(X_train) = {}, X_train = {}'.format(type(X_train), len(X_train), X_train))
        print('type(y_train) = {}, len(y_train) = {}, y_train = {}'.format(type(y_train), len(y_train), y_train))

        n_features = X_train.shape[2]
        print('n_features = {}'.format(n_features))

        tuner = Tuner(
            input_shape=(n_steps_in, n_features),
            t_pars=t_pars,
            t_dir=tuner_dir,
        )

        tune_result = tuner.tune(
            X_train,
            y_train,
            model_num=model_num,
            #plot_loss=False,
            plot_loss=True,
        )

        h_model = tune_result['h_model']

        # add to our list of models
        h_models[model_num] = h_model

        h_eval_dict = tune_result['h_eval_dict']
        print('h_model = {}'.format(h_model))
        print('h_eval_dict = {}'.format(h_eval_dict))

        # debug: show a prediction for the h_model
        X = [
            [
                [ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
                [ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
            ],
            [
                [ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
                [ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
            ]
        ]
        X_pred = np.array(X).reshape(2, n_steps_in, n_features)
        predictions = h_model.predict(X_pred)
        print('h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))

        # debug: show a prediction for the previously saved h_model
        loaded_h_model = tuner.load_model(model_num)
        predictions = loaded_h_model.predict(X_pred)
        print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))

    # here we have trained and saved the models 
    print('model_data_lens = {}'.format(model_data_lens))
    input('Press ENTER to continue ...')

# load saved models
if h_models[2] is None:
    # load saved
    print('loading saved models ...')
    h_models = [None]*9
    for model_num in range(9):
        print('loading saved model {}  ...'.format(model_num))
        tuner = Tuner(
            input_shape=(n_steps_in, n_features),
            t_pars=t_pars,
            t_dir=tuner_dir,
        )
        loaded_h_model = tuner.load_model(model_num)
        if loaded_h_model is None:
            continue

        '''
        # debug: show a prediction for the previously saved h_model
        X = [
            [
                [ [0, 0, 0], [1, 0, 0], [0, 0, 0] ],
                [ [0, 0, 0], [1, 2, 0], [0, 0, 0] ],
            ],
            [
                [ [0, 0, 0], [0, 1, 0], [0, 0, 0] ],
                [ [0, 0, 0], [0, 1, 2], [0, 0, 0] ],
            ]
        ]
        X_pred = np.array(X).reshape(2, n_steps_in, n_features)
        predictions = loaded_h_model.predict(X_pred)
        print('loaded_h_model: for X_pred = {}, predictions = {}'.format(X_pred, predictions))
        sys.exit()
        '''
        h_models[model_num] = loaded_h_model

# debug: show models
for model_num in range(9):
    print('h_models[{}] = {}'.format(model_num, h_models[model_num]))

print('play some games')

# play games and see results, player[1|2]_type: 0 = RD (random move), 1 = NN (neural network move)
game_play.play_games(
    1000,
    player1_type=1,
    player2_type=1,
    h_models=h_models, 
    show_games=True,
)

Improving performance

We can try to improve performance in the following ways:

  • add more training data
  • add more hyperparameter choices
  • add more hyperparameter choices
  • add more hyperparameters to the models like learning rate
  • add extra layers to the models
  • skip early moves reducing training data by

It would be nice to implement statistics that also show how many games completed in 5, 6, 7, 8, or 9 moves.

Summary

Using Deep Learning this way is not the solution to solve this game. I knew this when I started, but I still wanted to try this first before looking into Reinforcement Learning. There are a lot of possible improvements possible and I will certainly try some in future. Using Keras Tuner to tune the hyperparameters again made my life more easy. This was a nice project but also a lot of work. Learned a lot though.

Links / credits

How many tic tac toe games are there?
https://tictactoebeast.com/post/how-many-tic-tac-toe-games-are-there

How many Tic-Tac-Toe (noughts and crosses) games are possible?
http://www.se16.info/hgb/tictactoe.htm

How to Develop LSTM Models for Time Series Forecasting
https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting

Part 4 — Neural Network Q Learning, a Tic-Tac-Toe player that learns — kind of
https://medium.com/@carsten.friedrich/part-4-neural-network-q-learning-a-tic-tac-toe-player-that-learns-kind-of-2090ca4798d

Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51

Tic-Tac-Toe and Reinforcement Learning
https://medium.com/swlh/tic-tac-toe-and-deep-neural-networks-ea600bc53f51

Leave a comment

Comment anonymously or log in to comment.

Comments

Leave a reply

Reply anonymously or log in to reply.