TicTacJoe

Jedrzej_Swiezewski · May 13, 2021, 10:23pm

Authors:
Jędrzej Świeżewski

TTJ

Abstract:
TicTacJoe is a Reinforcement Learning agent playing Tic Tac Toe. In the app you can play against Tic Tac Joe, but also let him train and observe as he masters the game.

Full Description:

Play a game

You can play against TicTicJoe by pressing the Play a game button. When it is your turn, click a tile to make the move. When TicTacJoe is moving you will see the likelihood of him choosing a certain tile displayed on the tiles. Note that some tiles are equivalent (symmetric) and hence don't get their separate likelihood.

Let TicTacJoe train

Initially, TicTacJoe performs each move randomly (all likelihoods are equivalent in a given turn), so he is a Noob in the game. However, you can let TicTacJoe become better at the game, by letting him play against himself multiple times (click the Let TicTacJoe train button). Observe, how he gradually becomes a Tic Tac Toe Guru :).
Once he acquires some skills, you can revert him to the initial state of mind by pressing the Flush TicTacJoe's skills button.

The 3 graphs

The three graphs at the bottom (visible after you start training) visualise the evolution of the likelihoods (in %) of the first three possible moves TicTacJoe can make when he starts a game (so a move in the center, in a corner or at a side). They start from being equal (33% each), and as the training progresses, freeze out to always choosing the optimal move.

How does it work?

Interestingly, TicTacJoe's brain is implemented explicitly in pure R (with no extra libraries). It is an example of Reinforcement Learning, since as he plays against himself, he gets rewarded for moves leading him to win and hence encouraged to make them in the future. Similarly, he gets discouraged from those leading to his loss, and hence makes them less often.
To enhance the initial exploration, and then exploitation in the later stages of training, a temperature-like mechanism is introduced. Initially the temperature's role is to push the likelihoods to be equal, but as the temperature drops the largest likelihood is becoming more and more dominant.
Those mechanisms turn out to be enough to guarantee TicTacJoe almost always performs one of the optimal moves when trained to be a Guru.
A feature of the implementation of TicTacJoe is that (unlike in modern Reinforcement Learning applications) the entire probability graph is stored. This limits the direct generalizability of the approach, but makes investigation and interpretation of the state of training especially easy, which makes TicTacJoe a good toy example of the key Reinforcement Learning ideas.

Keywords: reinforcement learning, game, shiny.semantic
Shiny app: https://swiezew.shinyapps.io/tictacjoe/
Repo: GitHub - Appsilon/TicTacJoe: This repo holds an app with TicTacJoe - a reinforcement learning example, in the game of Tic Tac Toe
RStudio Cloud: Posit Cloud

Thumbnail:

Full image: