Building a board game with TensorFlow Agents and Flutter

1. Before you begin

The amazing breakthrough of AlphaGo and AlphaStar demonstrated the potential of using machine learning to build superhuman-level game agents. It is a fun exercise to build a small ML-powered game to pick up the skills needed to create powerful game agents.

In this codelab, you learn how to build a board game using:

TensorFlow Agent to train a game agent with reinforcement learning
TensorFlow Serving to serve the model
Flutter to create a cross-platform board game app

Prerequisites

Basic knowledge of Flutter development with Dart
Basic knowledge of machine learning with TensorFlow, such as training versus deployment
Basic knowledge of Python, terminals and Docker

What you'll learn

How to train a Non-Player Character (NPC) agent using TensorFlow Agents
How to serve the trained model using TensorFlow Serving
How to build a cross-platform Flutter board game

What you'll need

Flutter SDK
Android and iOS setup for Flutter
Desktop setup for Flutter
Web setup for Flutter
Visual Studio Code (VS Code) setup for Flutter and Dart
Docker
Bash
Python 3.7+

2. The Plane Strike Game

The game you build in this codelab is called ‘Plane Strike', a small 2-player board game that resembles the board game ‘Battleship'. The rules are very simple:

The human player plays against a NPC agent trained by machine learning. The human player can start the game by tapping any cell in the agent's board.
At the beginning of the game, the human player and the agent each have a ‘plane' object (8 green cells that form a ‘plane' as you can see in the human player's board in the animation below) on their own boards; these ‘planes' are randomly placed and only visible to the owners of the board and hidden to their opponents.
The human player and the agent take turns to strike at one cell of each other's board. The human player can tap any cell in the agent's board, while the agent will automatically make the choice based on the prediction of a machine learning model. The attempted cell turns red if it is a ‘plane' cell (‘hit'); otherwise it turns yellow (‘miss').
Whoever achieves 8 red cells first wins the game; then the game is restarted with fresh boards.

Here is a sample gameplay of the game:

3. Set up your Flutter development environment

For Flutter development, you need two pieces of software to complete this codelab—the Flutter SDK and an editor.

You can run the codelab using any of these devices:

The iOS simulator (requires installing Xcode tools).
The Android Emulator (requires setup in Android Studio).
A browser (Chrome is required for debugging).
As a Windows, Linux, or macOS desktop application. You must develop on the platform where you plan to deploy. So, if you want to develop a Windows desktop app, you must develop on Windows to access the appropriate build chain. There are operating system-specific requirements that are covered in detail on docs.flutter.dev/desktop.

4. Get set up

To download the code for this codelab:

Navigate to the GitHub repository for this codelab.
Click Code > Download zip to download all the code for this codelab.

Unzip the downloaded zip file to unpack a codelabs-main root folder with all the resources that you need.

For this codelab, you only need the files in the tfagents-flutter/ subdirectory in the repository, which contains multiple folders:

The step0 to step6 folders contain the starter code that you build upon for each step in this codelab.
The finished folder contains the completed code for the finished sample app.
Each folder contains a backbend subfolder, which includes the backend code, and a frontend subfolder, which includes the Flutter frontend code

5. Download the dependencies for the project

Backend

Open your terminal and go into the tfagents-flutter subfolder. Run the following:

pip install -r requirements.txt

Frontend

In VS Code, click File > Open folder and then select the step0 folder from the source code that you downloaded earlier.
Open step0/frontend/lib/main.dart file. If you see a VS Code dialog appear that prompts you to download the required packages for the starter app, click Get packages.
If you don't see this dialog, open your terminal and then run flutter pub get command in the step0/frontend folder.

6. Step 0: Run the starter app

Open step0/frontend/lib/main.dart file in VS Code, ensure that the Android Emulator or iOS Simulator is properly set up and appears in the status bar.

For example, here's what you see when you use Pixel 5 with the Android Emulator:

Here's what you see when you use iPhone 13 with the iOS Simulator:

Click Start debugging.

Run and explore the app

The app should launch on your Android Emulator or iOS Simulator. The UI is pretty straightforward. There are 2 game boards; a human player can tap any cell in the agent's board at the top as a strike position. You will train a smart agent to automatically predict where to strike based on the human player's board.

Under the hood, the Flutter app will send the human player's current board to the backend, which runs a reinforcement learning model and returns a predicted cell position to strike next. The frontend will display the result in the UI after receiving the response.

If you click any cell in the agent's board now, nothing happens because the app can't communicate with the backend yet.

7. Step 1: Create a TensorFlow Agents Python environment

The primary goal of this codelab is to design an agent that learns by interacting with an environment. While the Plane Strike game is relatively simple and it is possible to handcraft rules for the NPC agent, you use reinforcement learning to train an agent so that you learn the skills and can easily build agents for other games in the future.

In the standard Reinforcement Learning (RL) setting, the agent receives an observation at every time step and chooses an action. The action is applied to the environment and the environment returns a reward and a new observation. The agent trains a policy to choose actions to maximize the sum of rewards, also known as return. By playing the game many many times, the agent is able to learn the patterns and hone its skills to master the game. To formulate the Plane Strike game as a RL problem, think of the board state as the observation, a strike position as the action and the hit/miss signal as the reward.

To train the NPC agent, you leverage TensorFlow Agents, which is a reliable, scalable and easy-to-use reinforcement learning library for TensorFlow.

TF Agents is great for reinforcement learning because it comes with an extensive set of codelabs, examples and extensive documentation to get you started. You can use TF Agents to solve realistic and complex RL problems with scalability and develop new RL algorithms quickly. You can easily swap between different agents and algorithms for experimentation. It is also well tested and easy to configure.

There are many prebuilt game environments implemented in OpenAI Gym (e.g., Atari games), Mujuco, and etc., which TF Agents can easily leverage. But since the Plane Strike game is a complete custom game, you need to implement a new environment from scratch first.

To implement a TF Agents Python environment, you need to implement the following methods:

class YourGameEnv(py_environment.PyEnvironment):

  def __init__(self):
    """Initialize environment."""


  def action_spec(self):
    """Return action_spec."""


  def observation_spec(self):
    """Return observation_spec."""


  def _reset(self):
    """Return initial_time_step."""


  def _step(self, action):
    """Apply action and return new time_step."""

The most important one is the _step() function, which takes in an action and returns a new time_step object. In the case of the Plane Strike game, you have a game board; when a new strike position comes in, based on the game board condition, the environment figures out:

What the game board should look like next (should the cell change its color to red or yellow, given the hidden plane location?)
What reward should the player receive for that position (hit reward or miss penalty?)
Should the game terminate (did anyone win?)
Add the following code to the _step() function to the _planestrike_py_environment.py file:

if self._hit_count == self._plane_size:
    self._episode_ended = True
    return self.reset()

if self._strike_count + 1 == self._max_steps:
    self.reset()
    return ts.termination(
        np.array(self._visible_board, dtype=np.float32), UNFINISHED_GAME_REWARD
    )

self._strike_count += 1
action_x = action // self._board_size
action_y = action % self._board_size
# Hit
if self._hidden_board[action_x][action_y] == HIDDEN_BOARD_CELL_OCCUPIED:
    # Non-repeat move
    if self._visible_board[action_x][action_y] == VISIBLE_BOARD_CELL_UNTRIED:
        self._hit_count += 1
        self._visible_board[action_x][action_y] = VISIBLE_BOARD_CELL_HIT
        # Successful strike
        if self._hit_count == self._plane_size:
            # Game finished
            self._episode_ended = True
            return ts.termination(
                np.array(self._visible_board, dtype=np.float32),
                FINISHED_GAME_REWARD,
            )
        else:
            self._episode_ended = False
            return ts.transition(
                np.array(self._visible_board, dtype=np.float32),
                HIT_REWARD,
                self._discount,
            )
    # Repeat strike
    else:
        self._episode_ended = False
        return ts.transition(
            np.array(self._visible_board, dtype=np.float32),
            REPEAT_STRIKE_REWARD,
            self._discount,
        )
# Miss
else:
    # Unsuccessful strike
    self._episode_ended = False
    self._visible_board[action_x][action_y] = VISIBLE_BOARD_CELL_MISS
    return ts.transition(
        np.array(self._visible_board, dtype=np.float32),
        MISS_REWARD,
        self._discount,

8. Step 2: Train the game agent with TensorFlow Agents

With the TF Agents environment in place, you can train the game agent. For this codelab, you use a REINFORCE agent. REINFORCE is a policy gradient algorithm in RL. Its basic idea is to adjust the policy neural network parameters based on the reward signals collected during the gameplay, so that the policy network can maximize the return in future plays.

First, you need to instantiate the training and evaluation environments. Add this code to the train_agent() function in the step2/backend/training.py file:

train_py_env = planestrike_py_environment.PlaneStrikePyEnvironment(
    board_size=BOARD_SIZE, discount=DISCOUNT, max_steps=BOARD_SIZE**2
)
eval_py_env = planestrike_py_environment.PlaneStrikePyEnvironment(
    board_size=BOARD_SIZE, discount=DISCOUNT, max_steps=BOARD_SIZE**2
)

train_env = tf_py_environment.TFPyEnvironment(train_py_env)
eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)

Next, you need to create a reinforcement learning agent that is going to be trained. In this codelab, you use the REINFORCE agent, which is a policy-based agent. Add this code right below the code above:

actor_net = tfa.networks.Sequential(
    [
        tfa.keras_layers.InnerReshape([BOARD_SIZE, BOARD_SIZE], [BOARD_SIZE**2]),
        tf.keras.layers.Dense(FC_LAYER_PARAMS, activation="relu"),
        tf.keras.layers.Dense(BOARD_SIZE**2),
        tf.keras.layers.Lambda(lambda t: tfp.distributions.Categorical(logits=t)),
    ],
    input_spec=train_py_env.observation_spec(),
)

optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE)

train_step_counter = tf.Variable(0)

tf_agent = reinforce_agent.ReinforceAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    actor_network=actor_net,
    optimizer=optimizer,
    normalize_returns=True,
    train_step_counter=train_step_counter,
)

Lastly, train the agent in the training loop. In the loop, you first collect a few episodes of game plays into a buffer and then train the agent with the buffered data. Add this code to the train_agent() function in the step2/backend/training.py file:

# Collect a few episodes using collect_policy and save to the replay buffer.
collect_episode(
    train_py_env,
    collect_policy,
    COLLECT_EPISODES_PER_ITERATION,
    replay_buffer_observer,
)

# Use data from the buffer and update the agent's network.
iterator = iter(replay_buffer.as_dataset(sample_batch_size=1))
trajectories, _ = next(iterator)
tf_agent.train(experience=trajectories)
replay_buffer.clear()

Now you can kick off the training. In your terminal, go to the step2/backend folder on your computer and run:

python training.py

It takes 8-12 hours to finish training, depending on your hardware configurations (you don't have to finish the whole training by yourself since a pretrained model is provided in step3). In the meanwhile, you can monitor the progress with TensorBoard. Open a new terminal, go to the step2/backend folder on your computer and run:

tensorboard --logdir tf_agents_log/

tf_agents_log is the folder that contains the training log. A sample training run looks like below:

You can see that the average episode length decreases and the average return increases, as the training progresses. Intuitively you can understand that if the agent is smarter and makes better predictions, the game length becomes shorter and the agent gathers more rewards. This makes sense since the agent wants to finish the game in fewer steps to minimize heavy reward discounting in the later steps.

After the training is complete, the trained model is exported to the policy_model folder.

9. Step 3: Deploy the trained model with TensorFlow Serving

Now that you have trained the game agent, you can deploy it with TensorFlow Serving.

In your terminal, go to the step3/backend folder on your computer and start TensorFlow Serving with Docker:

docker run -t --rm -p 8501:8501 -p 8500:8500 -v "$(pwd)/backend/policy_model:/models/policy_model" -e MODEL_NAME=policy_model tensorflow/serving

Docker automatically downloads the TensorFlow Serving image first, which takes a minute. Afterward, TensorFlow Serving should start. The log should look like this code snippet:

2022-05-30 02:38:54.147771: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config:  model_name: policy_model model_base_path: /models/policy_model
2022-05-30 02:38:54.148222: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2022-05-30 02:38:54.148273: I tensorflow_serving/model_servers/server_core.cc:591]  (Re-)adding model: policy_model
2022-05-30 02:38:54.262684: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: policy_model version: 123}
2022-05-30 02:38:54.262768: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: policy_model version: 123}
2022-05-30 02:38:54.262787: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: policy_model version: 123}
2022-05-30 02:38:54.265010: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/policy_model/123
2022-05-30 02:38:54.277811: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve }
2022-05-30 02:38:54.278116: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/policy_model/123
2022-05-30 02:38:54.280229: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-30 02:38:54.332352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle.
2022-05-30 02:38:54.337000: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2193480000 Hz
2022-05-30 02:38:54.402803: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/policy_model/123
2022-05-30 02:38:54.410707: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 145695 microseconds.
2022-05-30 02:38:54.412726: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/policy_model/123/assets.extra/tf_serving_warmup_requests
2022-05-30 02:38:54.417277: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: policy_model version: 123}
2022-05-30 02:38:54.419846: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2022-05-30 02:38:54.420066: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled
2022-05-30 02:38:54.428339: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2022-05-30 02:38:54.431620: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

You can send a sample request to the endpoint to make sure it is working as expected:

curl -d '{"signature_name":"action","instances":[{"0/discount":0.0,"0/observation":[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,     0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],"0/reward":0.0,"0/step_type":0}]}'     -X POST http://localhost:8501/v1/models/policy_model:predict

The endpoint will return a predicted position 45, which is (5, 5) in the center of the board (for the curious, you can try to work out why the center of the board is a good guess for the first strike position).

{
    "predictions": [45]
}

That's it! You have successfully built a backend to predict the next strike position for the NPC agent..

10. Step 4: Create the Flutter app for Android and iOS

The backend is ready. You can start sending requests to it to retrieve strike position predictions from the Flutter app.

First, you need to define a class that wraps the inputs to send. Add this code to the step4/frontend/lib/game_agent.dart file:

class Inputs {
  final List<double> _boardState;
  Inputs(this._boardState);

  Map<String, dynamic> toJson() {
    final Map<String, dynamic> data = <String, dynamic>{};
    data['0/discount'] = [0.0];
    data['0/observation'] = [_boardState];
    data['0/reward'] = [0.0];
    data['0/step_type'] = [0];
    return data;
  }
}

Now you can send the request to TensorFlow Serving to make predictions.

Add this code to the predict() function in the step4/frontend/lib/game_agent.dart file:

var flattenedBoardState = boardState.expand((i) => i).toList();
final response = await http.post(
  Uri.parse('http://$server:8501/v1/models/policy_model:predict'),
  body: jsonEncode(<String, dynamic>{
    'signature_name': 'action',
    'instances': [Inputs(flattenedBoardState)]
  }),
);

if (response.statusCode == 200) {
  var output = List<int>.from(
      jsonDecode(response.body)['predictions'] as List<dynamic>);
  return output[0];
} else {
  throw Exception('Error response');
}

Once the app receives the response from the backend, you update the game UI to reflect the game progress.

Add this code to the _gridItemTapped() function in the step4/frontend/lib/main.dart file:

int agentAction =
    await _policyGradientAgent.predict(_playerVisibleBoardState);
_agentActionX = agentAction ~/ _boardSize;
_agentActionY = agentAction % _boardSize;
if (_playerHiddenBoardState[_agentActionX][_agentActionY] ==
    hiddenBoardCellOccupied) {
  // Non-repeat move
  if (_playerVisibleBoardState[_agentActionX][_agentActionY] ==
      visibleBoardCellUntried) {
    _agentHitCount++;
  }
  _playerVisibleBoardState[_agentActionX][_agentActionY] =
      visibleBoardCellHit;
} else {
  _playerVisibleBoardState[_agentActionX][_agentActionY] =
      visibleBoardCellMiss;
}
setState(() {});

Run it

Click Start debugging and then wait for the app to load.
Tap any cell in the agent's board to start the game.

11. Step 5: Enable the Flutter app for the desktop platforms

In addition to Android and iOS, Flutter also supports desktop platforms including Linux, Mac and Windows.

Linux

Make sure the target device is set to in the status bar of VSCode.
Click Start debugging and then wait for the app to load.
Click any cell in the agent's board to start the game.

Mac

For Mac, you need to set up appropriate entitlements since the app will send HTTP requests to the backend. Please refer to Entitlements and the App Sandbox for more details.

Add this code to step4/frontend/macOS/Runner/DebugProfile.entitlements and step4/frontend/macOS/Runner/Release.entitlements respectively:

<key>com.apple.security.network.client</key>
<true/>

Make sure the target device is set to in the status bar of VSCode.
Click Start debugging and then wait for the app to load.
Click any cell in the agent's board to start the game.

Windows

Make sure the target device is set to in the status bar of VSCode.
Click Start debugging and then wait for the app to load.
Click any cell in the agent's board to start the game.

12. Step 6: Enable the Flutter app for the web platform

One more thing you can do is to add web support to the Flutter app. By default the web platform is automatically enabled for Flutter apps, so all you need to do is to launch it.

Make sure the target device is set to in the status bar of VSCode.
Click Start debugging and then wait for the app to load in the Chrome browser.
Click any cell in the agent's board to start the game.

13. Congratulations

You built a board game app with a ML-powered agent to play against the human player!