1. Before you begin
The amazing breakthrough of AlphaGo and AlphaStar demonstrated the potential of using machine learning to build superhuman-level game agents. It is a fun exercise to build a small ML-powered game to pick up the skills needed to create powerful game agents.
In this codelab, you learn how to build a board game using:
- TensorFlow Agent to train a game agent with reinforcement learning
- TensorFlow Serving to serve the model
- Flutter to create a cross-platform board game app
Prerequisites
- Basic knowledge of Flutter development with Dart
- Basic knowledge of machine learning with TensorFlow, such as training versus deployment
- Basic knowledge of Python, terminals and Docker
What you'll learn
- How to train a Non-Player Character (NPC) agent using TensorFlow Agents
- How to serve the trained model using TensorFlow Serving
- How to build a cross-platform Flutter board game
What you'll need
- Flutter SDK
- Android and iOS setup for Flutter
- Desktop setup for Flutter
- Web setup for Flutter
- Visual Studio Code (VS Code) setup for Flutter and Dart
- Docker
- Bash
- Python 3.7+
2. The Plane Strike Game
The game you build in this codelab is called ‘Plane Strike', a small 2-player board game that resembles the board game ‘Battleship'. The rules are very simple:
- The human player plays against a NPC agent trained by machine learning. The human player can start the game by tapping any cell in the agent's board.
- At the beginning of the game, the human player and the agent each have a ‘plane' object (8 green cells that form a ‘plane' as you can see in the human player's board in the animation below) on their own boards; these ‘planes' are randomly placed and only visible to the owners of the board and hidden to their opponents.
- The human player and the agent take turns to strike at one cell of each other's board. The human player can tap any cell in the agent's board, while the agent will automatically make the choice based on the prediction of a machine learning model. The attempted cell turns red if it is a ‘plane' cell (‘hit'); otherwise it turns yellow (‘miss').
- Whoever achieves 8 red cells first wins the game; then the game is restarted with fresh boards.
Here is a sample gameplay of the game:
3. Set up your Flutter development environment
For Flutter development, you need two pieces of software to complete this codelab—the Flutter SDK and an editor.
You can run the codelab using any of these devices:
- The iOS simulator (requires installing Xcode tools).
- The Android Emulator (requires setup in Android Studio).
- A browser (Chrome is required for debugging).
- As a Windows, Linux, or macOS desktop application. You must develop on the platform where you plan to deploy. So, if you want to develop a Windows desktop app, you must develop on Windows to access the appropriate build chain. There are operating system-specific requirements that are covered in detail on docs.flutter.dev/desktop.
4. Get set up
To download the code for this codelab:
- Navigate to the GitHub repository for this codelab.
- Click Code > Download zip to download all the code for this codelab.
- Unzip the downloaded zip file to unpack a
codelabs-main
root folder with all the resources that you need.
For this codelab, you only need the files in the tfagents-flutter/
subdirectory in the repository, which contains multiple folders:
- The
step0
tostep6
folders contain the starter code that you build upon for each step in this codelab. - The
finished
folder contains the completed code for the finished sample app. - Each folder contains a
backbend
subfolder, which includes the backend code, and afrontend
subfolder, which includes the Flutter frontend code
5. Download the dependencies for the project
Backend
Open your terminal and go into the tfagents-flutter
subfolder. Run the following:
pip install -r requirements.txt
Frontend
- In VS Code, click File > Open folder and then select the
step0
folder from the source code that you downloaded earlier. - Open
step0/frontend/lib/main.dart
file. If you see a VS Code dialog appear that prompts you to download the required packages for the starter app, click Get packages. - If you don't see this dialog, open your terminal and then run
flutter pub get
command in thestep0/frontend
folder.
6. Step 0: Run the starter app
- Open
step0/frontend/lib/main.dart
file in VS Code, ensure that the Android Emulator or iOS Simulator is properly set up and appears in the status bar.
For example, here's what you see when you use Pixel 5 with the Android Emulator:
Here's what you see when you use iPhone 13 with the iOS Simulator:
- Click Start debugging.
Run and explore the app
The app should launch on your Android Emulator or iOS Simulator. The UI is pretty straightforward. There are 2 game boards; a human player can tap any cell in the agent's board at the top as a strike position. You will train a smart agent to automatically predict where to strike based on the human player's board.
Under the hood, the Flutter app will send the human player's current board to the backend, which runs a reinforcement learning model and returns a predicted cell position to strike next. The frontend will display the result in the UI after receiving the response.
If you click any cell in the agent's board now, nothing happens because the app can't communicate with the backend yet.
7. Step 1: Create a TensorFlow Agents Python environment
The primary goal of this codelab is to design an agent that learns by interacting with an environment. While the Plane Strike game is relatively simple and it is possible to handcraft rules for the NPC agent, you use reinforcement learning to train an agent so that you learn the skills and can easily build agents for other games in the future.
In the standard Reinforcement Learning (RL) setting, the agent receives an observation at every time step and chooses an action. The action is applied to the environment and the environment returns a reward and a new observation. The agent trains a policy to choose actions to maximize the sum of rewards, also known as return. By playing the game many many times, the agent is able to learn the patterns and hone its skills to master the game. To formulate the Plane Strike game as a RL problem, think of the board state as the observation, a strike position as the action and the hit/miss signal as the reward.
To train the NPC agent, you leverage TensorFlow Agents, which is a reliable, scalable and easy-to-use reinforcement learning library for TensorFlow.
TF Agents is great for reinforcement learning because it comes with an extensive set of codelabs, examples and extensive documentation to get you started. You can use TF Agents to solve realistic and complex RL problems with scalability and develop new RL algorithms quickly. You can easily swap between different agents and algorithms for experimentation. It is also well tested and easy to configure.
There are many prebuilt game environments implemented in OpenAI Gym (e.g., Atari games), Mujuco, and etc., which TF Agents can easily leverage. But since the Plane Strike game is a complete custom game, you need to implement a new environment from scratch first.
To implement a TF Agents Python environment, you need to implement the following methods:
class YourGameEnv(py_environment.PyEnvironment): def __init__(self): """Initialize environment.""" def action_spec(self): """Return action_spec.""" def observation_spec(self): """Return observation_spec.""" def _reset(self): """Return initial_time_step.""" def _step(self, action): """Apply action and return new time_step."""
The most important one is the _step()
function, which takes in an action and returns a new time_step
object. In the case of the Plane Strike game, you have a game board; when a new strike position comes in, based on the game board condition, the environment figures out:
- What the game board should look like next (should the cell change its color to red or yellow, given the hidden plane location?)
- What reward should the player receive for that position (hit reward or miss penalty?)
- Should the game terminate (did anyone win?)
- Add the following code to the
_step()
function to the_planestrike_py_environment.py
file:
if self._hit_count == self._plane_size: self._episode_ended = True return self.reset() if self._strike_count + 1 == self._max_steps: self.reset() return ts.termination( np.array(self._visible_board, dtype=np.float32), UNFINISHED_GAME_REWARD ) self._strike_count += 1 action_x = action // self._board_size action_y = action % self._board_size # Hit if self._hidden_board[action_x][action_y] == HIDDEN_BOARD_CELL_OCCUPIED: # Non-repeat move if self._visible_board[action_x][action_y] == VISIBLE_BOARD_CELL_UNTRIED: self._hit_count += 1 self._visible_board[action_x][action_y] = VISIBLE_BOARD_CELL_HIT # Successful strike if self._hit_count == self._plane_size: # Game finished self._episode_ended = True return ts.termination( np.array(self._visible_board, dtype=np.float32), FINISHED_GAME_REWARD, ) else: self._episode_ended = False return ts.transition( np.array(self._visible_board, dtype=np.float32), HIT_REWARD, self._discount, ) # Repeat strike else: self._episode_ended = False return ts.transition( np.array(self._visible_board, dtype=np.float32), REPEAT_STRIKE_REWARD, self._discount, ) # Miss else: # Unsuccessful strike self._episode_ended = False self._visible_board[action_x][action_y] = VISIBLE_BOARD_CELL_MISS return ts.transition( np.array(self._visible_board, dtype=np.float32), MISS_REWARD, self._discount,
8. Step 2: Train the game agent with TensorFlow Agents
With the TF Agents environment in place, you can train the game agent. For this codelab, you use a REINFORCE agent. REINFORCE is a policy gradient algorithm in RL. Its basic idea is to adjust the policy neural network parameters based on the reward signals collected during the gameplay, so that the policy network can maximize the return in future plays.
- First, you need to instantiate the training and evaluation environments. Add this code to the
train_agent()
function in thestep2/backend/training.py
file:
train_py_env = planestrike_py_environment.PlaneStrikePyEnvironment( board_size=BOARD_SIZE, discount=DISCOUNT, max_steps=BOARD_SIZE**2 ) eval_py_env = planestrike_py_environment.PlaneStrikePyEnvironment( board_size=BOARD_SIZE, discount=DISCOUNT, max_steps=BOARD_SIZE**2 ) train_env = tf_py_environment.TFPyEnvironment(train_py_env) eval_env = tf_py_environment.TFPyEnvironment(eval_py_env)
- Next, you need to create a reinforcement learning agent that is going to be trained. In this codelab, you use the REINFORCE agent, which is a policy-based agent. Add this code right below the code above:
actor_net = tfa.networks.Sequential( [ tfa.keras_layers.InnerReshape([BOARD_SIZE, BOARD_SIZE], [BOARD_SIZE**2]), tf.keras.layers.Dense(FC_LAYER_PARAMS, activation="relu"), tf.keras.layers.Dense(BOARD_SIZE**2), tf.keras.layers.Lambda(lambda t: tfp.distributions.Categorical(logits=t)), ], input_spec=train_py_env.observation_spec(), ) optimizer = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE) train_step_counter = tf.Variable(0) tf_agent = reinforce_agent.ReinforceAgent( train_env.time_step_spec(), train_env.action_spec(), actor_network=actor_net, optimizer=optimizer, normalize_returns=True, train_step_counter=train_step_counter, )
- Lastly, train the agent in the training loop. In the loop, you first collect a few episodes of game plays into a buffer and then train the agent with the buffered data. Add this code to the
train_agent()
function in thestep2/backend/training.py
file:
# Collect a few episodes using collect_policy and save to the replay buffer. collect_episode( train_py_env, collect_policy, COLLECT_EPISODES_PER_ITERATION, replay_buffer_observer, ) # Use data from the buffer and update the agent's network. iterator = iter(replay_buffer.as_dataset(sample_batch_size=1)) trajectories, _ = next(iterator) tf_agent.train(experience=trajectories) replay_buffer.clear()
- Now you can kick off the training. In your terminal, go to the
step2/backend
folder on your computer and run:
python training.py
It takes 8-12 hours to finish training, depending on your hardware configurations (you don't have to finish the whole training by yourself since a pretrained model is provided in step3
). In the meanwhile, you can monitor the progress with TensorBoard. Open a new terminal, go to the step2/backend
folder on your computer and run:
tensorboard --logdir tf_agents_log/
tf_agents_log
is the folder that contains the training log. A sample training run looks like below:
You can see that the average episode length decreases and the average return increases, as the training progresses. Intuitively you can understand that if the agent is smarter and makes better predictions, the game length becomes shorter and the agent gathers more rewards. This makes sense since the agent wants to finish the game in fewer steps to minimize heavy reward discounting in the later steps.
After the training is complete, the trained model is exported to the policy_model
folder.
9. Step 3: Deploy the trained model with TensorFlow Serving
Now that you have trained the game agent, you can deploy it with TensorFlow Serving.
- In your terminal, go to the
step3/backend
folder on your computer and start TensorFlow Serving with Docker:
docker run -t --rm -p 8501:8501 -p 8500:8500 -v "$(pwd)/backend/policy_model:/models/policy_model" -e MODEL_NAME=policy_model tensorflow/serving
Docker automatically downloads the TensorFlow Serving image first, which takes a minute. Afterward, TensorFlow Serving should start. The log should look like this code snippet:
2022-05-30 02:38:54.147771: I tensorflow_serving/model_servers/server.cc:89] Building single TensorFlow model file config: model_name: policy_model model_base_path: /models/policy_model 2022-05-30 02:38:54.148222: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models. 2022-05-30 02:38:54.148273: I tensorflow_serving/model_servers/server_core.cc:591] (Re-)adding model: policy_model 2022-05-30 02:38:54.262684: I tensorflow_serving/core/basic_manager.cc:740] Successfully reserved resources to load servable {name: policy_model version: 123} 2022-05-30 02:38:54.262768: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: policy_model version: 123} 2022-05-30 02:38:54.262787: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: policy_model version: 123} 2022-05-30 02:38:54.265010: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:38] Reading SavedModel from: /models/policy_model/123 2022-05-30 02:38:54.277811: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:90] Reading meta graph with tags { serve } 2022-05-30 02:38:54.278116: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/policy_model/123 2022-05-30 02:38:54.280229: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-05-30 02:38:54.332352: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:206] Restoring SavedModel bundle. 2022-05-30 02:38:54.337000: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2193480000 Hz 2022-05-30 02:38:54.402803: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:190] Running initialization op on SavedModel bundle at path: /models/policy_model/123 2022-05-30 02:38:54.410707: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:277] SavedModel load for tags { serve }; Status: success: OK. Took 145695 microseconds. 2022-05-30 02:38:54.412726: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:59] No warmup data file found at /models/policy_model/123/assets.extra/tf_serving_warmup_requests 2022-05-30 02:38:54.417277: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: policy_model version: 123} 2022-05-30 02:38:54.419846: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models 2022-05-30 02:38:54.420066: I tensorflow_serving/model_servers/server.cc:367] Profiler service is enabled 2022-05-30 02:38:54.428339: I tensorflow_serving/model_servers/server.cc:393] Running gRPC ModelServer at 0.0.0.0:8500 ... [warn] getaddrinfo: address family for nodename not supported 2022-05-30 02:38:54.431620: I tensorflow_serving/model_servers/server.cc:414] Exporting HTTP/REST API at:localhost:8501 ... [evhttp_server.cc : 245] NET_LOG: Entering the event loop ...
You can send a sample request to the endpoint to make sure it is working as expected:
curl -d '{"signature_name":"action","instances":[{"0/discount":0.0,"0/observation":[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]],"0/reward":0.0,"0/step_type":0}]}' -X POST http://localhost:8501/v1/models/policy_model:predict
The endpoint will return a predicted position 45
, which is (5, 5) in the center of the board (for the curious, you can try to work out why the center of the board is a good guess for the first strike position).
{ "predictions": [45] }
That's it! You have successfully built a backend to predict the next strike position for the NPC agent..
10. Step 4: Create the Flutter app for Android and iOS
The backend is ready. You can start sending requests to it to retrieve strike position predictions from the Flutter app.
- First, you need to define a class that wraps the inputs to send. Add this code to the
step4/frontend/lib/game_agent.dart
file:
class Inputs { final List<double> _boardState; Inputs(this._boardState); Map<String, dynamic> toJson() { final Map<String, dynamic> data = <String, dynamic>{}; data['0/discount'] = [0.0]; data['0/observation'] = [_boardState]; data['0/reward'] = [0.0]; data['0/step_type'] = [0]; return data; } }
Now you can send the request to TensorFlow Serving to make predictions.
- Add this code to the
predict()
function in thestep4/frontend/lib/game_agent.dart
file:
var flattenedBoardState = boardState.expand((i) => i).toList(); final response = await http.post( Uri.parse('http://$server:8501/v1/models/policy_model:predict'), body: jsonEncode(<String, dynamic>{ 'signature_name': 'action', 'instances': [Inputs(flattenedBoardState)] }), ); if (response.statusCode == 200) { var output = List<int>.from( jsonDecode(response.body)['predictions'] as List<dynamic>); return output[0]; } else { throw Exception('Error response'); }
Once the app receives the response from the backend, you update the game UI to reflect the game progress.
- Add this code to the
_gridItemTapped()
function in thestep4/frontend/lib/main.dart
file:
int agentAction = await _policyGradientAgent.predict(_playerVisibleBoardState); _agentActionX = agentAction ~/ _boardSize; _agentActionY = agentAction % _boardSize; if (_playerHiddenBoardState[_agentActionX][_agentActionY] == hiddenBoardCellOccupied) { // Non-repeat move if (_playerVisibleBoardState[_agentActionX][_agentActionY] == visibleBoardCellUntried) { _agentHitCount++; } _playerVisibleBoardState[_agentActionX][_agentActionY] = visibleBoardCellHit; } else { _playerVisibleBoardState[_agentActionX][_agentActionY] = visibleBoardCellMiss; } setState(() {});
Run it
- Click Start debugging and then wait for the app to load.
- Tap any cell in the agent's board to start the game.
11. Step 5: Enable the Flutter app for the desktop platforms
In addition to Android and iOS, Flutter also supports desktop platforms including Linux, Mac and Windows.
Linux
- Make sure the target device is set to in the status bar of VSCode.
- Click Start debugging and then wait for the app to load.
- Click any cell in the agent's board to start the game.
Mac
- For Mac, you need to set up appropriate entitlements since the app will send HTTP requests to the backend. Please refer to Entitlements and the App Sandbox for more details.
Add this code to step4/frontend/macOS/Runner/DebugProfile.entitlements
and step4/frontend/macOS/Runner/Release.entitlements
respectively:
<key>com.apple.security.network.client</key>
<true/>
- Make sure the target device is set to in the status bar of VSCode.
- Click Start debugging and then wait for the app to load.
- Click any cell in the agent's board to start the game.
Windows
- Make sure the target device is set to in the status bar of VSCode.
- Click Start debugging and then wait for the app to load.
- Click any cell in the agent's board to start the game.
12. Step 6: Enable the Flutter app for the web platform
One more thing you can do is to add web support to the Flutter app. By default the web platform is automatically enabled for Flutter apps, so all you need to do is to launch it.
- Make sure the target device is set to in the status bar of VSCode.
- Click Start debugging and then wait for the app to load in the Chrome browser.
- Click any cell in the agent's board to start the game.
13. Congratulations
You built a board game app with a ML-powered agent to play against the human player!