TensorFlow is a multipurpose machine learning framework. TensorFlow can be used anywhere from training huge models across clusters in the cloud, to running models locally on an embedded system like your phone.

What you'll Learn

What you will build

A simple app that runs a TensorFlow image recognition program on your photos, to identify flowers.

CC-BY by Felipe VenĂ¢ncio

Most of this codelab will be using the terminal. Open it now.

Install TensorFlow

Before we can begin the tutorial you need to install several pieces of software:

If you have a working python installation this can be as simple as:

pip install --upgrade  "tensorflow==1.7.*"
pip install PILLOW

If you have the git repository from the first codelab

This codelab uses files generated during the TensorFlow for Poets 1 codelab. If you have not completed that codelab we recommend you take a look. If you prefer not to, instructions for downloading the missing files are given in the next sub-section.


In TensorFlow for Poets 1, you also cloned the relevant files for this codelab. Ensure that it is your current working directory, checkout the branch and check the contents, as follows:

cd tensorflow-for-poets-2
ls

This directory should contain four other subdirectories. We will be using three of them.

ls tf_files/
flower_photos/ retrained_graph.pb  retrained_labels.txt

Otherwise (if you don't have the files from the first Codelab)

Clone the Git repository

The following command will clone the Git repository containing the files required for this codelab:

git clone https://github.com/googlecodelabs/tensorflow-for-poets-2

Now cd into the directory of the clone you just created. That's where you will be working for the rest of this codelab:

cd tensorflow-for-poets-2

The repo contains three directories: android/, scripts/, and tf_files/

Checkout the files from the end_of_first_codelab branch

git checkout end_of_first_codelab

ls tf_files
flower_photos/ retrained_graph.pb  retrained_labels.txt

Test the model

Next, verify that the model is producing reasonable results before starting to modifying it.

The scripts/ directory contains a simple command line script, label_image.py, to test the network. Now we'll test label_image.py on this picture of some daisies:

flower_photos/daisy/3475870145_685a19116d.jpg

Image CC-BY, by Fabrizio Sciami

Now test the model. If you are using a different architecture you will need to set the "--input_size" flag.

python -m scripts.label_image \
  --graph=tf_files/retrained_graph.pb  \
  --image=tf_files/flower_photos/daisy/3475870145_685a19116d.jpg

The script will print the probability the model has assigned to each flower type. Something like this:

Evaluation time (1-image): 0.140s

daisy 0.7361
dandelion 0.242222
tulips 0.0185161
roses 0.0031544
sunflowers 8.00981e-06

This should hopefully produce a sensible top label for your example. You'll be using this command to make sure you're still getting sensible results as you do further processing on the model file to prepare it for use in a mobile app.

Using the TFLite converter

Mobile devices have significant limitations, so any pre-processing that can be done to reduce an app's footprint is worth considering. With TFLite a new graph converter is now included with the TensorFlow installation. This program is called the "TensorFlow Lite Optimizing Converter" or tflite_convert.

It is installed as a command line script, with TensorFlow, so you can easily access it. To check that tflite_convert is correctly installed on your machine, try printing the help, with the following command:

tflite_convert --help

We will use tflite_convert to optimize our model, and convert it to the TFLite format. tflite_convert can do this in a single step, but we will do it in two so that we can try out optimized model in between.

Convert to model to TFLite format

While tflite_convert has advanced capabilities for dealing with quantized graphs, it also applies several optimizations that are still useful for our graph, (which does not use quantization). These include pruning unused graph-nodes, and performance improvements by joining operations into more efficient composite operations.

The pruning is especially helpful given that TFLite does not support training operations yet, so these should not be included in the graph.

While tflite_convert can be used to optimize regular graph.pb files, TFLite uses a different serialization format from regular TensorFlow. TensorFlow uses Protocol Buffers, while TFLite uses FlatBuffers.

The primary benefit of FlatBuffers comes from the fact that they can be memory-mapped, and used directly from disk without being loaded and parsed. This gives much faster startup times, and gives the operating system the option of loading and unloading the required pages from the model file, instead of killing the app when it is low on memory.

We can create the TFLite FlatBuffer with the following command:

IMAGE_SIZE=224
tflite_convert \
  --graph_def_file=tf_files/retrained_graph.pb \
  --output_file=tf_files/optimized_graph.lite \
  --input_format=TENSORFLOW_GRAPHDEF \
  --output_format=TFLITE \
  --input_shape=1,${IMAGE_SIZE},${IMAGE_SIZE},3 \
  --input_array=input \
  --output_array=final_result \
  --inference_type=FLOAT \
  --input_data_type=FLOAT

This should output a "optimized_graph.lite" in your "tf_files" directory.

The demo app requires several additional tools:

  1. Xcode

You can download Xcode here

  1. Xcode command line tools

Install the xcode command line tools with the following command:

xcode-select --install
  1. Cocoapods

Cocoapods use ruby, which is installed by default on macOS.

To install cocoapods, run this command:

sudo gem install cocoapods

Install TFLite Cocoapod

The rest of this codelab needs to run directly in maxOS, so close docker now (Ctrl-D will exit docker).

Use the following command, in macOS, to install TensorFlow Lite and create the .xcworkspace file using cocoapods:

pod install --project-directory=ios/tflite/

Open the project with Xcode

You can open the project in Xcode by running the following command:

open ios/tflite/tflite_camera_example.xcworkspace

Or launch Xcode and click the "Open another Project" button:

Then navigate to the .xcworkspace file (not the .xcproject file):

The app is a simple example that runs an image recognition model on in the iOS simulator. The simulator doesn't support camera input so the app reads from the photos library.

Before inserting our customized model, let's test the baseline version of the app which uses the base "mobilenet" trained on the 1000 ImageNet categories.

Hit the play button, , in the upper right corner of the Xcode window launch the app in the Simulator.

The "Next Photo" button loops through the photos on the device.

The result should look something like this:

The default app setup classifies images into one of the 1000 ImageNet classes, using the standard MobileNet. This is without the retraining we did in part 1.

Now let's modify the app so that it will use our retrained model with our custom image categories.

Add your model files to the project

The demo project is configured to search for a graph.lite, and a labels.txt files in the android/tflite/app/src/main/assets/ directory. Replace those two files with your versions. The following command accomplishes this task:

cp tf_files/optimized_graph.lite ios/tflite/data/graph.lite
cp tf_files/retrained_labels.txt ios/tflite/data/labels.txt

Run your app

Hit the play button, , in the upper right corner of the Xcode window re-launch the app in the Simulator. Drop in some files from the flower_photos/ directory and see how your model does.

It should look something like this:

CC-BY by Felipe VenĂ¢ncio

The default images aren't of flowers. To really try out the model: Either drag-and-drop in some of the training data images you downloaded earlier, or download some images from a Google search.

So now that you have the app running, let's look at the TensorFlow Lite specific code.

TensorFlowLite Pod

This app uses a pre-compiled TFLite Cocoapod. The Podfile includes the cocoapod in the project:

Podfile

platform :ios, '8.0'
inhibit_all_warnings!

target 'tflite_photos_example'
       pod 'TensorFlowLite'

Using theTFLite iOS API

The code interfacing to the TFLite is all contained in CameraExampleViewController.mm.

Setup

The first block of interest, after the necessary imports, is the viewDidLoad method:

CameraExampleViewController.mm

#include "tensorflow/contrib/lite/kernels/register.h"
#include "tensorflow/contrib/lite/model.h"
#include "tensorflow/contrib/lite/string_util.h"
#include "tensorflow/contrib/lite/tools/mutable_op_resolver.h"

...

- (void)viewDidLoad {
  [super viewDidLoad];
  labelLayers = [[NSMutableArray alloc] init];

  NSString* graph_path = FilePathForResourceName(model_file_name, model_file_type);
  model = tflite::FlatBufferModel::BuildFromFile([graph_path UTF8String]);
  if (!model) {
    LOG(FATAL) << "Failed to mmap model " << graph_path;
  }
  LOG(INFO) << "Loaded model " << graph_path;
  model->error_reporter();
  LOG(INFO) << "resolved reporter";

  ... 

The key like in this first half of the method is the model = tflite::FlatBufferModel::BuildFromFile([graph_path UTF8String]); line. This creates a FlatBufferModel from the graph file.

A FlatBuffer is a memory mappable data structure. These are a key feature TFLite as they allow the system to better manage the memory used by the model. The system can transparently swap parts of the model in or out of memory as needed.

The second part of the method builds an interpreter for the model, attaching op implementations to the graph data structure we loaded earlier:

CameraExampleViewController.mm

- (void)viewDidLoad {
  ...

  tflite::ops::builtin::BuiltinOpResolver resolver;
  LoadLabels(labels_file_name, labels_file_type, &labels);

  tflite::InterpreterBuilder(*model, resolver)(&interpreter);
  if (!interpreter) {
    LOG(FATAL) << "Failed to construct interpreter";
  }
  if (interpreter->AllocateTensors() != kTfLiteOk) {
    LOG(FATAL) << "Failed to allocate tensors!";
  }

  [self attachPreviewLayer];
}

If you're familiar with TensorFlow in python, this is roughly equivalent to building a tf.Session().

Run the model

The UpdatePhoto method handles all the details of fetching the next photo, updating the preview window, and running the model on the photo.

CameraExampleViewController.mm

- (void)UpdatePhoto{  
  PHAsset* asset;
  if (photos==nil || photos_index >= photos.count){
    [self updatePhotosLibrary];
    photos_index=0;
  }
  if (photos.count){
    asset = photos[photos_index];
    photos_index += 1;
    input_image = [self convertImageFromAsset:asset
                                   targetSize:CGSizeMake(wanted_input_width, wanted_input_height)
                                         mode:PHImageContentModeAspectFill];
    display_image = [self convertImageFromAsset:asset
                                     targetSize:CGSizeMake(asset.pixelWidth,asset.pixelHeight)
                                           mode:PHImageContentModeAspectFit];
    [self DrawImage];
  }
  
  if (input_image != nil){
    image_data image = [self CGImageToPixels:input_image.CGImage];
    [self inputImageToModel:image];
    [self runModel];
  }
}

It's the last three lines that we are interested in.

The CGImageToPixels method converts the CGImage returned by the iOS Photos library to a simple structure containing the width, height, channels, and pixel data.

CameraExampleViewController.h

typedef struct {
  int width;
  int height;
  int channels;
  std::vector<uint8_t> data;
} image_data;

The inputImageToModel method handles inserting the image into the interpreter memory. This includes resizing the image and adjusting the pixel values to match what's expected by the model.

CameraExampleViewController.mm

- (void)inputImageToModel:(image_data)image{
  float* out = interpreter->typed_input_tensor<float>(0);
  
  const float input_mean = 127.5f;
  const float input_std = 127.5f;
  assert(image.channels >= wanted_input_channels);
  uint8_t* in = image.data.data();

  for (int y = 0; y < wanted_input_height; ++y) {
    const int in_y = (y * image.height) / wanted_input_height;
    uint8_t* in_row = in + (in_y * image.width * image.channels);
    float* out_row = out + (y * wanted_input_width * wanted_input_channels);
    for (int x = 0; x < wanted_input_width; ++x) {
      const int in_x = (x * image.width) / wanted_input_width;
      uint8_t* in_pixel = in_row + (in_x * image.channels);
      float* out_pixel = out_row + (x * wanted_input_channels);
      for (int c = 0; c < wanted_input_channels; ++c) {
        out_pixel[c] = (in_pixel[c] - input_mean) / input_std;
      }
    }
  }
}

We know the model only has one input, so the float* out = interpreter->typed_input_tensor<float>(0); line asks the interpreter a pointer to the memory for input 0. The rest of the method handles the pointer arithmetic and pixel scaling to copy the data into that input array.

Finally the runModel method executes the model:

CameraExampleViewController.mm

- (void)runModel {
  double startTimestamp = [[NSDate new] timeIntervalSince1970];
  if (interpreter->Invoke() != kTfLiteOk) {
    LOG(FATAL) << "Failed to invoke!";
  }
  double endTimestamp = [[NSDate new] timeIntervalSince1970];
  total_latency += (endTimestamp - startTimestamp);
  total_count += 1;
  NSLog(@"Time: %.4lf, avg: %.4lf, count: %d", endTimestamp - startTimestamp,
        total_latency / total_count,  total_count);

  ...

}

Next runModel reads back the results. To do this it asks the interpreter for a pointer to the output array's data. The output is a simple array of floats. The GetTopN method handles the extraction of the top 5 results (using a priority queue).

CameraExampleViewController.mm

- (void)runModel {
  ...

  const int output_size = (int)labels.size();
  const int kNumResults = 5;
  const float kThreshold = 0.1f;

  std::vector<std::pair<float, int>> top_results;

  float* output = interpreter->typed_output_tensor<float>(0);
  GetTopN(output, output_size, kNumResults, kThreshold, &top_results);
  
  ...
}

The next few lines simply convert those top 5 (probability, class_id) pairs into (probability, label) pairs, and then passes off that result, asynchronously, to the setPredictionValues method which updates the on screen report:

CameraExampleViewController.mm

- (void)runModel {
  ...
  
  std::vector<std::pair<float, std::string>> newValues;
  for (const auto& result : top_results) {
    std::pair<float, std::string> item;
    item.first = result.first;
    item.second = labels[result.second];
    
    newValues.push_back(item);
  }
  dispatch_async(dispatch_get_main_queue(), ^(void) {
    [self setPredictionValues:newValues];
  });
}

Here are some links for more information: