ML Kit is a mobile SDK that brings Google's machine learning expertise to Android and iOS apps in a powerful yet easy-to-use package. Whether you're new or experienced in machine learning, you can easily implement the functionality you need in just a few lines of code. There's no need to have deep knowledge of neural networks or model optimization to get started.

How does it work?

ML Kit makes it easy to apply ML techniques in your apps by bringing Google's ML technologies, such as the Google Cloud Vision API, Mobile Vision, and TensorFlow Lite, together in a single SDK. Whether you need the power of cloud-based processing, the real-time capabilities of Mobile Vision's on-device models, or the flexibility of custom TensorFlow Lite models, ML Kit makes it possible with just a few lines of code.

This codelab will walk you through creating your own iOS app that can automatically detect text, facial features and objects in an image.

What you will build

In this codelab, you're going to build an iOS app with Firebase ML Kit. Your app will:

  • Use the ML Kit Text Recognition API to detect text in images
  • Use the ML Kit Face Contour API to identify facial features in images
  • (Optional) Use the ML Kit Cloud Text Recognition API to expand text recognition capabilities (such as non-Latin alphabets) when the device has internet connectivity
  • Learn how to host a custom pre-trained Tensor Flow Lite model using Firebase
  • Use the ML Kit Custom Model API to download the pre-trained TensorFlow Lite model to your app
  • Use the downloaded model to run inference and label images

What you'll learn

What you'll need

This codelab is focused on ML Kit. Non-relevant concepts and code blocks are glossed over and are provided for you to simply copy and paste.

Download the Code

Click the following link to download all the code for this codelab:

Download source code

Unpack the downloaded zip file. This will unpack a root folder (mlkit-ios) with all of the resources you will need.

The mlkit repository contains two directories:

Download the Tensor Flow Lite model

Click the following link to download the pre-trained Tensor Flow Lite model we will be using in this codelab:

Download model

Unpack the downloaded zip file. This will unpack a root folder (mobilenet_v1_1.0_224_quant) inside which you will find the Tensor Flow Lite custom model we will use in this codelab (mobilenet_v1_1.0_224_quant.tflite).

  1. Go to the Firebase console.
  2. Select Create New Project, and name your project "ML Kit Codelab."

Connect your iOS app

  1. From the overview screen of your new project,
    click Add Firebase to your iOS app.
  2. Enter the codelab's package name: com.google.firebase.codelab.mlkit.

Add GoogleService-Info.plist file to your app

After adding the package name and selecting Continue, your browser automatically downloads a configuration file that contains all the necessary Firebase metadata for your app. Copy the GoogleService-Info.plist file into your project.

Add the dependencies for ML Kit with CocoaPods

CocoaPods is used to add the ML Kit dependencies to your app. Installation instructions for CocoaPods are available here. Once installed follow setup a Podfile in the command line.

Command Line

# Make sure you in the root of your app
pod init
# open the created Podfile

Podfile

platform :ios, '9.0'
use_frameworks!

pod 'Firebase/Core'
pod 'Firebase/MLVision'
pod 'Firebase/MLVisionFaceModel'
pod 'Firebase/MLVisionLabelModel'
pod 'Firebase/MLModelInterpreter'
pod 'Firebase/MLVisionTextModel'

target 'MLKit-codelab' do
end

Install ML Kit Cocoa Pods

To be sure that all dependencies are available to your app, you should use the command line to install the ML Kit Cocoa Pods.

Command Line

# Make sure you in the root of your app
pod install
open MLKit-codelab.xcworkspace

Now that you have imported the project into XCode and configured the GoogleService-Info.plist, and added the dependencies for ML Kit, you are ready to run the app for the first time. Start the XCode Simulator, and click Run in Xcode.

The app should launch on your simulator. At this point, you should see a basic layout that has a picker which allows you to select between 6 images. In the next section, you add text recognition to your app to identify text in the images.

In this step, we will add functionality to your app to recognize text in images.

Import the MLVision module

Confirm the following import to your ViewController class exists.

ViewController.swift

import FirebaseMLVision

Create a VisionTextRecognizer

Add the following lazy properties to your ViewController class.

ViewController.swift

private lazy var vision = Vision.vision()
private lazy var textRecognizer = vision.onDeviceTextRecognizer()

Set up and run on-device text recognition on an image

Add the following to the runTextRecognition method of ViewController class:

ViewController.swift

func runTextRecognition(with image: UIImage) {
  let visionImage = VisionImage(image: image)
  textRecognizer.process(visionImage) { features, error in
    self.processResult(from: features, error: error)
  }
}

The code above configures the text recognition detector and calls the function processResult(from:, error:) with the response.

Process the text recognition response

Add the following code to processResult in the ViewController class to parse the results and display them in your app.

ViewController.swift

 func processResult(from text: VisionText?, error: Error?) {
    removeDetectionAnnotations()
    guard error == nil, let text = text else {
      let errorString = error?.localizedDescription ?? Constants.detectionNoResultsMessage
      print("Text recognizer failed with error: \(errorString)")
      return
    }

    let transform = self.transformMatrix()

    // Blocks.
    for block in text.blocks {
      drawFrame(block.frame, in: .purple, transform: transform)

      // Lines.
      for line in block.lines {
        drawFrame(line.frame, in: .orange, transform: transform)

        // Elements.
        for element in line.elements {
          drawFrame(element.frame, in: .green, transform: transform)

          let transformedRect = element.frame.applying(transform)
          let label = UILabel(frame: transformedRect)
          label.text = element.text
          label.adjustsFontSizeToFitWidth = true
          self.annotationOverlayView.addSubview(label)
        }
      }
    }
  }

Run the app on the simulator

Now click Run in XCode. Once the app loads, make sure that Image 1 is selected in the picker and click on the Find Text button.

Your app should now look like image below, showing the text recognition results and bounding boxes overlaid on top of the original image.

Photo: Kai Schreiber / Wikimedia Commons / CC BY-SA 2.0

Congratulations, you have just added on-device text recognition to your app using ML Kit for Firebase! On-device text recognition is great for many use cases as it works even when your app doesn't have internet connectivity and is fast enough to use on still images as well as live video frames. However, it does have some limitations. For example, try selecting Image 2 and Image 3 in the simulator and click Find Text. Notice that on-device text recognition doesn't return meaningful results for text in non-Latin alphabets.

In a later step, we will use the cloud text recognition functionality in ML Kit to fix this issue.

In this step, we will add functionality to your app to detect the contours of faces in images.

Create a VisionFaceDetector

Add the following lazy properties to your ViewController class.

ViewController.swift

private lazy var faceDetectorOption: VisionFaceDetectorOptions = {
  let option = VisionFaceDetectorOptions()
  option.contourMode = .all
  option.performanceMode = .fast
  return option
}()
private lazy var faceDetector = vision.faceDetector(options: faceDetectorOption)

Set up and run on-device face contour detection on an image

Add the following to the runFaceContourDetection method of ViewController class:

ViewController.swift

  func runFaceContourDetection(with image: UIImage) {
    let visionImage = VisionImage(image: image)
    faceDetector.process(visionImage) { features, error in
      self.processResult(from: features, error: error)
    }
  }

The code above configures the text recognition detector and calls the function processResult(from:, error:) with the response.

Process the face detector response

Add the following code to processResult in the ViewController class to parse the results and display them in your app.

ViewController.swift

  func processResult(from faces: [VisionFace]?, error: Error?) {
    removeDetectionAnnotations()
    guard let faces = faces else {
      return
    }

    for feature in faces {
      let transform = self.transformMatrix()
      let transformedRect = feature.frame.applying(transform)
      UIUtilities.addRectangle(
        transformedRect,
        to: self.annotationOverlayView,
        color: UIColor.green
      )
      self.addContours(forFace: feature, transform: transform)
    }
  }

Finally add the helper method addContours in the ViewController class to draw the contour points.

ViewController.swift

 private func addContours(forFace face: VisionFace, transform: CGAffineTransform) {
    // Face
    if let faceContour = face.contour(ofType: .face) {
      for point in faceContour.points {
        drawPoint(point, in: .blue, transform: transform)
      }
    }

    // Eyebrows
    if let topLeftEyebrowContour = face.contour(ofType: .leftEyebrowTop) {
      for point in topLeftEyebrowContour.points {
        drawPoint(point, in: .orange, transform: transform)
      }
    }
    if let bottomLeftEyebrowContour = face.contour(ofType: .leftEyebrowBottom) {
      for point in bottomLeftEyebrowContour.points {
        drawPoint(point, in: .orange, transform: transform)
      }
    }
    if let topRightEyebrowContour = face.contour(ofType: .rightEyebrowTop) {
      for point in topRightEyebrowContour.points {
        drawPoint(point, in: .orange, transform: transform)
      }
    }
    if let bottomRightEyebrowContour = face.contour(ofType: .rightEyebrowBottom) {
      for point in bottomRightEyebrowContour.points {
        drawPoint(point, in: .orange, transform: transform)
      }
    }

    // Eyes
    if let leftEyeContour = face.contour(ofType: .leftEye) {
      for point in leftEyeContour.points {
        drawPoint(point, in: .cyan, transform: transform)
      }
    }
    if let rightEyeContour = face.contour(ofType: .rightEye) {
      for point in rightEyeContour.points {
        drawPoint(point, in: .cyan, transform: transform)
      }
    }

    // Lips
    if let topUpperLipContour = face.contour(ofType: .upperLipTop) {
      for point in topUpperLipContour.points {
        drawPoint(point, in: .red, transform: transform)
      }
    }
    if let bottomUpperLipContour = face.contour(ofType: .upperLipBottom) {
      for point in bottomUpperLipContour.points {
        drawPoint(point, in: .red, transform: transform)
      }
    }
    if let topLowerLipContour = face.contour(ofType: .lowerLipTop) {
      for point in topLowerLipContour.points {
        drawPoint(point, in: .red, transform: transform)
      }
    }
    if let bottomLowerLipContour = face.contour(ofType: .lowerLipBottom) {
      for point in bottomLowerLipContour.points {
        drawPoint(point, in: .red, transform: transform)
      }
    }

    // Nose
    if let noseBridgeContour = face.contour(ofType: .noseBridge) {
      for point in noseBridgeContour.points {
        drawPoint(point, in: .yellow, transform: transform)
      }
    }
    if let noseBottomContour = face.contour(ofType: .noseBottom) {
      for point in noseBottomContour.points {
        drawPoint(point, in: .yellow, transform: transform)
      }
    }
  }

Run the app on the simulator

Now click Run in XCode. Once the app loads, make sure that Image 4 is selected in the picker and click on the Find Face Contour button. Your app should now look like image below, showing the contours of the face as points overlaid on top of the original image.

Congratulations, you have just added on-device face contour detection to your app using ML Kit for Firebase! On-device face contour detection is great for many use cases as it works even when your app doesn't have internet connectivity and is fast enough to use on still images as well as live video frames.

The pre-trained Tensor Flow Lite model we will be using in our app is the MobileNet_v1 model, which has been designed to be used in low-latency, low-power environments, and offers a good compromise between model size and accuracy. In this step, we will be hosting this model with Firebase by uploading it to our Firebase project. This enables apps using the ML Kit SDK to automatically download the model to our devices, and allows us to do model version management easily in the Firebase Console.

Host the custom model with Firebase

  1. Go to the Firebase console.
  2. Select your project.
  3. Select ML Kit under the DEVELOP section in the left hand navigation.
  4. Click on the CUSTOM tab.
  5. Click on Add another model and use "mobilenet_v1_224_quant" as the name. This is the name we will later use to download our custom model in our Android code.
  6. In the TensorFlow Lite model section, click BROWSE and upload the mobilenet_v1_1.0_224_quant.tflite file you downloaded earlier.
  7. Click PUBLISH.

We are now ready to modify our Swift code to use this hosted model.

Import the MLModelInterpreter module

Confirm that the following import to your ViewController class exists.

ViewController.swift

import FirebaseMLModelInterpreter

Download the custom model from Firebase

Now that we have hosted a pre-trained custom model by uploading it to our Firebase Project, we will modify our app code to automatically download and use this model.

Add the following fields to the top of the ViewController class to define our ModelManager and ModelInterpreter.

ViewController.swift

  private let modelInputOutputOptions = ModelInputOutputOptions()
  private lazy var modelManager = ModelManager.modelManager()
  private lazy var modelInterpreter: ModelInterpreter? = {
    do {
      try modelInputOutputOptions.setInputFormat(
        index: Constants.modelInputIndex,
        type: Constants.modelElementType,
        dimensions: Constants.inputDimensions
      )
      try modelInputOutputOptions.setOutputFormat(
        index: Constants.modelInputIndex,
        type: Constants.modelElementType,
        dimensions: outputDimensions
      )
      let conditions = ModelDownloadConditions(isWiFiRequired: true, canDownloadInBackground: true)
      guard let localModelFilePath = Bundle.main.path(
        forResource: Constants.localModelFilename,
        ofType: Constants.modelExtension)
        else {
          print("Failed to get the local model file path.")
          return nil
      }
      let localModelSource = LocalModelSource(
        modelName: Constants.localModelFilename,
        path: localModelFilePath
      )
      let cloudModelSource = CloudModelSource(
        modelName: Constants.hostedModelFilename,
        enableModelUpdates: true,
        initialConditions: conditions,
        updateConditions: conditions
      )
      modelManager.register(localModelSource)
      modelManager.register(cloudModelSource)
      let modelOptions = ModelOptions(cloudModelName: Constants.hostedModelFilename, localModelName: Constants.localModelFilename)
      return ModelInterpreter.modelInterpreter(options: modelOptions)
    } catch let error as NSError {
      print("Failed to load the model with error: \(error.localizedDescription)")
      return nil
    }
  }()

Note how we use ModelInputOutputOptions in the code to specify inputs expected by our custom model and the outputs it generates. In the case of the Mobilenetv1 model, we use an input of 224x224 pixel images and generate a 1 dimensional list of outputs. We then set up the conditions in which our custom model should be downloaded to the device and register it with ModelManager.

Bundle a local version of the model for offline scenarios

Hosting a model with Firebase allows you to make updates to the model and have those automatically be downloaded to your users. However, in situations where there is poor internet connectivity, you may also want to bundle a local version of your model. By both hosting the model on Firebase and supporting locally, you can ensure that the most recent version of the model is used when network connectivity is available, but your app's ML features still work when the Firebase-hosted model isn't available.

We have already added the code to do this in the previous code snippet. We first created a LocalModelSource and called modelManager.register(localModelSource)to register it with our ModelManager. All you have to do is add the mobilenet_v1.0_224_quant.tflite you downloaded earlier to the Resources folder in your project and add to our build target MLKit-codelab.

The .tflite file will now be included in the app package and available to ML Kit as a raw asset.

In this step, we will define a function that uses the ModelInterpreter we configured in the previous step to run inference using the downloaded or local custom model.

Add code to use the downloaded/local model in your app

Copy the following code into the runModelInference method into the ViewController class.

ViewController.swift

   DispatchQueue.global(qos: .userInitiated).async {
      guard let imageData =
        self.scaledImageData(from: image,
                             componentsCount: Constants.dimensionComponents.intValue) else {
                              return
      }
      let inputs = ModelInputs()
      do {
        // Add the image data to the model input.
        try inputs.addInput(imageData)
      } catch let error as NSError {
        print("Failed to add the image data input with error: \(error.localizedDescription)")
        return
      }

      // Run the interpreter for the model with the given inputs.
      self.modelInterpreter?.run(inputs: inputs, options: self.modelInputOutputOptions) { (outputs, error) in
        self.removeDetectionAnnotations()
        guard error == nil, let outputs = outputs else {
          print("Failed to run the model with error: \(error?.localizedDescription ?? "")")
          return
        }
        self.process(outputs)
      }
    }
  }

ML Kit handles downloading and running the model automatically (or using the local bundled version if the hosted model can't be downloaded), and provides the results as ModelOutputs. We then sort and display these results in our app UI as in private func process(_ outputs: ModelOutputs).

ViewController.swift

private func process(_ outputs: ModelOutputs) {
    let outputArrayOfArrays: Any
    do {
      // Get the output for the first batch, since `dimensionBatchSize` is 1.
      outputArrayOfArrays = try outputs.output(index: 0)
    } catch let error as NSError {
      print("Failed to process detection outputs with error: \(error.localizedDescription)")
      return
    }

    // Get the first output from the array of output arrays.
    guard let outputNSArray = outputArrayOfArrays as? NSArray,
      let firstOutputNSArray = outputNSArray.firstObject as? NSArray,
      var outputArray = firstOutputNSArray as? [NSNumber]
      else {
        print("Failed to get the results array from output.")
        return
    }

    // Convert the output from quantized 8-bit fixed point format to 32-bit floating point format.
    outputArray = outputArray.map {
      NSNumber(value: $0.floatValue / Constants.maxRGBValue)
    }

    // Create an array of indices that map to each label in the labels text file.
    var indexesArray = [Int](repeating: 0, count: labels.count)
    for index in 0..<labels.count {
      indexesArray[index] = index
    }

    // Create a zipped array of tuples ("confidence" as NSNumber, "labelIndex" as Int).
    let zippedArray = zip(outputArray, indexesArray)

    // Sort the zipped array of tuples ("confidence" as NSNumber, "labelIndex" as Int) by confidence
    // value in descending order.
    var sortedResults = zippedArray.filter {$0.0.floatValue > 0}.sorted {
      let confidenceValue1 = ($0 as (NSNumber, Int)).0
      let confidenceValue2 = ($1 as (NSNumber, Int)).0
      return confidenceValue1.floatValue > confidenceValue2.floatValue
    }

    // Resize the sorted results array to match the `topResultsCount`.
    sortedResults = Array(sortedResults.prefix(Constants.topResultsCount))

    // Create an array of tuples with the results as [("label" as String, "confidence" as Float)].
    let results = sortedResults.map { (confidence, labelIndex) -> (String, Float) in
      return (labels[labelIndex], confidence.floatValue)
    }
    showResults(results)
  }

Run the app on the simulator

Now click Run in XCode. Once the app loads, make sure that Image 5 is selected in the picker and click on the Find Objects button. Your app should now look like image below, showing the detected image labels with their confidence level.

In a previous step, you added on-device text recognition to the app, which runs quickly, without an internet connection, and is free. However, it does have some limitations. For example, try selecting Image 2 and Image 3 in the emulator and click FIND TEXT. Notice that on-device text recognition doesn't return meaningful results for text in non-Latin alphabets. In this step, you will add cloud text recognition to your app using ML Kit for Firebase. This will allow you to detect more types of text in images, such as non-Latin alphabets.

Switch your Firebase project to the Blaze plan

Only Blaze-level projects can use the Cloud Vision APIs. Follow these steps to switch your project to the Blaze plan and enable pay-as-you-go billing.

  1. Open your project in the Firebase console.
  2. Click on the MODIFY link in the lower left corner next to the currently selected Spark plan.
  3. Select the Blaze plan and follow the instructions in the Firebase Console to add a billing account.

Enable the Cloud Vision API

You need to enable the Cloud Vision API in order to use cloud text recognition in MK Kit.

  1. Open the Cloud Vision API in the Cloud Console API library.
  2. Ensure that your Firebase project is selected in the menu at the top of the page.
  3. If the API is not already enabled, click Enable.

Set up and run cloud text recognition on an image

Add the following to the runCloudTextRecognition method of the ViewController class:

ViewController.swift

// Add a property for the Cloud Document Text Recognizer
private lazy var cloudDocumentTextRecognizer = vision.cloudDocumentTextRecognizer()

ViewController.swift

  func runCloudTextRecognition(with image: UIImage) {
    let visionImage = VisionImage(image: image)
    cloudTextRecognizer.process(visionImage, completion: { (features, error) in
      self.processResult(from: features, error: error)
    })
  }

The code above configures the text recognition detector and calls the function processResult(from:error:) with the response.

Process the text recognition response

Add the following code to processResult in the ViewController class to parse the results and display them in your app.

ViewController.swift

 func processResult(from text: VisionDocumentText?, error: Error?) {
    removeDetectionAnnotations()
    guard error == nil, let text = text else {
      let errorString = error?.localizedDescription ?? Constants.detectionNoResultsMessage
      print("Document text recognizer failed with error: \(errorString)")
      return
    }
    let transform = self.transformMatrix()
    // Blocks.
    for block in text.blocks {
      drawFrame(block.frame, in: .purple, transform: transform)

      // Paragraphs.
      for paragraph in block.paragraphs {
        drawFrame(paragraph.frame, in: .orange, transform: transform)

        // Words.
        for word in paragraph.words {
          drawFrame(word.frame, in: .green, transform: transform)

          // Symbols.
          for symbol in word.symbols {
            drawFrame(symbol.frame, in: .cyan, transform: transform)

            let transformedRect = symbol.frame.applying(transform)
            let label = UILabel(frame: transformedRect)
            label.text = symbol.text
            label.adjustsFontSizeToFitWidth = true
            self.annotationOverlayView.addSubview(label)
          }
        }
      }
    }
  }

Run the app on the simulator

Now click Run () in XCode. Once the app loads, select Image 3 in the picker and click on the Find Text (Cloud) button. Notice that now we are successfully able to recognize the non-Latin characters in the image!

Cloud text recognition in ML Kit is ideally suited if:

You have used ML Kit for Firebase to easily add advanced machine learning capabilities to your app.

What we've covered

Next Steps

Learn More