ML Kit is a mobile SDK that brings Google's machine learning expertise to Android and iOS apps in a powerful yet easy-to-use package. Whether you're new or experienced in machine learning, you can easily implement the functionality you need in just a few lines of code. There's no need to have deep knowledge of neural networks or model optimization to get started. On the other hand, if you are an experienced ML developer, see the Custom Machine Learning Models with ML Kit codelab to learn how ML Kit makes it easy to use your custom TensorFlow Lite models in your mobile apps.

How does it work?

ML Kit makes it easy to apply ML techniques in your apps by bringing Google's ML technologies, such as the Google Cloud Vision API, Mobile Vision, and TensorFlow Lite, together in a single SDK. Whether you need the power of cloud-based processing, the real-time capabilities of Mobile Vision's on-device models, or the flexibility of custom TensorFlow Lite models, ML Kit makes it possible with just a few lines of code.

This codelab will walk you through creating your own iOS app that can automatically detect text in an image.

What you will build

In this codelab, you're going to build an iOS app with Firebase ML Kit. Your app will:

  • Utilize the ML Kit Text Recognition API to detect text in images
  • Use the ML Kit Cloud Text Recognition API to expand text recognition capabilities (such as non-Latin alphabets) when the device has internet connectivity

What you'll learn

What you'll need

This codelab is focused on ML Kit. Non-relevant concepts and code blocks are glossed over and are provided for you to simply copy and paste.

Download the Code

Click the following link to download all the code for this codelab:

Download source code

Unpack the downloaded zip file. This will unpack a root folder (mlkit-ios) with all of the resources you will need. For this codelab, you will only need the resources in the text-recognition subdirectory.

The text-recognition subdirectory in the mlkit repository contains two directories:

  1. Go to the Firebase console.
  2. Select Create New Project, and name your project "ML Kit Codelab."

Connect your iOS app

  1. From the overview screen of your new project,
    click Add Firebase to your iOS app.
  2. Enter the codelab's package name: com.google.firebase.codelab.mlkit.

Add GoogleService-Info.plist file to your app

After adding the package name and selecting Continue, your browser automatically downloads a configuration file that contains all the necessary Firebase metadata for your app. Copy the GoogleService-Info.plist file into your project.

Add the dependencies for ML Kit with Cocoa Pods

Cocoa Pods is used to add the ML Kit dependencies to your app. Installation instructions for Cocoa Pods is available here. Once installed follow setup a Podfile in the command line.

Command Line

# Make sure you in the root of your app
pod init
# open the created Podfile

Podfile

target 'text-recognition' do
  use_frameworks!

  # Pods for text-recognition
  pod 'Firebase/Core'
  pod 'Firebase/MLVision'
  pod 'Firebase/MLVisionTextModel'
end

Install ML Kit Cocoa Pods

To be sure that all dependencies are available to your app, you should use the command line to install the ML Kit Cocoa Pods.

Command Line

# Make sure you in the root of your app
pod install
open text-recognition.xcworkspace

Now that you have imported the project into XCode and configured the GoogleService-Info.plist, and added the dependencies for ML Kit, you are ready to run the app for the first time. Start the XCode Simulator, and click Run () in the toolbar.

The app should launch on your simulator. At this point, you should see a basic layout that has a picker which allows you to select between three images. In the next section, you add text recognition to your app to identify text in the images.

In this step, we will add functionality to your app to recognize text in images.

Create a VisionCloudTextDetector

Add the following property to your ViewController class and initialize it in viewDidLoad:

ViewController.swift

  var textDetector: VisionTextDetector!

  override func viewDidLoad() {
    super.viewDidLoad()
    // Initialize the on-device text detector
    let vision = Vision.vision()
    textDetector = vision.textDetector()
    imageView.layer.addSublayer(frameSublayer)
    pickerView.dataSource = self
    pickerView.delegate = self
  }

Set up and run on-device text recognition on an image

Add the following to the runTextRecognition method of ViewController class:

ViewController.swift

  func runTextRecognition(with image: UIImage) {
    let visionImage = VisionImage(image: image)
    textDetector.detect(in: visionImage) { features, error in
      self.processResult(from: features, error: error)
    }
  }

The code above configures the text recognition detector and calls the function drawLines(from text: VisionText) with the response.

Process the text recognition response

Add the following code to processResult in the ViewController class to parse the results and display them in your app.

ViewController.swift

  func processResult(from text: [VisionText]?, error: Error?) {
    removeFrames()
    guard let features = text, let image = imageView.image else {
      return
    }
    for text in features {
      if let block = text as? VisionTextBlock {
        for line in block.lines {
          for element in line.elements {
            self.addFrameView(
              featureFrame: element.frame,
              imageSize: image.size,
              viewFrame: self.imageView.frame,
              text: element.text
            )
          }
        }
      }
    }
  }

Run the app on the emulator

Now click Run () in XCode. Once the app loads, make sure that Image 2 is selected in the drop down field and click on the Find Text button.

Your app should now look like image below, showing the text recognition results and bounding boxes overlaid on top of the original image.

Photo: Kai Schreiber / Wikimedia Commons / CC BY-SA 2.0

Congratulations, you have just added on-device text recognition to your app using ML Kit for Firebase! On-device text recognition is great for many use cases as it works even when your app doesn't have internet connectivity and is fast enough to use on still images as well as live video frames. However, it does have some limitations. For example, try selecting Image 2 in the emulator and click FIND TEXT. Notice that on-device text recognition doesn't return meaningful results for text in non-Latin alphabets.

In the next step, we will use the cloud text recognition functionality in ML Kit to fix this issue.

In this step, you will add cloud text recognition to your app using ML Kit for Firebase. This will allow you to detect more types of text in images, such as non-Latin alphabets.

Switch your Firebase project to the Blaze plan

Only Blaze-level projects can use the Cloud Vision APIs. Follow these steps to switch your project to the Blaze plan and enable pay-as-you-go billing.

  1. Open your project in the Firebase console.
  2. Click on the MODIFY link in the lower left corner next to the currently selected Spark plan.
  3. Select the Blaze plan and follow the instructions in the Firebase Console to add a billing account.

Enable the Cloud Vision API

You need to enable the Cloud Vision API in order to use cloud text recognition in MK Kit.

  1. Open the Cloud Vision API in the Cloud Console API library.
  2. Ensure that your Firebase project is selected in the menu at the top of the page.
  3. If the API is not already enabled, click Enable.

Set up and run cloud text recognition on an image

Add the following to the runCloudTextRecognition method of the ViewController class:

ViewController.swift

  var textDetector: VisionTextDetector!
  // Add a property for the Cloud Document Text Detector
  var cloudTextDetector: VisionCloudDocumentTextDetector!

  override func viewDidLoad() {
    super.viewDidLoad()
    let vision = Vision.vision()
    textDetector = vision.textDetector()
    // Initialize the Cloud Document Text Detector
    cloudTextDetector = vision.cloudDocumentTextDetector()
    imageView.layer.addSublayer(frameSublayer)
    pickerView.dataSource = self
    pickerView.delegate = self
  }

ViewController.swift

  func runCloudTextRecognition(with image: UIImage) {
    let visionImage = VisionImage(image: image)
    cloudTextDetector.detect(in: visionImage) { features, error in
      if let error = error {
        print("Received error: \(error)")
        return
      }

      self.processCloudResult(from: features, error: error)
    }
  }

The code above configures the text recognition detector and calls the function processCloudResult(from:error:)

with the response.

Process the text recognition response

Add the following code to processCloudResult(from text: VisionCloudText?, error: Error?) in the ViewController class to parse the results and display them in your app.

ViewController.swift

  func processCloudResult(from text: VisionCloudText?, error: Error?) {
    removeFrames()
    guard let features = text, let image = imageView.image, let pages = features.pages else {
      return
    }
    for page in pages {
      for block in page.blocks ?? []  {
        for paragraph in block.paragraphs ?? [] {
          for word in paragraph.words ?? [] {
            var wordText = ""
            for symbol in word.symbols ?? [] {
              if let text = symbol.text {
                wordText = wordText + text
              }
            }
            self.addFrameView(
              featureFrame: word.frame,
              imageSize: image.size,
              viewFrame: self.imageView.frame,
              text: wordText
            )
          }
        }
      }
    }
  }

Run the app on the emulator

Now click Run () in XCode. Once the app loads, select Image 1in the drop down field and click on the Find Text (Cloud) button. Notice that now we are successfully able to recognize the non-Latin characters in the image!

Cloud text recognition in ML Kit is ideally suited if:

You have used ML Kit for Firebase to easily add advanced machine learning capabilities to your app.

What we've covered

Next Steps

Learn More