Detect objects in images with ML Kit: Android

ML Kit is a mobile SDK that brings Google's on-device machine learning expertise to Android and iOS apps. Use our powerful yet easy to use Vision and Natural Language APIs to solve common challenges in your apps or create brand-new user experiences. All are powered by Google's best-in-class ML models and offered to you at no cost.

ML Kit's APIs all run on-device, allowing for real-time use cases where you want to process a live camera stream for example. This also means that the functionality is available offline.

This codelab will walk you through simple steps to add Object Detection and Tracking (ODT) for a given image into your existing Android app. Please note that this codelab takes some shortcuts to highlight ML Kit ODT usage.

What you will build

In this codelab, you're going to build an Android app with ML Kit. Your app will use the ML Kit Object Detection and Tracking API to detect objects in a given image.In the end, you should see something similar to the image on the right.

What you'll learn

  • How to integrate ML Kit SDK into your Android application
  • ML Kit Object Detection and Tracking API

What you'll need

  • A recent version of Android Studio (v4.1.2+)
  • Android Studio Emulator or a physical Android device
  • The sample code
  • Basic knowledge of Android development in Kotlin

This codelab is focused on ML Kit. Non-relevant concepts and code blocks are glossed over and are provided for you to simply copy and paste.

Download the Code

Click the following link to download all the code for this codelab:

Download source code

Unpack the downloaded zip file. This will unpack a root folder (mlkit-android) with all of the resources you will need. For this codelab, you will only need the sources in the object-detection subdirectory.

The object-detection subdirectory in the mlkit-android repository contains two directories:

  • android_studio_folder.pngstarter—Starting code that you build upon for this codelab.
  • android_studio_folder.pngfinal—Completed code for the finished sample app.

Import the app into Android Studio

Let's start by importing the starter app into the Android Studio. Go to Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the starter folder from the source code that you have downloaded earlier.

7c0f27882a2698ac.png

Add the dependencies for ML Kit Object Detection and Tracking

The ML Kit dependencies allow you to integrate the ML Kit ODT SDK in your app. Add the following lines to the end of the app/build.gradle file of your project:

build.gradle

dependencies {
  // ...
  implementation 'com.google.mlkit:object-detection:16.2.3'
}

Sync your project with gradle files

To be sure that all dependencies are available to your app, you should sync your project with gradle files at this point. Select Sync Project with Gradle Files ( b451ab2d04d835f9.png) from the Android Studio toolbar.

(If this button is disabled, make sure you import only

starter/app/build.gradle

, not the entire repository.)

Now that you have imported the project into Android Studio and added the dependencies for ML Kit Object Detection and Tracking, you are ready to run the app for the first time. Connect your Android device via USB to your host or Start the Android Studio emulator, and click Run ( execute.png) in the Android Studio toolbar.

Run and explore the app

The app should launch on your Android device. It has some boilerplate code to allow you to capture a photo, or select a preset image, and feed it to an object detection and tracking pipeline that you'll build in this codelab. Let's explore the app a little bit before writing code.

Firstly, there is a Button ( c6d965d639c3646.png) at the bottom to

  1. bring up the camera app integrated in your device/emulator
  2. take a photo inside your camera app
  3. receive the captured image in starter app
  4. display the image

Try out the "Take photo" button, follow the prompts to take a photo, accept the photo and observe it displayed inside the starter app. Repeat a few times to see how it works:

9ec541980dbe2d31.png 8312dde41425ba4b.png fa8492bfc1914ff0.png

Secondly, there are 3 preset images that you can choose from. You can use these images later to test the object detection code if you are running on an Android emulator.

  1. Select an image from the 3 preset images.
  2. See that the image shows up in the larger view.

1dd41b3ec978f1d9.png

In this step, we will add the functionality to the starter app to detect objects in images. As you see in the previous step, the starter app contains boilerplate code to take photos with the camera app on the device. There are also 3 preset images in the app that you can try object detection on if you are running the codelab on an Android emulator.

When you have selected an image, either from the preset images or taking a photo with the camera app, the boilerplate code decodes that image into a Bitmap instance, shows it on the screen and calls the runObjectDetection method with the image.

In this step, you will add code to the runObjectDetection method to do object detection!

Set up and run on-device object detection on an image

There are only 3 simple steps with 3 APIs to set up ML Kit ODT

  • prepare an image: InputImage
  • create a detector object: ObjectDetection.getClient(options)
  • connect the 2 objects above: process(image)

You achieve these inside the function runObjectDetection(bitmap: Bitmap) in file MainActivity.kt.

/**
 * ML Kit Object Detection Function
 */
private fun runObjectDetection(bitmap: Bitmap) {
}

Right now the function is empty. Move on to the following steps to implement ML Kit ODT! Along the way, Android Studio would prompt you to add the necessary imports

  • com.google.mlkit.vision.common.InputImage
  • com.google.mlkit.vision.objects.ObjectDetection
  • com.google.mlkit.vision.objects.defaults.ObjectDetectorOptions

Step 1: Create an InputImage

ML Kit provides a simple API to create an InputImage from a Bitmap. Then you can feed an InputImage into the ML Kit APIs.

// Step 1: create ML Kit's InputImage object
val image = InputImage.fromBitmap(bitmap, 0)

Add the above code to the top of runObjectDetection(bitmap:Bitmap).

Step 2: Create a detector instance

ML Kit follows Builder Design Pattern, you would pass the configuration to the builder, then acquire a detector from it. There are 3 options to configure (the one in bold is used in codelab):

  • detector mode (single image or stream)
  • detection mode (single or multiple object detection)
  • classification mode (on or off)

This codelab is for single image - multiple object detection & classification, let's do that:

// Step 2: acquire detector object
val options = ObjectDetectorOptions.Builder()
   .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
   .enableMultipleObjects()
   .enableClassification()
   .build()
val objectDetector = ObjectDetection.getClient(options)

Step 3: Feed image(s) to the detector

Object detection and classification is async processing:

  • you send an image to detector (via process())
  • detector works pretty hard on it
  • detector reports the result back to you via a callback

The following code does just that (copy and append it to the existing code inside fun runObjectDetection(bitmap:Bitmap)):

// Step 3: feed given image to detector and setup callback
objectDetector.process(image)
   .addOnSuccessListener {
       // Task completed successfully
        debugPrint(it)
   }
   .addOnFailureListener {
       // Task failed with an exception
       Log.e(TAG, it.message.toString())
   }

Upon completion, detector notifies you with

  1. Total number of objects detected
  2. Each detected object is described with
  • trackingId: an integer you use to track it cross frames (NOT used in this codelab )
  • boundingBox: object's bounding box
  • labels: list of label(s) for the detected object (only when classification is enabled)
  • index (Get the index of this label)
  • text (Get the text of this label including "Fashion Goods", "Food", "Home Goods", "Place", "Plant")
  • confidence ( a float between 0.0 to 1.0 with 1.0 means 100%)

You have probably noticed that the code does a printf kind of processing for the detected result with debugPrint(). Add it into MainActivity class:

private fun debugPrint(detectedObjects: List<DetectedObject>) {
   detectedObjects.forEachIndexed { index, detectedObject ->
       val box = detectedObject.boundingBox

       Log.d(TAG, "Detected object: $index")
       Log.d(TAG, " trackingId: ${detectedObject.trackingId}")
       Log.d(TAG, " boundingBox: (${box.left}, ${box.top}) - (${box.right},${box.bottom})")
       detectedObject.labels.forEach {
           Log.d(TAG, " categories: ${it.text}")
           Log.d(TAG, " confidence: ${it.confidence}")
       }
   }
}

Now you are ready to accept images for detection! Let's run the codelab by clicking Run ( execute.png) in Android Studio toolbar. Try selecting a preset image or take a photo, then look at the logcat window( 16bd6ea224cf8cf1.png*)* inside the IDE. you should see something similar to this:

D/MLKit Object Detection: Detected object: 0
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (481, 2021) - (2426,3376)
D/MLKit Object Detection:  categories: Food
D/MLKit Object Detection:  confidence: 0.90234375
D/MLKit Object Detection: Detected object: 1
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (2639, 2633) - (3058,3577)
D/MLKit Object Detection: Detected object: 2
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (3, 1816) - (615,2597)
D/MLKit Object Detection:  categories: Home good
D/MLKit Object Detection:  confidence: 0.75390625

which means that detector saw 3 objects of:

  • categories are Food and Home good.
  • there is no category returned for the 2nd because it is an unknown class.
  • no trackingId (because this is single image detection mode)
  • position inside the boundingBox rectangle (e.g. (481, 2021) – (2426, 3376))
  • detector is pretty confident that the 1st is a Food (90%) (it was salad)

Technically that is all that you need to get ML Kit Object Detection to work: you got it all at this moment! Congratulations!

On the UI side, you are still at the stage when you started, but you could make use of the detected results on the UI such as drawing out the bounding box to create a better experience: let's go to the next step – post process the detected results!

In previous steps, you print the detected result into logcat: simple and fast. In this section, you would make use of the result into the image:

  • draw the bounding box on image
  • draw the category name and confidence inside bounding box

Understand the visualization utilities

There is some boilerplate code inside the codelab to help you visualize the detection result. We will leverage these utilities to make our visualization code simple.

  • data class BoxWithText(val box: Rect, val text: String) This is a data class to store an object detection result for visualization. box is the bounding box where the object locates, and text is the detection result string to display together with the object's bounding box.
  • fun drawDetectionResult(bitmap: Bitmap, detectionResults: List<BoxWithText>): Bitmap This method draws the object detection results in detectionResults on the input bitmap and returns the modified copy of it.

Here is an example of an output of the drawDetectionResult utility method.

58c6f1d4ddb00dfa.png

Visualize the ML Kit detection result

Let's use the visualization utilities to draw the ML Kit object detection result on top of the input image. Go to where you call debugPrint() and add the following code snippet below it.

// Parse ML Kit's DetectedObject and create corresponding visualization data
val detectedObjects = it.map { obj ->
    var text = "Unknown"

    // We will show the top confident detection result if it exist
    if (obj.labels.isNotEmpty()) {
        val firstLabel = obj.labels.first()
        text = "${firstLabel.text}, ${firstLabel.confidence.times(100).toInt()}%"
    }
    BoxWithText(obj.boundingBox, text)
}

// Draw the detection result on the input bitmap
val visualizedResult = drawDetectionResult(bitmap, detectedObjects)

// Show the detection result on the app screen
runOnUiThread {
    inputImageView.setImageBitmap(visualizedResult)
}
  • We start with parsing the ML Kit's DetectedObject and create a list of BoxWithText objects to display the visualization result.
  • Then we draw the detection result on top of the input image using the drawDetectionResult utility method, and show it on the screen.

Run it

Now click Run ( execute.png) in the Android Studio toolbar. Once the app loads, press the Button with the camera icon, point your camera to an object, take a photo, accept the photo (in Camera App) or you can easily tap any preset images, you would see the detection result; press the Button again or select another image to repeat a couple of times to experience the latest ML Kit ODT!

a03109cb30d5014d.png

You have used ML Kit to add Object Detection capabilities to your app:

  • 3 steps with 3 APIs
  • Create Input Image
  • Create Detector
  • Send Image to Detector

That is all you need to get it up and running!

As you proceed, you might like to enhance the model: as you can see that the default model could only recognize 5 categories – the model does not even know knife, fork and bottle; check out the other codelab in our On-device Machine Learning - Object Detection learning pathway to learn how you can train a custom model.

What we've covered

  • How to add ML Kit Object Detection and Tracking to your Android app
  • How to use on-device object detection and tracking in ML Kit to detect objects in images

Next Steps

  • Explore more with ML Kit ODT with more images and live video to experience detection & classification accuracy and performance
  • Check out the On-device Machine Learning - Object Detection learning pathway to learn how to train a custom model
  • Apply ML Kit ODT in your own Android app

Learn More