Detect objects in images to build a visual product search with ML Kit: Android

1. Before you begin

727608486a28395d.png

Have you seen the Google Lens demo, where you can point your phone camera at an object and find where you can buy it online? If you want to learn how you can add the same feature to your app, then this codelab is for you. It is part of a learning pathway that teaches you how to build a product image search feature into a mobile app.

In this codelab, you will learn the first step to build a product image search feature: how to detect objects in images and let the user choose the objects they want to search for. You will use ML Kit Object Detection and Tracking to build this feature.

You can learn about the remaining steps, including how to build a product search backend with Vision API Product Search, in the learning pathway.

What you'll build

  • In this codelab, you're going to build an Android app with ML Kit. Your app will use the ML Kit Object Detection and Tracking API to detect objects in a given image. Then the user will pick an object that they want to search for in our product database.
  • In the end, you should see something similar to the image on the right.

What you'll learn

  • How to integrate ML Kit SDK into your Android application
  • ML Kit Object Detection and Tracking API

What you'll need

  • A recent version of Android Studio (v4.1.2+)
  • Android Studio Emulator or a physical Android device
  • The sample code
  • Basic knowledge of Android development in Kotlin

This codelab is focused on ML Kit. Other concepts and code blocks are not studied and are provided for you to simply copy and paste.

2. Get set up

Download the Code

Click the following link to download all the code for this codelab:

Unpack the downloaded zip file. This will unpack a root folder (odml-pathways-main) with all of the resources you will need. For this codelab, you will only need the sources in the product-search/codelab1/android subdirectory.

The object-detection subdirectory in the mlkit-android repository contains two directories:

  • android_studio_folder.pngstarter—Starting code that you build upon for this codelab.
  • android_studio_folder.pngfinal—Completed code for the finished sample app.

3. Add ML Kit Object Detection and Tracking API to the project

Import the app into Android Studio

Start by importing the starter app into the Android Studio.

Go to Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the starter folder from the source code that you have downloaded earlier.

7c0f27882a2698ac.png

Add the dependencies for ML Kit Object Detection and Tracking

The ML Kit dependencies allow you to integrate the ML Kit ODT SDK in your app.

Go to the app/build.gradle file of your project and confirm that the dependency is already there:

build.gradle

dependencies {
  // ...
  implementation 'com.google.mlkit:object-detection:16.2.4'
}

Sync your project with gradle files

To be sure that all dependencies are available to your app, you should sync your project with gradle files at this point.

Select Sync Project with Gradle Files ( b451ab2d04d835f9.png) from the Android Studio toolbar.

(If this button is disabled, make sure you import only starter/app/build.gradle, not the entire repository.)

4. Run the starter app

Now that you have imported the project into Android Studio and added the dependencies for ML Kit Object Detection and Tracking, you are ready to run the app for the first time.

Connect your Android device via USB to your host or Start the Android Studio emulator, and click Run ( execute.png) in the Android Studio toolbar.

Run and explore the app

The app should launch on your Android device. It has some boilerplate code to allow you to capture a photo, or select a preset image, and feed it to an object detection and tracking pipeline that you'll build in this codelab. Explore the app a little bit before writing code:

First, there is a Button ( c6d965d639c3646.png) at the bottom to

  • launch the camera app integrated in your device/emulator
  • take a photo inside your camera app
  • receive the captured image in starter app
  • display the image

Try out the "Take photo" button. Follow the prompts to take a photo, accept the photo and observe it displayed inside the starter app.

Second, there are 3 preset images that you can choose from. You can use these images later to test the object detection code if you are running on an Android emulator.

  1. Select an image from the 3 preset images.
  2. See that the image shows up in the larger view.

1290481786af21b9.png

5. Add on-device object detection

In this step, you'll add the functionality to the starter app to detect objects in images. As you saw in the previous step, the starter app contains boilerplate code to take photos with the camera app on the device. There are also 3 preset images in the app that you can try object detection on, if you are running the codelab on an Android emulator.

When you select an image, either from the preset images or by taking a photo with the camera app, the boilerplate code decodes that image into a Bitmap instance, shows it on the screen and calls the runObjectDetection method with the image.

In this step, you will add code to the runObjectDetection method to do object detection!

Set up and run on-device object detection on an image

There are only 3 simple steps with 3 APIs to set up ML Kit ODT

  • prepare an image: InputImage
  • create a detector object: ObjectDetection.getClient(options)
  • connect the 2 objects above: process(image)

You achieve these inside the function **runObjectDetection(bitmap: Bitmap)**in file MainActivity.kt.

/**
 * ML Kit Object Detection Function
 */
private fun runObjectDetection(bitmap: Bitmap) {
}

Right now the function is empty. Move on to the following steps to integrate ML Kit ODT! Along the way, Android Studio would prompt you to add the necessary imports

  • com.google.mlkit.vision.common.InputImage
  • com.google.mlkit.vision.objects.ObjectDetection
  • com.google.mlkit.vision.objects.defaults.ObjectDetectorOptions

Step 1: Create an InputImage

ML Kit provides a simple API to create an InputImage from a Bitmap. Then you can feed an InputImage into the ML Kit APIs.

// Step 1: create ML Kit's InputImage object
val image = InputImage.fromBitmap(bitmap, 0)

Add the above code to the top of runObjectDetection(bitmap:Bitmap).

Step 2: Create a detector instance

ML Kit follows Builder Design Pattern, you would pass the configuration to the builder, then acquire a detector from it. There are 3 options to configure (the one in bold is used in codelab):

  • detector mode (single image or stream)
  • detection mode (single or multiple object detection)
  • classification mode (on or off)

This codelab is for single image - multiple object detection & classification, let's do that:

// Step 2: acquire detector object
val options = ObjectDetectorOptions.Builder()
   .setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
   .enableMultipleObjects()
   .enableClassification()
   .build()
val objectDetector = ObjectDetection.getClient(options)

Step 3: Feed image(s) to the detector

Object detection and classification is async processing:

  • you send an image to detector (via process())
  • detector works pretty hard on it
  • detector reports the result back to you via a callback

The following code does just that (copy and append it to the existing code inside fun runObjectDetection(bitmap:Bitmap)):

// Step 3: feed given image to detector and setup callback
objectDetector.process(image)
   .addOnSuccessListener {
       // Task completed successfully
        debugPrint(it)
   }
   .addOnFailureListener {
       // Task failed with an exception
       Log.e(TAG, it.message.toString())
   }

Upon completion, detector notifies you with

  1. Total number of objects detected
  2. Each detected object is described with
  • trackingId: an integer you use to track it cross frames (NOT used in this codelab)
  • boundingBox: object's bounding box
  • labels: list of label(s) for the detected object (only when classification is enabled)
  • index (Get the index of this label)
  • text (Get the text of this label including "Fashion Goods", "Food", "Home Goods", "Place", "Plant")
  • confidence (a float between 0.0 to 1.0 with 1.0 means 100%)

You have probably noticed that the code prints detected results to Logcat with debugPrint(). Add it into MainActivity class:

private fun debugPrint(detectedObjects: List<DetectedObject>) {
   detectedObjects.forEachIndexed { index, detectedObject ->
       val box = detectedObject.boundingBox

       Log.d(TAG, "Detected object: $index")
       Log.d(TAG, " trackingId: ${detectedObject.trackingId}")
       Log.d(TAG, " boundingBox: (${box.left}, ${box.top}) - (${box.right},${box.bottom})")
       detectedObject.labels.forEach {
           Log.d(TAG, " categories: ${it.text}")
           Log.d(TAG, " confidence: ${it.confidence}")
       }
   }
}

Now you are ready to accept images for detection!

Run the codelab by clicking Run ( execute.png) in the Android Studio toolbar. Try selecting a preset image or take a photo, then look at the logcat window( 16bd6ea224cf8cf1.png) inside the IDE. You should see something similar to this:

D/MLKit Object Detection: Detected object: 0
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (481, 2021) - (2426,3376)
D/MLKit Object Detection:  categories: Fashion good
D/MLKit Object Detection:  confidence: 0.90234375
D/MLKit Object Detection: Detected object: 1
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (2639, 2633) - (3058,3577)
D/MLKit Object Detection: Detected object: 2
D/MLKit Object Detection:  trackingId: null
D/MLKit Object Detection:  boundingBox: (3, 1816) - (615,2597)
D/MLKit Object Detection:  categories: Home good
D/MLKit Object Detection:  confidence: 0.75390625

which means that detector saw 3 objects of:

  • categories are Fashion good and Home good.
  • there is no category returned for the 2nd because it is an unknown class.
  • no trackingId (because this is single image detection mode)
  • position inside the boundingBox rectangle (e.g. (481, 2021) – (2426, 3376))
  • detector is pretty confident that the 1st is a Fashion good (90%) (it is a dress)

Technically that is all that you need to get ML Kit Object Detection to work— you got it all at this moment! Congratulations!

Yeah, on the UI side, you are still at the stage when you started, but you could make use of the detected results on the UI such as drawing out the bounding box to create a better experience. The next step is to visualize the detected results!

6. Post-processing the detection results

In previous steps, you printed the detected result into logcat: simple and fast.

In this section, you will make use of the result in the image:

  • draw the bounding box on image
  • draw the category name and confidence inside bounding box

Understand the visualization utilities

There is some boilerplate code inside the codelab to help you visualize the detection result. Leverage these utilities to make our visualization code simple:

  • class ImageClickableView This is an image view class that provides some convenient utils for visualization and interaction with the detection result.
  • fun drawDetectionResults(results: List<DetectedObject>) This method draws white circles at the center of each object detected.
  • fun setOnObjectClickListener(listener: ((objectImage: Bitmap) -> Unit)) This is a callback to receive the cropped image that contains only the object that the user has tapped on. You will send this cropped image to the image search backend in a later codelab to get a visually similar result. In this codelab, you won't use this method yet.

Show the ML Kit detection result

Use the visualization utilities to show the ML Kit object detection result on top of the input image.

Go to where you call debugPrint() and add the following code snippet below it:

runOnUiThread {
    viewBinding.ivPreview.drawDetectionResults(it)
}

Run it

Now click Run ( execute.png) in the Android Studio toolbar.

Once the app loads, press the Button with the camera icon, point your camera to an object, take a photo, accept the photo (in Camera App) or you can easily tap any preset images. You should see the detection result; press the Button again or select another image to repeat a couple of times, and experience the latest ML Kit ODT!

5027148750dc0748.png

7. Congratulations!

You have used ML Kit to add Object Detection capabilities to your app:

  • 3 steps with 3 APIs
  • Create Input Image
  • Create Detector
  • Send Image to Detector

That is all you need to get it up and running!

What we've covered

  • How to add ML Kit Object Detection and Tracking to your Android app
  • How to use on-device object detection and tracking in ML Kit to detect objects in images

Next Steps

  • Try this codelab about how to send the detected object to a product search backend and show the search results
  • Explore more with ML Kit ODT with more images and live video to experience detection & classification accuracy and performance
  • Check out the Go further with object detection learning pathway to learn how to train a custom model
  • Read about the Material Design recommendations for object detection live-camera and static-image
  • Apply ML Kit ODT in your own Android app

Learn More