1. Before you begin
ML Kit is a mobile SDK that brings Google's on-device machine learning expertise to Android and iOS apps. You can use the powerful yet simple to use Vision and Natural Language APIs to solve common challenges in your apps or create brand-new user experiences. All are powered by Google's best-in-class ML models and offered to you at no cost.
ML Kit's APIs all run on-device, allowing for real-time use cases where you want to process a live camera stream, for example. This also means that the functionality is available offline.
This codelab will walk you through simple steps to add Object Detection and Tracking (ODT) for a given image into your existing Android app. Please note that this codelab takes some shortcuts to highlight ML Kit ODT usage.
What you'll build
In this codelab, you're going to build an Android app with ML Kit. Your app will use the ML Kit Object Detection and Tracking API to detect objects in a given image.In the end, you should see something similar to the image on the right. |
What you'll learn
- How to integrate ML Kit SDK into your Android application
- ML Kit Object Detection and Tracking API
What you'll need
- A recent version of Android Studio (v4.1.2+)
- Android Studio Emulator or a physical Android device
- The sample code
- Basic knowledge of Android development in Kotlin
This codelab is focused on ML Kit. Non-relevant concepts and code blocks are glossed over and are provided for you to simply copy and paste.
2. Get set up
Download the Code
Click the following link to download all the code for this codelab:
Unpack the downloaded zip file. This will unpack a root folder (mlkit-android-main
) with all of the resources you will need. For this codelab, you will only need the sources in the object-detection
subdirectory.
The object-detection subdirectory in the mlkit-android repository contains two directories:
- starter—Starting code that you build upon for this codelab.
- final—Completed code for the finished sample app.
3. Add ML Kit Object Detection and Tracking API to the project
Import the app into Android Studio
Let's start by importing the starter app into Android Studio.
Open Android Studio, select Import Project (Gradle, Eclipse ADT, etc.) and choose the starter
folder from the source code that you have downloaded earlier.
Add the dependencies for ML Kit Object Detection and Tracking
The ML Kit dependencies allow you to integrate the ML Kit ODT SDK in your app. Add the following lines to the end of the app/build.gradle
file of your project:
build.gradle
dependencies {
// ...
implementation 'com.google.mlkit:object-detection:16.2.4'
}
Sync your project with gradle files
To be sure that all dependencies are available to your app, you should sync your project with gradle files at this point.
Select Sync Project with Gradle Files ( ) from the Android Studio toolbar.
(If this button is disabled, make sure you import only starter/app/build.gradle
, not the entire repository.)
4. Run the starter app
Now that you have imported the project into Android Studio and added the dependencies for ML Kit Object Detection and Tracking, you are ready to run the app for the first time.
Connect your Android device via USB to your host, or Start the Android Studio emulator, and click Run ( ) in the Android Studio toolbar.
Run and explore the app
The app should launch on your Android device. It has some boilerplate code to allow you to capture a photo, or select a preset image, and feed it to an object detection and tracking pipeline that you'll build in this codelab. Let's explore the app a little bit before writing code.
First, there is a Button ( ) at the bottom to:
- bring up the camera app integrated in your device/emulator
- take a photo inside your camera app
- receive the captured image in starter app
- display the image
Try out the Take photo button, follow the prompts to take a photo, accept the photo and observe it displayed inside the starter app.
Repeat a few times to see how it works:
Second, there are 3 preset images that you can choose from. You can use these images later to test the object detection code if you are running on an Android emulator.
Select an image from the 3 preset images. See that the image shows up in the larger view:
5. Add on-device object detection
In this step, you will add the functionality to the starter app to detect objects in images. As you saw in the previous step, the starter app contains boilerplate code to take photos with the camera app on the device. There are also 3 preset images in the app that you can try object detection on if you are running the codelab on an Android emulator.
When you have selected an image, either from the preset images or taking a photo with the camera app, the boilerplate code decodes that image into a Bitmap
instance, shows it on the screen and calls the runObjectDetection
method with the image.
In this step, you will add code to the runObjectDetection
method to do object detection!
Set up and run on-device object detection on an image
There are only 3 simple steps with 3 APIs to set up ML Kit ODT:
- prepare an image:
InputImage
- create a detector object:
ObjectDetection.getClient(options)
- connect the 2 objects above:
process(image)
You achieve these inside the function runObjectDetection(bitmap: Bitmap)
in file MainActivity.kt
.
/**
* ML Kit Object Detection Function
*/
private fun runObjectDetection(bitmap: Bitmap) {
}
Right now the function is empty. Move on to the following steps to implement ML Kit ODT! Along the way, Android Studio would prompt you to add the necessary imports:
com.google.mlkit.vision.common.InputImage
com.google.mlkit.vision.objects.ObjectDetection
com.google.mlkit.vision.objects.defaults.ObjectDetectorOptions
Step 1: Create an InputImage
ML Kit provides a simple API to create an InputImage
from a Bitmap
. Then you can feed an InputImage
into the ML Kit APIs.
// Step 1: create ML Kit's InputImage object
val image = InputImage.fromBitmap(bitmap, 0)
Add the above code to the top of runObjectDetection(bitmap:Bitmap)
.
Step 2: Create a detector instance
ML Kit follows Builder Design Pattern. You will pass the configuration to the builder, then acquire a detector from it. There are 3 options to configure (the options in bold are used in this codelab):
- detector mode (single image or stream)
- detection mode (single or multiple object detection)
- classification mode (on or off)
This codelab is for single image - multiple object detection & classification. Add that now:
// Step 2: acquire detector object
val options = ObjectDetectorOptions.Builder()
.setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
.enableMultipleObjects()
.enableClassification()
.build()
val objectDetector = ObjectDetection.getClient(options)
Step 3: Feed image(s) to the detector
Object detection and classification is async processing:
- You send an image to the detector (via
process()
). - The Detector works pretty hard on it.
- The Detector reports the result back to you via a callback.
The following code does just that (copy and append it to the existing code inside fun runObjectDetection(bitmap:Bitmap)):
// Step 3: feed given image to detector and setup callback
objectDetector.process(image)
.addOnSuccessListener {
// Task completed successfully
debugPrint(it)
}
.addOnFailureListener {
// Task failed with an exception
Log.e(TAG, it.message.toString())
}
Upon completion, detector notifies you with:
- The total number of objects detected. Each detected object is described with:
trackingId
: an integer you use to track it cross frames (NOT used in this codelab).boundingBox
: the object's bounding box.labels:
a list of label(s) for the detected object (only when classification is enabled):index
(Get the index of this label)text
(Get the text of this label including "Fashion Goods", "Food", "Home Goods", "Place", "Plant")confidence
( a float between 0.0 to 1.0 with 1.0 means 100%)
You have probably noticed that the code does a printf kind of processing for the detected result with debugPrint()
.
Add it into MainActivity
class:
private fun debugPrint(detectedObjects: List<DetectedObject>) {
detectedObjects.forEachIndexed { index, detectedObject ->
val box = detectedObject.boundingBox
Log.d(TAG, "Detected object: $index")
Log.d(TAG, " trackingId: ${detectedObject.trackingId}")
Log.d(TAG, " boundingBox: (${box.left}, ${box.top}) - (${box.right},${box.bottom})")
detectedObject.labels.forEach {
Log.d(TAG, " categories: ${it.text}")
Log.d(TAG, " confidence: ${it.confidence}")
}
}
}
Now you are ready to accept images for detection!
Let's run the codelab by clicking Run ( ) in the Android Studio toolbar. Try selecting a preset image, or take a photo, then look at the logcat window( ) inside the IDE.
You should see something similar to this:
D/MLKit Object Detection: Detected object: 0
D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (481, 2021) - (2426,3376)
D/MLKit Object Detection: categories: Food
D/MLKit Object Detection: confidence: 0.90234375
D/MLKit Object Detection: Detected object: 1
D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (2639, 2633) - (3058,3577)
D/MLKit Object Detection: Detected object: 2
D/MLKit Object Detection: trackingId: null
D/MLKit Object Detection: boundingBox: (3, 1816) - (615,2597)
D/MLKit Object Detection: categories: Home good
D/MLKit Object Detection: confidence: 0.75390625
...which means that the detector saw 3 objects:
- The categories are Food and Home good.
- There is no category returned for the 2nd because it is an unknown class.
- No
trackingId
(because this is the single image detection mode). - The position inside the
boundingBox
rectangle (e.g. (481, 2021) – (2426, 3376)) - The detector is pretty confident that the 1st is a Food (90% confidence—it was salad).
Technically that is all that you need to get ML Kit Object Detection to work: you got it all at this moment! Congratulations!
On the UI side, you are still at the stage when you started, but you could make use of the detected results on the UI such as drawing out the bounding box to create a better experience: let's go to the next step – post-process the detected results!
6. Post-processing the detection results
In previous steps, you print the detected result into logcat: simple and fast.
In this section, you'll make use of the result into the image:
- draw the bounding box on image
- draw the category name and confidence inside bounding box
Understand the visualization utilities
There is some boilerplate code inside the codelab to help you visualize the detection result. Leverage these utilities to make our visualization code simple:
data class BoxWithText(val box: Rect, val text: String)
This is a data class to store an object detection result for visualization.box
is the bounding box where the object locates, andtext
is the detection result string to display together with the object's bounding box.fun drawDetectionResult(bitmap: Bitmap, detectionResults: List<BoxWithText>): Bitmap
This method draws the object detection results indetectionResults
on the inputbitmap
and returns the modified copy of it.
Here is an example of an output of the drawDetectionResult
utility method:
Visualize the ML Kit detection result
Use the visualization utilities to draw the ML Kit object detection result on top of the input image.
Go to where you call debugPrint()
and add the following code snippet below it:
// Parse ML Kit's DetectedObject and create corresponding visualization data
val detectedObjects = it.map { obj ->
var text = "Unknown"
// We will show the top confident detection result if it exist
if (obj.labels.isNotEmpty()) {
val firstLabel = obj.labels.first()
text = "${firstLabel.text}, ${firstLabel.confidence.times(100).toInt()}%"
}
BoxWithText(obj.boundingBox, text)
}
// Draw the detection result on the input bitmap
val visualizedResult = drawDetectionResult(bitmap, detectedObjects)
// Show the detection result on the app screen
runOnUiThread {
inputImageView.setImageBitmap(visualizedResult)
}
- You start by parsing the ML Kit's
DetectedObject
and creating a list ofBoxWithText
objects to display the visualization result. - Then you draw the detection result on top of the input image, using the
drawDetectionResult
utility method, and show it on the screen.
Run it
Now click Run ( ) in the Android Studio toolbar.
Once the app loads, press the Button with the camera icon, point your camera to an object, take a photo, accept the photo (in Camera App) or you can easily tap any preset images. You should see the detection results; press the Button again or select another image to repeat a couple of times to experience the latest ML Kit ODT!
7. Congratulations!
You have used ML Kit to add Object Detection capabilities to your app:
- 3 steps with 3 APIs
- Create Input Image
- Create Detector
- Send Image to Detector
That is all you need to get it up and running!
As you proceed, you might like to enhance the model: as you can see that the default model can only recognize 5 categories— the model does not even know knife, fork and bottle. Check out the other codelab in our On-device Machine Learning - Object Detection learning pathway to learn how you can train a custom model.
What we've covered
- How to add ML Kit Object Detection and Tracking to your Android app
- How to use on-device object detection and tracking in ML Kit to detect objects in images
Next Steps
- Explore more with ML Kit ODT with more images and live video to experience detection & classification accuracy and performance
- Check out the On-device Machine Learning - Object Detection learning pathway to learn how to train a custom model
- Apply ML Kit ODT in your own Android app