Create a custom object detection web app with MediaPipe

1. Before you begin

MediaPipe Solutions lets you apply machine-learning (ML) solutions to your apps. It provides a framework that lets you configure prebuilt processing pipelines that deliver immediate, engaging, and useful output to users. You can even customize these solutions with Model Maker to update the default models.

Object detection is one of several ML vision tasks that MediaPipe Solutions offers. MediaPipe Tasks is available for Android, Python, and the web.

In this codelab, you add object detection to a web app to detect dogs in images and a live webcam video.

What you'll learn

  • How to incorporate an object detection task in a web app with MediaPipe Tasks.

What you'll build

  • A web app that detects the presence of dogs. You can also customize a model to detect a class of objects of your choice with MediaPipe Model Maker.

What you'll need

  • A CodePen account
  • A device with a web browser
  • Basic knowledge of JavaScript, CSS, and HTML

2. Get set up

This codelab runs your code in CodePen,​ a social development environment that lets you write code in the browser and check the results as you build.

To get set up, follow these steps:

  1. In your CodePen account, navigate to this CodePen. You use this code as a starting base to create your own object detector.
  2. At the bottom of CodePen in the navigation menu, click Fork to make a copy of the starter code.

The navigation menu in CodePen where the Fork button is located

  1. In the JS tab, click the b15acb07e6357dce.png expander arrow and then select Maximize JavaScript editor. You only edit the work in the JS tab for this codelab, so you don't need to see the HTML or CSS tabs.

Review the starter app

  1. In the preview pane, notice that there are two images of dogs and an option to run your webcam. The model that you use in this tutorial was trained on the three dogs displayed in the two images.

A preview of the web app from the starter code

  1. In the JS tab, notice that there are several comments throughout the code. For example, you can find the following comment on line 15:
// Import the required package.

These comments indicate where you need to insert code snippets.

3. Import the MediaPipe tasks-vision package and add the required variables

  1. In the JS tab, import the MediaPipe tasks-vision package:
// Import the required package.
​​import { ObjectDetector, FilesetResolver, Detection } from "https://cdn.skypack.dev/@mediapipe/tasks-vision@latest";

This code uses the Skypack content-delivery network (CDN) to import the package. For more information about how to use Skypack with CodePen, see Skypack + CodePen.

In your projects, you can use Node.js with npm or the package manager or CDN of your choice. For more information about the required package that you need to install, see JavaScript packages.

  1. Declare variables for the object detector and running mode:
// Create required variables.
let objectDetector = null;
let runningMode = "IMAGE";

The runningMode variable is a string that's set to an "IMAGE" value when you detect objects in images or a "VIDEO" value when you detect objects in video.

4. Initialize the object detector

  • To initialize the object detector, add the following code after the relevant comment in the JS tab:
// Initialize the object detector.
async function initializeObjectDetector() {
  const visionFilesetResolver = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );
  objectDetector = await ObjectDetector.createFromOptions(visionFilesetResolver, {
    baseOptions: {
      modelAssetPath: "https://storage.googleapis.com/mediapipe-assets/dogs.tflite"
    },
    scoreThreshold: 0.3,
    runningMode: runningMode
  });
}
initializeObjectDetector();

The FilesetResolver.forVisionTasks() method specifies the location of the WebAssembly (Wasm) binary for the task.

The ObjectDetector.createFromOptions() method instantiates the object detector. You must provide a path to the model used for detection. In this case, the dog-detection model is hosted on Cloud Storage.

The scoreThreshold property is set to a 0.3 value. This means that the model returns results for any object detected with a confidence level of 30% or greater. You can adjust this threshold to suit the needs of your app.

The runningMode property is set upon initialization of the ObjectDetector object. You can change this and other options as needed later.

5. Run predictions on images

  • To run predictions on images, navigate to the handleClick() function and then add the following code to the function's body:
// Verify object detector is initialized and choose the correct running mode.
if (!objectDetector) {
    alert("Object Detector still loading. Please try again");
    return;
  }

  if (runningMode === "VIDEO") {
    runningMode = "IMAGE";
    await objectDetector.setOptions({ runningMode: runningMode });
  }

This code determines whether the object detector is initialized and ensures that the running mode is set for images.

Detect objects

  • To detect objects in images, add the following code to the handleClick() function's body:
// Run object detection.
  const detections = objectDetector.detect(event.target);

The following code snippet includes an example of the output data from this task:

ObjectDetectionResult:
 Detection #0:
  Box: (x: 355, y: 133, w: 190, h: 206)
  Categories:
   index       : 17
   score       : 0.73828
   class name  : aci
 Detection #1:
  Box: (x: 103, y: 15, w: 138, h: 369)
  Categories:
   index       : 17
   score       : 0.73047
   class name  : tikka

Process and display predictions

  1. At the end of the handleClick() function's body, call the displayImageDetections() function:
// Call the displayImageDetections() function.
displayImageDetections(detections, event.target);
  1. In the displayImageDetections() function's body, add the following code to display object-detection results:
// Display object detection results.
  
  const ratio = resultElement.height / resultElement.naturalHeight;

  for (const detection of result.detections) {
    // Description text
    const p = document.createElement("p");
    p.setAttribute("class", "info");
    p.innerText =
      detection.categories[0].categoryName +
      " - with " +
      Math.round(parseFloat(detection.categories[0].score) * 100) +
      "% confidence.";
    // Positioned at the top-left of the bounding box.
    // Height is that of the text.
    // Width subtracts text padding in CSS so that it fits perfectly.
    p.style =
      "left: " +
      detection.boundingBox.originX * ratio +
      "px;" +
      "top: " +
      detection.boundingBox.originY * ratio +
      "px; " +
      "width: " +
      (detection.boundingBox.width * ratio - 10) +
      "px;";
    const highlighter = document.createElement("div");
    highlighter.setAttribute("class", "highlighter");
    highlighter.style =
      "left: " +
      detection.boundingBox.originX * ratio +
      "px;" +
      "top: " +
      detection.boundingBox.originY * ratio +
      "px;" +
      "width: " +
      detection.boundingBox.width * ratio +
      "px;" +
      "height: " +
      detection.boundingBox.height * ratio +
      "px;";

    resultElement.parentNode.appendChild(highlighter);
    resultElement.parentNode.appendChild(p);
  }

This function displays bounding boxes over the objects detected in the images. It removes any previous highlighting, and then creates and displays <p> tags to highlight each object that's detected.

Test the app

When you make changes to your code in CodePen, the preview pane automatically refreshes upon saving. If autosave is enabled, your app likely refreshed already, but it's a good idea to refresh again.

To test the app, follow these steps:

  1. In the preview pane, click each image to view the predictions. A bounding box shows the dog's name with the model's confidence level.
  2. If there isn't a bounding box, open Chrome DevTools and then check the Console panel for errors or review the previous steps to ensure that you didn't miss anything.

A preview of the web app with bounding boxes over the dogs detected in the images

6. Run predictions on a live webcam video

Detect objects

  • To detect objects in a live webcam video, navigate to the predictWebcam() function and then add the following code to the function's body:
// Run video object detection.
  // If image mode is initialized, create a classifier with video runningMode.
  if (runningMode === "IMAGE") {
    runningMode = "VIDEO";
    await objectDetector.setOptions({ runningMode: runningMode });
  }
  let nowInMs = performance.now();

  // Detect objects with the detectForVideo() method.
  const result = await objectDetector.detectForVideo(video, nowInMs);

  displayVideoDetections(result.detections);

Object detection for video uses the same methods regardless of whether you run inference on streaming data or a complete video. The detectForVideo() method is similar to the detect() method used for photos, but it includes an additional parameter for the timestamp associated with the current frame. The function performs detection live, so you pass the current time as the timestamp.

Process and display predictions

  • To process and display the detection results, navigate to the displayVideoDetections() function and then add the following code to the function's body:
//  Display video object detection results.
  for (let child of children) {
    liveView.removeChild(child);
  }
  children.splice(0);

  // Iterate through predictions and draw them to the live view.
  for (const detection of result.detections) {
    const p = document.createElement("p");
    p.innerText =
      detection.categories[0].categoryName +
      " - with " +
      Math.round(parseFloat(detection.categories[0].score) * 100) +
      "% confidence.";
    p.style =
      "left: " +
      (video.offsetWidth -
        detection.boundingBox.width -
        detection.boundingBox.originX) +
      "px;" +
      "top: " +
      detection.boundingBox.originY +
      "px; " +
      "width: " +
      (detection.boundingBox.width - 10) +
      "px;";

    const highlighter = document.createElement("div");
    highlighter.setAttribute("class", "highlighter");
    highlighter.style =
      "left: " +
      (video.offsetWidth -
        detection.boundingBox.width -
        detection.boundingBox.originX) +
      "px;" +
      "top: " +
      detection.boundingBox.originY +
      "px;" +
      "width: " +
      (detection.boundingBox.width - 10) +
      "px;" +
      "height: " +
      detection.boundingBox.height +
      "px;";

    liveView.appendChild(highlighter);
    liveView.appendChild(p);

    // Store drawn objects in memory so that they're queued to delete at next call.
    children.push(highlighter);
    children.push(p);
  }
}

This code removes any previous highlighting, and then creates and displays <p> tags to highlight each object that's detected.

Test the app

To test live object detection, it helps to have an image of one of the dogs on which the model was trained.

To test the app, follow these steps:

  1. Download one of the dog photos onto your phone.
  2. In the preview pane, click Enable webcam.
  3. If your browser presents a dialog that asks you to grant access to the webcam, grant permission.
  4. Hold the picture of the dog on your phone in front of your webcam. A bounding box shows the dog's name and the model's confidence level.
  5. If there isn't a bounding box, open Chrome DevTools and then check the Console panel for errors or review the previous steps to ensure that you didn't miss anything.

A bounding box over an image of a dog that's held up to a live webcam

7. Congratulations

Congratulations! You built a web app that detects objects in images. To learn more, see a completed version of the app on CodePen.

Learn more