The Cloud Vision API lets you understand the content of an image by encapsulating powerful machine learning models in a simple REST API.

In this lab, we will send images to the Vision API and see it detect objects, faces, and landmarks.

What you'll learn

What you'll need

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would rate your experience with Google Cloud Platform?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!).

Running through this codelab shouldn't cost you more than a few cents, but it could be more if you decide to use more storage or if you do not delete your objects (see "Delete a bucket" section at the end of this document). Google Cloud Storage pricing is documented here.

New users of Google Cloud Platform are eligible for a $300 free trial.

Codelab-at-a-conference setup

The instructor will be sharing with you temporary accounts with existing projects that are already setup so you do not need to worry about enabling billing or any cost associated with running this codelab. Note that all these accounts will be disabled soon after the codelab is over.

Once you have received a temporary username / password to login from the instructor, log into the Google Cloud Console: https://console.cloud.google.com/.

Here's what you should see once logged in :

Click on the menu icon in the top left of the screen.

Select API Manager from the drop down.

Click on Enable API.

Then, search for "vision" in the search box. Click on Google Cloud Vision API:

Click Enable to enable the Cloud Vision API:

Wait for a few seconds for it to enable. You will see this once it's enabled:

Google Cloud Shell is a command line environment running in the Cloud. This Debian-based virtual machine is loaded with all the development tools you'll need (gcloud, bq, git and others) and offers a persistent 5GB home directory. We'll use Cloud Shell to create our request to the Speech API.

To get started with Cloud Shell, Click on the "Activate Google Cloud Shell" icon in top right hand corner of the header bar

A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. Wait until the user@project:~$ prompt appears

Since we'll be using curl to send a request to the Vision API, we'll need to generate an API key to pass in our request URL. To create an API key, navigate to the API Manager section of your project dashboard:

Then, navigate to the Credentials tab and click Create credentials:

In the drop down menu, select API key:

Next, copy the key you just generated.

Now that you have an API key, save it to an environment variable to avoid having to insert the value of your API key in each request. You can do this in Cloud Shell. Be sure to replace <your_api_key> with the key you just copied.

$ export API_KEY=<YOUR_API_KEY>

Creating a Cloud Storage bucket

There are two ways to send an image to the Vision API for image detection: by sending the API a base64 encoded image string, or passing it the URL of a file stored in Google Cloud Storage. We'll be using a Cloud Storage URL. The first step to do that is to create a Google Cloud Storage bucket to store our images.

Navigate to Storage in the Cloud console for your project:

Then click Create bucket. Give your bucket a unique name (such as your Project ID) and click Create.

Upload an image to your bucket

Right click on the following image of donuts, then click Save image as and save it to your Downloads folder as donuts.jpeg.

Navigate to the bucket you just created in the storage browser and click Upload files. Then select donuts.jpeg.

You should see the file in your bucket:

Now that you have the file in your bucket, you're ready to create a Vision API request, passing it the URL of this donuts picture.

In your Cloud Shell environment, create a request.json file with the following, making sure to replace my-bucket-name with the name of the Cloud Storage bucket you created:

request.json

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/donuts.jpeg"
          } 
        },
        "features": [
          {
            "type": "LABEL_DETECTION",
            "maxResults": 10
          }
        ]
      }
  ]
}

The first Cloud Vision API feature we'll explore is label detection. This method will return a list of labels (words) of what's in your image.

Before calling the Vision API, we need to update the access on the image in our bucket to allow the Vision API read access. Run the following command to allow global read access on this image, making sure to replace my-storage-bucket with the name of your bucket. This command uses gsutil, which lets you access Cloud Storage from the command line and comes installed with your Cloud Shell environment by default.

$ gsutil acl ch -g AllUsers:R gs://my-storage-bucket/donuts.jpeg

Now we're ready to call the Vision API with curl:

$ curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

Your response should look something like the following:

{
  "labelAnnotations": [
    {
      "mid": "/m/02wbm",
      "description": "Food",
      "score": 94
    },
    {
      "mid": "/m/0ggjl84",
      "description": "Baked Goods",
      "score": 90
    },
    {
      "mid": "/m/02q08p0",
      "description": "Dish",
      "score": 85
    },
    {
      "mid": "/m/0270h",
      "description": "Dessert",
      "score": 83
    },
    {
      "mid": "/m/0bp3f6m",
      "description": "Fried Food",
      "score": 75
    },
    {
      "mid": "/m/01wydv",
      "description": "Beignet",
      "score": 67
    },
    {
      "mid": "/m/0pqdc",
      "description": "Hors D Oeuvre",
      "score": 54
    }
  ]
}

The API was able to identify the specific type of donuts these are (beignets), cool! For each label the Vision API found, it returns a description with the name of the item. It also returns a score, a number from 0 - 100 indicating how confident it is that the description matches what's in the image. The mid value maps to the item's mid in Google's Knowledge Graph. You can use the mid when calling the Knowledge Graph API to get more information on the item.

Next we'll explore the face and landmark detection methods of the Vision API. The face detection method returns data on faces found in an image, including the emotions of the faces and their location in the image. Landmark detection can identify common (and obscure) landmarks - it returns the name of the landmark, its latitude longitude coordinates, and the location of where the landmark was identified in an image.

Upload a new image

To use these two new methods, let's upload a new image with faces and landmarks to our Cloud Storage bucket. Right click on the following image, then click Save image as and save it to your Downloads folder as selfie.jpeg.

Then upload it to your Cloud Storage bucket the same way you did in the previous step. After uploading the photo, update the access control on the new image with the following gsutil command:

$ gsutil acl ch -g AllUsers:R gs://my-storage-bucket/selfie.jpeg

Updating our request

Next, we'll update our request.json file to include the URL of the new image, and to use face and landmark detection instead of label detection. Be sure to replace my-bucket-name with the name of our Cloud Storage bucket:

request.json

{
  "requests": [
      {
        "image": {
          "source": {
              "gcsImageUri": "gs://my-bucket-name/selfie.jpeg"
          } 
        },
        "features": [
          {
            "type": "FACE_DETECTION"
          },
          {
            "type": "LANDMARK_DETECTION"
          }
        ]
      }
  ]
}

Calling the Vision API and parsing the response

Now you're ready to call the Vision API using the same curl command you used above:

$ curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json  https://vision.googleapis.com/v1/images:annotate?key=${API_KEY}

Let's take a look at the faceAnnotations object in our response first. You'll notice the API returns an object for each face found in the image - in this case, three. Here's a clipped version of our response:

{
      "faceAnnotations": [
        {
          "boundingPoly": {
            "vertices": [
              {
                "x": 669,
                "y": 324
              },
              ...
            ]
          },
          "fdBoundingPoly": {
            ...
          },
          "landmarks": [
            {
              "type": "LEFT_EYE",
              "position": {
                "x": 692.05646,
                "y": 372.95868,
                "z": -0.00025268539
              }
            },
            ...
          ],
          "rollAngle": 0.21619819,
          "panAngle": -23.027969,
          "tiltAngle": -1.5531756,
          "detectionConfidence": 0.72354823,
          "landmarkingConfidence": 0.20047489,
          "joyLikelihood": "POSSIBLE",
          "sorrowLikelihood": "VERY_UNLIKELY",
          "angerLikelihood": "VERY_UNLIKELY",
          "surpriseLikelihood": "VERY_UNLIKELY",
          "underExposedLikelihood": "VERY_UNLIKELY",
          "blurredLikelihood": "VERY_UNLIKELY",
          "headwearLikelihood": "VERY_LIKELY"
        }
        ...
     }
}

The boundingPoly gives us the x,y coordinates around the face in the image. fdBoundingPoly is a smaller box than boundingPoly, encodling on the skin part of the face. landmarks is an array of objects for each facial feature (some you may not have even known about!). This tells us the type of landmark, along with the 3D position of that feature (x,y,z coordinates) where the z coordinate is the depth. The remaining values gives us more details on the face, including the likelihood of joy, sorrow, anger, and surprise. The object above is for the person furthest back in the image - you can see he's making kind of a silly face which explains the joyLikelihood of POSSIBLE.

Next let's look at the landmarkAnnotations part of our response:

"landmarkAnnotations": [
        {
          "mid": "/m/0c7zy",
          "description": "Petra",
          "score": 0.5403372,
          "boundingPoly": {
            "vertices": [
              {
                "x": 153,
                "y": 64
              },
              ...
            ]
          },
          "locations": [
            {
              "latLng": {
                "latitude": 30.323975,
                "longitude": 35.449361
              }
            }
          ]

Here, the Vision API was able to tell that this picture was taken in Petra - this is pretty impressive given the visual clues in this image are minimal. The values in this response should look similar to the labelAnnotations response above.

We get the mid of the landmark, it's name (description), along with a confidence score. boundingPoly shows the region in the image where the landmark was identified. The locations key tells us the latitude longitude coordinates of this landmark.

We've looked at the Vision API's label, face, and landmark detection methods, but there are three others we haven't explored. Dive into the docs to learn about the other three:

You've learned how to analyze images with the Vision API. In this example you passed the API the Google Cloud Storage URL of your image. Alternatively, you can pass a base64 encoded string of your image.

What we've covered

Next Steps