The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In this codelab you will focus on using the Vision API with Python. You will learn how to use several of the API's features, namely label annotations, OCR/text extraction, landmark detection, and detecting facial features!

What you'll learn

What you'll need

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

Self-paced environment setup

  1. Sign in to Cloud Console and create a new project or reuse an existing one. (If you don't already have a Gmail or G Suite account, you must create one.)

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

  1. Next, you'll need to enable billing in Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running.

New users of Google Cloud are eligible for a $300 free trial.

Start Cloud Shell

Activate Cloud Shell

  1. From the Cloud Console, click Activate Cloud Shell .

If you've never started Cloud Shell before, you'll be presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:

It should only take a few moments to provision and connect to Cloud Shell.

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  1. Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

This codelab requires you to use the Python language (although many languages are supported by the Google APIs client libraries, so feel free to build something equivalent in your favorite development tool and simply use the Python as pseudocode). In particular, this codelab supports Python 2 and 3, but we recommend moving to 3.x as soon as possible.

The Cloud Shell is a convenience available for users directly from the Cloud Console and doesn't require a local development environment, so this tutorial can be done completely in the cloud with a web browser. The Cloud Shell is especially useful if you're developing or plan to continue developing with GCP products & APIs. More specifically for this codelab, the Cloud Shell has already pre-installed both versions of Python.

The Cloud Shell also has IPython installed... it is a higher-level interactive Python interpreter which we recommend, especially if you are part of the data science or machine learning community. If you are, IPython is the default interpreter for Jupyter Notebooks as well as Colab, Jupyter Notebooks hosted by Google Research.

IPython favors a Python 3 interpreter first but falls back to Python 2 if 3.x isn't available. IPython can be accessed from the Cloud Shell but can also be installed in a local development environment. Exit with ^D (Ctrl-d) and accept the offer to exit. Example output of starting ipython will look like this:

$ ipython
Python 3.7.3 (default, Mar  4 2020, 23:11:43)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

If IPython isn't your preference, use of a standard Python interactive interpreter (either the Cloud Shell or your local development environment) is perfectly acceptable (also exit with ^D):

$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
$ python3
Python 3.7.3 (default, Mar 10 2020, 02:33:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The codelab also assumes you have the pip installation tool (Python package manager and dependency resolver). It comes bundled with versions 2.7.9+ or 3.4+. If you have an older Python version, see this guide for installation instructions. Depending on your permissions you may need to have sudo or superuser access, but generally this isn't the case. You can also explicitly use pip2 or pip3 to execute pip for specific Python versions.

The remainder of the codelab assumes you're using Python 3—specific instructions will be provided for Python 2 if they differ significantly from 3.x.

*Create and use virtual environments

This section is optional and only really required for those who must use a virtual environment for this codelab (per the warning sidebar above). If you only have Python 3 on your computer, you can simply issue this command to create a virtualenv called my_env (you can choose another name if desired):

virtualenv my_env

However, if you have both Python 2 & 3 on your computer, we recommend you install a Python 3 virtualenv which you can do with the -p flag like this:

virtualenv -p python3 my_env

Enter your newly created virtualenv by "activating" it like this:

source my_env/bin/activate

Confirm you're in the environment by observing your shell prompt is now preceded with your environment name, i.e.,

(my_env) $ 

Now you should be able to pip install any required packages, execute code within this eivonment, etc. Another benefit is that if you completely mess it up, get into a situation where your Python installation is corrupted, etc., you can blow away this entire environment without affecting the rest of your system.

Before you can begin using Google APIs, you must enable them. The example below shows what you would do to enable the Cloud Vision API. In this codelab, you may be using one or more APIs, and should follow similar steps to enable them before usage.

From Cloud Shell

Using Cloud Shell, you can enable the API by using the following command:

gcloud services enable vision.googleapis.com

From the Cloud Console

You may also enable the Vision API in the API Manager. From the Cloud Console, go to API Manager and select, "Library."

In the search bar, start typing, "vision," then select Vision API when it appears. It may look something like this as you're typing:

Select the Cloud Vision API to get the dialog you see below, then click the "Enable" button:

Cost

While many Google APIs can be used without fees, use of GCP (products & APIs) is not free. When enabling the Vision API (as described above), you may be asked for an active billing account. The Vision API's pricing information should be referenced by the user before enabling. Keep in mind that certain Google Cloud Platform (GCP) products feature an "Always Free" tier for which you have to exceed in order to incur billing. For the purposes of the codelab, each call to the Vision API counts against that free tier, and so long as you stay within its limits in aggregate (within each month), you should not incur any charges.

Some Google APIs, i.e., G Suite, has usage covered by a monthly subscription, so there's no direct billing for use of the Gmail, Google Drive, Calendar, Docs, Sheets, and Slides APIs, for example. Different Google products are billed differently, so be sure to reference your API's documentation for that information.

Summary

In this codelab, you only need to turn on the Cloud Vision API, so proceed forward with this tutorial once you've successfully followed the instructions above and enabled the API.

In order to make requests to the APIs, your application needs to have the proper authorization. Authentication, a similar word, describes login credentials—you authenticate yourself when logging into your Google account with a login & password. Once authenticated, the next step is whether you are—or rather, your code, is—authorized to access data, such as blob files on Cloud Storage or a user's personal files on Google Drive.

Google APIs support several types of authorization, but the one most common for GCP API users is service account authorization since applications like the one in this codelab runs in the cloud as a "robot user." While the Vision API supports API key authorization as well, it's strongly recommended that users employ a more secure form of authorization.

A service account is an account that belong to your project or application (rather than a user) that is used by the client library to make Vision API requests. Like a user account, a service account is represented by an email address. You can create service account credentials from either the command line (via gcloud) or in the Cloud Console. Let's take a look at both below.

Using gcloud (in Cloud Shell or your dev environment)

In this section, you will use the gcloud tool to create a service account then create the credentials needed to access the API. First you will set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export PROJECT_ID=$(gcloud config get-value core/project)

Next, you will create a new service account to access the Vision API by using:

gcloud iam service-accounts create my-vision-sa \
  --display-name "my vision service account"

Next, you will create the private key credentials that your Python code will use to log in as your new service account. Create these credentials and save it as JSON file ~/key.json by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-vision-sa@${PROJECT_ID}.iam.gserviceaccount.com

From the Cloud Console

To get OAuth2 credentials for user authorization, go back to the API manager (shortcut link: console.developers.google.com) and select the "Credentials" tab on the left-nav:

From the Credentials page, click on the "+ Create Credentials" button at the top, which then gives you a pulldown dialog where you'd choose "Service account:"

On the "Create service account" screen (similar to the below), you must enter a Service account name (choose something short but explanatory like "svc acct vision" or the one we used with gcloud above, "my vision sa". A Service account ID is also required, and the form will create a valid ID string similar to the name you chose. The Service account description field is optional, but you can specify something like, "Service account for Vision API demo". Click the "Create" button when complete.

The next step is to grant service account access to this project. Having a service account is great, but if it doesn't have permissions to access project resources, it's kind-of useless... it's like creating a new user who doesn't have any access.

Here, click on the "Select a role" pulldown menu. You'll see a variety of options (see below), some more granular than others. For this codelab, choose Project → Viewer. Then click Continue.

On this 3rd screen (see below), we will skip granting specific users access to this service account, but we do need to make a private key our application script can use to access the Vision API with. To that end, click the "+ Create Key" button.

Creating a key is straightforward on the next screen. Take the default of a JSON key structure. (P12 is only used for backwards-compatibility, so it is not recommended for new projects.) Click the "Create" button and save the private key file when prompted. The default filename will be long and possibly confusing, i.e., PROJECT_ID-HASH.json, so we recommend renaming it to something more digestible such as key.json or svc_acct.json.

Once the file is saved, you'll get the following confirmation message:

Click the "Close" button to complete this task from the console.

Summary

One last step whether you created your service account from the command-line or in the Cloud console: direct your cloud project to use this as the default service account private key to use for your application by assigning this file to the GOOGLE_APPLICATION_CREDENTIALS environment variable:

export GOOGLE_APPLICATION_CREDENTIALS=~/key.json

The environment variable should be set to the full path of the credentials JSON file you saved. It's not necessary to do so, but if you don't, you can only use that key file from the current working directory.

You can read more about authenticating the Google Cloud Vision API, including the other forms of authorization, i.e., API key, user authorization OAuth2 client ID, etc.

We're going to use the Vision API client library for Python which should already be installed in your Cloud Shell environment. Verify it's installed with with pip or pip3:

$ pip3 freeze | grep google-cloud-vision
google-cloud-vision==1.0.0

If you're using a local development environment or using a new virtual environment you just created, install/update the client library (including pip itself if necessary) with this command:

$ pip3 install -U pip google-cloud-vision
...
Successfully installed google-cloud-vision-1.0.0

Confirm the client library can be imported without issue like the below, and then you're ready to use the Vision API from real code!

$ python3 -c "import google.cloud.vision"
$

One of the Vision API's basic features is to identify objects or entities in an image, known as label annotation. Label detection identifies general objects, locations, activities, animal species, products, and more. The Vision API takes an input image and returns the most likely labels which apply to that image. It returns the top-matching labels along with a confidence score of a match to the image.

In this example, you will perform label detection on an image of a street scene in Shanghai. To do this, copy the following Python code into your IPython session (or drop it into a local file such as label_detect.py and run it normally):

from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-samples-data/vision/using_curl/shanghai.jpeg'

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = image_uri

response = client.label_detection(image=image)

print('Labels (and confidence score):')
print('=' * 30)
for label in response.label_annotations:
    print(label.description, '(%.2f%%)' % (label.score*100.))

You should see the following output:

Labels (and confidence score):
==============================
People (95.05%)
Street (89.12%)
Mode of transport (89.09%)
Transport (85.13%)
Vehicle (84.69%)
Snapshot (84.11%)
Urban area (80.29%)
Infrastructure (73.14%)
Road (72.74%)
Pedestrian (68.90%)

Summary

In this step, you were able to perform label detection on an image of a street scene in China and display the most likely labels associated with that image. Read more about Label Detection.

Text detection performs Optical Character Recognition (OCR). It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.

In this example, you will perform text detection on an image of an Otter Crossing. Copy the following snippet into your IPython session (or save locally as text_dectect.py):

from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-vision-codelab/otter_crossing.jpg'

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = image_uri

response = client.text_detection(image=image)

for text in response.text_annotations:
    print('=' * 30)
    print(text.description)
    vertices = ['(%s,%s)' % (v.x, v.y) for v in text.bounding_poly.vertices]
    print('bounds:', ",".join(vertices))

You should see the following output:

==============================
CAUTION
Otters crossing
for next 6 miles

('bounds:', '(61,243),(251,243),(251,340),(61,340)')
==============================
CAUTION
('bounds:', '(75,245),(235,243),(235,269),(75,271)')
==============================
Otters
('bounds:', '(65,296),(140,297),(140,315),(65,314)')
==============================
crossing
('bounds:', '(151,295),(247,297),(247,318),(151,316)')
==============================
for
('bounds:', '(61,322),(94,322),(94,340),(61,340)')
==============================
next
('bounds:', '(106,322),(156,322),(156,340),(106,340)')
==============================
6
('bounds:', '(167,321),(180,321),(180,339),(167,339)')
==============================
miles
('bounds:', '(191,321),(251,321),(251,339),(191,339)')

Summary

In this step, you were able to perform text detection on an image of an Otter Crossing and display the recognized text from the image. Read more about Text Detection.

Landmark detection detects popular natural and man-made structures within an image.

In this example, you will perform landmark detection on an image of the Eiffel Tower.

To perform landmark detection, copy the following Python code into your IPython session (or save locally as landmark_dectect.py).

from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-vision-codelab/eiffel_tower.jpg'

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = image_uri

response = client.landmark_detection(image=image)

for landmark in response.landmark_annotations:
    print('=' * 30)
    print(landmark)

You should see the following output:

==============================
mid: "/g/120xtw6z"
description: "Trocad\303\251ro Gardens"
score: 0.930072069168
bounding_poly {
  vertices {
    x: 303
    y: 54
  }
  vertices {
    x: 513
    y: 54
  }
  vertices {
    x: 513
    y: 353
  }
  vertices {
    x: 303
    y: 353
  }
}
locations {
  lat_lng {
    latitude: 48.8615963
    longitude: 2.2892823
  }
}

==============================
mid: "/m/02j81"
description: "Eiffel Tower"
score: 0.665995359421
bounding_poly {
  vertices {
    x: 440
    y: 72
  }
    .
    .
    .

Summary

In this step, you were able to perform landmark detection on an image of the Eiffel Tower. Read more about Landmark Detection.

Facial features detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.

In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.

To perform emotional face detection, copy the following Python code into your IPython session (or save locally as face_dectect.py):

from __future__ import print_function
from google.cloud import vision

uri_base = 'gs://cloud-vision-codelab'
pics = ('face_surprise.jpg', 'face_no_surprise.png')

client = vision.ImageAnnotatorClient()
image = vision.types.Image()

for pic in pics:
    image.source.image_uri = '%s/%s' % (uri_base, pic)
    response = client.face_detection(image=image)

    print('=' * 30)
    print('File:', pic)
    for face in response.face_annotations:
        likelihood = vision.enums.Likelihood(face.surprise_likelihood)
        vertices = ['(%s,%s)' % (v.x, v.y) for v in face.bounding_poly.vertices]
        print('Face surprised:', likelihood.name)
        print('Face bounds:', ",".join(vertices))

You should see the following output for our face_surprise and face_no_surprise examples:

==============================
File: face_surprise.jpg
Face surprised: LIKELY
Face bounds: (93,425),(520,425),(520,922),(93,922)
==============================
File: face_no_surprise.png
Face surprised: VERY_UNLIKELY
Face bounds: (120,0),(334,0),(334,198),(120,198)

Summary

In this step, you were able to perform emotional face detection. Read more about Facial Features Detection.

Congratulations... you learned how to use the Vision API with Python to perform several image detection features! Also check out the code samples in this codelab's open source repo—while the code in this tutorial works for both 2.x (2.6+) and 3.x, the code in the repo requires 3.6+.

Clean up

You're allowed to perform a fixed amount of (label, text/OCR, landmark, etc.) detection calls per month for free. Since you only incur charges each time you call the Vision API, there's no need to shut anything down nor must you disable/delete your project. More information on billing for the Vision API can be found on its pricing page.

In addition to the source code for the four examples you completed in this codelab, below are additional reading material as well as recommended exercises to augment your knowledge and use of the Vision API with Python.

Learn More

Additional Study

Now that you have some experience with the Vision API under your belt, below are some recommended exercises to further develop your skills:

  1. You've built separate scripts demoing individual features of the Vision API. Combine at least 2 of them into another script. For example, add OCR/text recognition to the first script that performs label detection (label_detect.py). You may be surprised to find there is text on one of the hats in that image!
  2. Instead of our random images available on Google Cloud Storage, write a script that uses one or more of your images on your local filesystem. Another similar exercise is to find images online (accessible via http://).
  3. Same as #2, but with local images on your filesystem. Note that #2 may be an easier first step before doing this one with local files.
  4. Try non-photographs to see how the API works with those.
  5. Migrate some of the script functionality into a microservice hosted on Google Cloud Functions, or in a web app or mobile backend running on Google App Engine.

If you're ready to tackle that last suggestion but can't think of any ideas, here are a pair to get your gears going:

  1. Analyze multiple images in a Cloud Storage bucket, a Google Drive folder (use the Drive API), or a directory on your local computer. Call the Vision API on each image, writing out data about each into a Google Sheet (use the Sheets API) or Excel spreadsheet. (NOTE: you may have to do some extra auth work as G Suite assets like Drive folders and Sheets spreadsheets generally belong to users, not service accounts.)
  2. Some people Tweet images (phone screenshots) of other tweets where the text of the original can't be cut-n-pasted or otherwise analyzed. Use the Twitter API to retrieve the referring tweet, extract and pass the tweeted image to the Vision API to OCR the text out of those images, then call the Cloud Natural Language API to perform sentiment analysis (to determine whether it's positive or negative) and entity extraction (search for entities/proper nouns) on them. (This is optional for the text in the referring tweet.)

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.