Using the Vision API with Python

1. Overview

The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In this codelab you will focus on using the Vision API with Python. You will learn how to use several of the API's features, namely label annotations, OCR/text extraction, landmark detection, and detecting facial features!

What you'll learn

  • How to use Cloud Shell
  • How to Enable the Google Cloud Vision API
  • How to Authenticate API requests
  • How to install the Vision API client library for Python
  • How to perform Label detection
  • How to perform Text detection
  • How to perform Landmark detection
  • How to perform Face detection

What you'll need

  • A Google account (G Suite accounts may require administrator approval)
  • A Google Cloud Platform project with an active GCP billing account
  • Basic Python skills would be helpful but not required; this tutorial requires Python 2.6+ or 3.5+. This tutorial is also available in C#/.NET. If you know neither, you can just follow the tutorial in Python or C# but implement your code in any supported language.

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud services?

Novice Intermediate Proficient

2. Setup and Requirements

Self-paced environment setup

  1. Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

b35bf95b8bf3d5d8.png

a99b7ace416376c4.png

bd84a6d3004737c5.png

  • The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can update it at any time.
  • The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference the Project ID (it is typically identified as PROJECT_ID). If you don't like the generated ID, you may generate another random one. Alternatively, you can try your own and see if it's available. It cannot be changed after this step and will remain for the duration of the project.
  • For your information, there is a third value, a Project Number which some APIs use. Learn more about all three of these values in the documentation.
  1. Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab shouldn't cost much, if anything at all. To shut down resources so you don't incur billing beyond this tutorial, you can delete the resources you created or delete the whole project. New users of Google Cloud are eligible for the $300 USD Free Trial program.

Start Cloud Shell

Activate Cloud Shell

  1. From the Cloud Console, click Activate Cloud Shell 853e55310c205094.png.

55efc1aaa7a4d3ad.png

If you've never started Cloud Shell before, you're presented with an intermediate screen (below the fold) describing what it is. If that's the case, click Continue (and you won't ever see it again). Here's what that one-time screen looks like:

9c92662c6a846a5c.png

It should only take a few moments to provision and connect to Cloud Shell.

9f0e51b578fecce5.png

This virtual machine is loaded with all the development tools you need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with simply a browser or your Chromebook.

Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  1. Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`
  1. Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

3. Confirm Python environment

This codelab requires you to use the Python language (although many languages are supported by the Google APIs client libraries, so feel free to build something equivalent in your favorite development tool and simply use the Python as pseudocode). In particular, this codelab supports Python 2 and 3, but we recommend moving to 3.x as soon as possible.

The Cloud Shell is a convenient tool available for users directly from the Cloud Console and doesn't require a local development environment, so this tutorial can be done completely in the cloud with a web browser. More specifically for this codelab, the Cloud Shell has already pre-installed both versions of Python.

The Cloud Shell also has IPython installed: it is a higher-level interactive Python interpreter which we recommend, especially if you are part of the data science or machine learning community. If you are, IPython is the default interpreter for Jupyter Notebooks as well as Colab, Jupyter Notebooks hosted by Google Research.

IPython favors a Python 3 interpreter first but falls back to Python 2 if 3.x isn't available. IPython can be accessed from the Cloud Shell but can also be installed in a local development environment. Exit with ^D (Ctrl-d) and accept the offer to exit. Example output of starting ipython will look like this:

$ ipython
Python 3.7.3 (default, Mar  4 2020, 23:11:43)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]:

If IPython isn't your preference, use of a standard Python interactive interpreter (either the Cloud Shell or your local development environment) is perfectly acceptable (also exit with ^D):

$ python
Python 2.7.13 (default, Sep 26 2018, 18:42:22)
[GCC 6.3.0 20170516] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 
$ python3
Python 3.7.3 (default, Mar 10 2020, 02:33:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

The codelab also assumes you have the pip installation tool (Python package manager and dependency resolver). It comes bundled with versions 2.7.9+ or 3.4+. If you have an older Python version, see this guide for installation instructions. Depending on your permissions, you may need to have sudo or superuser access, but generally this isn't the case. You can also explicitly use pip2 or pip3 to execute pip for specific Python versions.

The remainder of the codelab assumes you're using Python 3—specific instructions will be provided for Python 2 if they differ significantly from 3.x.

[optional] Create and use virtual environments

This section is optional and only really required for those who must use a virtual environment for this codelab (per the warning sidebar above). If you only have Python 3 on your computer, you can simply issue this command to create a virtualenv called my_env (you can choose another name if desired):

virtualenv my_env

However, if you have both Python 2 & 3 on your computer, we recommend you install a Python 3 virtualenv which you can do with the -p flag like this:

virtualenv -p python3 my_env

Enter your newly created virtualenv by "activating" it like this:

source my_env/bin/activate

Confirm you're in the environment by observing your shell prompt is now preceded with your environment name, i.e.,

(my_env) $ 

Now you should be able to pip install any required packages, execute code within this eivonment, etc. Another benefit is that if you completely mess it up, get into a situation where your Python installation is corrupted, etc., you can blow away this entire environment without affecting the rest of your system.

4. Enable Vision API

Regardless of which Google API you want to use in your application, they must be enabled. APIs can be enabled from the command-line or from the Cloud console. The process of enabling APIs is identical, so once you enable one API, you can enable others in a similar way.

Option 1: gcloud command-line interface (Cloud Shell or local environment)

While enabling APIs from the Cloud Console is more common, some developers prefer doing everything from the command line. To do so, you need to look up an API's "service name." It looks like a URL: SERVICE_NAME.googleapis.com. You can find these in the Supported products chart, or you can programmatically query for them with the Google Discovery API.

Armed with this information, using Cloud Shell (or your local development environment with the gcloud command-line tool installed), you can enable an API or service, as follows:

gcloud services enable SERVICE_NAME.googleapis.com

Example 1: Enable the Cloud Vision API

gcloud services enable vision.googleapis.com

Example 2: Enable the Google App Engine serverless compute platform

gcloud services enable appengine.googleapis.com

Example 3: Enable multiple APIs with one request. For example, if this codelab has viewers deploying an app using the Cloud Translation API to App Engine, Cloud Functions, and Cloud Run, the command line would be:

gcloud services enable appengine.googleapis.com cloudfunctions.googleapis.com artifactregistry.googleapis.com run.googleapis.com translate.googleapis.com

This command enables App Engine, Cloud Functions, Cloud Run, and the Cloud Translation API. Furthermore, it enables the Cloud Artifact Registry because that's where container images must be registered by the Cloud Build system in order to deploy to Cloud Run.

There are also a few commands to either query for APIs to enable or which APIs have already been enabled for your project.

Example 4: Query for Google APIs available to enable for your project

gcloud services list --available --filter="name:googleapis.com"

Example 5: Query for Google APIs enabled for your project

gcloud services list

For more information on the above commands, see the enabling and disabling services and listing services documentation.

Option 2: Cloud Console

You can also enable the Google APIs in the API Manager. From the Cloud Console, go to API Manager. On this dashboard page, you'll see some traffic information for your app, graphs showing application requests, errors generated by your app, and your app's response times:

6945c8680452932.png

Below these charts are a list of Google APIs enabled for your project:

41520b4a7af4d00d.png

To enable (or disable) APIs, click Enable APIs and Services at the top:

d2404744eb11d367.png

Alternatively, go to the left-navigation bar and select APIs & ServicesLibrary:

6eda5ba145b30b97.png

Either way, you'll arrive at the API Library page:

5d4f1c8e7cf8df28.png

Enter an API name to search for and see matching results:

35bc4b9cf72ce9a4.png

Select the API you're seeking to enable and click the Enable button:

9574a69ef8d9e8d2.png

The process of enabling all APIs is similar, regardless of which Google API you wish to use.

Cost

Many Google APIs can be used without fees, however, there are costs when using most Google Cloud products and APIs. When enabling Cloud APIs, you may be asked for an active billing account. However, some Google Cloud products feature an "Always Free" tier, which you have to exceed in order to incur billing charges.

New Google Cloud users qualify for the Free Trial, currently $300USD good for the first 90 days. Codelabs generally don't incur much or any billing, so we suggest you hold off on the Free Trial until you're really ready to give it a test drive, especially since it's a one-time offer. The Free Tier quotas don't expire and apply regardless of whether you use the Free Trial or not.

Users should reference the pricing information for any API before enabling (example: Cloud Vision API pricing page), especially noting whether it has a free tier, and if so, what it is. So long as you stay within specified daily or monthly limits in aggregate, you should not incur any charges. Pricing and free tiers vary between Google product group APIs. Examples:

Different Google products are billed differently, so be sure to reference the appropriate documentation for that information.

Summary

In this codelab, you only need to turn on the Cloud Vision API, so proceed forward with this tutorial once you've successfully followed the instructions above and enabled the API.

5. Authorize API requests

Google Cloud recommends use of service account authorization when calling APIs, however for ease of prototyping and local development & testing, developers can use user authorization.

Using user authorization (local development and testing)

If using Cloud Shell, you're good to go. Local development requires installing the Cloud SDK, including the gcloud command-line tool. With the SDK installed, run the following gcloud command:

gcloud auth application-default login

This opens up a browser window for you to provide your user credentials, and when completed, will give you access to calling the Vision API in this tutorial without explicitly downloading or managing any authorization files like with service account authorization (see below). If you were able to successfully accomplish this, you can skip the next section.

Using service acct authorization (local development, staging, and production)

API authorization

In order to make requests to the APIs, your application needs to have the proper authorization. Authentication, a similar word, describes login credentials—you authenticate yourself when logging into your Google account with a login & password. Once authenticated, the next step is whether you are—or rather, your code, is—authorized to access data, such as blob files on Cloud Storage or a user's personal files on Google Drive.

Google APIs support several types of authorization, but the one most common for GCP API users is service account authorization since applications like the one in this codelab run in the cloud as a "robot user." While the Vision API supports API key authorization as well, it's strongly recommended that users employ a more secure form of authorization.

A service account is an account that belongs to your project or application (rather than a user) that is used by the client library to make Vision API requests. Like a user account, a service account is represented by an email address. You can create service account credentials from either the command line (via gcloud) or in the Cloud Console. Let's take a look at both below.

Option 1: Using gcloud (in Cloud Shell or your dev environment)

In this section, you will use the gcloud tool to create a service account then create the credentials needed to access the API. First you will set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export PROJECT_ID=$(gcloud config get-value core/project)

For example, if this tutorial requires you to create a new service account to access the Cloud Vision API, you would do so with a command like this:

gcloud iam service-accounts create my-vision-sa \
  --display-name "my vision service account"

Next, you will create the private key credentials that your Python code will use to log in as your new service account. Create these credentials and save it as JSON file ~/key.json by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-vision-sa@${PROJECT_ID}.iam.gserviceaccount.com

Option 2: From the Cloud Console

To get OAuth2 credentials for user authorization, go back to the API manager (shortcut link: console.developers.google.com) and select the Credentials tab on the left-nav:

635af008256d323.png

From the Credentials page, click on the "+ Create Credentials" button at the top, which then gives you a pull-down dialog where you'd choose "Service account:"

c27297821f235cc6.png

On the "Create service account" screen (similar to the below), you must enter a Service account name. Choose something short but explanatory like "svc acct vision" or the one we used with gcloud above, "my vision sa". A Service account ID is also required, and the form will create a valid ID string similar to the name you chose. The Service account description field is optional, but you can specify something related to what you're trying to do, for example: "Service account for Vision API demo". Click the "Create" button when complete.

6bf1c33aa51ef608.png

The next step is to grant service account access to this project. Having a service account is great, but if it doesn't have permissions to access project resources, it's kind-of useless... it's like creating a new user who doesn't have any access.

8178a876453f1984.png

Here, click on the "Select a role" pull-down menu. You'll see a variety of options (see below), some more granular than others. For some codelabs, choosing Project → Viewer will suffice, but not all. Check what this tutorial requires, select the right option, then click Continue.

63f43ce6fa6aff0e.png

On this 3rd screen (see below), we will skip granting specific users access to this service account, but we do need to make a private key our application script can use to access the Vision API with. To that end, click the "+ Create Key" button.

6d597d87de8ea419.png

Creating a key is straightforward on the next screen. Take the default of a JSON key structure. Note that P12 is only used for backwards-compatibility, so it is not recommended for new projects. Click the Create button and save the private key file when prompted. The default filename will be long and possibly confusing, i.e., PROJECT_ID-HASH.json, so we recommend renaming it to something more digestible such as key.json or svc_acct.json.

e918f78b96c54258.png

Once the file is saved, you'll get the following confirmation message:

bec9af599571518d.png

Click the Close button to complete this task from the console.

Summary

One last step whether you created your service account from the command-line or in the Cloud console: direct your cloud project to use this as the default service account private key to use for your application by assigning this file to the GOOGLE_APPLICATION_CREDENTIALS environment variable:

export GOOGLE_APPLICATION_CREDENTIALS=~/key.json

The environment variable should be set to the full path of the credentials JSON file you saved. It's not necessary to do so, but if you don't, you can only use that key file from the current working directory.

You can read more about authenticating the Google Cloud Vision API, including the other forms of authorization, i.e., API key, user authorization OAuth2 client ID, etc.

6. Install the Cloud Vision client library for Python

We're going to use the Vision API client library for Python which should already be installed in your Cloud Shell environment. Verify it's installed with the following pip or pip3 command listing the current version installed:

$ pip3 freeze | grep google-cloud-vision
google-cloud-vision==3.1.4

If you're developing locally and/or using a virtual environment, install/update the client library (including pip itself if necessary) with this command:

$ pip3 install -U pip google-cloud-vision
. . .
Successfully installed google-cloud-vision-3.1.4

Regardless of environment, confirm the client library can be imported without issue by running the command below. If it returns without error, you're ready to use the Vision API from real code!

$ python3 -c "import google.cloud.vision"
$

7. Perform Label Detection

One of the Vision API's basic features is to identify objects or entities in an image, known as label annotation. Label detection identifies general objects, locations, activities, animal species, products, and more. The Vision API takes an input image and returns the most likely labels which apply to that image. It returns the top-matching labels along with a confidence score of a match to the image.

In this example, you will perform label detection on an image of a street scene in Shanghai. To do this, copy the following Python code into your IPython session (or drop it into a local file such as label_detect.py and run it normally):

# Py2+3 from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-samples-data/vision/using_curl/shanghai.jpeg'

client = vision.ImageAnnotatorClient()
image = vision.Image() # Py2+3 if hasattr(vision, 'Image') else vision.types.Image()
image.source.image_uri = image_uri

response = client.label_detection(image=image)

print('Labels (and confidence score):')
print('=' * 30)
for label in response.label_annotations:
    print(label.description, '(%.2f%%)' % (label.score*100.))

You should see the following output:

Labels (and confidence score):
==============================
Wheel (97.83%)
Tire (97.37%)
Photograph (94.24%)
Bicycle (93.96%)
Infrastructure (89.71%)
Motor vehicle (89.58%)
Vehicle (86.27%)
Mode of transport (84.77%)
Bicycle wheel (83.11%)
Public space (80.82%)

Summary

In this step, you were able to perform label detection on an image of a street scene in China and display the most likely labels associated with that image. Read more about Label Detection.

8. Perform Text Detection

Text detection performs Optical Character Recognition (OCR). It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.

In this example, you will perform text detection on an image of a system software update screen. Copy the following snippet into your IPython session (or save locally as text_detect.py):

# Py2+3 from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-samples-data/vision/text/screen.jpg'

client = vision.ImageAnnotatorClient()
image = vision.Image() # Py2+3 if hasattr(vision, 'Image') else vision.types.Image()
image.source.image_uri = image_uri

response = client.text_detection(image=image)

for text in response.text_annotations:
    print('=' * 30)
    print(text.description)
    vertices = ['(%s,%s)' % (v.x, v.y) for v in text.bounding_poly.vertices]
    print('bounds:', ",".join(vertices))

You should see the following output:

==============================
System Software Update
Back
Preparing to install...
After preparation is complete, the PS4 will automatically restart and the update file will be
installed.
37%
gus class
bounds: (-4,186),(1354,186),(1354,980),(-4,980)
==============================
System
bounds: (-2,186),(158,191),(156,241),(-4,236)
==============================
Software
bounds: (175,192),(380,198),(378,247),(173,241)
==============================
Update
bounds: (394,199),(573,205),(571,254),(392,248)
==============================
Back
bounds: (152,856),(195,860),(193,873),(151,869)
==============================
Preparing
bounds: (199,501),(311,507),(310,530),(198,525)
...

Summary

In this step, you were able to perform text detection on an image of an Otter Crossing and display the recognized text from the image. Read more about Text Detection.

9. Perform Landmark Detection

Landmark detection detects popular natural and man-made structures within an image.

In this example, you will perform landmark detection on an image of the Eiffel Tower.

To perform landmark detection, copy the following Python code into your IPython session (or save locally as landmark_detect.py).

# Py2+3 from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-samples-data/vision/eiffel_tower.jpg'

client = vision.ImageAnnotatorClient()
image = vision.Image() # Py2+3 if hasattr(vision, 'Image') else vision.types.Image()
image.source.image_uri = image_uri

response = client.landmark_detection(image=image)

for landmark in response.landmark_annotations:
    print('=' * 30)
    print(landmark)

You should see the following output:

==============================
mid: "/g/120xtw6z"
description: "Trocad\303\251ro Gardens"
score: 0.925706148147583
bounding_poly {
  vertices {
    x: 339
    y: 54
  }
  vertices {
    x: 531
    y: 54
  }
  vertices {
    x: 531
    y: 371
  }
  vertices {
    x: 339
    y: 371
  }
}
locations {
  lat_lng {
    latitude: 48.861596299999995
    longitude: 2.2892823
  }
}

==============================
mid: "/m/02j81"
description: "Eiffel Tower"
score: 0.6325246095657349
bounding_poly {
  vertices {
    x: 435
    y: 180
  }
...

Summary

In this step, you were able to perform landmark detection on an image of the Eiffel Tower. Read more about Landmark Detection.

10. Perform Emotional Face Detection

Facial features detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.

In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.

To perform emotional face detection, copy the following Python code into your IPython session (or save locally as face_detect.py):

# Py2+3 from __future__ import print_function
from google.cloud import vision

image_uri = 'gs://cloud-samples-data/vision/face/face_no_surprise.jpg'
likelihood = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE',
                       'LIKELY', 'VERY_LIKELY')

client = vision.ImageAnnotatorClient()
image = vision.Image() # Py2+3 if hasattr(vision, 'Image') else vision.types.Image()
image.source.image_uri = image_uri

response = client.face_detection(image=image)

print('=' * 30)
for face in response.face_annotations:
    vertices = ['(%s,%s)' % (v.x, v.y) for v in face.bounding_poly.vertices]
    print('Face surprised:', likelihood[face.surprise_likelihood])
    print('Face bounds:', ','.join(vertices))

You should see the following output for our face_surprise example:

==============================
Face surprised: LIKELY
Face bounds: (93,425),(520,425),(520,922),(93,922)

Summary

In this step, you were able to perform emotional face detection. Read more about Facial Features Detection.

11. Conclusion

Congratulations... you learned how to use the Vision API with Python to perform several image detection features! Also check out the code samples in this codelab's open source repo—while the code in this tutorial works for both 2.x (2.6+) and 3.x, the code in the repo requires 3.6+.

Clean up

You're allowed to perform a fixed amount of (label, text/OCR, landmark, etc.) detection calls per month for free. Since you only incur charges each time you call the Vision API, there's no need to shut anything down nor must you disable/delete your project. More information on billing for the Vision API can be found on its pricing page.

12. Additional Resources

In addition to the source code for the four examples you completed in this codelab, below are additional reading material as well as recommended exercises to augment your knowledge and use of the Vision API with Python.

Learn More

Additional Study

Now that you have some experience with the Vision API under your belt, below are some recommended exercises to further develop your skills:

  1. You've built separate scripts demoing individual features of the Vision API. Combine at least 2 of them into another script. For example, add OCR/text recognition to the first script that performs label detection (label_detect.py). You may be surprised to find there is text on one of the hats in that image!
  2. Instead of our random images available on Google Cloud Storage, write a script that uses one or more of your images on your local filesystem. Another similar exercise is to find images online (accessible via http://).
  3. Same as #2, but with local images on your filesystem. Note that #2 may be an easier first step before doing this one with local files.
  4. Try non-photographs to see how the API works with those.
  5. Migrate some of the script functionality into a microservice hosted on Google Cloud Functions, or in a web app or mobile backend running on Google App Engine.

If you're ready to tackle that last suggestion but can't think of any ideas, here are a pair to get your gears going:

  1. Analyze multiple images in a Cloud Storage bucket, a Google Drive folder (use the Drive API), or a directory on your local computer. Call the Vision API on each image, writing out data about each into a Google Sheet (use the Sheets API) or Excel spreadsheet. (NOTE: you may have to do some extra auth work as G Suite assets like Drive folders and Sheets spreadsheets generally belong to users, not service accounts.)
  2. Some people Tweet images (phone screenshots) of other tweets where the text of the original can't be cut-n-pasted or otherwise analyzed. Use the Twitter API to retrieve the referring tweet, extract and pass the tweeted image to the Vision API to OCR the text out of those images, then call the Cloud Natural Language API to perform sentiment analysis (to determine whether it's positive or negative) and entity extraction (search for entities/proper nouns) on them. (This is optional for the text in the referring tweet.)

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.