The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In this codelab you will focus on using the Vision API with Python. You will learn how to perform text detection, landmark detection, and face detection!

What you'll learn

What you'll need

Survey

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Python?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud Platform services?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Screenshot from 2016-02-10 12:45:26.png

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Developers Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document). Google Kubernetes Engine pricing is documented here.

New users of Google Cloud Platform are eligible for a $300 free trial.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud. This Debian-based virtual machine is loaded with all the development tools you'll need (gcloud, python and more), it offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. This means that all you will need for this codelab is a browser (yes, it works on a Chromebook).

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Before you can begin using the Vision API you must enable the API. Using the Cloud Shell you can enable the API by using the following command:

gcloud services enable vision.googleapis.com

In order to make requests to the Vision API, you need to use a Service Account. A Service Account is an account, belonging to your project, that is used by the Google Client Python library to make Vision API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the gcloud tool to create a service account and then create credentials you will need to authenticate as the service account.

First you will set an environment variable with your PROJECT_ID which you will use throughout this codelab:

export GOOGLE_CLOUD_PROJECT="<PROJECT_ID>"

Next, you will create a new service account to access the Vision API by using:

gcloud iam service-accounts create my-vision-sa \
  --display-name "my vision service account"

Next, you will create credentials that your Python code will use to log in as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-vision-sa@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Vision API Python client, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:

export GOOGLE_APPLICATION_CREDENTIALS="/home/${USER}/key.json"

You can read more about authenticating the Google Cloud Vision API.

We're going to use the Google Python client library, which should already be installed in your cloud shell environment. You can read more about Google Cloud Python services here.

In this codelab, we'll use an interactive Python interpreter called iPython. Start a session by running iPython in the cloud shell. This command runs the Python interpreter in an interactive Read, Eval, Print, Loop (REPL) session.

user@project:~$ ipython
Python 2.7.13 (default, Nov 24 2017, 17:33:09)
Type "copyright", "credits" or "license" for more information.

IPython 5.6.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
Help      -> Python's own help system.
Object?   -> Details about 'object', use 'object??' for extra details.

In [1]: 

Text Detection performs Optical Character Recognition. It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.

In this example, you will perform text detection on an image of an Otter Crossing.

Copy the following Python code into your iPython session:

from google.cloud import vision
from google.cloud.vision import types

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = 'gs://cloud-vision-codelab/otter_crossing.jpg'
resp = client.text_detection(image=image)
print('\n'.join([d.description for d in resp.text_annotations]))

You should see the following output:

CAUTION
Otters crossing
for next 6 miles

CAUTION
Otters
crossing
for
next
6
miles

Summary

In this step, you were able to perform text detection on an image of an Otter Crossing and print recognized text from the image. Read more about Text Detection.

Landmark Detection detects popular natural and man-made structures within an image.

In this example, you will perform landmark detection on an image of the Eiffel Tower.

To perform landmark detection, copy the following Python code into your iPython session.

from google.cloud import vision
from google.cloud.vision import types

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
image.source.image_uri = 'gs://cloud-vision-codelab/eiffel_tower.jpg'
resp = client.landmark_detection(image=image)
print(resp.landmark_annotations)

You should see the following output:

[mid: "/m/02j81"
description: "Eiffel Tower"
score: 0.420224517584
vertices {  
  vertices {    
    x: 460    
    y: 49  
  }  
  vertices {    
    x: 530    
    y: 49  
  }  
  vertices {    
    x: 530    
    y: 278  
  }  
  vertices {    
    x: 460    
    y: 278  
  }
}
locations {  
  lat_lng {    
    latitude: 48.858461    
    longitude: 2.294351  
  }
}]

Summary

In this step, you were able to perform landmark detection on image of the Eiffel Tower. Read more about Landmark Detection.

Face Detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.

In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.

To perform emotional face detection, copy the following Python code into your iPython session:

from google.cloud import vision
from google.cloud.vision import types

client = vision.ImageAnnotatorClient()
image = vision.types.Image()
likelihood_name = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY', 'POSSIBLE',
  'LIKELY', 'VERY_LIKELY')
for pic in ('face_surprise.jpg', 'face_no_surprise.png'):
  image.source.image_uri = 'gs://cloud-vision-codelab/'+pic
  resp = client.face_detection(image=image)
  faces = resp.face_annotations
  for face in faces:
    print(pic + ': surprise: {}'.format(likelihood_name[face.surprise_likelihood]))

You should see the following output for our face_surprise and face_no_surprise examples:

face_surprise.jpg: surprise: LIKELY
face_no_surprise.png: surprise: VERY_UNLIKELY

Summary

In this step, you were able to perform emotional face detection. Read more about Face Detection.

You learned how to use the Vision API using Python to perform several image detection features!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this codelab:

Learn More

License

This work is licensed under a Creative Commons Attribution 2.0 Generic License.