The Google Cloud Vision API allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

In this codelab you will focus on using the Vision API with Ruby. You will learn how to perform text detection, landmark detection, and face detection!

What you'll learn

What you'll need


How will you use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with Ruby?

Novice Intermediate Proficient

How would you rate your experience with using Google Cloud Platform services?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console ( and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

Activate Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click "Start Cloud Shell":

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done with simply a browser or your Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID.

Run the following command in the cloud shell to confirm that you are authenticated:

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

Before you can begin using the Vision API you must enable the API. Using the Cloud Shell you can enable the API by using the following command:

gcloud services enable

In order to make requests to the Vision API, you need to use a Service Account. A Service Account is an account, belonging to your project, that is used by the Google Client Ruby library to make Vision API requests. Like any other user account, a service account is represented by an email address. In this section, you will use the gcloud tool to create a service account and then create credentials you will need to authenticate as the service account.

First you will set an environment variable with your PROJECT_ID which you will use throughout this codelab:


Next, you will create a new service account to access the Vision API by using:

gcloud iam service-accounts create my-vision-sa \
  --display-name "my vision service account"

Next, you will create credentials that your Ruby code will use to log in as your new service account. Create these credentials and save it as a JSON file "~/key.json" by using the following command:

gcloud iam service-accounts keys create ~/key.json \
  --iam-account my-vision-sa@${GOOGLE_CLOUD_PROJECT}

Finally, set the GOOGLE_APPLICATION_CREDENTIALS environment variable, which is used by the Vision API Ruby gem, covered in the next step, to find your credentials. The environment variable should be set to the full path of the credentials JSON file you created, by using:


You can read more about authenticating the Google Cloud Vision API.

You can use the command line to install the Google Cloud Vision API Ruby gem.

gem install google-cloud-vision -v 0.27.0

You can read more about the set of Google Cloud service Ruby gems available for different APIs here.

Next you will clone the Ruby sample repository that contains example images you can use to follow along.

Clone the Ruby sample repository:

git clone
cd ruby-docs-samples
git checkout "a902f30dd449ce82469cc315610c8a3d4888ff5a"

Change directory into `ruby-docs-samples/vision`:

cd vision

Now that you have installed the required gem, start the Interactive Ruby tool by using irb.

irb --noecho

IRB will run the Ruby interpreter in a Read, Eval, Print, Loop session.

Text Detection performs Optical Character Recognition. It detects and extracts text within an image with support for a broad range of languages. It also features automatic language identification.

In this example, you will perform text detection on an image of an Otter Crossing.

Copy the following Ruby code into your IRB session:

require "google/cloud/vision"

vision =
image  = vision.image "images/otter_crossing.jpg"

puts image.text

You should see the following output:

Otters crossing
for next 6 miles


In this step, you were able to perform text detection on an image of an Otter Crossing and print recognized text from the image. Read more about Text Detection.

Landmark Detection detects popular natural and man-made structures within an image.

In this example, you will perform landmark detection on an image of the Eiffel Tower.

To perform landmark detection, copy the following Ruby code into your IRB session.

require "google/cloud/vision"

vision =
image  = vision.image "images/eiffel_tower.jpg"

image.landmarks.each do |landmark|
  puts landmark.description

  landmark.locations.each do |location|
    puts "#{location.latitude}, #{location.longitude}"

You should see the following output:

Eiffel Tower
48.858461, 2.294351


In this step, you were able to perform landmark detection on image of the Eiffel Tower. Read more about Landmark Detection.

Face Detection detects multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear.

In this example, you will detect the likelihood of emotional state from four different emotional likelihoods including: joy, anger, sorrow, and surprise.

To perform emotional face detection, copy the following Ruby code into your IRB session:

require "google/cloud/vision"

vision =
image  = vision.image "images/face_no_surprise.jpg"

image.faces.each do |face|
  puts "Joy:      #{}"
  puts "Anger:    #{face.likelihood.anger?}"
  puts "Sorrow:   #{face.likelihood.sorrow?}"
  puts "Surprise: #{face.likelihood.surprise?}"

You should see the following output for example image:

Joy: true
Anger: true
Sorrow: false
Surprise: true


In this step, you were able to perform emotional face detection. Read more about Face Detection.

You learned how to use the Vision API using Ruby to perform different detection on images!

Clean up

To avoid incurring charges to your Google Cloud Platform account for the resources used in this quickstart:

Learn More


This work is licensed under a Creative Commons Attribution 2.0 Generic License.