This lab will show you how to deploy a set of Cloud Functions in order to process images and videos with the Cloud Vision API and Cloud Video Intelligence API.

The Cloud Video Intelligence and Cloud Vision APIs offer you a scalable and serverless way to implement intelligent image and video filtering, accelerating submission processing. If you use the safe-search feature in the Vision API solution and the explicit content detection feature in the Video Intelligence API, you can eliminate images and videos that are identified as unsafe or undesirable content before further processing.

In this codelab, you will :

The following diagram outlines the high-level architecture:

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Google Cloud Shell

While Google Cloud, Cloud Functions and Machine Learning APIs can be operated remotely from your laptop, in this codelab we will be using Google Cloud Shell, a command line environment running in the Cloud.

This Debian-based virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and authentication. This means that all you will need for this codelab is a browser (yes, it works on a Chromebook).

To activate Google Cloud Shell, from the developer console simply click the button on the top right-hand side (it should only take a few moments to provision and connect to the environment):

Then accept the terms of service and click the "Start Cloud Shell" link:

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID :

gcloud auth list

Command output

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If for some reason the project is not set, simply issue the following command :

gcloud config set project <PROJECT_ID>

Looking for your PROJECT_ID? Check out what ID you used in the setup steps or look it up in the console dashboard:

IMPORTANT: Finally, set the default zone and project configuration:

gcloud config set compute/zone us-central1-f

You can choose a variety of different zones. Learn more in the Regions & Zones documentation.

Prepare for the lab by setting up some environment variables that you'll need in the lab.

Enter the following command in the Cloud Shell to create these variables to store key values that are used later in the lab.

export PROJECT_ID=$(gcloud info --format='value(config.project)')
export IV_BUCKET_NAME=${PROJECT_ID}-upload
export FILTERED_BUCKET_NAME=${PROJECT_ID}-filtered
export FLAGGED_BUCKET_NAME=${PROJECT_ID}-flagged
export STAGING_BUCKET_NAME=${PROJECT_ID}-staging

Cloud Storage buckets provide a storage location for uploading your images and videos. Now you will create four different Cloud Storage buckets.

Create a bucket for storing your uploaded images and video files using the IV_BUCKET_NAME environment variable:

gsutil mb gs://${IV_BUCKET_NAME}

Create a bucket for storing your filtered image and video files using the FILTERED_BUCKET_NAME environment variable:

gsutil mb gs://${FILTERED_BUCKET_NAME}

Create a bucket for storing your flagged image and video files using the FLAGGED_BUCKET_NAME environment variable:

gsutil mb gs://${FLAGGED_BUCKET_NAME}

Create a bucket for your Cloud Functions to use as a staging location using the STAGING_BUCKET_NAME environment variable:

gsutil mb gs://${STAGING_BUCKET_NAME}

Check that the four storage buckets have been created:

gsutil ls

You should see the names of the four storage buckets listed in the output. These will be in the format [PROJECT-ID]-upload, -filtered, -flagged, and -staging.

Cloud Pub/Sub topics is used for Cloud Storage notification messages and for messages between your Cloud Functions. This lab has some of the topic names preset to specific defaults which are used in this section for the topic names.

Create a topic to receive Cloud Storage notifications whenever one of your files is uploaded to Cloud Storage. You set the default value to upload_notification and save it in an environment variable since it will be used later :

export UPLOAD_NOTIFICATION_TOPIC=upload_notification
gcloud pubsub topics create ${UPLOAD_NOTIFICATION_TOPIC}

Create a topic to receive your messages from the Vision API. The default value in the config.json file is visionapiservice :

gcloud pubsub topics create visionapiservice

Create a topic to receive your messages from the Video Intelligence API. The default value in the config.json file is videointelligenceservice :

gcloud pubsub topics create videointelligenceservice

Create a topic to receive your messages to store in BigQuery. The default value in the config.json file is bqinsert :

gcloud pubsub topics create bqinsert

Check that the four pubsub topics have been created :

gcloud pubsub topics list

You should see the names of the four topics listed in the output: upload_notification, visionapiservice, videointelligenceservice and bqinsert.

Create a notification that is triggered only when one of your new objects is placed in the Cloud Storage file upload bucket :

gsutil notification create -t upload_notification -f json -e OBJECT_FINALIZE gs://${IV_BUCKET_NAME}

Confirm that your notification has been created for the bucket:

gsutil notification list gs://${IV_BUCKET_NAME}

You'll see this output, if the function succeeds:

Filters: Event Types: OBJECT_FINALIZE

The code for the cloud functions used in this lab is in JavaScript, available on GitHub, and defined in the index.js file. You can examine the source in detail here to see how each of the functions is implemented.

Download the code from GitHub using the following command:

git clone https://github.com/GoogleCloudPlatform/cloud-functions-intelligentcontent-nodejs

Change directory to the application directory :

cd cloud-functions-intelligentcontent-nodejs

The results of the Vision and Video Intelligence APIs are stored in BigQuery. The demo solution used in this codelab has default dataset and table names set to intelligentcontentfilter and filtered_content. You can change these values, but if you do you must also make those changes in the config.json file that is downloaded later as part of the solution.

Create your BigQuery dataset.The dataset name is set to intelligentcontentfilter to match the default value in the config.json file :

export DATASET_ID=intelligentcontentfilter
export TABLE_NAME=filtered_content

bq --project_id ${PROJECT_ID} mk ${DATASET_ID}

This will ask you to select your default project. Press Enter to leave the default unconfigured.

Now you'll create your BigQuery table from the schema file that is included with the lab. The dataset and table name is set to filtered_content to match the default values in the config.json file and the schema is defined in the file intelligent_content_bq_schema.json.

Run the following to create the BigQuery table:

bq --project_id ${PROJECT_ID} mk --schema intelligent_content_bq_schema.json -t ${DATASET_ID}.${TABLE_NAME}

Verify that your BigQuery table has been created:

bq --project_id ${PROJECT_ID} show ${DATASET_ID}.${TABLE_NAME}

Resulting output should contain the following:

  Last modified                     Schema
 ----------------- ---------------------------------------
  08 Nov 19:22:43   |- gcsUrl: string (required)
                    |- contentUrl: string (required)
                    |- contentType: string (required)
                    |- insertTimestamp: timestamp (required)
                    +- labels: record (repeated)
                    |  |- name: string
                    +- safeSearch: record (repeated)
                    |  |- flaggedType: string
                    |  |- likelihood: string

Before you can deploy the cloud functions defined in the source code, you must modify the config.json file to use your specific Cloud Storage buckets, Cloud Pub/Sub topic names, and BigQuery dataset ID and table name.

Enter these sed commands in the Google Cloud shell to make the changes for you :

sed -i "s/\[PROJECT-ID\]/$PROJECT_ID/g" config.json
sed -i "s/\[FLAGGED_BUCKET_NAME\]/$FLAGGED_BUCKET_NAME/g" config.json
sed -i "s/\[FILTERED_BUCKET_NAME\]/$FILTERED_BUCKET_NAME/g" config.json
sed -i "s/\[DATASET_ID\]/$DATASET_ID/g" config.json
sed -i "s/\[TABLE_NAME\]/$TABLE_NAME/g" config.json

The code for the cloud functions used in this lab are available on GitHub, defined in the index.js file. You can examine the source in detail here to see how each of the functions is implemented. The deployments can each take a few minutes to complete.

Deploy the GCStoPubsub function

Next you will deploy the GCStoPubsub Cloud Function, which contains the logic to receive a Cloud Storage notification message from Cloud Pub/Sub and forward the message to the appropriate function with another Cloud Pub/Sub message.

Run the following:

gcloud functions deploy GCStoPubsub --stage-bucket gs://${STAGING_BUCKET_NAME} --trigger-topic ${UPLOAD_NOTIFICATION_TOPIC} --entry-point GCStoPubsub

The command-line output is similar to the following for each of the four Cloud Functions:

Deploying function (may take a while - up to 2 minutes)...done.
availableMemoryMb: 256
entryPoint: GCStoPubsub
eventTrigger:
  eventType: providers/cloud.pubsub/eventTypes/topic.publish
  failurePolicy: {}
  resource: projects/my-project/topics/my-project-upload
  service: pubsub.googleapis.com
labels:
  deployment-tool: cli-gcloud
name: projects/my-project/locations/us-central1/functions/GCStoPubsub
serviceAccountEmail: my-project@appspot.gserviceaccount.com
sourceArchiveUrl: gs://my-project-staging/us-central1-projects/my-project/locations/us-central1/functions/GCStoPubsub-xeejkketibhf.zip
status: ACTIVE
timeout: 60s
updateTime: '2018-11-08T21:39:42Z'
versionId: '1'

Deploy the visionAPI function

Deploy your visionAPI Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub, call the Vision API, and forward the message to the insertIntoBigQuery Cloud Function with another Cloud Pub/Sub message. If you chose to use a different Vision API topic name then change that name here as well.

Run the following:

gcloud functions deploy visionAPI --stage-bucket gs://${STAGING_BUCKET_NAME} --trigger-topic visionapiservice --entry-point visionAPI

Deploy the videoIntelligenceAPI function

Deploy your videoIntelligenceAPI Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub, call the Video Intelligence API, and forward the message to the insertIntoBigQuery Cloud Function with another Cloud Pub/Sub message. If you chose to use a different Video Intelligence API topic name then change that name here as well.

Run the following:

gcloud functions deploy videoIntelligenceAPI --stage-bucket gs://${STAGING_BUCKET_NAME} --trigger-topic videointelligenceservice --entry-point videoIntelligenceAPI --timeout 540

Deploy the insertIntoBigQuery function

Deploy your insertIntoBigQuery Cloud Function, which contains the logic to receive a message with Cloud Pub/Sub and call the BigQuery API to insert the data into your BigQuery table. If you chose to use a different BigQuery topic name then change that name here as well.

Run the following:

gcloud functions deploy insertIntoBigQuery --stage-bucket gs://${STAGING_BUCKET_NAME} --trigger-topic bqinsert --entry-point insertIntoBigQuery

Confirm that the Cloud Functions have been deployed :

gcloud functions list

You should see the names of the four, cloud functions listed in the output: GCStoPubsub, visionAPI, videoIntelligenceAPI and insertintobigquery.

The following diagram outlines the processing flow:

You can test the process by uploading your files to Cloud Storage, checking your logs, and viewing your results in BigQuery.

Upload an image and a video file to the upload storage bucket

  1. Go back to the Google Cloud Platform Console tab in your browser
  2. Click Storage and then click Browser to open the Storage Browser
  3. Click the name of the bucket with the -upload suffix and then click Upload Files
  4. Upload some image files and/or video files from your local machine to this bucket (use your favorite search engine to download Creative Commons content)

Monitor Log Activity

Switch back to the Google Cloud Shell to verify that your Cloud Functions were triggered and ran successfully by viewing the Cloud Functions logs captured in Cloud Logging:

Run the following to test GCStoPubsub :

gcloud functions logs read --filter "finished with status" "GCStoPubsub" --limit 100

Run the following to test insertIntoBigQuery:

gcloud functions logs read --filter "finished with status" "insertIntoBigQuery" --limit 100

View Results in BigQuery

To see your results in BigQuery, you'll create SQL commands to query BigQuery.

Run the following, replacing [PROJECT_ID], [DATASET_ID], and [TABLE_NAME] with your project ID, dataset ID, and BigQuery table name if you found out that variables created for above doesn't contain correct value.

echo "
#standardSql

SELECT insertTimestamp,
  contentUrl,
  flattenedSafeSearch.flaggedType,
  flattenedSafeSearch.likelihood
FROM \`$PROJECT_ID.$DATASET_ID.$TABLE_NAME\`
CROSS JOIN UNNEST(safeSearch) AS flattenedSafeSearch
ORDER BY insertTimestamp DESC,
  contentUrl,
  flattenedSafeSearch.flaggedType
LIMIT 1000
" > sql.txt

View your BigQuery results with the following command. Replace [PROJECT_ID] with your project ID :

bq --project_id ${PROJECT_ID} query < sql.txt

Well you don't really need to clean up all of your resources since scaling the cloud functions to zero also means scaling cost to zero. So if your functions are getting no traffic, there will be no cost incurred. Also, the first 2 million Cloud Function invocations each month are free. Check out the pricing page for more details.

If you'd like to delete the functions, simply head over to the overview page, select the functions and click DELETE.

Your first 1000 minutes of video processing using the Video Intelligence API as well as your first 1000 Cloud Vision API requests are part of the always free tier.

Cloud Pub/Sub pricing is indexed on message ingestion and delivery, so again, no message, no cost. You can however delete the Cloud Pub/Sub topics, replacing the variables with your values:

gcloud pubsub topics delete [UPLOAD_NOTIFICATION_TOPIC]
gcloud pubsub topics delete [VISION_TOPIC_NAME]
gcloud pubsub topics delete [VIDEOIQ_TOPIC_NAME]
gcloud pubsub topics delete [BIGQUERY_TOPIC_NAME]

Delete the Cloud Storage buckets, replacing the variables with your values:

gsutil -m rm -r [IV_BUCKET_NAME]
gsutil -m rm -r [FLAGGED_BUCKET_NAME]
gsutil -m rm -r [FILTERED_BUCKET_NAME]
gsutil -m rm -r [STAGING_BUCKET_NAME]

Finally, you can delete the BigQuery table and dataset, replacing the variables with your values:

bq --project_id [PROJECT_ID] rm -r -f [DATASET_ID]

Cloud Functions has a lot more in store for you! Check out other codelabs and the product page, and its documentation.

You should also check out the following :

Machine learning APIs are documented here: