Cloud Datalab is an interactive data analysis, visualization and machine learning tool. It enables you to author and run Python code in the form of notebooks. Notebooks bring together code, results of code execution including visualizations and documentation in a single file. They also allow you to capture a history of executions so you can iteratively refine your data analysis by utilizing previous execution results..

Cloud Machine Learning (ML) Engine is a managed service that lets you run TensorFlow-based models in a distributed fashion for training and prediction. It also provides a way to run training locally (e.g. on the VM running Datalab) so you can validate your model against a small sample of data before submitting a long-running training job.

What you'll learn

What you'll need

How will you use use this tutorial?

Read it through only Read it and complete the exercises

How would you rate your experience with using Google Cloud Platform services?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one.

Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Developers Console in order to use Google Cloud resources like Cloud Datastore and Cloud Storage.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Launch Cloud Shell

We're going to make use of a new feature of Google Cloud Platform called Google Cloud Shell, an interactive shell that can be used to manage your Cloud Resources and to do development work directly from the Google Developers Console.

Google Cloud Shell provides you with command-line access to computing resources hosted on Google Cloud Platform and is available now in the Google Cloud Platform Console. Cloud Shell makes it easy for you to manage your Cloud Platform Console projects and resources without having to install the Google Cloud SDK and other tools on your system. With Cloud Shell, the Cloud SDK gcloud command and other utilities you need are always available when you need them. It also comes preinstalled with tools you'd often use. E.g. git, maven, java virtual machine (jvm), nodejs, python, npm.

To get started:

  1. Visit the Google Cloud Platform Console
  2. Click on the "Activate Google Cloud Shell" icon in top right hand corner of the header bar

A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt.

  1. Wait until the codelabuser-xxxx@devshell:~$ prompt appears

Datalab is set up on a GCE VM. For that we need to specify the project and the zone where the VM is created. Typically Datalab is set up from your client machine (desktop/laptop) with Cloud SDK installed. Here we are going to use Cloud Shell as the client to run the installation commands.


Copy the project id from the left pane using the icon next to the text in Qwiklabs and paste it in place of PROJECT_ID below.

$ gcloud config set core/project PROJECT_ID

You can use the specified zone us-central1-f. If you don't specify one, the subsequent command will provide a list and prompt you to pick one.

$ gcloud config set compute/zone us-central1-f

Now we can create a Datalab instance on a VM in the project and zone specified above. The code in the following sections will be running on that VM in Google Cloud. In the create command below, image-class is used as the VM and Datalab instance name. For this lab, we do not need a source repository to commit files to and the temporary account does not have permissions to create it so we will turn that off.

This command will take a few minutes to complete and may prompt you to enter a passphrase for an ssh key. Feel free to leave the passphrase blank.

The process may take up to 8 minutes to complete. You'll know it's finished when you see the output, "The connection to Datalab is now open and will remain until this command is killed. Click on the *Web Preview* (up-arrow button at top-left), select *port 8081*, and start using Datalab."

$ datalab create --no-create-repository --machine-type n1-highmem-2 image-class 

The previous command creates a connection to your instance. Use that connection to open your browser to the Cloud Datalab notebook listing page by selecting Cloud Shell Web preview→Change port→Port 8081.

You will need the following command only if you lose connection to Datalab for some reason.

$ datalab connect image-class

Summary

In this step, you launched Cloud Shell and called some simple gcloud commands to set up a Datalab instance.

Is this your first time using a Datalab? (If you're an experienced Datalab user or just finished trying out Datalab in another code lab, you can skip to the next section titled "Image Classification".)

Here are a few tips to help you get started:

Working with Datalab

When you start your Datalab, you are instructed to create a new notebook within your lab environment. You will then copy/paste code into your notebook, and run it. To write code with your new iPython notebook:

  1. Type or paste the code into your notebook (typically you will copy and paste from the lab materials).
  2. Run the cell. You can either click "Run" with your cursor in the cell, or simply press Shift+Enter.
  3. The output (if any) will appear below your code. You can change the code and rerun the cell until you are happy with the results.
  4. You can write commentary in markdown format.
  5. You can share and collaborate on notebooks (but keep in mind that notebooks started within a lab are not available once you end the lab).

Step 3

On the notebook listing page, navigate to the docs folder. Click on the notebook file named Hello World.ipynb. It will open up in a new tab.

In the notebook, click on the cell with Python code for printing hello world. Run the cell by pressing Shift+Enter or by clicking the Run button in the menu bar at the top. You will see the text printed as a result of execution and a new, empty code cell will be created just below the printed text.

Click the empty cell and type the following code to see how visualization works. This code takes static values of numbers and their squares and plots a line chart.

import matplotlib.pyplot as pl

pl.plot([1,2,3,4,5], [1,4,9,16,25])

pl.axis([0,6,0,30])

pl.show()

Press Shift+Enter to run the code cell. Observe the output chart.

Now double click the text "Untitled Notebook" at the top of the notebook editing area. You will see markdown corresponding to the formatted text. Replace the text "Untitled Notebook" with the text "Hello".

Run the markdown cell by pressing Shift+Enter. You will see the formatted text again with the updated title.

Step 1

In Cloud Datalab notebook listing page, click on the Home icon, and then navigate to datalab/docs/samples/ML Toolbox/Image Classification/Flower.
View the list of notebooks. Then open Local End to End.ipynb.

You will see a notebook with markdown (documentation) and code cells. Code cells are followed by execution results.

Step 2

In Datalab, click on Clear | All Cells. Now, read the documentation and code in the notebook and execute each cell in turn. Check the output. Some of the cells may take some time to execute if they are doing sizeable amount of data processing. You will see a progress bar while the execution is in progress. For preprocessing which can take up to six minutes, you may see a few warnings from Apache Beam about typehint and suboptimal implementation. Please ignore those and proceed. Batch prediction may take up to two minutes.

All the cells in this notebook used a local version of the Cloud ML Engine service. This allows one to iterate on preprocessing, model development using a sample of the data and then submit the jobs to the service with the full, unsampled set of data. Essentially same code can be executed with the service version instead of the local version by specifying a different parameter for training and prediction in the toolbox API.

Congratulations! You completed an image classification model by preprocessing flower data, training a neural net using Datalab toolbox APIs that in turn use TensorFlow. You also tested the model and evaluated the results by using online and batch prediction.

Go back to the notebook listing tab and open the notebooks with "Service" in their names. These notebooks allow you to perform the same end-to-end steps in stages using Cloud ML Engine service so you can scale to large amounts of data. Read the notebooks and compare contents with the previously executed notebook in a different browser tab. Execution of code cells in these notebooks is not recommended for a short lab as it is likely to take time and it will require more computing resources than is allowed for your project, especially if you are using a free trial account.