In this lab you launch Cloud Datalab on a Google Compute Engine VM.

What you need

To complete this lab, you need:

What you learn

In this lab, you:

Choose one of the following three options to launch Cloud Datalab:

  1. Run Datalab from gcloud SDK in a Google Compute Engine virtual machine [recommended]. Choose this option if you have the ability to install software on your laptop.
  2. Run Datalab from Cloud Shell in a Google Compute Engine virtual machine. Choose this option if you do not have the ability to install software on your laptop. Note that Cloud Shell is an ephemeral VM, and so you will have to periodically reconnect to Datalab. This is not recommended for long-term use of Datalab, but is acceptable for classroom situations.
  3. Run Datalab in a Docker container on your local machine. Choose this option if your local machine is powerful, capable of running Docker and your network bandwidth supports downloading a Docker image. You must also have the ability to install software on your laptop (see current Docker requirements here).

In this lab, you will launch Cloud Datalab by running the Docker container in a Compute Engine VM and connecting to it through a ssh tunnel:

Launch Datalab VM

To launch Datalab VM:

Step 1

If necessary, install the gcloud SDK from https://cloud.google.com/sdk/

Step 2

In a terminal window, install the datalab component (change the zone appropriately, for example: us-central1-a):

gcloud components install datalab
gcloud config set compute/zone <ZONE>

Step 3

Create a Compute Engine instance that will run Datalab (change the username appropriately). For your convenience, this also exists as the script create_vm.sh at http://github.com/GoogleCloudPlatform/training-data-analyst/datalab/gcloud. If you have the repo cloned, you can launch the VM using the script instead.

datalab create datalabvm-<USER>

(OR)

cd training-data-analyst/datalab/gcloud
./create_vm.sh

This will take several minutes. Wait for the message "You can now connect to Datalab at http://localhost:8081/"

Step 4

Your web browser should open a new tab at http://localhost:8081/ ; if not, navigate to it directly.

If the datalab command terminates (for example, if the laptop goes to sleep), reconnect to the VM using:

datalab connect datalabm-<USER>

(OR)

cd training-data-analyst/datalab/gcloud
./start_tunnel.sh

When you are done using the Datalab VM, delete the instance using:

datalab delete datalabvm-<USER>

(OR)

cd training-data-analyst/datalab/gcloud
./delete_vm.sh

In this lab, you will launch Cloud Datalab by running the Docker container in a Compute Engine VM and connecting to it through Cloud Shell:

Launch Datalab VM

To launch Datalab VM:

Step 1

Open up CloudShell and, if necessary, clone the following git repository:

git clone http://github.com/GoogleCloudPlatform/training-data-analyst

This downloads the necessary scripts from github.

Step 2

Navigate to the folder containing the launch script and run it:

cd training-data-analyst/datalab/cloudshell
./create_vm.sh

Note: Modify the instance_details.sh script in training-data-analyst/datalab/ if necessary to change the Compute Engine type or zone. For example, change the machine type to n1-highcpu-16 to use a more powerful, high-memory instance.

Step 3

On the GCP console, view the Compute Engine instances and notice that you now have an instance called datalabvm-username. Be frugal by stopping this VM when you are not using datalab. You can restart a stopped VM in a minute or so.

Step 4

In CloudShell, start the tunnel (this process will not exit):

cd training-data-analyst/datalab/cloudshell
./start_tunnel.sh

If the CloudShell VM is recycled or if the connection is lost because your laptop went to sleep, just restart the tunnel.

Connect to Datalab

Step 1

In CloudShell, click on the Web Preview button (the up arrow icon at the top-left), select port 8081 and view it. This will open up a webpage that contains the Datalab application.

Step 2

In Datalab, click on the right-most icon of the ribbon to make sure that you are signed into GCP:

In this lab, you will launch Cloud Datalab by running the Docker container on your laptop:

Install Docker

To install Docker:

Step 1

Install Docker starting from https://www.docker.com/products/docker

Step 2

Verify your Docker install by opening up a command prompt (terminal) and typing in:

docker run -d -p 80:80 --name webserver nginx

Install git

Step 1

Install git starting from https://git-scm.com/book/en/v2/Getting-Started-Installing-Git

Step 2

In a terminal window, git clone the training-data-analyst repository:

git clone https://github.com/GoogleCloudPlatform/training-data-analyst

This downloads the code from github.

Start Datalab Docker container

To start Datalab Docker container, open up a Terminal window and type in:

cd training-data-analyst/datalab/local
./start_datalab.sh

This may take a few minutes if the Docker image needs to be downloaded.

Connect to Datalab

To connect to Datalab and sign into GCP:

Step 1

In your browser's address window, type http://localhost:8081/.

Step 2

In Datalab, click on the right-most icon of the ribbon to make sure that you are signed into GCP:

Verify that Datalab works by:

Step 1

Navigate to datalab/docs/tutorials/BigQuery and click on the notebook for "Hello BigQuery".

Step 2

Select Run | Run all Cells. Ensure that you get a valid result from the query.

Note: If you get an error saying that %projects needs to be set, create a code cell, type in the following and hit Run. You can find your project-id from the Home page of the GCP Console.

%projects set <project-id>

┬ęGoogle, Inc. or its affiliates. All rights reserved. Do not distribute.