In this lab, you learn how to work with Cloud Bigtable in a performant way.

What you learn

In this lab, you learn how to:

You will need a GCE instance with Java 8 and Maven installed. If you do not have the datasme instance appropriately setup, please follow the steps in the pubsub-exercises.

Step 1

From the GCP Console, create a Cloud Bigtable instance with the following specifications:


Use gcloud to create a Bigtable instance:

gcloud beta bigtable instances create datasme-cbt \
    --instance-type=DEVELOPMENT \
    --cluster=datasme-cbt-c1 \
    --cluster-zone us-central1-b \

Step 2

Make sure the client works:

cd training-data-analyst/courses/data_analysis/deepdive/bigtable-exercises
bash ./

Step 2

Look at What class was being executed in the previous step?



Step 3

Why does the program print out "It worked!"


Hint: What column and what value is being written into the table?


(Answer: We write to the column cf:col the string "It worked!". This value is read back and printed.)

We are going to import a 1m row subset of the actions data set from the retail example.

Step 0

Download the retail data subset file from GCS and put it where the Java code can read it.


Step 1 [If familiar with Java]

Implement the TODOs in (scroll down to the method implementations).


For 1a, reorder the last 2 parameters to String.join to efficiently distribute.

For 1b, See

For 1c:

writer.execute(() -> {
}, point);



Use the BufferedMutator in WriteWithBufferedMutator


Step 2

Check the speed of the implementation by running either Ex1 or Ex1Solution.

Pass "false" to run the SinglePut code, or "true" to run the BufferedMutator code.

bash ./ true|false
bash ./ true|false

Step 3

Fill out this table based on Step 2:


Parameter (if any)

Writing rate



_____________ rows/sec


_____________ rows/sec

BufferedMutator and BufferedMutatorParams


_____________ rows/sec


_____________ rows/sec

Step 1 [If familiar with Java]

Complete the single TODO in

Hint: Which filter will let us scan only rows that starts with "action"?

Step 2

Run the job to read data out of Bigtable and write aggregated data back in.




We are going to check our by-minute aggregations and look for big drops in retail activity.

Step 1:

Install pip and virtualenv if you do not already have them. You may want to refer to the Python Development Environment Setup Guide for Google Cloud Platform for instructions.

Step 2:

Create a virtualenv:

cd python
virtualenv env
source env/bin/activate

Step 3: Install Requirements

pip install -r requirements.txt

Step 4: Complete the TODO if familiar with python

Python library docs are here:

Hint: Iterate over all the cells in the rollups family for column name "". Add them all to a list and check that list for any value that drops more than 50% from the previous one. Print something out with the two values and two timestamps (both accessible from the cell).

Step 5:

Run it!

python <your project> datasme-cbt TrainingTable


python <your project> datasme-cbt TrainingTable

Delete the following resources: