Analyze and visualize credit card transaction data in Bigtable using BigQuery and Looker

About this codelab

Last updated Mar 8, 2024

Written by a Googler

1. Overview

This codelab walks you through analyzing a stream of credit card transaction data being written to Bigtable. You'll learn how to use the Bigtable change streams to BigQuery template to export data in real-time. Then, you'll see techniques to query the change stream log and how to reformat the data to create a dashboard using Looker.

This codelab is intended for technical users who are familiar with Bigtable, using command line tools, and event streaming services.

This codelab shows you how to do the following:

Create a Bigtable table with a change stream enabled.
Create a BigQuery dataset.
Deploy the Dataflow template Bigtable change streams to BigQuery.
Query the event stream in BigQuery.
Visualize the event stream with Looker.

This diagram shows the architecture of the system you'll be deploying.

2. Project setup

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Make sure that billing is enabled for your Google Cloud project.
Visit this pre-populated API enablement page to enable the necessary APIs for Dataflow, Bigtable, BigQuery, Looker, and Cloud Storage.

3. Create a BigQuery dataset

You'll be analyzing data using BigQuery later in the codelab. Follow these instructions to create the dataset for your data pipeline's output.

In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, find your project name, and click the overflow menu (three vertical dots next to the project ID).
Click Create dataset.

In the Create dataset panel, do the following:

For Dataset ID, enter bigtable_bigquery_tutorial.
Leave the remaining default settings as they are.
Click Create dataset.

4. Create a Bigtable table with a change stream enabled

Bigtable is a low-latency, horizontally scalable NoSQL database service, and one common use case for it is serving financial data. Here you are going to create a table that can store credit card transactions. Bigtable can handle the high throughput writes of transactions around the world and could even use that data for real-time fraud detection.

In the Google Cloud console, go to the Bigtable Instances page.
Click the ID of the instance that you are using for this tutorial. If you don't have an instance available, create an instance with a name of your choosing in a region near you. You can use the default configurations for everything else.

In the left navigation pane, click Tables.
Click Create table.

Name the table retail-database.
Add a column family named transactions.
Select Enable change stream.
Leave the garbage collection policy and retention period with their default values.
Click Create.

5. Initialize a data pipeline to capture the change stream

Bigtable's performance is optimal for point reads and row range scans, but table-wide analytics can put a strain on serving capacity and CPU resources. BigQuery is great for table-wide analytics, so the solution you'll use here is dual writes. This means you will write your data into Bigtable and BigQuery which is a common technique. This won't require additional coding as you'll use the Bigtable change streams to BigQuery Dataflow template: when data is written to Bigtable, the pipeline will write a change record in BigQuery. Once the data is in BigQuery, you can perform optimized queries over your entire dataset that won't impact the performance of your serving data in Bigtable.

On the Bigtable Tables page, find your table retail-database.
In the Change stream column, click Connect.
In the Connect with Dataflow dialog, select BigQuery.
Click Create Dataflow job.
In the provided parameter fields, enter your parameter values. You don't need to provide any optional parameters.

Set the Cloud Bigtable application profile ID to default.
Set the BigQuery dataset to bigtable_bigquery_tutorial.

Click Run job.
Wait until the job status is Starting or Running before proceeding. It can take around five minutes once the job is queued. The page will automatically update. This template creates a streaming job, so it can continuously process new data written to Bigtable until the job is manually stopped.

6. Write some data to Bigtable

Now, you'll write a year of credit card transactions to your Bigtable table. This example dataset includes information like credit card number, merchant name and id, and amount. In an actual credit card processing application, this data would be streaming into your database in real time as each transaction occurs.

Open the Cloud Shell by clicking the button in the top right-hand corner of the cloud console.

If prompted, accept the prompt to authorize the Cloud Shell.
Download the dataset.

gsutil cp gs://cloud-bigtable-public-datasets/change-streams-tutorial/cc-transactions-2023.csv .

Set your environment variables in the command line

PROJECT_ID=your-project-id
BIGTABLE_INSTANCE_ID=your-bigtable-instance-id

Use the cbt CLI to write a number of credit card transactions to the retail-database table.

cbt -instance=$BIGTABLE_INSTANCE_ID -project=$PROJECT_ID import \
retail-database cc-transactions-2023.csv  column-family=transactions

The output will say

Done importing 10000 rows.

7. View the change logs in BigQuery

In the Google Cloud console, go to the BigQuery page.
In the Explorer pane, expand your project and the dataset bigtable_bigquery_tutorial.
Click the table retail-database_changelog. You may need to refresh the dataset if the table isn't present.
To see the change log, click Preview.

Try out some queries

Now, you can run a few queries over this dataset to get some insights. The SQL queries are provided here, but the Query a Bigtable change log in BigQuery documentation has more information about how to write queries for the changelog data.

Look up one transaction

Use the following query to look up the data for one specific transaction.

SELECT *
FROM `bigtable_bigquery_tutorial.retail-database_changelog`
WHERE row_key="3034-442694-3052#2023-03-03T14:50:46.824Z"

You'll see that each column written has been turned into an individual row in BigQuery.

Find the number of purchases in each category

Use the following query to count the number of purchases per category.

SELECT value as category, count(*) as `number of sales`
FROM `bigtable_bigquery_tutorial.retail-database_changelog`
WHERE column="category"
GROUP BY category

Reformat the data

To reconstruct each transaction into a single BigQuery table row, you will pivot the data and save that result to a new table. This is a more manageable format to query.

CREATE VIEW bigtable_bigquery_tutorial.retail_transactions_view AS (
    SELECT *, parse_numeric(amount) as sales_dollars FROM (
      SELECT row_key, timestamp, column, value
      FROM `bigtable_bigquery_tutorial.retail-database_changelog`
      )
      PIVOT (
      MAX(value)
      FOR column in ("merchant", "amount", "category", "transaction_date")
    )
)

Create a Looker dashboard

Click Go to view to go to the new view's detail page.
Click Export.
Select Explore with Looker Studio

8. Add charts to the dashboard

Now you can graph the information to make it easier to digest and share out as a report. You will add three charts to your dashboard:

Transaction amounts over time
Total transactions for each merchant
Percentage of transactions per category

Set up the page

Click each existing chart and press delete to remove them.
Select Properties on the right-hand side of the page, so you can modify the graph data.

Add the charts

Transaction amounts over time

Click Add a chart, and create a time series chart.
Set the Dimension to transaction_date.

Set the Metric to sales_dollars.

Total transactions for each merchant

Click Add a chart, and create a table.
Set Dimension to merchant.
Set Metric to sales_dollars

Percentage of transactions per category

Click Add a chart, and create a pie chart.
Set Dimension to category.
Set Metric to sales_dollars

See real-time changes

Spend some time and explore the values in the graphs. You can click specific merchants in the table or specific categories in the table which will cause all of the charts to filter on those specific values which gives you more insights. Next, you can write more data and see how this graph will update in real time.

Go back to your Cloud Shell.
Download and write the second dataset.

gsutil cp gs://cloud-bigtable-public-datasets/change-streams-tutorial/cc-transactions-2024-jan.csv .

cbt -instance=$BIGTABLE_INSTANCE_ID -project=$PROJECT_ID import \
retail-database cc-transactions-2024-jan.csv  column-family=transactions

Return to your Looker dashboard and refresh the data with the command Ctrl+Shift+E or click Refresh data in the view menu. Now you should see the data for Jan 2024 in the charts.

There are many variations of graphs and metrics you can create beyond this. Read more in the Looker documentation.

9. Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

Stop the change stream pipeline

In the Google Cloud console, go to the Dataflow Jobs page.
Select your streaming job from the job list.
In the navigation, click Stop.
In the Stop job dialog, select Cancel, and then click Stop job.

Delete the Bigtable resources

If you created a Bigtable instance for this tutorial, you can delete it, or you can clean up the table you created.

In the Google Cloud console, go to the Bigtable Instances page.
Click the ID of the instance that you are using for this tutorial.
In the left navigation pane, click Tables.
Find the retail-database table.
Click Edit.
Clear Enable change stream.
Click Save.
Open the overflow menu for the table.
Click Delete and input the table name to confirm.
Optional: Delete the instance if you created a new one for this tutorial

Delete the BigQuery dataset

In the Google Cloud console, go to the BigQuery page.
In the Explorer panel, find the dataset bigtable_bigquery_tutorial and click it.
Click Delete, type delete, and then click Delete to confirm.

What's next

Report a mistake