1. Introduction
In this codelab, you will learn how to use BigQuery Graph to build a Customer 360 view and a recommendation engine for Cymbal Pets, a fictional retail company. You will leverage the power of SQL to create, query, and analyze graph data directly within BigQuery, combining it with vector search for advanced product recommendations.
BigQuery Graph allows you to model relationships between your data entities (like customers, products, and orders) as a graph, making it easy to answer complex questions about customer behavior and product affinities.

What you'll do
- Create a BigQuery dataset and schema for the Cymbal Pets graph
- Load sample data (Customers, Products, Orders, Stores) from Cloud Storage
- Create a Property Graph in BigQuery connecting these entities
- Visualize customer purchase history using graph queries
- Build a product recommendation system using vector search
- Enhance recommendations using "Bought Together" graph relationships
What you'll need
- A web browser such as Chrome
- A Google Cloud project with billing enabled
This codelab is for developers of all levels, including beginners.
2. Before you begin
Create a Google Cloud Project
- In the Google Cloud Console, select or create a Google Cloud project.
- Make sure that billing is enabled for your Cloud project.
Start Cloud Shell
- Click Activate Cloud Shell at the top of the Google Cloud console.
- Verify authentication:
gcloud auth list
- Confirm your project:
gcloud config get project
- Set it if needed:
export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID
Enable APIs
Run this command to enable the required BigQuery API:
gcloud services enable bigquery.googleapis.com
3. Define the Schema
First, you need to create a dataset to store your graph-related tables and define the schema for your nodes and edges.
- For this codelab, we will execute SQL commands. You can run these commands in the BigQuery Studio > SQL Editor, or use the
bq querycommand in Cloud Shell.
We will assume you are using the BigQuery SQL Editor for a better experience with multi-line create statements. - Create the
cymbal_pets_demodataset:
CREATE SCHEMA IF NOT EXISTS cymbal_pets_demo;
- Create the tables for
order_items,products,orders,stores,customers, andco_related_products_for_angelica. These tables will serve as the source data for our graph.
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.order_items
(
order_id INT64,
product_id INT64,
order_item_id INT64,
quantity INT64,
price FLOAT64,
PRIMARY KEY (order_id, product_id, order_item_id) NOT ENFORCED
)
CLUSTER BY order_item_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.products
(
product_id INT64,
product_name STRING,
brand STRING,
category STRING,
subcategory INT64,
animal_type INT64,
search_keywords INT64,
price FLOAT64,
description STRING,
inventory_level INT64,
supplier_id INT64,
average_rating FLOAT64,
uri STRING,
embedding ARRAY<FLOAT64>,
PRIMARY KEY (product_id) NOT ENFORCED
)
CLUSTER BY product_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.orders
(
customer_id INT64,
order_id INT64,
shipping_address_city STRING,
store_id INT64,
order_date DATE,
order_type STRING,
payment_method STRING,
PRIMARY KEY (order_id) NOT ENFORCED
)
PARTITION BY order_date
CLUSTER BY order_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.stores
(
store_id INT64,
store_name STRING,
address_state STRING,
address_city STRING,
latitude FLOAT64,
longitude FLOAT64,
opening_hours STRUCT<Monday STRING, Tuesday STRING, Wednesday STRING, Thursday STRING, Friday STRING, Saturday STRING, Sunday STRING>,
manager_id INT64,
PRIMARY KEY (store_id) NOT ENFORCED
)
CLUSTER BY store_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.customers
(
customer_id INT64,
first_name STRING,
last_name STRING,
email STRING,
gender STRING,
address_city STRING,
address_state STRING,
loyalty_member BOOL,
PRIMARY KEY (customer_id) NOT ENFORCED
)
CLUSTER BY customer_id;
CREATE TABLE IF NOT EXISTS cymbal_pets_demo.co_related_products_for_angelica
(
angelica_product_id INT64,
other_product_id INT64,
co_purchase_count INT64
);
You have now defined the structure for your graph data.
4. Load the Data
Now, populate the tables with sample data from Cloud Storage.
Run the following LOAD DATA statements in the BigQuery SQL dditor:
LOAD DATA INTO `cymbal_pets_demo.customers`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/customers/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.order_items`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/order_items/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.orders`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/orders/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.products`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/products/*.avro']
);
LOAD DATA INTO `cymbal_pets_demo.stores`
FROM FILES (
format = 'AVRO',
uris = ['gs://sample-data-and-media/cymbal-pets/tables/stores/*.avro']
);
You should see a confirmation that rows have been loaded into each table.
5. Create the Property Graph
With the data loaded, you can now define the property graph. This tells BigQuery which tables represent nodes (entities like Customers, Products) and which tables represent edges (relationships like "Visited", "Placed", "Has").

Run the following DDL statement:
CREATE OR REPLACE PROPERTY GRAPH cymbal_pets_demo.PetsOrderGraph
NODE TABLES (
cymbal_pets_demo.customers KEY(customer_id) LABEL Customer,
cymbal_pets_demo.products KEY(product_id) LABEL Products,
cymbal_pets_demo.stores KEY(store_id) LABEL Stores,
cymbal_pets_demo.orders KEY(order_id) LABEL Orders
)
EDGE TABLES (
cymbal_pets_demo.orders as customer_to_store_edge
KEY (order_id)
SOURCE KEY (customer_id) references customers(customer_id)
DESTINATION KEY (store_id) references stores(store_id)
LABEL Visited
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.order_items
KEY (order_item_id)
SOURCE KEY (order_id) references orders(order_id)
DESTINATION KEY (product_id) references products(product_id)
LABEL Has
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.orders as customer_to_orders_edge
KEY (order_id)
SOURCE KEY (customer_id) references customers(customer_id)
DESTINATION KEY (order_id) references orders(order_id)
LABEL Placed
PROPERTIES ALL COLUMNS,
cymbal_pets_demo.co_related_products_for_angelica
KEY (angelica_product_id)
SOURCE KEY (angelica_product_id) references products(product_id)
DESTINATION KEY (other_product_id) references products(product_id)
LABEL BoughtTogether
PROPERTIES ALL COLUMNS
);
This creates the graph PetsOrderGraph which allows us to perform graph traversals using the GRAPH_TABLE operator.
6. Visualize Purchase History of all customers
Open New Notebook in BigQuery Studio.

For the visualization and recommendation parts of this codelab, we will use a Google Colab notebook in BigQuery Studio. This allows us to easily visualize the graph results.
BigQuery Graph Notebook is implemented as an IPython Magics. By adding the %%bigquery magic command with the TO_JSON function, you can visualize the results as shown in the following sections.
Say that Cymbal Pets want to get a 360 degree visualization of all the customers and the purchases they did in a specific time window.
Run the following in a new cell:
%%bigquery --graph
GRAPH cymbal_pets_demo.PetsOrderGraph
# finds the customer node and then finds all
# the Orders nodes that are connected to that customer through the
# Placed relationship
MATCH (customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
# filters the Orders nodes to only include those where the
# order_date is within the last 3 months.
WHERE ordr.order_date >= date('2024-11-27')
# # This line finds all the Products nodes that are connected to the
# # filtered Orders nodes through the Has relationship.
MATCH p=(customer:Customer)-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
LIMIT 40
RETURN
TO_JSON(p) as paths
You should see a visual representation of the graph result.

7. Visualize Purchase History of Angelica
Say that Cymbal Pets want to dive deep into a customer named Angelica Russell. They want to analyze the products that Angelica bought within the last 3 months, and the stores that the customer visited.
%%bigquery --graph
GRAPH cymbal_pets_demo.PetsOrderGraph
# finds the customer node with the name "Angelica Russell" and then finds all
# the Orders nodes that are connected to that customer through the
# Placed relationship and all the Products nodes that are connected to the
# filtered Orders nodes through the Has relationship.
MATCH p=(customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)-[has:Has]->(product:Products)
# filters the Orders nodes to only include those where the
# order_date is within the last 3 months.
WHERE ordr.order_date >= date('2024-11-27')
# finds the Stores nodes where Angelica placed order from
MATCH p2=(customer)-[visited:Visited]->(store:Stores)
RETURN
TO_JSON(p) as path, TO_JSON(p2) as path2

8. Product Recommendation using vector search
Cymbal Pets wants to recommend products to Angelica based on what she has bought recently. We can use vector search to find products with similar embeddings to her past purchases.
Run the following SQL script in a new Colab cell. This script:
- Identifies products Angelica bought recently.
- Uses
VECTOR_SEARCHto find the top 4 similar products from theproductstable.
Note: This step assumes you have already run AI.GENERATE_EMBEDDINGS to create an embeddings column in the products table.
%%bigquery
DECLARE products_bought_by_angelica ARRAY<INT64>;
-- 1. Get IDs of products bought by Angelica
SET products_bought_by_angelica = (
SELECT ARRAY_AGG(product_id) FROM
GRAPH_TABLE(
cymbal_pets_demo.PetsOrderGraph
MATCH (c:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(o:Orders)
WHERE o.order_date >= date('2024-11-27')
MATCH (o)-[has_edge:Has]->(p:Products)
RETURN DISTINCT p.product_id as product_id
));
-- 2. Find similar products using vector search
SELECT
query.product_name as AngelicaBought,
base.product_name as RecommendedProducts,
base.category
FROM
VECTOR_SEARCH(
TABLE cymbal_pets_demo.products,
'embedding',
(SELECT * FROM cymbal_pets_demo.products
WHERE product_id IN UNNEST(products_bought_by_angelica)),
'embedding',
top_k => 4)
WHERE query.product_name <> base.product_name;
You should see a list of recommended products that are semantically similar to what Angelica bought.

9. Recommendation using "Bought Together" Relationships
Another powerful recommendation technique is "Collaborative Filtering" — recommending products that are frequently bought together by other users. We have modeled this as a BoughtTogether edge in our graph.
For recommending bought together products, Cymbal pets executed an analytical offline graph query to find top products to recommend for each product that Angelica purchased.
%%bigquery
CREATE OR REPLACE TABLE cymbal_pets_demo.co_related_products_for_angelica AS
SELECT
angelica_product_id,
other_product_id,
co_purchase_count
FROM (
SELECT
angelicaProduct.product_id AS angelica_product_id,
otherProduct.product_id AS other_product_id,
count(otherProduct) AS co_purchase_count,
# ensures that the row numbering is done separately for each angelica_product_id
ROW_NUMBER() OVER (PARTITION BY angelicaProduct.product_id ORDER BY count(otherProduct) DESC) AS rn
FROM
GRAPH_TABLE (cymbal_pets_demo.PetsOrderGraph
MATCH (angelica:Customer {first_name: 'Angelica', last_name: 'Russell'})-[:Placed]->(o:Orders)-[:Has]->(angelicaProduct:Products)
WHERE o.order_date >= date('2024-11-27')
WITH angelica, angelicaProduct
MATCH (otherCustomer:Customer)-[:Placed]->(otherOrder:Orders)-[:Has]->(angelicaProduct) # Find orders where Angelica's products were bought
WHERE otherCustomer <> angelica # Exclude Angelica's own orders
WITH angelicaProduct, otherOrder
MATCH (otherOrder)-[:HAS]->(otherProduct:Products) # Find other products in those orders
WHERE angelicaProduct <> otherProduct # Exclude the original product.
RETURN angelicaProduct, otherProduct, otherOrder
)
GROUP BY
angelicaProduct.product_id, otherProduct.product_id
)
WHERE rn <= 3; # only keep top 3 co-related products

Run this query to recommend products that are directly connected to Angelica's purchases via the BoughtTogether edge:
%%bigquery
SELECT * FROM GRAPH_TABLE(
cymbal_pets_demo.PetsOrderGraph
MATCH (customer:Customer {first_name: 'Angelica', last_name: 'Russell'})-[placed:Placed]->(ordr:Orders)
WHERE ordr.order_date >= date('2024-11-27')
MATCH (ordr)-[has:Has]->(product:Products)
MATCH (product)-[bought_together:BoughtTogether]->(recommended_product:Products)
RETURN
product.product_name AS OriginalProduct,
recommended_product.product_name AS Recommended,
bought_together.co_purchase_count AS Strength
);
This query traverses from Customer -> Order -> Product -> (BoughtTogether) -> Recommended Product, showing you recommendations based on collective purchase behavior.

10. Clean up
To avoid ongoing charges to your Google Cloud account, delete the resources created during this codelab.
Delete the dataset and all tables:
DROP SCHEMA IF EXISTS cymbal_pets_demo CASCADE;
If you created a new project for this codelab, you can also delete the project:
gcloud projects delete $PROJECT_ID
11. Congratulations
Congratulations! You have successfully built a Customer 360 view and recommendation engine using BigQuery Graph.
What you've learned
- How to create a property graph in BigQuery.
- How to load data into graph nodes and edges.
- How to query graph patterns using
GRAPH_TABLEandMATCH. - How to combine graph queries with vector search for hybrid recommendations.
Next steps
- Explore BigQuery Graph documentation.
- Learn more about Vector Search in BigQuery.