1. Introduction
As developers and data engineers, we often inherit large collections of data that look more like data swamps. We face the same friction points repeatedly: "What is the actual definition of this ‘amt' column?", "Who is on the hook if this dataset breaks?", or "Are we allowed to use this table in the personalized recommendation engine?"
Traditionally, data catalogs have been passive inventories filled with free-text tags that quickly become inconsistent and outdated. They don't enforce structure, making programmatic governance nearly impossible.
To make this practical, we will work through a scenario in this lab: establishing robust governance over raw retail sales data so it can be trusted by a finance department for official reporting. You will move this data from an ambiguous "swamp" state to a governed product.
Dataplex Universal Catalog changes this by providing an active, structured metadata management framework. It allows you to attach structured, schema-driven metadata (Aspects) and accepted business definitions (Glossaries) directly to your data assets (Entries).
Before you can write Python scripts or Terraform modules to automate this at scale, you need to understand the underlying object model.
In this codelab, we will perform the governance steps manually in the Google Cloud Console. We will explicitly connect the dots between Entries, Aspect Types, Aspects, and Glossaries to give you a solid mental model of how to make your data discoverable, understandable, and trustworthy.
Prerequisites
- A Google Cloud Project with Owner or Editor access.
- Familiarity with the Google Cloud Console.
- Basic gcloud and bq CLI skills in Cloud Shell.
What you'll learn
- The crucial distinction between a Dataplex Entry, Aspect Type, and Aspect.
- How to create a Business Glossary to resolve ambiguity in terminology.
- How to design an Aspect Type to enforce a strict schema for technical metadata (moving beyond "tags").
- How to link a Business Glossary Term to a specific BigQuery column.
- How to attach a structured Aspect to a data asset and validate inputs.
- How to execute precise search queries against this new structured metadata.
What you'll need
- A Google Cloud Account and Google Cloud Project
- A web browser such as Chrome
Key concepts
- Entry: The canonical, abstract representation of a data asset in the catalog. Think of this as the "pointer" or the "noun." When you create a BigQuery table, Dataplex automatically creates an Entry for it. We don't govern the table directly; we govern its Entry.
- Business Glossary: A centralized, versioned dictionary of your organization's business terms. It is the single source of truth. It prevents the "Sales defines GMV differently than Finance" problem.
- Aspect Type: The schema or template for a specific category of metadata. An Aspect Type defines fields, data types (string, enum, datetime, etc.), and constraints (required/optional). It is the contract that ensures metadata consistency.
- Aspect: A specific piece of metadata attached to an Entry that follows the structure defined by Aspect Type. It contains the actual data fulfilling the Aspect Type's schema.
2. Setup and requirements
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.
From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.
Enable required APIs and configure environment
Run the following commands to set your project ID, define the region, and enable the necessary service APIs.
export PROJECT_ID=$(gcloud config get-value project)
gcloud config set project $PROJECT_ID
export LOCATION="us-central1"
gcloud services enable dataplex.googleapis.com \
bigquery.googleapis.com \
datacatalog.googleapis.com
Create a BigQuery dataset and prepare sample data
We need a concrete data asset to govern. We will create a BigQuery dataset and load a small sample CSV representing transactions. Dataplex will automatically discover this table and create an Entry for it.
# Create the BigQuery Dataset in the us-central1 region
bq --location=$LOCATION mk --dataset \
--description "Retail data for governance codelab" \
$PROJECT_ID:retail_data
# Create a temporary CSV file with the sample data
echo "transaction_id,user_email,gmv,transaction_date
1001,test@example.com,150.50,2025-08-28
1002,user@example.com,75.00,2025-08-28" > /tmp/transactions.csv
# Load the data from the temporary CSV file into BigQuery
bq load \
--source_format=CSV \
--autodetect \
retail_data.transactions \
/tmp/transactions.csv
# (Optional) Clean up the temporary file
rm /tmp/transactions.csv
Verify the setup by running a quick query:
bq query --nouse_legacy_sql "SELECT * FROM retail_data.transactions"
3. Establish a common language with a Business Glossary
Effective governance starts with unambiguous definitions. If a developer sees a column named gmv, they shouldn't have to guess if it includes taxes or returns. A Business Glossary solves this by decoupling the business definition from the technical implementation.
- In the Google Cloud Console, navigate to Dataplex Universal catalog.
- In the left navigation menu, select Glossaries (under Manage metadata).

- Click Create business glossary.
- Enter the following details:
- Name:
Retail Business Glossary - Location:
us-central1(or the location you defined in setup).
- Name:
- Click Create.

- Click on the newly created Retail Business Glossary to enter it.

- Click Create category and name it
Sales Metrics, then click Create. Categories help group related terms. - Select the
Sales Metricscategory and click Add term, then name itGross Merchandise Value, then click Create - Click + Add button on Overview, then fill the following details:
- Overview:
The total value of merchandise sold over a given period of time before the deduction of any fees or expenses. This is a key indicator of e-commerce business growth.
- Overview:
- Click Save.

You have now established a clear definition that can be linked to technical assets across your organization.
4. Define structured technical metadata with an Aspect Type
Simple "key:value" tags are insufficient for engineering rigor. If you need to track "Data Owners," you don't want one table tagged owner:bob and another contact:alice@example.com. You need a schema to enforce that an owner is required and must be a valid email format.
We will use an Aspect Type to define this contract.
- In the Dataplex left navigation, under Catalogue, select Aspect types & Tag Templates.
- Select the Custom tab and click Create aspect type.

- Enter the following details:
- Display name:
Data Asset Governance - Location:
us-central1
- Display name:
- In the Template section, we will define the schema for our
Aspect. Click Add a field to create the following three fields:- Field 1:
- Display name:
Data Steward - Type:
Text - Text type:
Plain text - Cardinality: Is required (check the box)
- Display name:
- Field 2 (click Add a field again):
- Display name:
Data Sensitivity - Type:
Enum - Values: Add
Public,Internal, andConfidential - Cardinality: Optional
- Display name:
- Field 3 (click Add a field again):
- Display name:
Last Review Date - Type:
Date and time - Cardinality: Optional
- Display name:
- Field 1:
- Click Save.

You have just created a reusable metadata contract. Nothing uses it yet, but the structure exists.
5. Connecting governance to the Asset
Now we will bring it all together. We have a BigQuery table (retail_data.transactions), a business definition (Gross Merchandise Value), and a governance schema (Data Asset Governance).
We will enrich the Dataplex Entry for the BigQuery table.
Enrich the schema with business context (column level)
Let's tell users what the gmv column actually means by linking it to the glossary.
- In the Dataplex left navigation, click Search.
- On the Top-right side, click Dataplex Universal Catalog tab if it was not activated.

- Search for
retail_data.transactions. Click on the result for the BigQuery Table.

- Click the Schema tab within the Entry details.
- Check the checkbox of the
gmvcolumn row and click the Add business term. - Select the
Gross Merchandise Valueterm.

The column gmv is no longer just a "FLOAT"; it is now linked to the corporate definition of Gross Merchandise Value.
Enrich the entry with structured technical metadata (table level)
Next, we will attach the Data Asset Governance Aspect to the table to define ownership and sensitivity.
- Stay on the
retail_data.transactionsEntry page. - Click Add tag or aspect tab, then select the
Data Asset Governancetype from the dropdown.

- The form will now display the fields defined in your Aspect Type schema. Fill them out as follows:
- Data Steward:
finance-team@example.com - Data Sensitivity: Select
Internal. - Last Review Date: Select today's date.
- Data Steward:
- Click Save.

You have successfully attached a structured Aspect to the Entry. Unlike a simple tag, this data is validated against the schema you created.
6. Unified discovery and verification
We didn't do this work just to fill out forms. We did it to make data discoverable and trustable. Let's see how this metadata changes the developer experience for search and discovery.
Return to the main Search page in Dataplex Universal Catalog.
Imagine you are a platform engineer enforcing governance. You need to find all assets marked "Internal" that are governed by your specific Aspect Type. You need to use precise predicates based on your schema.
You can verify this in two ways: using a precise query syntax (essential for automation) or using interactive UI filters.
Method 1: Verify via Structured Query
- In the search bar (in Keyword search mode), enter the following structured query.
aspect:data-asset-governance.data-sensitivity=Internal
- You should see your
retail_data.transactionstable.

Method 2: Verify via UI Filter Facets
- Clear the search bar to reset the view
- Look at the Filter by properties panel on the left side of the screen.
- Scroll down and expand the Data Asset Governance section (this represents the Aspect Type you created)
- Under Data Sensitivity, check the box for
Internal. - The search results will update to show the
retail_data.transactionstable.

Whether you use the typed query or the UI filters, the underlying mechanism is the same.
This demonstrates the fundamental difference between Dataplex and a simple wiki: your metadata is a queryable structure. You can now build automated audits (e.g., "Find all tables where last_review_date is > 1 year ago") relying on this predictable structure.
7. Cleaning up your environment
To avoid incurring ongoing charges, delete the resources created in this codelab.
Delete the BigQuery Dataset
This command is irreversible and uses the -f (force) flag to remove the dataset and all its tables without confirmation.
# Re-run these exports if your Cloud Shell session timed out
export PROJECT_ID=$(gcloud config get-value project)
# Manually type this command to confirm you are deleting the correct dataset
bq rm -r -f --dataset $PROJECT_ID:retail_data
Delete Dataplex artifacts
- Navigate to Dataplex Universal catalog UI > Manage metadata > Catalogue.
- In Aspect types & tag templates, select the data_asset_governance aspect type, and delete it.
- Navigate to Manage metadata > Glossaries, select the
Retail Business Glossary, and delete it. Make sure to delete the term,Gross Merchandise Valuefirst and delete glossary later.
8. Congratulations!
You have moved beyond simple data tagging and established a foundational, structured governance model in Dataplex.
You learned that:
- Glossaries resolve business ambiguity.
- Aspect Types provide the schema contract for technical metadata.
- Aspects apply that schema to actual Data Entries.
- Dataplex Search utilizes this structured metadata for precise discovery.
What's Next?
- Governance as Code: Use the Google Cloud Terraform provider to define your Aspect Types and Glossaries in version control, ensuring consistent schemas across dev/test/prod environments.
- Automated Tagging: Write a Cloud Function or Cloud Build step triggered by new dataset creation that automatically attaches your "Data Asset Governance" Aspect with default values (e.g.,
sensitivity=Internal, steward=TBD), flagging it for review.