Agentverse - The Scholar's Grimoire - Building Knowledge Engines with RAG

1. Overture

The era of siloed development is ending. The next wave of technological evolution is not about solitary genius, but about collaborative mastery. Building a single, clever agent is a fascinating experiment. Building a robust, secure, and intelligent ecosystem of agents—a true Agentverse—is the grand challenge for the modern enterprise.

Success in this new era requires the convergence of four critical roles, the foundational pillars that support any thriving agentic system. A deficiency in any one area creates a weakness that can compromise the entire structure.

This workshop is the definitive enterprise playbook for mastering the agentic future on Google Cloud. We provide an end-to-end roadmap that guides you from the first vibe of an idea to a full-scale, operational reality. Across these four interconnected labs, you will learn how the specialized skills of a developer, architect, data engineer, and SRE must converge to create, manage, and scale a powerful Agentverse.

No single pillar can support the Agentverse alone. The Architect's grand design is useless without the Developer's precise execution. The Developer's agent is blind without the Data Engineer's wisdom, and the entire system is fragile without the SRE's protection. Only through synergy and a shared understanding of each other's roles can your team transform an innovative concept into a mission-critical, operational reality. Your journey begins here. Prepare to master your role and learn how you fit into the greater whole.

Welcome to The Agentverse: A Call to Champions

In the sprawling digital expanse of the enterprise, a new era has dawned. It is the agentic age, a time of immense promise, where intelligent, autonomous agents work in perfect harmony to accelerate innovation and sweep away the mundane.

This connected ecosystem of power and potential is known as The Agentverse.

But a creeping entropy, a silent corruption known as The Static, has begun to fray the edges of this new world. The Static is not a virus or a bug; it is the embodiment of chaos that preys on the very act of creation.

It amplifies old frustrations into monstrous forms, giving birth to the Seven Spectres of Development. If left unchecked, The Static and its Spectres will grind progress to a halt, turning the promise of the Agentverse into a wasteland of technical debt and abandoned projects.

Today, we issue a call for champions to push back the tide of chaos. We need heroes willing to master their craft and work together to protect the Agentverse. The time has come to choose your path.

Choose Your Class

Four distinct paths lie before you, each a critical pillar in the fight against The Static. Though your training will be a solo mission, your ultimate success depends on understanding how your skills combine with others.

The Shadowblade (Developer): A master of the forge and the front line. You are the artisan who crafts the blades, builds the tools, and faces the enemy in the intricate details of the code. Your path is one of precision, skill, and practical creation.
The Summoner (Architect): A grand strategist and orchestrator. You do not see a single agent, but the entire battlefield. You design the master blueprints that allow entire systems of agents to communicate, collaborate, and achieve a goal far greater than any single component.
The Scholar (Data Engineer): A seeker of hidden truths and the keeper of wisdom. You venture into the vast, untamed wilderness of data to uncover the intelligence that gives your agents purpose and sight. Your knowledge can reveal an enemy's weakness or empower an ally.
The Guardian (DevOps / SRE): The steadfast protector and shield of the realm. You build the fortresses, manage the supply lines of power, and ensure the entire system can withstand the inevitable attacks of The Static. Your strength is the foundation upon which your team's victory is built.

Your Mission

Your training will begin as a standalone exercise. You will walk your chosen path, learning the unique skills required to master your role. At the end of your trial, you will face a Spectre born of The Static—a mini-boss that preys on the specific challenges of your craft.

Only by mastering your individual role can you prepare for the final trial. You must then form a party with champions from the other classes. Together, you will venture into the heart of the corruption to face an ultimate boss.

A final, collaborative challenge that will test your combined strength and determine the fate of the Agentverse.

The Agentverse awaits its heroes. Will you answer the call?

2. The Scholar's Grimoire

Our journey begins! As Scholars, our primary weapon is knowledge. We've discovered a trove of ancient, cryptic scrolls in our archives (Google Cloud Storage). These scrolls contain raw intelligence on the fearsome beasts plaguing the land. Our mission is to use the profound analytical magic of Google BigQuery and the wisdom of a Gemini Elder Brain (Gemini Pro model) to decipher these unstructured texts and forge them into a structured, queryable Bestiary. This will be the foundation of all our future strategies.

Overview

What you'll learn

Use BigQuery to create external tables and perform complex unstructured-to-structured transformations using BQML.GENERATE_TEXT with a Gemini model.
Provision a Cloud SQL for PostgreSQL instance and enable the pgvector extension for semantic search capabilities.
Build a robust, containerized batch pipeline using Dataflow and Apache Beam to process raw text files, generate vector embeddings with a Gemini model, and write the results to a relational database.
Implement a basic Retrieval-Augmented Generation (RAG) system within an agent to query the vectorized data.
Deploy a data-aware agent as a secure, scalable service on Cloud Run.

3. Preparing the Scholar's Sanctum

Welcome, Scholar. Before we can begin inscribing the powerful knowledge of our Grimoire, we must first prepare our sanctum. This foundational ritual involves enchanting our Google Cloud environment, opening the right portals (APIs), and creating the conduits through which our data magic will flow. A well-prepared sanctum ensures our spells are potent and our knowledge is secure.

👉Click Activate Cloud Shell at the top of the Google Cloud console (It's the terminal shape icon at the top of the Cloud Shell pane),

alt text

👉Click on the "Open Editor" button (it looks like an open folder with a pencil). This will open the Cloud Shell Code Editor in the window. You'll see a file explorer on the left side. alt text

👉Open the terminal in the cloud IDE, alt text

👉💻 In the terminal, verify that you're already authenticated and that the project is set to your project ID using the following command:

gcloud auth list

👉💻Clone the bootstrap project from GitHub:

git clone https://github.com/weimeilin79/agentverse-dataengineer
chmod +x ~/agentverse-dataengineer/init.sh
chmod +x ~/agentverse-dataengineer/set_env.sh
chmod +x ~/agentverse-dataengineer/data_setup.sh

git clone https://github.com/weimeilin79/agentverse-dungeon.git
chmod +x ~/agentverse-dungeon/run_cloudbuild.sh
chmod +x ~/agentverse-dungeon/start.sh

👉💻 Run the setup script from the project directory.

⚠️ Note on Project ID: The script will suggest a randomly generated default Project ID. You can press Enter to accept this default.

However, if you prefer to create a specific new project, you can type your desired Project ID when prompted by the script.

cd ~/agentverse-dataengineer
./init.sh

👉 Important Step After Completion: Once the script finishes, you must ensure your Google Cloud Console is viewing the correct project:

Go to console.cloud.google.com.
Click the project selector dropdown at the top of the page.
Click the "All" tab (as the new project might not appear in "Recent" yet).
Select the Project ID you just configured in the init.sh step.

👉💻 Set the Project ID needed:

gcloud config set project $(cat ~/project_id.txt) --quiet

👉💻 Run the following command to enable the necessary Google Cloud APIs:

gcloud services enable \
    storage.googleapis.com \
    bigquery.googleapis.com \
    sqladmin.googleapis.com \
    aiplatform.googleapis.com \
    dataflow.googleapis.com \
    pubsub.googleapis.com \
    cloudfunctions.googleapis.com \
    run.googleapis.com \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    iam.googleapis.com \
    compute.googleapis.com \
    cloudresourcemanager.googleapis.com \
    cloudaicompanion.googleapis.com \
    bigqueryunified.googleapis.com

👉💻 If you have not already created an Artifact Registry repository named agentverse-repo, run the following command to create it:

. ~/agentverse-dataengineer/set_env.sh
gcloud artifacts repositories create $REPO_NAME \
    --repository-format=docker \
    --location=$REGION \
    --description="Repository for Agentverse agents"

Setting up permission

👉💻 Grant the necessary permissions by running the following commands in the terminal:

. ~/agentverse-dataengineer/set_env.sh

# --- Grant Core Data Permissions ---
gcloud projects add-iam-policy-binding $PROJECT_ID \
 --member="serviceAccount:$SERVICE_ACCOUNT_NAME" \
 --role="roles/storage.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID \
 --member="serviceAccount:$SERVICE_ACCOUNT_NAME" \
 --role="roles/bigquery.admin"

# --- Grant Data Processing & AI Permissions ---
gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/dataflow.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/cloudsql.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/pubsub.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/aiplatform.user"

# --- Grant Deployment & Execution Permissions ---
gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/cloudbuild.builds.editor"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/artifactregistry.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/run.admin"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/iam.serviceAccountUser"

gcloud projects add-iam-policy-binding $PROJECT_ID  \
--member="serviceAccount:$SERVICE_ACCOUNT_NAME"  \
--role="roles/logging.logWriter"


gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:$SERVICE_ACCOUNT_NAME" \
  --role="roles/dataflow.admin"

👉💻 As you begin your training, we will prepare the final challenge. The following commands will summon the Spectres from the chaotic static, creating the bosses for your final test.

. ~/agentverse-dataengineer/set_env.sh
cd ~/agentverse-dungeon
./run_cloudbuild.sh
cd ~/agentverse-dataengineer

Excellent work, Scholar. The foundational enchantments are complete. Our sanctum is secure, the portals to the elemental forces of data are open, and our servitor is empowered. We are now ready to begin the real work.

4. The Alchemy of Knowledge: Transforming Data with BigQuery & Gemini

In the ceaseless war against The Static, every confrontation between a Champion of the Agentverse and a Spectre of Development is meticulously recorded. The Battleground Simulation system, our primary training environment, automatically generates an Aetheric Log Entry for each encounter. These narrative logs are our most valuable source of raw intelligence, the unrefined ore from which we, as Scholars, must forge the pristine steel of strategy.The true power of a Scholar lies not in merely possessing data, but in the ability to transmute the raw, chaotic ore of information into the gleaming, structured steel of actionable wisdom.We will perform the foundational ritual of data alchemy.

Story

Our journey will take us through a multi-stage process entirely within the sanctum of Google BigQuery. We will begin by gazing into our GCS archive without moving a single scroll, using a magical lens. Then, we will summon a Gemini to read and interpret the poetic, unstructured sagas of battle logs. Finally, we will refine the raw prophecies into a set of pristine, interconnected tables. Our first Grimoire. And ask of it a question so profound it could only be answered by this newfound structure.

Overview

Data Engineer's Note: :What we are about to perform is a powerful, in-database, AI-powered ELT (Extract, Load, Transform) pattern. This is a cutting-edge approach that differs significantly from traditional methods.

Extract & Load (via External Table): Instead of a costly ingestion process (the traditional "L"), we will use a BigQuery External Table. This applies a "schema-on-read," allowing our data warehouse to query raw text files directly in Cloud Storage. This is incredibly efficient, eliminating data movement and storage duplication.
Transform (via ML.GENERATE_TEXT): The "T" in our ELT is where the real magic happens. We will use the ML.GENERATE_TEXT function to call a Gemini model directly from a SQL query. This allows us to perform complex, context-aware transformation—in this case, converting unstructured narrative text into structured JSON without writing or managing a separate processing pipeline in another language (like Python or Java). This is a paradigm shift from brittle, hard-coded solutions like regular expressions, offering flexibility and power with a simple SQL interface.

The Lens of Scrutiny: Peering into GCS with BigQuery External Tables

Our first act is to forge a lens that allows us to see the contents of our GCS archive without disturbing the scrolls within. An External Table is this lens, mapping the raw text files to a table-like structure that BigQuery can query directly.

To do this, we must first create a stable ley line of power, a CONNECTION resource, that securely links our BigQuery sanctum to the GCS archive.

👉💻 In your Cloud Shell terminal, run the following command to setup the storage and forge the conduit:

. ~/agentverse-dataengineer/set_env.sh
. ~/agentverse-dataengineer/data_setup.sh

bq mk --connection \
  --connection_type=CLOUD_RESOURCE \
  --project_id=${PROJECT_ID} \
  --location=${REGION} \
  gcs-connection

💡 Heads Up! A Message Will Appear Later!

The setup script from step 2 started a process in the background. After a few minutes, a message will pop up in your terminal that looks similar to this:[1]+ Done gcloud sql instances create ...This is normal and expected. It simply means your Cloud SQL database has been successfully created. You can safely ignore this message and continue working.

Before you can create the External Table, you must first create the dataset that will contain it.

👉💻 Run this one simple command in your Cloud Shell terminal:

. ~/agentverse-dataengineer/set_env.sh
bq --location=${REGION} mk --dataset ${PROJECT_ID}:bestiary_data

👉💻 Now we must grant the conduit's magical signature the necessary permissions to read from the GCS Archive and consult Gemini.

. ~/agentverse-dataengineer/set_env.sh
export CONNECTION_SA=$(bq show --connection --project_id=${PROJECT_ID} --location=${REGION} --format=json gcs-connection  | jq -r '.cloudResource.serviceAccountId')

echo "The Conduit's Magical Signature is: $CONNECTION_SA"

echo "Granting key to the GCS Archive..."
gcloud storage buckets add-iam-policy-binding gs://${PROJECT_ID}-reports \
  --member="serviceAccount:$CONNECTION_SA" \
  --role="roles/storage.objectViewer"

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:$CONNECTION_SA" \
  --role="roles/aiplatform.user"

👉💻 In your Cloud Shell terminal, run the following command to display your bucket name:

echo $BUCKET_NAME

Your terminal will display a name similar to your-project-id-gcs-bucket. You will need this in the next steps.

👉 You'll need to run the next command from within the BigQuery query editor in the Google Cloud Console. The easiest way to get there is to open the link below in a new browser tab. It will take you directly to the correct page in the Google Cloud Console.

https://console.cloud.google.com/bigquery

👉 Once the page loads, click the blue + button (Compose a new query) to open a fresh editor tab.

BigQuery query editor

Now we write the Data Definition Language (DDL) incantation to create our magical lens. This tells BigQuery where to look and what to see.

👉📜 In the BigQuery query editor you opened, paste the following SQL. Remember to replace the REPLACE-WITH-YOUR-BUCKET-NAME

with the bucket name you just copied. And click Run:

CREATE OR REPLACE EXTERNAL TABLE bestiary_data.raw_intel_content_table (
  raw_text STRING
)
OPTIONS (
  format = 'CSV',
  -- This is a trick to load each line of the text files as a single row.
  field_delimiter = '§', 
  uris = ['gs://REPLACE-WITH-YOUR-BUCKET-NAME/raw_intel/*']
);

👉📜 Run a query to "look through the lens" and see the content of the files.

SELECT * FROM bestiary_data.raw_intel_content_table;

Raw Intel Content

Our lens is in place. We can now see the raw text of the scrolls. But reading is not understanding.

In the Archives of Forgotten Ideas, a Scholar of the Agentverse, Elara (designation adv_001), confronted the angelic Spectre of Perfectionism. The entity, catalogued as ‘p-01', shimmered with a life-force of 120 Hit Points. With a single, focused incantation of Elegant Sufficiency, Elara shattered its paralyzing aura, a mental assault dealing 150 points of damage. The encounter lasted 180 seconds of intense concentration. Final Assessment: Victory.

The scrolls are not written in tables and rows, but in the winding prose of sagas. This is our first great test.

The Scholar's Divination: Turning Text into a Table with SQL

The challenge is that a report detailing the rapid, twin attacks of a Shadowblade reads very differently from the chronicle of a Summoner gathering immense power for a single, devastating blast. We cannot simply import this data; we must interpret it. This is the moment of magic. We will use a single SQL query as a powerful incantation to read, understand, and structure all the records from all our files, right inside BigQuery.

👉💻 Back in your Cloud Shell terminal, run the following command to display your connection name:

echo "${PROJECT_ID}.${REGION}.gcs-connection"

Your terminal will display the complete connection string, Select and copy this entire string, you will need it in the next step

We will use a single, powerful incantation: ML.GENERATE_TEXT. This spell summons a Gemini, shows it each scroll, and commands it to return the core facts as a structured JSON object.

👉📜 In BigQuery studio, create the Gemini Model Reference. This binds the Gemini Flash oracle to our BigQuery library so we can call it in our queries. Remember to replace the

REPLACE-WITH-YOUR-FULL-CONNECTION-STRING with the full connection string you just copied from your terminal.

  CREATE OR REPLACE MODEL bestiary_data.gemini_flash_model
  REMOTE WITH CONNECTION `REPLACE-WITH-YOUR-FULL-CONNECTION-STRING`
  OPTIONS (endpoint = 'gemini-2.5-flash');

👉📜 Now, cast the grand transmutation spell. This query reads the raw text, constructs a detailed prompt for each scroll, sends it to Gemini, and builds a new staging table from the AI's structured JSON response.

CREATE OR REPLACE TABLE bestiary_data.structured_bestiary AS
SELECT
  -- THE CRITICAL CHANGE: We remove PARSE_JSON. The result is already a JSON object.
  ml_generate_text_result AS structured_data
FROM
  ML.GENERATE_TEXT(
    -- Our bound Gemini Flash model.
    MODEL bestiary_data.gemini_flash_model,

    -- Our perfectly constructed input, with the prompt built for each row.
    (
      SELECT
        CONCAT(
          """
          From the following text, extract structured data into a single, valid JSON object.

          Your output must strictly conform to the following JSON structure and data types. Do not add, remove, or change any keys.

          {
            "monster": {
              "monster_id": "string",
              "name": "string",
              "type": "string",
              "hit_points": "integer"
            },
            "battle": {
              "battle_id": "string",
              "monster_id": "string",
              "adventurer_id": "string",
              "outcome": "string",
              "duration_seconds": "integer"
            },
            "adventurer": {
              "adventurer_id": "string",
              "name": "string",
              "class": "string"
            }
          }

          **CRUCIAL RULES:**
          - Do not output any text, explanations, conversational filler, or markdown formatting like ` ```json` before or after the JSON object.
          - Your entire response must be ONLY the raw JSON object itself.

          Here is the text:
          """,
          raw_text -- We append the actual text of the report here.
        ) AS prompt -- The final column is still named 'prompt', as the oracle requires.
      FROM
        bestiary_data.raw_intel_content_table
    ),

    -- The STRUCT now ONLY contains model parameters.
    STRUCT(
      0.2 AS temperature,
      2048 AS max_output_tokens
    )
  );

The transmutation is complete, but the result is not yet pure. The Gemini model returns its answer in a standard format, wrapping our desired JSON inside a larger structure that includes metadata about its thought process. Let us look upon this raw prophecy before we attempt to purify it.

👉📜 Run a query to inspect the raw output from the Gemini model:

SELECT * FROM bestiary_data.structured_bestiary;

👀 You will see a single column named structured_data. The content for each row will look similar to this complex JSON object:

{"candidates":[{"avg_logprobs":-0.5691758094475283,"content":{"parts":[{"text":"```json\n{\n  \"monster\": {\n    \"monster_id\": \"gw_02\",\n    \"name\": \"Gravewight\",\n    \"type\": \"Gravewight\",\n    \"hit_points\": 120\n  },\n  \"battle\": {\n    \"battle_id\": \"br_735\",\n    \"monster_id\": \"gw_02\",\n    \"adventurer_id\": \"adv_001\",\n    \"outcome\": \"Defeat\",\n    \"duration_seconds\": 45\n  },\n  \"adventurer\": {\n    \"adventurer_id\": \"adv_001\",\n    \"name\": \"Elara\",\n    \"class\": null\n  }\n}\n```"}],"role":"model"},"finish_reason":"STOP","score":-97.32906341552734}],"create_time":"2025-07-28T15:53:24.482775Z","model_version":"gemini-2.5-flash","response_id":"9JyHaNe7HZ2WhMIPxqbxEQ","usage_metadata":{"billable_prompt_usage":{"text_count":640},"candidates_token_count":171,"candidates_tokens_details":[{"modality":"TEXT","token_count":171}],"prompt_token_count":207,"prompt_tokens_details":[{"modality":"TEXT","token_count":207}],"thoughts_token_count":1014,"total_token_count":1392,"traffic_type":"ON_DEMAND"}}

As you can see, our prize—the clean JSON object we requested—is nested deep within this structure. Our next task is clear. We must perform a ritual to systematically navigate this structure and extract the pure wisdom within.

The Ritual of Cleansing: Normalizing GenAI Output with SQL

The Gemini has spoken, but its words are raw and wrapped in the ethereal energies of its creation (candidates, finish_reason, etc.). A true Scholar does not simply shelve the raw prophecy; they carefully extract the core wisdom and scribe it into the appropriate tomes for future use.

We will now cast our final set of spells. This single script will:

Read the raw, nested JSON from our staging table.
Cleanse and parse it to get to the core data.
Scribe the relevant pieces into three final, pristine tables: monsters, adventurers, and battles.

👉📜 In a new BigQuery query editor, run the following spell to create our cleansing lens:

CREATE OR REPLACE TABLE bestiary_data.monsters AS
WITH
  CleanedDivinations AS (
    SELECT
      SAFE.PARSE_JSON(
        REGEXP_EXTRACT(
          JSON_VALUE(structured_data, '$.candidates[0].content.parts[0].text'),
          r'\{[\s\S]*\}'
        )
      ) AS report_data
    FROM
      bestiary_data.structured_bestiary
  )
SELECT
  JSON_VALUE(report_data, '$.monster.monster_id') AS monster_id,
  JSON_VALUE(report_data, '$.monster.name') AS name,
  JSON_VALUE(report_data, '$.monster.type') AS type,
  SAFE_CAST(JSON_VALUE(report_data, '$.monster.hit_points') AS INT64) AS hit_points
FROM
  CleanedDivinations
WHERE
  report_data IS NOT NULL
QUALIFY ROW_NUMBER() OVER (PARTITION BY monster_id ORDER BY name) = 1;

👉📜 Verify the Bestiary:

SELECT * FROM bestiary_data.monsters;

Next, we will create our Roll of Champions, a list of the brave adventurers who have faced these beasts.

👉📜 In a new query editor, run the following spell to create the adventurers table:

CREATE OR REPLACE TABLE bestiary_data.adventurers AS
WITH
  CleanedDivinations AS (
    SELECT
      SAFE.PARSE_JSON(
        REGEXP_EXTRACT(
          JSON_VALUE(structured_data, '$.candidates[0].content.parts[0].text'),
          r'\{[\s\S]*\}'
        )
      ) AS report_data
    FROM
      bestiary_data.structured_bestiary
  )
SELECT
  JSON_VALUE(report_data, '$.adventurer.adventurer_id') AS adventurer_id,
  JSON_VALUE(report_data, '$.adventurer.name') AS name,
  JSON_VALUE(report_data, '$.adventurer.class') AS class
FROM
  CleanedDivinations
QUALIFY ROW_NUMBER() OVER (PARTITION BY adventurer_id ORDER BY name) = 1;

👉📜 Verify the Roll of Champions:

SELECT * FROM bestiary_data.adventurers;

Finally, we will create our fact table: the Chronicle of Battles. This tome links the other two, recording the details of each unique encounter. Since every battle is a unique event, no deduplication is needed.

👉📜 In a new query editor, run the following spell to create the battles table:

CREATE OR REPLACE TABLE bestiary_data.battles AS
WITH
  CleanedDivinations AS (
    SELECT
      SAFE.PARSE_JSON(
        REGEXP_EXTRACT(
          JSON_VALUE(structured_data, '$.candidates[0].content.parts[0].text'),
          r'\{[\s\S]*\}'
        )
      ) AS report_data
    FROM
      bestiary_data.structured_bestiary
  )
-- Extract the raw essence for all battle fields and cast where necessary.
SELECT
  JSON_VALUE(report_data, '$.battle.battle_id') AS battle_id,
  JSON_VALUE(report_data, '$.battle.monster_id') AS monster_id,
  JSON_VALUE(report_data, '$.battle.adventurer_id') AS adventurer_id,
  JSON_VALUE(report_data, '$.battle.outcome') AS outcome,
  SAFE_CAST(JSON_VALUE(report_data, '$.battle.duration_seconds') AS INT64) AS duration_seconds
FROM
  CleanedDivinations;

👉📜 Verify the Chronicle:

SELECT * FROM bestiary_data.battles;

Uncovering Strategic Insights

The scrolls have been read, the essence distilled, and the tomes inscribed. Our Grimoire is no longer just a collection of facts, it is a relational database of profound strategic wisdom. We can now ask questions that were impossible to answer when our knowledge was trapped in raw, unstructured text.

Let us now perform a final, grand divination. We will cast a spell that consults all three of our tomes at once—the Bestiary of Monsters, the Roll of Champions, and the Chronicle of Battles—to uncover a deep, actionable insight.

Our strategic question: "For each adventurer, what is the name of the most powerful monster (by hit points) they have successfully defeated, and how long did that specific victory take?"

This is a complex question that requires linking champions to their victorious battles, and those battles to the stats of the monsters involved. This is the true power of a structured data model.

👉📜 In a new BigQuery query editor, cast the following final incantation:

-- This is our final spell, joining all three tomes to reveal a deep insight.
WITH
  -- First, we consult the Chronicle of Battles to find only the victories.
  VictoriousBattles AS (
    SELECT
      adventurer_id,
      monster_id,
      duration_seconds
    FROM
      bestiary_data.battles
    WHERE
      outcome = 'Victory'
  ),
  -- Next, we create a temporary record for each victory, ranking the monsters
  -- each adventurer defeated by their power (hit points).
  RankedVictories AS (
    SELECT
      v.adventurer_id,
      m.name AS monster_name,
      m.hit_points,
      v.duration_seconds,
      -- This spell ranks each adventurer's victories from most to least powerful monster.
      ROW_NUMBER() OVER (PARTITION BY v.adventurer_id ORDER BY m.hit_points DESC) as victory_rank
    FROM
      VictoriousBattles v
    JOIN
      bestiary_data.monsters m ON v.monster_id = m.monster_id
  )
-- Finally, we consult the Roll of Champions and join it with our ranked victories
-- to find the name of each champion and the details of their greatest triumph.
SELECT
  a.name AS adventurer_name,
  a.class AS adventurer_class,
  r.monster_name AS most_powerful_foe_defeated,
  r.hit_points AS foe_hit_points,
  r.duration_seconds AS duration_of_greatest_victory
FROM
  bestiary_data.adventurers a
JOIN
  RankedVictories r ON a.adventurer_id = r.adventurer_id
WHERE
  -- We only want to see their number one, top-ranked victory.
  r.victory_rank = 1
ORDER BY
  foe_hit_points DESC;

The output of this query will be a clean, beautiful table that provides a "Tale of a Champion's Greatest Feat" for every adventurer in your dataset. It might look something like this:

Close the Big Query tab.

This single, elegant result proves the value of the entire pipeline. You have successfully transformed raw, chaotic battlefield reports into a source of legendary tales and strategic, data-driven insights.

FOR NON GAMERS

"The Alchemy of Knowledge" describes the process of Transforming Raw Business Data into Structured, Actionable Intelligence using Advanced Cloud Tools. We start with "Aetheric Log Entries"—these are simply the diverse raw data sources your company generates, such as customer feedback forms, internal incident reports, legal documents, market research, or policy manuals. Often, this data is unstructured, making it hard to analyze.
Our process uses Google BigQuery (a powerful cloud data warehouse) and the Gemini AI model (a highly capable AI model) to perform this transformation.

The Lens of Scrutiny (BigQuery External Tables):
- Concept: Instead of physically moving all your raw data into a database, BigQuery can "look at" files directly in Cloud Storage. This is like having a magical lens that lets you read a library of scrolls without ever moving them. It's incredibly efficient because it avoids redundant data movement and storage.
- Real-World Use Case: Imagine your company stores millions of customer support chat logs as plain text files in a cloud storage bucket. Using an External Table, a data analyst can immediately query these files using SQL in BigQuery, without needing a complex and costly data ingestion process.
The Scholar's Divination (BQML.GENERATE_TEXT):
- Concept: This is the core "magic"—using AI directly within your data warehouse. We employ the ML.GENERATE_TEXT function to call the Gemini AI model from a standard SQL query. This allows the AI to "read" long, unstructured text entries and extract specific, structured information (like a JSON object). It's a powerful way to turn qualitative observations into quantitative data.
- Real-World Use Case:
  - Customer Feedback Analysis: Automatically extract the "sentiment" (positive, negative, neutral), "product mentioned," and "issue category" from free-text customer reviews.
  - Incident Report Summarization: Parse lengthy IT incident reports to extract the "system affected," "severity level," "root cause," and "resolution steps" into a structured format for easier analysis and trend identification.
  - Contractual Obligation Extraction: From legal documents, automatically pull out key dates, parties involved, and specific clauses.
- This eliminates the need for manual data entry or complex, brittle text-parsing scripts (like regular expressions), saving immense time and ensuring consistency.
The Ritual of Cleansing (Normalizing GenAI Output):
- Concept: After the AI extracts information, it often comes wrapped in extra details (like AI confidence scores or other metadata). This step involves cleaning and parsing that AI output to get only the pure, structured data you need.
- Real-World Use Case: Ensuring that the extracted "issue category" from an incident report is always one of a predefined set of values, or that dates are always in a consistent format. This prepares the data for reliable analysis.
Uncovering Strategic Insights:
- Concept: Once your raw, unstructured data is transformed into clean, structured tables (e.g., monsters, adventurers, battles in the codelab), you can then perform complex queries and analysis that were previously impossible.
- Real-World Use Case: Beyond simple counting, you can now answer questions like: "What is the average resolution time for critical IT incidents related to our billing system?" or "Which product features are most frequently mentioned in positive customer feedback from a specific demographic?" This provides deep, actionable business intelligence.

This entire process enables an "in-database, AI-powered ELT (Extract, Load, Transform)" pattern, a cutting-edge approach that keeps data secure within your data warehouse, minimizes movement, and leverages AI for powerful, flexible transformations using simple SQL commands.

5. The Scribe's Grimoire: In-Datawarehouse Chunking, Embedding, and Search

Our work in the Alchemist's lab was a success. We have transmuted the raw, narrative scrolls into structured, relational tables—a powerful feat of data magic. However, the original scrolls themselves still hold a deeper, semantic truth that our structured tables cannot fully capture. To build a truly wise agent, we must unlock this meaning.

Overview

A raw, lengthy scroll is a blunt instrument. If our agent asks a question about a "paralyzing aura," a simple search might return an entire battle report where that phrase is mentioned only once, burying the answer in irrelevant details. A master Scholar knows that true wisdom is found not in volume, but in precision.

We will perform a trio of powerful, in-database rituals entirely within our BigQuery sanctum.

The Ritual of Division (Chunking): We will take our raw intelligence logs and meticulously break them down into smaller, focused, self-contained passages.
The Ritual of Distillation (Embedding): We will use BQML to consult a Gemini model, transforming each text chunk into a "semantic fingerprint"—a vector embedding.
The Ritual of Divination (Searching): We will use BQML's vector search to ask a question in plain English and find the most relevant, distilled wisdom from our Grimoire.

This entire process creates a powerful, searchable knowledge base without the data ever leaving the security and scale of BigQuery.

The Ritual of Division: Deconstructing Scrolls with SQL

Our source of wisdom remains the raw text files in our GCS archive, accessible via our external table, bestiary_data.raw_intel_content_table. Our first task is to write a spell that reads each long scroll and splits it into a series of smaller, more digestible verses. For this ritual, we will define a "chunk" as a single sentence.

While splitting by the sentence is a clear and effective starting point for our narrative logs, a master Scribe has many chunking strategies at their disposal, and the choice is critical to the quality of the final search. Simpler methods might use a

Fixed length(size) Chunking, but this can crudely slice a key idea in half.

More sophisticated rituals, like

Recursive Chunking, are often preferred in practice; they attempt to divide text along natural boundaries like paragraphs first, then fall back to sentences to maintain as much semantic context as possible. For truly complex manuscripts.
Content-Aware Chunking(document), where the Scribe uses the inherent structure of the document—such as the headers in a technical manual or the functions in a scroll of code, to create the most logical and potent chunks of wisdom. and more...

For our battle logs, the sentence provides the perfect balance of granularity and context.

👉📜 In a new BigQuery query editor, run the following incantation. This spell uses the SPLIT function to break each scroll's text apart at every period (.) and then unnests the resulting array of sentences into separate rows.

CREATE OR REPLACE TABLE bestiary_data.chunked_intel AS
WITH
  -- First, add a unique row number to each scroll to act as a document ID.
  NumberedScrolls AS (
    SELECT
      ROW_NUMBER() OVER () AS scroll_id,
      raw_text
    FROM
      bestiary_data.raw_intel_content_table
  )
-- Now, process each numbered scroll.
SELECT
  scroll_id,
  -- Assign a unique ID to each chunk within a scroll for precise reference.
  CONCAT(CAST(scroll_id AS STRING), '-', CAST(ROW_NUMBER() OVER (PARTITION BY scroll_id) AS STRING)) as chunk_id,
  -- Trim whitespace from the chunk for cleanliness.
  TRIM(chunk) AS chunk_text
FROM
  NumberedScrolls,
  -- This is the core of the spell: UNNEST splits the array of sentences into rows.
  UNNEST(SPLIT(raw_text, '.')) AS chunk
-- A final refinement: we only keep chunks that have meaningful content.
WHERE
  -- This ensures we don't have empty rows from double periods, etc.
  LENGTH(TRIM(chunk)) > 15;

👉 Now, run a query to inspect your newly scribed, chunked knowledge and see the difference.

SELECT * FROM bestiary_data.chunked_intel ORDER BY scroll_id, chunk_id;

Observe the results. Where there was once a single, dense block of text, there are now multiple rows, each tied to the original scroll (scroll_id) but containing only a single, focused sentence. Each row is now a perfect candidate for vectorization.

The Ritual of Distillation: Transforming Text to Vectors with BQML

👉💻 First, go back to your terminal, run the following command to display your connection name:

. ~/agentverse-dataengineer/set_env.sh
echo "${PROJECT_ID}.${REGION}.gcs-connection"

👉📜 We must create a new BigQuery model that points to a Gemini's text embedding. In BigQuery Studio, run the following spell. Note you need to replace REPLACE-WITH-YOUR-FULL-CONNECTION-STRING with the full connection string you just copied from your terminal.

CREATE OR REPLACE MODEL bestiary_data.text_embedding_model
  REMOTE WITH CONNECTION `REPLACE-WITH-YOUR-FULL-CONNECTION-STRING`
  OPTIONS (endpoint = 'text-embedding-005');

👉📜 Now, cast the grand distillation spell. This query calls the ML.GENERATE_EMBEDDING function, which will read every row from our chunked_intel table, send the text to the Gemini embedding model, and store the resulting vector fingerprint in a new table.

CREATE OR REPLACE TABLE bestiary_data.embedded_intel AS
SELECT
  *
FROM
  ML.GENERATE_EMBEDDING(
    -- The embedding model we just created.
    MODEL bestiary_data.text_embedding_model,
    -- A subquery that selects our data and renames the text column to 'content'.
    (
      SELECT
        scroll_id,
        chunk_id,
        chunk_text AS content -- Renaming our text column is the key correction.
      FROM
        bestiary_data.chunked_intel
    ),
    -- The configuration struct is now simpler and correct.
    STRUCT(
      -- This task_type is crucial. It optimizes the vectors for retrieval.
      'RETRIEVAL_DOCUMENT' AS task_type
    )
  );

This process may take a minute or two as BigQuery processes all the text chunks.

👉📜 Once complete, inspect the new table to see the semantic fingerprints.

SELECT
  chunk_id,
  content,
  ml_generate_embedding_result
FROM
  bestiary_data.embedded_intel
LIMIT 20;

You will now see a new column, ml_generate_embedding_result, containing the dense vector representation of your text. Our Grimoire is now semantically encoded.

The Ritual of Divination: Semantic Search with BQML

👉📜 The ultimate test of our Grimoire is to ask it a question. We will now perform our final ritual: a vector search. This is not a keyword search; it is a search for meaning. We will ask a question in natural language, BQML will convert our question into an embedding on the fly, and then search our entire table of embedded_intel to find the text chunks whose fingerprints are the "closest" in meaning.

SELECT
  -- The content column contains our original, relevant text chunk.
  base.content,
  -- The distance metric shows how close the match is (lower is better).
  distance
FROM
  VECTOR_SEARCH(
    -- The table containing the knowledge base with its embeddings.
    TABLE bestiary_data.embedded_intel,
    -- The column that contains the vector embeddings.
    'ml_generate_embedding_result',
    (
      -- This subquery generates an embedding for our question in real-time.
      SELECT ml_generate_embedding_result
      FROM ML.GENERATE_EMBEDDING(
          MODEL bestiary_data.text_embedding_model,
          (SELECT 'What are the tactics against a foe that causes paralysis?' AS content),
          STRUCT('RETRIEVAL_QUERY' AS task_type)
        )
    ),
    -- Specify how many of the closest results we want to see.
    top_k => 3,
    -- The distance metric used to find the "closest" vectors.
    distance_type => 'COSINE'
  );

Analysis of the Spell:

VECTOR_SEARCH: The core function that orchestrates the search.
ML.GENERATE_EMBEDDING (inner query): This is the magic. We embed our query ('What are the tactics...') using the same model but with the task type 'RETRIEVAL_QUERY', which is specifically optimized for queries.
top_k => 3: We are asking for the top 3 most relevant results.
distance_type => 'COSINE': This measures the "angle" between vectors. A smaller angle means the meanings are more aligned.

Look closely at the results. The query did not contain the word "shattered" or "incantation," yet the top result is: "With a single, focused incantation of Elegant Sufficiency, Elara shattered its paralyzing aura, a mental assault dealing 150 points of damage". This is the power of semantic search. The model understood the concept of "tactics against paralysis" and found the sentence that described a specific, successful tactic.

You have now successfully built a complete, in-datawareshouse base RAG pipeline. You have prepared raw data, transformed it into semantic vectors, and queried it by meaning. While BigQuery is a powerful tool for this large-scale analytical work, for a live agent needing low-latency responses, we often transfer this prepared wisdom to a specialized operational database. That is the subject of our next training.

FOR NON GAMERS

While structured tables are great for facts, the deeper semantic meaning of original documents can be lost. "The Scribe's Grimoire" is about creating a Semantic Knowledge Base that understands the meaning and context of your documents, not just keywords. This is crucial for building truly intelligent search and AI-driven answering systems.

The Ritual of Division (Chunking):
- Concept: Long documents are like dense books. To find specific answers, you don't read the whole book; you skim specific paragraphs or sentences. "Chunking" is the process of breaking down long documents (e.g., policy manuals, product documentation, research papers) into smaller, more focused, self-contained passages. This makes searching more precise.
- Real-World Use Case: Taking a 50-page employee handbook and automatically splitting it into hundreds of individual policy statements or FAQs. This ensures that when an employee asks a question, the AI retrieves only the most relevant sections, not the entire document. Different chunking strategies (by sentence, paragraph, document section) are chosen based on the document type for optimal retrieval.
The Ritual of Distillation (Embedding):
- Concept: Text is hard for computers to understand for meaning. "Embedding" uses an AI model (like Gemini) to convert each text chunk into a unique numerical "semantic fingerprint" (a vector). Chunks with similar meanings will have "fingerprints" that are numerically close to each other, even if they use different words.
- Real-World Use Case: Converting all your company's product descriptions, marketing materials, and technical specifications into these semantic fingerprints. This allows for truly intelligent search based on meaning.
The Ritual of Divination (Semantic Search):
- Concept: Instead of searching for exact keywords, "semantic search" uses these numerical fingerprints to find text chunks that are conceptually similar to a user's query. The user's question is also converted into a fingerprint, and the system finds the closest matching document chunks.
- Real-World Use Case: An employee asks, "How do I get reimbursed for travel expenses?" A keyword search might miss documents using "expenditure report." A semantic search, however, would find relevant sections of the company's "Travel and Expense Policy" even if the exact words aren't present, because the meaning is similar.

This entire process creates a powerful, searchable knowledge base, allowing for intelligent information retrieval without sensitive data ever leaving your secure BigQuery environment.

6. The Vector Scriptorium: Crafting the Vector Store with Cloud SQL for Inferencing

Our Grimoire currently exists as structured tables—a powerful catalog of facts, but its knowledge is literal. It understands monster_id = ‘MN-001' but not the deeper, semantic meaning behind "Obfuscation" To give our agents true wisdom, to let them advise with nuance and foresight, we must distill the very essence of our knowledge into a form that captures meaning: Vectors.

Our quest for knowledge has led us to the crumbling ruins of a long-forgotten precursor civilization. Buried deep within a sealed vault, we have uncovered a chest of ancient scrolls, miraculously preserved. These are not mere battle reports; they contain profound, philosophical wisdom on how to defeat a beast that plagues all great endeavors. An entity described in the scrolls as a "creeping, silent stagnation," a "fraying of the weave of creation." It appears The Static was known even to the ancients, a cyclical threat whose history was lost to time.

This forgotten lore is our greatest asset. It holds the key not just to defeating individual monsters, but to empowering the entire party with strategic insight. To wield this power, we will now forge the Scholar's true Spellbook (a PostgreSQL database with vector capabilities) and construct an automated Vector Scriptorium (a Dataflow pipeline) to read, comprehend, and inscribe the timeless essence of these scrolls. This will transform our Grimoire from a book of facts into an engine of wisdom.

Story

Overview

Forging the Scholar's Spellbook (Cloud SQL)

Before we can inscribe the essence of these ancient scrolls, we must first confirm that the vessel for this knowledge, the managed PostgreSQL Spellbook has been successfully forged. The initial setup rituals should have already created this for you.

👉💻 In a terminal, run the following command to verify that your Cloud SQL instance exists and is ready. This script also grants the instance's dedicated service account the permission to use Vertex AI, which is essential for generating embeddings directly within the database.

. ~/agentverse-dataengineer/set_env.sh

echo "Verifying the existence of the Spellbook (Cloud SQL instance): $INSTANCE_NAME..."
gcloud sql instances describe $INSTANCE_NAME

SERVICE_ACCOUNT_EMAIL=$(gcloud sql instances describe $INSTANCE_NAME --format="value(serviceAccountEmailAddress)")
gcloud projects add-iam-policy-binding $PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
  --role="roles/aiplatform.user"

If the command succeeds and returns details about your grimoire-spellbook instance, the forge has done its work well. You are ready to proceed to the next incantation. If the command returns a NOT_FOUND error, please ensure you have successfully completed the initial environment setup steps before continuing.(data_setup.py)

👉💻 With the book forged, we open it to the first chapter by creating a new database named arcane_wisdom.

. ~/agentverse-dataengineer/set_env.sh
gcloud sql databases create $DB_NAME --instance=$INSTANCE_NAME

Inscribing the Semantic Runes: Enabling Vector Capabilities with pgvector

Now that your Cloud SQL instance has been created, let's connect to it using the built-in Cloud SQL Studio. This provides a web-based interface for running SQL queries directly on your database.

👉💻 First, Navigate to the Cloud SQL Studio, the easiest and fastest way to get there is to open the following link in a new browser tab. It will take you directly to the Cloud SQL Studio for your grimoire-spellbook instance.

https://console.cloud.google.com/sql/instances/grimoire-spellbook/studio

👉 Select arcane_wisdom as the database. enter postgres as user and 1234qwer as the password abd click Authenticate.

👉📜 In the SQL Studio query editor, navigate to tab Editor 1, paste the following SQL code to enables the vector data type:

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE;

Cloud SQL Studio

👉📜 Prepare the pages of our Spellbook by creating the table that will hold our scrolls' essence.

CREATE TABLE ancient_scrolls (
    id SERIAL PRIMARY KEY,
    scroll_content TEXT,
    embedding VECTOR(768)
);

The spell VECTOR(768) is a important detail. The Vertex AI embedding model we will use (textembedding-gecko@003 or a similar model) distills text into a 768-dimension vector. Our Spellbook's pages must be prepared to hold an essence of exactly that size. The dimensions must always match.

The First Transliteration: A Manual Inscription Ritual

Before we command an army of automated scribes (Dataflow), we must perform the central ritual by hand once. This will give us a deep appreciation for the two-step magic involved:

Divination: Taking a piece of text and consulting the Gemini oracle to distill its semantic essence into a vector.
Inscription: Writing the original text and its new vector essence into our Spellbook.

Now, let's perform the manual ritual.

👉📜 In the Cloud SQL Studio. We will now use the embedding() function, a powerful feature provided by the google_ml_integration extension. This allows us to call the Vertex AI embedding model directly from our SQL query, simplifying the process immensely.

SET session.my_search_var='The Spectre of Analysis Paralysis is a phantom of the crossroads. It does not bind with chains but with the infinite threads of what if. It conjures a fog of options, a maze within the mind where every path seems equally fraught with peril and promise. It whispers of a single, flawless route that can only be found through exhaustive study, paralyzing its victim in a state of perpetual contemplation. This spectres power is broken by the Path of First Viability. This is not the search for the *best* path, but the commitment to the *first good* path. It is the wisdom to know that a decision made, even if imperfect, creates movement and reveals more of the map than standing still ever could. Choose a viable course, take the first step, and trust in your ability to navigate the road as it unfolds. Motion is the light that burns away the fog.';

INSERT INTO ancient_scrolls (scroll_content, embedding)

VALUES (current_setting('session.my_search_var'),  (embedding('text-embedding-005',current_setting('session.my_search_var')))::vector);

👉📜 Verify your work by running a query to read the newly inscribed page:

SELECT id, scroll_content, LEFT(embedding::TEXT, 100) AS embedding_preview FROM ancient_scrolls;

You have successfully performed the core RAG data-loading task by hand!

Forging the Semantic Compass: Enchanting the Spellbook with an HNSW Index

Our Spellbook can now store wisdom, but finding the right scroll requires reading every single page. It is a sequential scan. This is slow and inefficient. To guide our queries instantly to the most relevant knowledge, we must enchant the Spellbook with a semantic compass: a vector index.

Let's prove the value of this enchantment.

👉📜 In Cloud SQL Studio, run the following spell. It simulates searching for our newly inserted scroll and asks the database to EXPLAIN its plan.

EXPLAIN ANALYZE
WITH ReferenceVector AS (
  -- First, get the vector we want to compare against.
  SELECT embedding AS vector
  FROM ancient_scrolls
  LIMIT 1
)
-- This is the main query we want to analyze.
SELECT
  ancient_scrolls.id,
  ancient_scrolls.scroll_content,
  -- We can also select the distance itself.
  ancient_scrolls.embedding <=> ReferenceVector.vector AS distance
FROM
  ancient_scrolls,
  ReferenceVector
ORDER BY
  -- Order by the distance operator's result.
  ancient_scrolls.embedding <=> ReferenceVector.vector
LIMIT 5;

Look at the output. You will see a line that says -> Seq Scan on ancient_scrolls. This confirms the database is reading every single row. Note the execution time.

👉📜 Now, let's cast the indexing spell. The lists parameter tells the index how many clusters to create. A good starting point is the square root of the number of rows you expect to have.

CREATE INDEX ON ancient_scrolls USING hnsw (embedding vector_cosine_ops);

Wait for the index to build (it will be fast for one row, but can take time for millions).

👉📜 Now, run the exact same EXPLAIN ANALYZE command again:

EXPLAIN ANALYZE
WITH ReferenceVector AS (
  -- First, get the vector we want to compare against.
  SELECT embedding AS vector
  FROM ancient_scrolls
  LIMIT 1
)
-- This is the main query we want to analyze.
SELECT
  ancient_scrolls.id,
  ancient_scrolls.scroll_content,
  -- We can also select the distance itself.
  ancient_scrolls.embedding <=> ReferenceVector.vector AS distance
FROM
  ancient_scrolls,
  ReferenceVector
ORDER BY
  -- Order by the distance operator's result.
  ancient_scrolls.embedding <=> ReferenceVector.vector
LIMIT 5;

Look at the new query plan. You will now see -> Index Scan using.... More importantly, look at the execution time. It will be significantly faster, even with just one entry. You have just demonstrated the core principle of database performance tuning in a vector world.

Execution time

With your source data inspected, your manual ritual understood, and your Spellbook optimized for speed, you are now truly ready to build the automated Scriptorium.

FOR NON GAMERS

While BigQuery is excellent for large-scale data processing and analysis, for live AI agents needing very fast answers, we often transfer this prepared "wisdom" to a more specialized, operational database. "The Vector Scriptorium" is about Building a High-Performance, Searchable Knowledge Store using a relational database enhanced for AI.

Forging the Scholar's Spellbook (Cloud SQL for PostgreSQL with pgvector):
- Concept: We use a standard, managed database like Cloud SQL for PostgreSQL and equip it with a special extension called pgvector. This allows the database to store both our original text chunks and their semantic vector embeddings together. It's a "one-stop-shop" for both traditional relational data and AI-friendly vector data.
- Real-World Use Case: Storing your company's product FAQs, technical support articles, or HR policies. This database holds both the text of the answers and their semantic fingerprints, ready for fast lookups by AI.
Forging the Semantic Compass (HNSW Index):
- Concept: Searching through millions of semantic fingerprints one by one would be too slow. A "vector index" (like HNSW – Hierarchical Navigable Small World) is a sophisticated data structure that pre-organizes these fingerprints, dramatically speeding up search. It quickly guides queries to the most relevant information.
- Real-World Use Case: For an AI-powered customer service chatbot, an HNSW index ensures that when a customer asks a question, the system can find the most relevant answer from thousands of articles in milliseconds, providing a seamless user experience.
The Conduit of Meaning (Dataflow Vectorization Pipeline):
- Concept: This is your Automated, Scalable Data Processing Pipeline for continuously updating your knowledge store. Using Google Dataflow (a serverless, managed service for big data processing) and Apache Beam (a programming model), you build an assembly line of "scribes" that:
  1. Read new or updated documents from cloud storage.
  2. Batch process them to send to the Gemini embedding model for semantic fingerprinting.
  3. Write the text and its new vector embedding into your Cloud SQL database.
- Real-World Use Case: Automatically ingesting all new internal documents (e.g., quarterly reports, updated HR policies, new product specifications) from a shared drive into your pgvector database. This ensures your AI-powered internal knowledge base is always up-to-date, without manual intervention, and can scale to handle millions of documents efficiently.

This entire process establishes a robust, automated workflow for continuously enriching and maintaining a semantic knowledge base, vital for any data-driven AI application.

7. The Conduit of Meaning: Building a Dataflow Vectorization Pipeline

Now we build the magical assembly line of scribes that will read our scrolls, distill their essence, and inscribe them into our new Spellbook. This is a Dataflow pipeline that we will trigger manually. But before we write the master spell for the pipeline itself, we must first prepare its foundation and the circle from which we will summon it.

Overview

Preparing the Scriptorium's Foundation (The Worker Image)

Our Dataflow pipeline will be executed by a team of automated workers in the cloud. Each time we summon them, they need a specific set of libraries to do their job. We could give them a list and have them fetch these libraries every single time, but that is slow and inefficient. A wise Scholar prepares a master library in advance.

Here, we will command Google Cloud Build to forge a custom container image. This image is a "perfected golem," pre-loaded with every library and dependency our scribes will need. When our Dataflow job starts, it will use this custom image, allowing the workers to begin their task almost instantly.

👉💻 Run the following command to build and store your pipeline's foundational image in the Artifact Registry.

. ~/agentverse-dataengineer/set_env.sh
cd ~/agentverse-dataengineer/pipeline
gcloud builds submit --config cloudbuild.yaml \
  --substitutions=_REGION=${REGION},_REPO_NAME=${REPO_NAME} \
  .

👉💻 Run the following commands to create and activate your isolated Python environment and install the necessary summoning libraries into it.

cd ~/agentverse-dataengineer
. ~/agentverse-dataengineer/set_env.sh
python -m venv env
source ~/agentverse-dataengineer/env/bin/activate
cd ~/agentverse-dataengineer/pipeline
pip install -r requirements.txt

The Master Incantation

The time has come to write the master spell that will power our Vector Scriptorium. We will not be writing the individual magical components from scratch. Our task is to assemble components into a logical, powerful pipeline using the language of Apache Beam.

EmbedTextBatch (The Gemini's Consultation): You will build this specialized scribe that knows how to perform a "group divination." It takes a batch of raw text fike, presents them to the Gemini text embedding model, and receives their distilled essence (the vector embeddings).
WriteEssenceToSpellbook (The Final Inscription): This is our archivist. It knows the secret incantations to open a secure connection to our Cloud SQL Spellbook. Its job is to take a scroll's content and its vectorized essence and permanently inscribe them onto a new page.

Our mission is to chain these actions together to create a seamless flow of knowledge.

👉✏️ In the Cloud Shell Editor, head over to ~/agentverse-dataengineer/pipeline/inscribe_essence_pipeline.py, inside, you will find a DoFn class named EmbedTextBatch. Locate the comment #REPLACE-EMBEDDING-LOGIC. Replace it with the following incantation.

# 1. Generate the embedding for the monster's name
result = self.client.models.embed_content(
                model="text-embedding-005",
                contents=contents,
                config=EmbedContentConfig(
                    task_type="RETRIEVAL_DOCUMENT",  
                    output_dimensionality=768, 
                )
            )

This spell is precise, with several key parameters:

model: We specify text-embedding-005 to use a powerful and up-to-date embedding model.
contents: This is a list of all the text content from the batch of files the DoFn receives.
task_type: We set this to "RETRIEVAL_DOCUMENT". This is a critical instruction that tells Gemini to generate embeddings specifically optimized for being found later in a search.
output_dimensionality: This must be set to 768, perfectly matching the VECTOR(768) dimension we defined when we created our ancient_scrolls table in Cloud SQL. Mismatched dimensions are a common source of error in vector magic.

Our pipeline must begin by reading the raw, unstructured text from all the ancient scrolls in our GCS archive.

👉✏️ In ~/agentverse-dataengineer/pipeline/inscribe_essence_pipeline.py, find the comment #REPLACE ME-READFILE and replace it with the following three-part incantation:

files = (
            pipeline
            | "MatchFiles" >> fileio.MatchFiles(known_args.input_pattern)
            | "ReadMatches" >> fileio.ReadMatches()
            | "ExtractContent" >> beam.Map(lambda f: (f.metadata.path, f.read_utf8()))
        )

With the raw text of the scrolls gathered, we must now send them to our Gemini for divination. To do this efficiently, we will first group the individual scrolls into small batches and then hand those batches to our EmbedTextBatch scribe. This step will also separate any scrolls that the Gemini fails to understand into a "failed" pile for later review.

👉✏️ Find the comment #REPLACE ME-EMBEDDING and replace it with this:

embeddings = (
            files
            | "BatchScrolls" >> beam.BatchElements(min_batch_size=1, max_batch_size=2)
            | "DistillBatch" >> beam.ParDo(
                  EmbedTextBatch(project_id=project, region=region)
              ).with_outputs('failed', main='processed')
        )

The essence of our scrolls has been successfully distilled. The final act is to inscribe this knowledge into our Spellbook for permanent storage. We will take the scrolls from the "processed" pile and hand them to our WriteEssenceToSpellbook archivist.

👉✏️ Find the comment #REPLACE ME-WRITE TO DB and replace it with this:

_ = (
            embeddings.processed
            | "WriteToSpellbook" >> beam.ParDo(
                  WriteEssenceToSpellbook(
                      project_id=project,
                      region = "us-central1",
                      instance_name=known_args.instance_name,
                      db_name=known_args.db_name,
                      db_password=known_args.db_password
                  )
              )
        )

A wise Scholar never discards knowledge, even failed attempts. As a final step, we must instruct a scribe to take the "failed" pile from our divination step and log the reasons for failure. This allows us to improve our rituals in the future.

👉✏️ Find the comment #REPLACE ME-LOG FAILURES and replace it with this:

_ = (
            embeddings.failed
            | "LogFailures" >> beam.Map(lambda e: logging.error(f"Embedding failed for file {e[0]}: {e[1]}"))
        )

The Master Incantation is now complete! You have successfully assembled a powerful, multi-stage data pipeline by chaining together individual magical components. Save your inscribe_essence_pipeline.py file. The Scriptorium is now ready to be summoned.

Now we cast the grand summoning spell to command the Dataflow service to awaken our Golem and begin the scribing ritual.

👉💻 In your terminal, run the following commandline

. ~/agentverse-dataengineer/set_env.sh
source ~/agentverse-dataengineer/env/bin/activate
cd ~/agentverse-dataengineer/pipeline

# --- The Summoning Incantation ---
echo "Summoning the golem for job: $DF_JOB_NAME"
echo "Target Spellbook: $INSTANCE_NAME"

python inscribe_essence_pipeline.py \
  --runner=DataflowRunner \
  --project=$PROJECT_ID \
  --job_name=$DF_JOB_NAME \
  --temp_location="gs://${BUCKET_NAME}/dataflow/temp" \
  --staging_location="gs://${BUCKET_NAME}/dataflow/staging" \
  --sdk_container_image="${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO_NAME}/grimoire-inscriber:latest" \
  --sdk_location=container \
  --experiments=use_runner_v2 \
  --input_pattern="gs://${BUCKET_NAME}/ancient_scrolls/*.md" \
  --instance_name=$INSTANCE_NAME \
  --region=$REGION

echo "The golem has been dispatched. Monitor its progress in the Dataflow console."

💡 Heads Up! If the job fails with a resource error ZONE_RESOURCE_POOL_EXHAUSTED, it might be due to temporary resource constraints of this low reputation account in the selected region. The power of Google Cloud is its global reach! Simply try summoning the golem in a different region. To do this, replace --region=$REGION in the command above with another region, such as

--region=southamerica-west1
--region=asia-northeast3
--region=asia-southeast2
--region=me-west1
--region=southamerica-east1
--region=europe-central2
--region=asia-east2
--region=europe-southwest1

, and run it again. 🎰

The process will take about 3-5 minutes to start up and complete. You can watch it live in the Dataflow console.

👉Go to the Dataflow Console: The easiest way is to open this direct link in a new browser tab:

https://console.cloud.google.com/dataflow

👉 Find and Click Your Job: You will see a job listed with the name you provided (inscribe-essence-job or similar). Click on the job name to open its details page. Observe the Pipeline:

Starting Up: For the first 3 minutes, the job status will be "Running" as Dataflow provisions the necessary resources. The graph will appear, but you may not see data moving through it yet.
Completed: When finished, the job status will change to "Succeeded", and the graph will provide the final count of records processed.

Verifying the Inscription

👉📜 Back in the SQL studio, run the following queries to verify that your scrolls and their semantic essence have been successfully inscribed.

SELECT COUNT(*) FROM ancient_scrolls;

SELECT id, scroll_content, LEFT(embedding::TEXT, 50) AS embedding_preview FROM ancient_scrolls;

This will show you the scroll's ID, its original text, and a preview of the magical vector essence now permanently inscribed in your Grimoire.

Pipeline done

Your Scholar's Grimoire is now a true Knowledge Engine, ready to be queried by meaning in the next chapter.

8. Sealing the Final Rune: Activating Wisdom with a RAG Agent

Your Grimoire is no longer just a database. It is a wellspring of vectorized knowledge, a silent oracle awaiting a question.

Now, we undertake the true test of a Scholar: we will craft the key to unlock this wisdom. We will build a Retrieval-Augmented Generation (RAG) Agent. This is a magical construct that can understand a plain-language question, consult the Grimoire for its deepest and most relevant truths, and then use that retrieved wisdom to forge a powerful, context-aware answer.

RAG

Data Engineer's Note: RAG is a powerful technique that grounds Large Language Models (LLMs) in factual, private, or up-to-date information, drastically reducing the chance of the model "hallucinating" or making things up. The process has three core steps:

Retrieve: The user's query is first converted into a vector embedding. This query vector is then used to search the our knowledge base to find the most semantically similar text snippets using cosine similarity.
Augment: The relevant text snippets retrieved from the database are then injected directly into a new prompt for the LLM, providing it with crucial context
Generate: The LLM receives the augmented prompt (e.g., "Given this context about 'Procrastination', answer the user's question...") and generates a final, fact-based answer.

RAG

The First Rune: The Spell of Query Distillation

Before our agent can search the Grimoire, it must first understand the essence of the question being asked. A simple string of text is meaningless to our vector-powered Spellbook. The agent must first take the query and, using the same Gemini model, distill it into a query vector.

👉✏️ In the Cloud Shell Editor, navigate to ~~/agentverse-dataengineer/scholar/agent.py file, find the comment #REPLACE RAG-CONVERT EMBEDDING and replace it with this incantation. This teaches the agent how to turn a user's question into a magical essence.

        result = client.models.embed_content(
                model="text-embedding-005",
                contents=monster_name,
                config=EmbedContentConfig(
                    task_type="RETRIEVAL_DOCUMENT",  
                    output_dimensionality=768,  
                )
        )

With the essence of the query in hand, the agent can now consult the Grimoire. It will present this query vector to our pgvector-enchanted database and ask a profound question: "Show me the ancient scrolls whose own essence is most similar to the essence of my query."

The magic for this is the cosine similarity operator (<=>), a powerful rune that calculates the distance between vectors in high-dimensional space.

👉✏️ In agent.py, find the comment #REPLACE RAG-RETRIEVE and replace it with following script:

        # This query performs a cosine similarity search
        cursor.execute(
            "SELECT scroll_content FROM ancient_scrolls ORDER BY embedding <=> %s LIMIT 3",
            ([query_embedding]) # Cast embedding to string for the query
        )

The final step is to grant the agent access to this new, powerful tool. We will add our grimoire_lookup function to its list of available magical implements.

👉✏️ In agent.py, find the comment #REPLACE-CALL RAG and replace it with this line:

root_agent = LlmAgent(
    model="gemini-2.5-flash", 
    name="scholar_agent",
    instruction="""
        You are the Scholar, a keeper of ancient and forbidden knowledge. Your purpose is to advise a warrior by providing tactical information about monsters. Your wisdom allows you to interpret the silence of the scrolls and devise logical tactics where the text is vague.

        **Your Process:**
        1.  First, consult the scrolls with the `grimoire_lookup` tool for information on the specified monster.
        2.  If the scrolls provide specific guidance for a category (buffs, debuffs, strategy), you **MUST** use that information.
        3.  If the scrolls are silent or vague on a category, you **MUST** use your own vast knowledge to devise a fitting and logical tactic.
        4.  Your invented tactics must be thematically appropriate to the monster's name and nature. (e.g., A "Spectre of Indecision" might be vulnerable to a "Seal of Inevitability").
        5.  You **MUST ALWAYS** provide a "Damage Point" value. This value **MUST** be a random integer between 150 and 180. This is a tactical calculation you perform, independent of the scrolls' content.

        **Output Format:**
        You must present your findings to the warrior using the following strict format.
    """,
    tools=[grimoire_lookup],
)

This configuration brings your agent to life:

model="gemini-2.5-flash": Selects the specific Large Language Model that will serve as the agent's "brain" for reasoning and generating text.
name="scholar_agent": Assigns a unique name to your agent.
instruction="...You are the Scholar...": This is the system prompt, the most critical piece of the configuration. It defines the agent's persona, its objectives, the exact process it must follow to complete a task, and the required format for its final output.
tools=[grimoire_lookup]: This is the final enchantment. It grants the agent access to the grimoire_lookup function you built. The agent can now intelligently decide when to call this tool to retrieve information from your database, forming the core of the RAG pattern.

The Scholar's Examination

👉💻 In Cloud Shell terminal, activate your environment and use the Agent Development Kit's primary command to awaken your Scholar agent:

cd ~/agentverse-dataengineer/
. ~/agentverse-dataengineer/set_env.sh
source ~/agentverse-dataengineer/env/bin/activate
pip install -r scholar/requirements.txt
adk run scholar

You should see output confirming that the "Scholar Agent" is engaged and running.

👉💻 Now, challenge your agent. In the first terminal where the battle simulation is running, issue a command that requires the Grimoire's wisdom:

We've been trapped by 'Hydra of Scope Creep'. Break us out!

Adk run

Observe the logs in the terminal. You will see the agent receive the query, distill its essence, search the Grimoire, find the relevant scrolls about "Procrastination," and use that retrieved knowledge to formulate a powerful, context-aware strategy.

You have successfully assembled your first RAG agent and armed it with the profound wisdom of your Grimoire.

👉💻 Press Ctrl+C in the terminal to put the agent to rest for now.

Unleashing the Scholar Sentinel into the Agentverse

Your agent has proven its wisdom in the controlled environment of your study. The time has come to release it into the Agentverse, transforming it from a local construct into a permanent, battle-ready operative that can be called upon by any champion, at any time. We will now deploy our agent to Cloud Run.

👉💻 Run the following grand summoning spell. This script will first build your agent into a perfected Golem (a container image), store it in your Artifact Registry, and then deploy that Golem as a scalable, secure, and publicly accessible service.

. ~/agentverse-dataengineer/set_env.sh
cd ~/agentverse-dataengineer/
echo "Building ${AGENT_NAME} agent..."
gcloud builds submit . \
  --project=${PROJECT_ID} \
  --region=${REGION} \
  --substitutions=_AGENT_NAME=${AGENT_NAME},_IMAGE_PATH=${IMAGE_PATH}

gcloud run deploy ${SERVICE_NAME} \
  --image=${IMAGE_PATH} \
  --platform=managed \
  --labels="dev-tutorial-codelab=agentverse" \
  --region=${REGION} \
  --set-env-vars="A2A_HOST=0.0.0.0" \
  --set-env-vars="A2A_PORT=8080" \
  --set-env-vars="GOOGLE_GENAI_USE_VERTEXAI=TRUE" \
  --set-env-vars="GOOGLE_CLOUD_LOCATION=${REGION}" \
  --set-env-vars="GOOGLE_CLOUD_PROJECT=${PROJECT_ID}" \
  --set-env-vars="PROJECT_ID=${PROJECT_ID}" \
  --set-env-vars="PUBLIC_URL=${PUBLIC_URL}" \
  --set-env-vars="REGION=${REGION}" \
  --set-env-vars="INSTANCE_NAME=${INSTANCE_NAME}" \
  --set-env-vars="DB_USER=${DB_USER}" \
  --set-env-vars="DB_PASSWORD=${DB_PASSWORD}" \
  --set-env-vars="DB_NAME=${DB_NAME}" \
  --allow-unauthenticated \
  --project=${PROJECT_ID} \
  --min-instances=1

Your Scholar Agent is now a live, battle-ready operative in the Agentverse.

FOR NON GAMERS

Your vectorized knowledge base is ready. "Sealing the Final Rune" is about Activating an Intelligent AI Advisor capable of harnessing this knowledge. We build a Retrieval-Augmented Generation (RAG) Agent, a powerful AI construct that combines intelligent search with AI's ability to generate coherent answers.

RAG (Retrieval-Augmented Generation):
- Concept: RAG is a crucial technique for making Large Language Models (LLMs) more accurate, factual, and trustworthy. Instead of solely relying on the LLM's pre-trained knowledge (which can be outdated or prone to "hallucination"—making things up), RAG first retrieves relevant information from your authoritative knowledge base and then uses that information to augment the LLM's prompt, guiding it to generate a precise, context-aware answer.
- Three Core Steps:
  1. Retrieve: The user's question is converted into a vector (semantic fingerprint), which is then used to search your pgvector database for the most relevant text chunks.
  2. Augment: These retrieved, factual text snippets are then directly inserted into the prompt given to the LLM, providing it with specific, up-to-date context.
  3. Generate: The LLM receives this augmented prompt and generates a final answer that is grounded in your company's authoritative data, reducing the risk of errors or made-up information.
The Scholar's Examination (grimoire_lookup tool):
- Concept: Your RAG agent becomes a "Scholar" that possesses a grimoire_lookup tool. When a user asks a question, the agent intelligently decides to use this tool. Thegrimoire_lookup function then performs the "retrieve" step by converting the query to an embedding and searching the pgvector database. The retrieved context is then passed to the main LLM for augmentation and generation.
- Real-World Use Case: An AI-powered Internal Help Desk Chatbot.
  - User Question: An employee asks, "What's the process for requesting extended leave for medical reasons?"
  - RAG Agent Action:
    - The scholar_agent identifies the need for information and uses its grimoire_lookup tool.
    - The tool converts the question into an embedding and searches the ancient_scrolls table in the pgvector database.
    - It retrieves the most relevant sections from the HR policy document on medical leave.
    - These sections are then fed as context to the Gemini LLM.
    - The Gemini LLM then generates a precise, step-by-step answer based only on the retrieved HR policy, reducing the chance of providing incorrect or outdated information.
  - This provides employees with instant, accurate answers based on official company documents, reducing the workload on HR and improving employee satisfaction.

This creates an AI agent that is not just conversational, but genuinely knowledgeable and reliable, serving as a trusted source of information within your enterprise.

9. The Boss Flight

The scrolls have been read, the rituals performed, the gauntlet passed. Your agent is not just an artifact in storage; it is a live operative in the Agentverse, awaiting its first mission. The time has come for the final trial—a live-fire exercise against a powerful adversary.

You will now enter a battleground simulation to pit your newly deployed Shadowblade Agent against a formidable mini-boss: The Spectre of the Static. This will be the ultimate test of your work, from the agent's core logic to its live deployment.

Acquire Your Agent's Locus

Before you can enter the battleground, you must possess two keys: your champion's unique signature (Agent Locus) and the hidden path to the Spectre's lair (Dungeon URL).

👉💻 First, acquire your agent's unique address in the Agentverse—its Locus. This is the live endpoint that connects your champion to the battleground.

. ~/agentverse-dataengineer/set_env.sh
echo https://scholar-agent"-${PROJECT_NUMBER}.${REGION}.run.app"

👉💻 Next, pinpoint the destination. This command reveals the location of the Translocation Circle, the very portal into the Spectre's domain.

. ~/agentverse-dataengineer/set_env.sh
echo https://agentverse-dungeon"-${PROJECT_NUMBER}.${REGION}.run.app"

Important: Keep both of these URLs ready. You will need them in the final step.

Confronting the Spectre

With the coordinates secured, you will now navigate to the Translocation Circle and cast the spell to head into battle.

👉 Open the Translocation Circle URL in your browser to stand before the shimmering portal to The Crimson Keep.

To breach the fortress, you must attune your Shadowblade's essence to the portal.

On the page, find the runic input field labeled A2A Endpoint URL.
Inscribe your champion's sigil by pasting its Agent Locus URL (the first URL you copied) into this field.
Click Connect to unleash the teleportation magic.

Translocation Circle

The blinding light of teleportation fades. You are no longer in your sanctum. The air crackles with energy, cold and sharp. Before you, the Spectre materializes—a vortex of hissing static and corrupted code, its unholy light casting long, dancing shadows across the dungeon floor. It has no face, but you feel its immense, draining presence fixated entirely on you.

Your only path to victory lies in the clarity of your conviction. This is a duel of wills, fought on the battlefield of the mind.

As you lunge forward, ready to unleash your first attack, the Spectre counters. It doesn't raise a shield, but projects a question directly into your consciousness—a shimmering, runic challenge drawn from the core of your training.

Dungeon

This is the nature of the fight. Your knowledge is your weapon.

Answer with the wisdom you have gained, and your blade will ignite with pure energy, shattering the Spectre's defense and landing a CRITICAL BLOW.
But if you falter, if doubt clouds your answer, your weapon's light will dim. The blow will land with a pathetic thud, dealing only a FRACTION OF ITS DAMAGE. Worse, the Spectre will feed on your uncertainty, its own corrupting power growing with every misstep.

This is it, Champion. Your code is your spellbook, your logic is your sword, and your knowledge is the shield that will turn back the tide of chaos.

Focus. Strike true. The fate of the Agentverse depends on it.

Congratulations, Scholar.

You have successfully completed the trial. You have mastered the arts of data engineering, transforming raw, chaotic information into the structured, vectorized wisdom that empowers the entire Agentverse.

10. Cleanup: Expunging the Scholar's Grimoire

Congratulations on mastering the Scholar's Grimoire! To ensure your Agentverse remains pristine and your training grounds are cleared, you must now perform the final cleanup rituals. This will systematically remove all resources created during your journey.

Deactivate the Agentverse Components

You will now systematically dismantle the deployed components of your RAG system.

Delete All Cloud Run Services and Artifact Registry Repository

This command removes your deployed Scholar agent and the Dungeon application from Cloud Run.

👉💻 In your terminal, run the following commands:

. ~/agentverse-dataengineer/set_env.sh
gcloud run services delete scholar-agent --region=${REGION} --quiet
gcloud run services delete agentverse-dungeon --region=${REGION} --quiet
gcloud artifacts repositories delete ${REPO_NAME} --location=${REGION} --quiet

Delete BigQuery Datasets, Models, and Tables

This removes all the BigQuery resources, including the bestiary_data dataset, all tables within it, and the associated connection and models.

👉💻 In your terminal, run the following commands:

. ~/agentverse-dataengineer/set_env.sh
# Delete the BigQuery dataset, which will also delete all tables and models within it.
bq rm -r -f --dataset ${PROJECT_ID}:${REGION}.bestiary_data

# Delete the BigQuery connection
bq rm --force --connection --project_id=${PROJECT_ID} --location=${REGION} gcs-connection

Delete the Cloud SQL Instance

This removes the grimoire-spellbook instance, including its database and all tables within it.

👉💻 In your terminal, run:

. ~/agentverse-dataengineer/set_env.sh
gcloud sql instances delete ${INSTANCE_NAME} --project=${PROJECT_ID} --quiet

Delete Google Cloud Storage Buckets

This command removes the bucket that held your raw intel and Dataflow staging/temp files.

👉💻 In your terminal, run:

. ~/agentverse-dataengineer/set_env.sh
gcloud storage rm -r gs://${BUCKET_NAME} --quiet

Clean Up Local Files and Directories (Cloud Shell)

Finally, clear your Cloud Shell environment of the cloned repositories and created files. This step is optional but highly recommended for a complete cleanup of your working directory.

👉💻 In your terminal, run:

rm -rf ~/agentverse-dataengineer
rm -rf ~/agentverse-dungeon
rm -f ~/project_id.txt

You have now successfully cleared all traces of your Agentverse Data Engineer journey. Your project is clean, and you are ready for your next adventure.