Database as a Tool: Agentic RAG with ADK, MCP Toolbox, and Cloud SQL

1. Introduction

AI agents are only as useful as the data they can access. Most real-world data lives in databases — and connecting agents to databases typically means writing connection management, query logic, and embedding pipelines inside your agent code. Every agent that needs database access repeats this work, and every query change requires redeploying the agent.

This codelab shows a different approach. You declare your database tools in a YAML file — standard SQL queries, vector similarity search, even automatic embedding generation — and MCP Toolbox for Databases handles all database operations as an MCP server. Your agent code stays minimal: load the tools, let Gemini decide which one to call.

What you'll build

A Smart Job Board Assistant for "TechJobs" — an ADK agent powered by Gemini that helps developers browse tech job listings using standard filters (role, tech stack) and discover jobs through natural language descriptions like "I want a remote job working on AI chatbots." The agent reads from and writes to a Cloud SQL PostgreSQL database entirely through MCP Toolbox for Databases, which handles all database access — including automatic embedding generation for vector search. By the end, both the Toolbox and the agent run on Cloud Run.

eb6de681c40990c1.jpeg

What you'll learn

  • How MCP (Model Context Protocol) standardizes tool access for AI agents, and how MCP Toolbox for Databases applies this to database operations
  • Set up MCP Toolbox for Databases as middleware between an ADK agent and Cloud SQL PostgreSQL
  • Define database tools declaratively in tools.yaml — no database code in your agent
  • Build an ADK agent that loads tools from a running Toolbox server using ToolboxToolset
  • Generate vector embeddings using Cloud SQL's built-in embedding() function and enable semantic search with pgvector
  • Use the valueFromParam feature for automatic vector ingestion on write operations
  • Deploy both the Toolbox server and the ADK agent to Cloud Run

Prerequisites

  • A Google Cloud account with a trial billing account
  • Basic familiarity with Python and SQL
  • Prior experience with Cloud Database and ADK will be helpful

2. Set Up Your Environment

This step prepares your Cloud Shell environment, configures your Google Cloud project, and clones the reference repository.

Open Cloud Shell

Open Cloud Shell in your browser. Cloud Shell provides a pre-configured environment with all the tools you need for this codelab. Click Authorize when prompted to

Then click "View" -> "Terminal" to open the terminal.Your interface should look similar to this

86307fac5da2f077.png

This will be our main interface, IDE on top, terminal on the bottom

Set up your working directory

Create your working directory. All code you write in this codelab lives here:

mkdir -p ~/build-agent-adk-toolbox-cloudsql
cloudshell workspace ~/build-agent-adk-toolbox-cloudsql && cd ~/build-agent-adk-toolbox-cloudsql

After that, let's prepare several directories to manage things like seeding scripts and logs

mkdir -p ~/build-agent-adk-toolbox-cloudsql/scripts
mkdir -p ~/build-agent-adk-toolbox-cloudsql/logs

Set up your Google Cloud project

Create the .env file with the location variables:

# For Vertex AI / Gemini API calls
echo "GOOGLE_CLOUD_LOCATION=global" > .env
# For Cloud SQL, Cloud Run, Artifact Registry
echo "REGION=us-central1" >> .env

To simplify project setup in your terminal, download this project setup script into your working directory:

curl -sL https://raw.githubusercontent.com/alphinside/cloud-trial-project-setup/main/setup_verify_trial_project.sh -o setup_verify_trial_project.sh

Run the script. It verifies your trial billing account, creates a new project (or validates an existing one), saves your project ID to a .env file in the current directory, and sets the active project in gcloud.

bash setup_verify_trial_project.sh && source .env

The script will:

  1. Verify you have an active trial billing account
  2. Check for an existing project in .env (if any)
  3. Create a new project or reuse the existing one
  4. Link the trial billing account to your project
  5. Save the project ID to .env
  6. Set the project as the active gcloud project

Verify the project is set correctly by checking the yellow text next to your working directory in the Cloud Shell terminal prompt. It should display your project ID.

dcba35ce1389f313.png

Activate Required API

Next, we need to enable several API for the product that we will be interacting with:

gcloud services enable \
  aiplatform.googleapis.com \
  sqladmin.googleapis.com \
  compute.googleapis.com \
  run.googleapis.com \
  cloudbuild.googleapis.com \
  artifactregistry.googleapis.com
  • Vertex AI API (aiplatform.googleapis.com) — your agent uses Gemini models, and Toolbox uses the embedding API for vector search.
  • Cloud SQL Admin API (sqladmin.googleapis.com) — you provision and manage a PostgreSQL instance.
  • Compute Engine API (compute.googleapis.com) — required for creating Cloud SQL instances.
  • Cloud Run, Cloud Build, Artifact Registry — used in the deployment step later in this codelab

3. Preparing Scripts for Database Initialization

This step starts Cloud SQL instance creation and runs an automated setup script that waits for the instance to be ready, then creates the database, seeds it with job listings, and generates embeddings — all in one operation.

First, let's add the database password to your .env file and reload it:

echo "DB_PASSWORD=techjobs-pwd" >> .env
echo "DB_INSTANCE=jobs-instance" >> .env
echo "DB_NAME=jobs_db" >> .env
source .env

Creating Bash script for instance and database creation

Then, create the scripts/setup_database.sh script with the following command

mkdir -p ~/build-agent-adk-toolbox-cloudsql/scripts
cloudshell edit scripts/setup_database.sh

Then, copy the following code into the scripts/setup_database.sh file

#!/bin/bash
set -e
source .env

echo "================================================"
echo "Database Setup"
echo "================================================"
echo ""

# Step 1: Create Cloud SQL instance
echo "[1/5] Creating Cloud SQL instance..."

# Check if instance already exists
if gcloud sql instances describe "$DB_INSTANCE" --quiet >/dev/null 2>&1; then
    echo "      Instance already exists"
else
    echo "      Creating instance (takes 5-10 minutes)..."
    gcloud sql instances create "$DB_INSTANCE" \
        --database-version=POSTGRES_17 \
        --tier=db-custom-1-3840 \
        --edition=ENTERPRISE \
        --region="$REGION" \
        --root-password="$DB_PASSWORD" \
        --enable-google-ml-integration \
        --database-flags cloudsql.enable_google_ml_integration=on \
        --quiet
fi
echo "      ✓ Instance ready"
echo ""

# Step 2: Verify instance is ready
echo "[2/5] Verifying instance state..."

STATE=$(gcloud sql instances describe "$DB_INSTANCE" --format='value(state)')

if [ "$STATE" != "RUNNABLE" ]; then
    echo "ERROR: Instance not ready (state: $STATE)"
    exit 1
fi
echo "      ✓ Instance is RUNNABLE"
echo ""

# Step 3: Grant IAM permissions
echo "[3/5] Granting Vertex AI permissions..."

SERVICE_ACCOUNT=$(gcloud sql instances describe "$DB_INSTANCE" \
    --format='value(serviceAccountEmailAddress)')

if [ -z "$SERVICE_ACCOUNT" ]; then
    echo "ERROR: Could not retrieve service account"
    exit 1
fi

gcloud projects add-iam-policy-binding "$GOOGLE_CLOUD_PROJECT" \
    --member="serviceAccount:$SERVICE_ACCOUNT" \
    --role="roles/aiplatform.user" \
    --quiet

echo "      ✓ Permissions granted"
echo ""

# Step 4: Create database
echo "[4/5] Creating database..."

# Check if database already exists
if gcloud sql databases describe "$DB_NAME" \
    --instance="$DB_INSTANCE" --quiet >/dev/null 2>&1; then
    echo "      Database already exists"
else
    gcloud sql databases create "$DB_NAME" \
        --instance="$DB_INSTANCE" \
        --quiet
fi

echo "      ✓ Database '$DB_NAME' ready"
echo ""

# Step 5: Seed database and generate embeddings
echo "[5/5] Seeding database and generating embeddings..."

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SETUP_SCRIPT="${SCRIPT_DIR}/setup_jobs_db.py"

if [ ! -f "$SETUP_SCRIPT" ]; then
    echo "ERROR: Setup script not found: $SETUP_SCRIPT"
    exit 1
fi

uv run "$SETUP_SCRIPT"

echo ""
echo "================================================"
echo "Setup complete!"
echo "================================================"
echo ""

Creating Python script for data seed

After that, create the seeding script python file scripts/setup_jobs_db.py using the command below

cloudshell edit scripts/setup_jobs_db.py

Then, copy the following code into scripts/setup_jobs_db.py file

import os
import sys
from pathlib import Path
from dotenv import load_dotenv
from google.cloud.sql.connector import Connector
import pg8000
import time

# Load environment variables from .env file
env_path = Path(__file__).parent.parent / '.env'
load_dotenv(env_path)
EMBEDDING_MODEL='gemini-embedding-001'

# Verify required environment variables
required_vars = ['GOOGLE_CLOUD_PROJECT', 'REGION', 'DB_PASSWORD']
missing_vars = [var for var in required_vars if not os.environ.get(var)]

if missing_vars:
    print(f"ERROR: Missing required environment variables: {', '.join(missing_vars)}", file=sys.stderr)
    print(f"", file=sys.stderr)
    print(f"Expected .env file location: {env_path}", file=sys.stderr)
    if not env_path.exists():
        print(f"✗ File not found at that location", file=sys.stderr)
    else:
        print(f"✓ File exists but is missing the variables above", file=sys.stderr)
    print(f"", file=sys.stderr)
    print(f"Make sure your .env file contains:", file=sys.stderr)
    for var in missing_vars:
        print(f"  {var}=<value>", file=sys.stderr)
    sys.exit(1)

# Job listings data (fictional, for tutorial purposes only)
JOBS = [
    ("Senior Backend Engineer", "Stripe", "Backend", "Go, PostgreSQL, gRPC, Kubernetes", "$180-250K/year", "San Francisco, Hybrid", 3,
     "Design and build high-throughput microservices powering payment infrastructure for millions of businesses. Optimize Go services for sub-100ms latency at scale, work with PostgreSQL and Redis for data persistence, and deploy on Kubernetes clusters handling billions of API calls."),
    ("Machine Learning Engineer", "Spotify", "Data/AI", "Python, TensorFlow, BigQuery, Vertex AI", "$170-230K/year", "Stockholm, Remote", 2,
     "Build and deploy ML models for music recommendation and personalization systems serving hundreds of millions of listeners. Design feature pipelines in BigQuery, train models using distributed computing, and serve predictions through real-time APIs processing thousands of requests per second."),
    ("Frontend Engineer", "Vercel", "Frontend", "React, TypeScript, Next.js", "$140-190K/year", "Remote", 4,
     "Build developer-facing dashboard interfaces and deployment tools used by millions of developers worldwide. Create responsive, accessible React components for project management, analytics, and real-time deployment monitoring with a focus on developer experience."),
    ("DevOps Engineer", "Datadog", "DevOps", "Terraform, GCP, Docker, Kubernetes, ArgoCD", "$160-220K/year", "New York, Hybrid", 2,
     "Manage cloud infrastructure powering an observability platform used by thousands of engineering teams. Automate deployment pipelines with ArgoCD, manage multi-cloud Kubernetes clusters, and implement infrastructure-as-code with Terraform across production environments."),
    ("Mobile Engineer (Android)", "Grab", "Mobile", "Kotlin, Jetpack Compose, GraphQL", "$120-170K/year", "Singapore, Hybrid", 3,
     "Develop features for a super-app serving millions of users across Southeast Asia. Build modern Android UIs with Jetpack Compose, integrate GraphQL APIs, and optimize app performance for diverse device capabilities and network conditions."),
    ("Data Engineer", "Airbnb", "Data", "Python, Apache Spark, Airflow, BigQuery", "$160-210K/year", "San Francisco, Hybrid", 2,
     "Build data pipelines that process booking, search, and pricing data for a global travel marketplace. Design ETL workflows with Apache Spark and Airflow, maintain data warehouses in BigQuery, and ensure data quality for analytics and machine learning teams."),
    ("Full Stack Engineer", "Revolut", "Full Stack", "TypeScript, Node.js, React, PostgreSQL", "$130-180K/year", "London, Remote", 5,
     "Build the next generation of financial products making banking accessible to millions of users across 35 countries. Develop real-time trading interfaces with React and WebSockets, build Node.js APIs handling market data streams, and design PostgreSQL schemas for financial transactions."),
    ("Site Reliability Engineer", "Cloudflare", "SRE", "Go, Prometheus, Grafana, GCP, Terraform", "$170-230K/year", "Austin, Hybrid", 2,
     "Ensure 99.99% uptime for a global network handling millions of requests per second. Define SLOs, build monitoring dashboards with Prometheus and Grafana, manage incident response, and automate infrastructure scaling across 300+ data centers worldwide."),
    ("Cloud Architect", "Google Cloud", "Cloud", "GCP, Terraform, Kubernetes, Python", "$200-280K/year", "Seattle, Hybrid", 1,
     "Help enterprises modernize their infrastructure on Google Cloud. Design multi-region architectures, lead migration projects from on-premises to GKE, and build reference implementations using Terraform and Cloud Foundation Toolkit."),
    ("Backend Engineer (Payments)", "Square", "Backend", "Java, Spring Boot, PostgreSQL, Kafka", "$160-220K/year", "San Francisco, Hybrid", 3,
     "Build payment processing systems handling millions of transactions for businesses of all sizes. Design event-driven architectures using Kafka, implement idempotent payment flows with Spring Boot, and ensure PCI-DSS compliance across all services."),
    ("AI Engineer", "Hugging Face", "Data/AI", "Python, LangChain, Vertex AI, FastAPI, PostgreSQL", "$150-210K/year", "Paris, Remote", 2,
     "Build AI-powered tools for the largest open-source ML community. Develop RAG pipelines that index and search model documentation, create conversational agents using LangChain, and deploy AI services with FastAPI on cloud infrastructure."),
    ("Platform Engineer", "Coinbase", "Platform", "Rust, Kubernetes, AWS, Terraform", "$180-250K/year", "Remote", 0,
     "Build the infrastructure platform for a leading cryptocurrency exchange. Develop high-performance matching engines in Rust, manage Kubernetes clusters for microservices, and design CI/CD pipelines that enable rapid feature deployment with zero downtime."),
    ("QA Automation Engineer", "Shopify", "QA", "Python, Selenium, Cypress, Jenkins", "$110-160K/year", "Toronto, Hybrid", 3,
     "Design and maintain automated test suites for a commerce platform powering millions of merchants. Build end-to-end test frameworks with Cypress and Selenium, integrate tests into Jenkins CI pipelines, and establish quality gates that prevent regressions in checkout and payment flows."),
    ("Security Engineer", "CrowdStrike", "Security", "Python, SIEM, Kubernetes, Penetration Testing", "$170-240K/year", "Austin, On-site", 1,
     "Protect enterprise customers from cyber threats on a leading endpoint security platform. Conduct penetration testing, design security monitoring with SIEM tools, implement zero-trust networking in Kubernetes environments, and lead incident response for security events."),
    ("Product Engineer", "GitLab", "Full Stack", "Go, React, PostgreSQL, Redis, GCP", "$140-200K/year", "Remote", 4,
     "Own features end-to-end for an all-in-one DevSecOps platform used by millions of developers. Build Go microservices for CI/CD pipelines, create React frontends for code review and project management, and collaborate with product managers to iterate on user-facing features using data-driven development."),
]


def get_connection():
    """Create a connection to Cloud SQL using the connector."""
    project = os.environ['GOOGLE_CLOUD_PROJECT']
    region = os.environ['REGION']
    password = os.environ['DB_PASSWORD']
    instance = os.environ['DB_INSTANCE']
    database = os.environ['DB_NAME']

    connector = Connector()
    conn = connector.connect(
        f"{project}:{region}:{instance}",
        "pg8000",
        user="postgres",
        password=password,
        db=database
    )
    return conn, connector


def create_schema(cursor):
    """Create extensions and jobs table."""
    cursor.execute("CREATE EXTENSION IF NOT EXISTS google_ml_integration")
    cursor.execute("CREATE EXTENSION IF NOT EXISTS vector")
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS jobs (
            id SERIAL PRIMARY KEY,
            title VARCHAR NOT NULL,
            company VARCHAR NOT NULL,
            role VARCHAR NOT NULL,
            tech_stack VARCHAR NOT NULL,
            salary_range VARCHAR NOT NULL,
            location VARCHAR NOT NULL,
            openings INTEGER NOT NULL,
            description TEXT NOT NULL,
            description_embedding vector(3072)
        )
    """)


def seed_jobs(cursor, conn):
    """Insert job listings."""
    cursor.execute("SELECT COUNT(*) FROM jobs")
    existing_count = cursor.fetchone()[0]

    if existing_count > 0:
        print(f"      {existing_count} jobs already exist, skipping seed")
        return 0

    cursor.executemany("""
        INSERT INTO jobs (title, company, role, tech_stack, salary_range, location, openings, description)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
    """, JOBS)
    conn.commit()
    return len(JOBS)


def generate_embeddings(cursor, conn):
    """Generate embeddings using Cloud SQL's embedding() function."""
    cursor.execute("SELECT COUNT(*) FROM jobs WHERE description_embedding IS NULL")
    null_count = cursor.fetchone()[0]

    if null_count == 0:
        print("      All jobs already have embeddings")
        return 0

    cursor.execute(f"""
        UPDATE jobs
        SET description_embedding = embedding('{EMBEDDING_MODEL}', description)::vector
        WHERE description_embedding IS NULL
    """)
    rows_updated = cursor.rowcount
    conn.commit()
    return rows_updated


def main():
    conn, connector = get_connection()
    cursor = conn.cursor()

    try:
        create_schema(cursor)
        conn.commit()

        seeded = seed_jobs(cursor, conn)
        if seeded > 0:
            print(f"      ✓ Inserted {seeded} jobs")

        # Waiting for vertex role propagation
        time.sleep(60)
        embedded = generate_embeddings(cursor, conn)
        if embedded > 0:
            print(f"      ✓ Generated {embedded} embeddings")

    except Exception as e:
        print(f"ERROR: {e}", file=sys.stderr)
        sys.exit(1)
    finally:
        cursor.close()
        conn.close()
        connector.close()


if __name__ == "__main__":
    main()

Now, let's go to the next step

4. Create and Initialize the Database

Now our scripts are ready to be executed. We will need Python to execute our prepared script, so let's prepare that one first

Set up the Python project

uv is a fast Python package and project manager written in Rust ( uv documentations ). This codelab uses it for speed and simplicity in maintaining the Python project

Initialize a Python project and add the required dependencies:

uv init
uv add cloud-sql-python-connector --extra pg8000
uv add python-dotenv

Note that we are utilizing cloud-sql-python-connector Python SDK here to initialize a secure connection with our database instance which is authenticated using Application Default Credentials.

Execute the setup script

Now, we can run the setup script in the background and inspect the console output which will be written to logs/atabase_setup.log file using the following command. You can continue to the next section while waiting this to be finished

mkdir -p ~/build-agent-adk-toolbox-cloudsql/logs
bash scripts/setup_database.sh > logs/database_setup.log 2>&1 &

Download the Toolbox binary

We will utilize MCP Toolbox in this tutorial, fortunately it comes with a pre-built binary that is ready to be used in the Linux environment. Now, let's download it in the background as well as it takes quite a while. Run the following command to download the binary and inspect the output log on the logs/toolbox_dl.log . You can continue to the next section while waiting this to be finished

cd ~/build-agent-adk-toolbox-cloudsql
curl -O https://storage.googleapis.com/mcp-toolbox-for-databases/v1.0.0/linux/amd64/toolbox > logs/toolbox_dl.log 2>&1 &

Understanding the setup script scripts/setup_database.sh

Now let's try to understand the setup script we previously configured. It does the following process

  1. The very first command we execute there is the gcloud sql instances create command with the following flag
  • db-custom-1-3840 is the smallest dedicated-core Cloud SQL tier (1 vCPU, 3.75 GB RAM) in ENTERPRISE edition. You can read more details in here. A dedicated core is required for the Vertex AI ML integration — shared-core tiers (db-f1-micro, db-g1-small) do not support it.
  • --root-password sets the password for the default postgres user.
  • --enable-google-ml-integration enables Cloud SQL's built-in integration with Vertex AI, which lets you call embedding models directly from SQL using the embedding() function.
  1. Verify whether the instance already in RUNNABLE status
  2. Grant the Cloud SQL instance's service account permission to call Vertex AI using the gcloud projects add-iam-policy-binding command. This is required for the built-in embedding() function that we will use when seeding the database
  3. Creating the database
  4. Executing the seeding script setup_jobs_db.py script

Understanding the seed script scripts/setup_jobs_db.py

Now, moving to the seeding script, this script do the following things:

  1. Initialize connection to the database instance
  2. Installs two PostgreSQL extensions:
  • google_ml_integration — provides the embedding() SQL function, which calls Vertex AI embedding models directly from SQL. This is a database-level extension that makes ML functions available inside jobs_db. The instance-level flag (--enable-google-ml-integration) you set during instance creation allows the Cloud SQL VM to reach Vertex AI — the extension makes the SQL functions available within this specific database.
  • vector (pgvector) — adds the vector data type and distance operators for storing and querying embeddings.
  1. Create the table, notes that the description_embedding column is vector(3072) — a pgvector column that stores 3072-dimensional vectors.
  2. Seed the initial jobs data
  3. Generate the embedding data from description field and fill the description_embedding using the built in vertex integration via the embedding() function
  • embedding('gemini-embedding-001', description) — calls Vertex AI's Gemini embedding model directly from SQL, passing each job's description text. This is the google_ml_integration extension you installed in the seed script.
  • ::vector — casts the returned float array to pgvector's vector type so it can be stored and queried with distance operators.
  • The UPDATE runs across all 15 rows, generating one 3072-dimensional embedding per job description.

This will prepare initial data which will be accessed by our agent

5. Configure MCP Toolbox for Databases

This step introduces MCP Toolbox for Databases, configures it to connect to your Cloud SQL instance, and defines two standard SQL query tools.

What is MCP and why use Toolbox?

e7b9be2e1c98b4db.png

MCP (Model Context Protocol) is an open protocol that standardizes how AI agents discover and interact with external tools. It defines a client-server model: the agent hosts an MCP client, and tools are exposed by MCP servers. Any MCP-compatible client can use any MCP-compatible server — the agent doesn't need custom integration code for each tool.

5bf26eeecad2277d.png

MCP Toolbox for Databases is an open-source MCP server built specifically for database access. Without it, you would write Python functions that open database connections, manage connection pools, construct parameterized queries to prevent SQL injection, handle errors, and embed all of that code inside your agent. Every agent that needs database access repeats this work. Changing a query means redeploying the agent.

With Toolbox, you write a YAML file. Each tool maps to a parameterized SQL statement. Toolbox handles connection pooling, parameterized queries, authentication, and observability. Tools are decoupled from the agent — update a query by editing tools.yaml and restarting Toolbox, without touching agent code. The same tools work across ADK, LangGraph, LlamaIndex, or any MCP-compatible framework.

Write the tools configuration

Now, we need to create a file called tools.yaml in the Cloud Shell Editor to set up our tools configuration

cloudshell edit tools.yaml

The file uses multi-document YAML — each block separated by --- is a standalone resource. Every resource has a kind that declares what it is (sources for database connections, tools for agent-callable actions) and a type that specifies the backend (cloud-sql-postgres for the source, postgres-sql for SQL-based tools). A tool references its source by name, which is how Toolbox knows which connection pool to execute against. Environment variables use ${VAR_NAME} syntax and are resolved at startup.

Now, let's copy the following scripts first into tools.yaml file

# tools.yaml

# --- Data Source ---
kind: source
name: jobs-db
type: cloud-sql-postgres
project: ${GOOGLE_CLOUD_PROJECT}
region: ${REGION}
instance: ${DB_INSTANCE}
database: ${DB_NAME}
user: postgres
password: ${DB_PASSWORD}

---

This script here define the following resource:

  • Source (jobs-db) — tells Toolbox how to connect to your Cloud SQL PostgreSQL instance. The cloud-sql-postgres type uses the Cloud SQL connector internally, handling authentication and secure connections automatically. The ${GOOGLE_CLOUD_PROJECT} , ${REGION} and ${DB_PASSWORD} placeholders are resolved from environment variables at startup.

Next, append the following script under the --- symbol in the tools.yaml

# --- Tool 1: Search jobs by role and/or tech stack ---
kind: tool
name: search-jobs
type: postgres-sql
source: jobs-db
description: >-
  Search for job listings by role category and/or tech stack.
  Use this tool when the developer wants to browse listings
  by role (e.g., Backend, Frontend, Data) or find jobs
  using a specific technology. Both parameters accept an
  empty string to match all values.
statement: |
  SELECT title, company, role, tech_stack, salary_range, location, openings
  FROM jobs
  WHERE ($1 = '' OR LOWER(role) = LOWER($1))
  AND ($2 = '' OR LOWER(tech_stack) LIKE '%' || LOWER($2) || '%')
  ORDER BY title
  LIMIT 10
parameters:
  - name: role
    type: string
    description: "The role category to filter by (e.g., 'Backend', 'Frontend', 'Data/AI', 'DevOps'). Use empty string for all roles."
  - name: tech_stack
    type: string
    description: "A technology to search for in the tech stack (partial match, e.g., 'Python', 'Kubernetes'). Use empty string for all tech stacks."

---

# --- Tool 2: Get full details for a specific job ---
kind: tool
name: get-job-details
type: postgres-sql
source: jobs-db
description: >-
  Get full details for a specific job listing including its description,
  salary range, location, and number of openings. Use this tool when the
  developer asks about a particular job by title or company.
statement: |
  SELECT title, company, role, tech_stack, salary_range, location, openings, description
  FROM jobs
  WHERE LOWER(title) LIKE '%' || LOWER($1) || '%'
  OR LOWER(company) LIKE '%' || LOWER($1) || '%'
parameters:
  - name: search_term
    type: string
    description: "The job title or company name to look up (partial match supported)."

---

This script here define the following resource:

  • Tools 1 and 2 (search-jobs, get-job-details) — standard SQL query tools. Each maps a tool name (what the agent sees) to a parameterized SQL statement (what the database executes). Parameters use $1, $2 positional placeholders. Toolbox executes these as prepared statements, which prevents SQL injection.

Let's continue, append the following script under the --- symbol in the tools.yaml

# --- Embedding Model ---
kind: embeddingModel
name: gemini-embedding
type: gemini
model: gemini-embedding-001
project: ${GOOGLE_CLOUD_PROJECT}
location: ${GOOGLE_CLOUD_LOCATION}
dimension: 3072

---

This script here define the following resource:

  • Embedding model (gemini-embedding) — configures Toolbox to call Gemini's gemini-embedding-001 model for generating 3072-dimensional text embeddings. Toolbox uses Application Default Credentials (ADC) to authenticate — no API key needed in Cloud Shell or Cloud Run. Notes that this dimension configured here must be the same with previously we config to seed the database

Let's continue, append the following script under the --- symbol in the tools.yaml

# --- Tool 3: Semantic search by description ---
kind: tool
name: search-jobs-by-description
type: postgres-sql
source: jobs-db
description: >-
  Find jobs that match a natural language description of what the developer
  is looking for. Use this tool when the developer describes their ideal job
  using interests, work style, career goals, or project type rather than a
  specific role or tech stack. Examples: "I want to work on AI chatbots,"
  "a remote job at a fintech startup," "something involving infrastructure
  and reliability."
statement: |
  SELECT title, company, role, tech_stack, salary_range, location, description
  FROM jobs
  WHERE description_embedding IS NOT NULL
  ORDER BY description_embedding <=> $1
  LIMIT 5
parameters:
  - name: search_query
    type: string
    description: "A natural language description of the kind of job the developer is looking for."
    embeddedBy: gemini-embedding

---

This script here define the following resource:

  • Tool 3 (search-jobs-by-description) — a vector search tool. The search_query parameter has embeddedBy: gemini-embedding, which tells Toolbox to intercept the raw text, send it to the embedding model, and use the resulting vector in the SQL statement. The <=> operator is pgvector's cosine distance — smaller values mean more similar descriptions.

Finally, append the last tool under the --- symbol in the tools.yaml

# --- Tool 4: Add a new job listing with automatic embedding ---
kind: tool
name: add-job
type: postgres-sql
source: jobs-db
description: >-
  Add a new job listing to the platform. Use this tool when a user asks
  to post a job that is not currently listed.
statement: |
  INSERT INTO jobs (title, company, role, tech_stack, salary_range, location, openings, description, description_embedding)
  VALUES ($1, $2, $3, $4, $5, $6, CAST($7 AS INTEGER), $8, $9)
  RETURNING title, company
parameters:
  - name: title
    type: string
    description: "The job title (e.g., 'Senior Backend Engineer')."
  - name: company
    type: string
    description: "The company name (e.g., 'Stripe', 'Spotify')."
  - name: role
    type: string
    description: "The role category (e.g., 'Backend', 'Frontend', 'Data/AI', 'DevOps')."
  - name: tech_stack
    type: string
    description: "Comma-separated list of technologies (e.g., 'Python, FastAPI, GCP')."
  - name: salary_range
    type: string
    description: "The salary range (e.g., '$150-200K/year')."
  - name: location
    type: string
    description: "Work location and arrangement (e.g., 'Remote')."
  - name: openings
    type: string
    description: "The number of open positions."
  - name: description
    type: string
    description: "A short description of the job (2-3 sentences)."
  - name: description_vector
    type: string
    description: "Auto-generated embedding vector for the job description."
    valueFromParam: description
    embeddedBy: gemini-embedding

This script here define the following resource:

  • Tool 4 (add-job) — demonstrates vector ingestion. The description_vector parameter has two special fields:
  • valueFromParam: description — Toolbox copies the value from the description parameter into this one. The LLM never sees this parameter.
  • embeddedBy: gemini-embedding — Toolbox embeds the copied text into a vector before passing it to the SQL.

The result: one tool call stores both the raw description text and its vector embedding, without the agent knowing anything about embeddings.

The multi-document YAML format separates each resource with ---. Each document has kind, name, and type fields that define what it is. In summary we already configured all of the following things:

  • Define the source database
  • Define tools ( tool 1 and 2 ) to query database with standard filter
  • Define embedding model
  • Define tool to do vector search ( tool 3 ) to database
  • Define tool to do vector data ingestion ( tool 4 ) to database

6. Running the MCP Toolbox Server

In the previous step, we already set the necessary configuration for our MCP Toolbox. Now we are ready to run the server

Verify the seeded data

Before starting Toolbox, let's confirm the database setup has completed. Create a python script scripts/verify_database.py using the following command

cloudshell edit scripts/verify_seed.py

Then, copy the following code into scripts/verify_seed.py file

#!/usr/bin/env python3
"""Verify the database has 15 jobs with embeddings."""

import os
import sys
from pathlib import Path
from dotenv import load_dotenv
from google.cloud.sql.connector import Connector
import pg8000

# Load environment variables
env_path = Path(__file__).parent.parent / '.env'
load_dotenv(env_path)

# Verify required environment variables
required_vars = ['GOOGLE_CLOUD_PROJECT', 'REGION', 'DB_PASSWORD', 'DB_INSTANCE', 'DB_NAME']
missing_vars = [var for var in required_vars if not os.environ.get(var)]

if missing_vars:
    print(f"ERROR: Missing environment variables: {', '.join(missing_vars)}", file=sys.stderr)
    sys.exit(1)


def verify_database():
    """Check that 15 jobs exist with embeddings."""
    connector = Connector()

    try:
        project = os.environ['GOOGLE_CLOUD_PROJECT']
        region = os.environ['REGION']
        password = os.environ['DB_PASSWORD']
        instance = os.environ['DB_INSTANCE']
        database = os.environ['DB_NAME']

        conn = connector.connect(
            f"{project}:{region}:{instance}",
            "pg8000",
            user="postgres",
            password=password,
            db=database
        )
        cursor = conn.cursor()

        # Count jobs and embeddings
        cursor.execute("SELECT COUNT(*) FROM jobs")
        job_count = cursor.fetchone()[0]

        cursor.execute("SELECT COUNT(*) FROM jobs WHERE description_embedding IS NOT NULL")
        embedding_count = cursor.fetchone()[0]

        print(f"Jobs: {job_count}/15")
        print(f"Embeddings: {embedding_count}/15")

        cursor.close()
        conn.close()

        if job_count == 15 and embedding_count == 15:
            print("\n✓ Database ready!")
            return True
        else:
            print("\n✗ Database not ready")
            return False

    except Exception as e:
        print(f"\nERROR: {e}", file=sys.stderr)
        return False
    finally:
        connector.close()


if __name__ == "__main__":
    success = verify_database()
    sys.exit(0 if success else 1)

This script will check the number of job post data and their embedding. Run the script using the following command

uv run scripts/verify_seed.py

If you see the following terminal output, it means the data is ready

Jobs: 15/15
Embeddings: 15/15

✓ Database ready!

Start the Toolbox server

In the setup step earlier, we already downloaded the toolbox executable. Ensure that this binary file exist and successfully downloaded, if not, download it and wait till finished

cd ~/build-agent-adk-toolbox-cloudsql
if [ ! -f toolbox ]; then
  curl -O https://storage.googleapis.com/mcp-toolbox-for-databases/v1.0.0/linux/amd64/toolbox
fi
chmod +x toolbox

We will need to expose our .env variables to the child process which is run by the MCP toolbox. Run the following command to start the toolbox server and log its console output to logs/mcp_toolbox.log file

set -a; source .env; set +a
./toolbox --config tools.yaml --enable-api > logs/mcp_toolbox.log 2>&1 &

You should see output in the logs/mcp_toolbox.log file confirming the server is ready like shown below:

... INFO "Initialized 1 sources: jobs-db"
... INFO "Initialized 0 authServices: "
... INFO "Using Vertex AI backend for Gemini embedding" 
... INFO "Initialized 1 embeddingModels: gemini-embedding" 
... INFO "Initialized 4 tools: add-job, search-jobs, get-job-details, search-jobs-by-description" 
...
... INFO "Server ready to serve!"

Verify the tools

Query the Toolbox API to list all registered tools:

curl -s http://localhost:5000/api/toolset | uv run -m json.tool

You should see tools with their descriptions and parameters. Like shown below

...
       "search-jobs-by-description": {
            "description": "Find jobs that match a natural language description of what the developer is looking for. Use this tool when the developer describes their ideal job using interests, work style, career goals, or project type rather than a specific role or tech stack. Examples: \"I want to work on AI chatbots,\" \"a remote job at a fintech startup,\" \"something involving infrastructure and reliability.\"",
            "parameters": [
                {
                    "name": "search_query",
                    "type": "string",
                    "required": true,
                    "description": "A natural language description of the kind of job the developer is looking for.",
                    "authSources": []
                }
            ],
            "authRequired": []
        }
...

Test the search-jobs tool directly:

curl -s -X POST http://localhost:5000/api/tool/search-jobs/invoke \
  -H "Content-Type: application/json" \
  -d '{"role": "Backend", "tech_stack": ""}' | jq '.result | fromjson'

The response should contain the two backend engineering jobs from your seed data.

[
  {
    "title": "Backend Engineer (Payments)",
    "company": "Square",
    "role": "Backend",
    "tech_stack": "Java, Spring Boot, PostgreSQL, Kafka",
    "salary_range": "$160-220K/year",
    "location": "San Francisco, Hybrid",
    "openings": 3
  },
  {
    "title": "Senior Backend Engineer",
    "company": "Stripe",
    "role": "Backend",
    "tech_stack": "Go, PostgreSQL, gRPC, Kubernetes",
    "salary_range": "$180-250K/year",
    "location": "San Francisco, Hybrid",
    "openings": 3
  }
]

7. Build the ADK Agent

Now, we will utilize ADK in Python for this project, let's add the required dependencies:

uv add google-adk==1.29.0 toolbox-adk==1.0.0
  • google-adk — Google's Agent Development Kit, including the Gemini SDK
  • toolbox-adk — ADK integration for MCP Toolbox for Databases.

Create the agent directory structure

ADK expects a specific folder layout: a directory named after your agent containing __init__.py, agent.py, and .env. To help with this, it has built in command to quickly establish the structure:

uv run adk create jobs_agent \
    --model gemini-2.5-flash \
    --project ${GOOGLE_CLOUD_PROJECT} \
    --region ${GOOGLE_CLOUD_LOCATION}

Your directory should now look like this:

build-agent-adk-toolbox-cloudsql/
├── jobs_agent/
│   ├── __init__.py
│   ├── agent.py
│   └── .env
├── logs
├── scripts
└── ...

Next, we will need to integrate the ADK agent to the running Toolbox server and test all four tools — standard queries, semantic search, and vector ingestion. The agent code is minimal: all database logic lives in tools.yaml.

Configure the agent's environment

ADK reads GOOGLE_GENAI_USE_VERTEXAI, GOOGLE_CLOUD_PROJECT, and GOOGLE_CLOUD_LOCATION from the shell environment, which you already set in the earlier step. The only agent-specific variable is TOOLBOX_URL — append it to the agent's .env file:

echo -e "\nTOOLBOX_URL=http://127.0.0.1:5000" >> jobs_agent/.env

Update the agent module

Open jobs_agent/agent.py in the Cloud Shell Editor

cloudshell edit jobs_agent/agent.py

and overwrite the content with the following code:

# jobs_agent/agent.py
import os

from google.adk.agents import LlmAgent
from toolbox_adk import ToolboxToolset

TOOLBOX_URL = os.environ.get("TOOLBOX_URL", "http://127.0.0.1:5000")

toolbox = ToolboxToolset(TOOLBOX_URL)

root_agent = LlmAgent(
    name="jobs_agent",
    model="gemini-2.5-flash",
    instruction="""You are a helpful assistant at "TechJobs," a tech job listing platform.

Your job:
- Help developers browse job listings by role or tech stack.
- Provide full details about specific positions, including salary range and number of openings.
- Recommend jobs based on natural language descriptions of what the developer is looking for.
- Add new job listings to the platform when asked.

When a developer asks about a specific job by title or company, use the get-job-details tool.
When a developer asks for a specific role category or tech stack, use the search-jobs tool.
When a developer describes what kind of job they want — by interest area, work style,
career goals, or project type — use the search-jobs-by-description tool for semantic search.
When in doubt between search-jobs and search-jobs-by-description, prefer
search-jobs-by-description — it searches job descriptions and finds more relevant matches.

If a position has no openings (openings is 0), let the developer know
and suggest similar alternatives from the search results.

Be conversational, knowledgeable, and concise.""",
    tools=[toolbox],
)

Notice that there is no database code in here — ToolboxToolset connects to the Toolbox server at startup and loads all available tools. The agent calls tools by name; Toolbox translates those calls into SQL queries against Cloud SQL.

The TOOLBOX_URL environment variable defaults to http://127.0.0.1:5000 for local development. When you deploy to Cloud Run later, you override this with the Toolbox service's Cloud Run URL — no code changes needed.

The instruction currently references only the two standard tools (search-jobs and get-job-details). You will expand it in the next step when you add semantic search and ingestion tools.

Test the agent

Start the ADK dev UI:

cd ~/build-agent-adk-toolbox-cloudsql
uv run adk web --allow_origins "regex:https://.*\.cloudshell\.dev"

Open the URL shown in the terminal (typically http://localhost:8000) using Cloud Shell's Web Preview feature or ctrl + click the URL shown in terminal. Select jobs_agent from the agent dropdown in the top-left corner.

Test standard queries

Try these prompts to verify the standard SQL tools:

What backend engineering jobs do you have?
Any jobs using Kubernetes?
Tell me about the Cloud Architect position

93ac33e7f73aa0b9.png 240c53376042a916.png

Try natural language descriptions that don't map to a specific role or tech stack:

I want a remote job where I can work on AI and machine learning
Find me something in fintech with good work-life balance
I'm interested in infrastructure and reliability engineering

The agent will try to pick the right tool based on the query type: structured filters go through search-jobs, natural language descriptions go through search-jobs-by-description.

b0ea629f5c9b4c26.png

Test vector ingestion

Ask the agent to add a new job:

Add a new job: 'Robotics Software Engineer' at Boston Dynamics, role Robotics, tech stack: Python, C++, ROS, Computer Vision, salary $160-230K/year, location Waltham MA, Hybrid, 2 openings. Description: Design and implement autonomous navigation and manipulation algorithms for next-generation robots. Work on perception pipelines using computer vision and lidar, develop motion planning software in C++ and Python, and test systems on real hardware in warehouse and logistics environments.

c601a7a9bc0a705b.png

Now try to search for it:

Find me jobs involving autonomous systems and working with physical hardware

The embedding was generated automatically during the INSERT — no separate step needed.

5a3d8e6f523dc18b.png

Now, you already have a full working Agentic RAG application utilizing ADK, MCP Toolbox, and CloudSQL. Congratulations! Let's take a step further to deploy these apps to Cloud Run!

Now, let's stop the dev UI by killing the process by pressing Ctrl+C twice before proceeding.

8. Deploy to Cloud Run

The agent and Toolbox work locally. This step deploys both as Cloud Run services so they're accessible over the internet. The Toolbox service runs as an MCP server on Cloud Run, and the agent service connects to it.

Prepare the Toolbox for deployment

Create a deployment directory for the Toolbox service:

cd ~/build-agent-adk-toolbox-cloudsql
mkdir -p deploy-toolbox
cp toolbox tools.yaml deploy-toolbox/

Create the Dockerfile for the Toolbox. Open deploy-toolbox/Dockerfile in the Cloud Shell Editor:

cloudshell edit deploy-toolbox/Dockerfile

And copy the following script to it

# deploy-toolbox/Dockerfile
FROM debian:bookworm-slim
RUN apt-get update && apt-get install -y ca-certificates && rm -rf /var/lib/apt/lists/*
WORKDIR /app
COPY toolbox tools.yaml ./
RUN chmod +x toolbox
EXPOSE 8080
CMD ["./toolbox", "--config", "tools.yaml", "--enable-api", "--address", "0.0.0.0", "--port", "8080"]

The Toolbox binary and tools.yaml are packaged into a minimal Debian image. Cloud Run routes traffic to port 8080.

Deploy the Toolbox service

cd ~/build-agent-adk-toolbox-cloudsql
gcloud run deploy toolbox-service \
  --source deploy-toolbox/ \
  --region $REGION \
  --set-env-vars "DB_PASSWORD=$DB_PASSWORD,DB_INSTANCE=$DB_INSTANCE,DB_NAME=$DB_NAME,GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT,REGION=$REGION,GOOGLE_CLOUD_LOCATION=$GOOGLE_CLOUD_LOCATION" \
  --allow-unauthenticated \
  --quiet > logs/deploy_toolbox.log 2>&1 &

This command submits the source to Cloud Build, builds a container image, pushes it to Artifact Registry, and deploys it to Cloud Run. It will take a few minutes — we can inspect the deployment process log on the logs/deploy_toolbox.log file

Prepare the agent for deployment

While the Toolbox builds, set up the agent's deployment files.

Create a Dockerfile in the project root. Open Dockerfile in the Cloud Shell Editor:

cloudshell edit Dockerfile

Then, copy the following content

# Dockerfile
FROM ghcr.io/astral-sh/uv:python3.12-trixie-slim
WORKDIR /app
COPY pyproject.toml ./
COPY uv.lock ./
RUN uv sync --no-dev
COPY jobs_agent/ jobs_agent/
EXPOSE 8080
CMD ["uv", "run", "adk", "web", "--host", "0.0.0.0", "--port", "8080"]

This Dockerfile uses ghcr.io/astral-sh/uv as the base image, which includes both Python and uv pre-installed — no need to install uv separately via pip.

Create a .dockerignore file to exclude unnecessary files from the container image:

cloudshell edit .dockerignore

Then copy the following script into it

# .dockerignore
.venv/
__pycache__/
*.pyc
.env
jobs_agent/.env
toolbox
tools.yaml
seed.sql
deploy-toolbox/

Deploy the agent service

Wait for the Toolbox deployment to complete. Check the deployment process again on logs/deploy_toolbox.log to verify the process. Then, r etrieve its Cloud Run URL using the following command

TOOLBOX_URL=$(gcloud run services describe toolbox-service \
  --region=$REGION \
  --format='value(status.url)')
echo "Toolbox URL: $TOOLBOX_URL"

You will see the similar output like this

Toolbox URL: https://toolbox-service-xxxxxx-xx.a.run.app

Then, Let's verify the deployed Toolbox is working:

curl -s "$TOOLBOX_URL/api/toolset" | python3 -m json.tool | head -5

If the output shown like this example, the deployment is already succeed

{
    "serverVersion": "1.0.0+binary.linux.amd64.c5524d3",
    "tools": {
        "add-job": {
            "description": "Add a new job listing to the platform. Use this tool when a user asks to post a job that is not currently listed.",

Next, let's deploy the agent, passing the Toolbox URL as an environment variable:

cd ~/build-agent-adk-toolbox-cloudsql
gcloud run deploy jobs-agent \
  --source . \
  --region $REGION \
  --set-env-vars "TOOLBOX_URL=$TOOLBOX_URL,GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT,GOOGLE_CLOUD_LOCATION=$GOOGLE_CLOUD_LOCATION,GOOGLE_GENAI_USE_VERTEXAI=TRUE" \
  --allow-unauthenticated \
  --quiet

The agent code reads TOOLBOX_URL from the environment (you set this up previously). Locally it points to http://127.0.0.1:5000; on Cloud Run it points to the Toolbox service URL. No code changes needed.

Test the deployed agent

Retrieve the agent's Cloud Run URL:

AGENT_URL=$(gcloud run services describe jobs-agent \
  --region=$REGION \
  --format='value(status.url)')
echo "Agent URL: $AGENT_URL"

Open the URL in your browser. The ADK dev UI loads — the same interface you've been using locally, now running on Cloud Run.

Select jobs_agent from the dropdown and test:

What backend engineering jobs do you have?
I want a remote job working on AI and machine learning

Both queries work through the deployed services: the agent on Cloud Run calls the Toolbox on Cloud Run, which queries Cloud SQL.

9. Congratulations / Clean Up

You've built and deployed a smart job board assistant that uses MCP Toolbox for Databases to bridge an ADK agent and Cloud SQL PostgreSQL — with both standard SQL queries and semantic vector search.

What you've learned

  • How MCP standardizes tool access for AI agents, and how MCP Toolbox for Databases applies this specifically to database operations — replacing custom database code with declarative YAML configuration
  • How to configure Cloud SQL PostgreSQL as a Toolbox data source using the cloud-sql-postgres source type
  • How to define standard SQL query tools with parameterized statements that prevent SQL injection
  • How to enable vector search using pgvector and gemini-embedding-001, with the embeddedBy parameter for automatic query embedding
  • How valueFromParam enables automatic vector ingestion — the LLM provides a text description, and Toolbox silently copies, embeds, and stores the vector alongside the text
  • How ADK's ToolboxToolset loads tools from a running Toolbox server, keeping agent code minimal and database logic fully decoupled
  • How to deploy both the Toolbox MCP server and the ADK agent to Cloud Run as separate services

Clean up

To avoid incurring charges to your Google Cloud account for the resources created in this codelab, you can either delete the individual resources or delete the entire project.

The easiest way to clean up is to delete the project. This removes all resources associated with the project.

gcloud projects delete $GOOGLE_CLOUD_PROJECT

Option 2: Delete individual resources

If you want to keep the project but remove only the resources created in this codelab:

gcloud run services delete jobs-agent --region=$REGION --quiet
gcloud run services delete toolbox-service --region=$REGION --quiet
gcloud sql instances delete jobs-instance --quiet
gcloud artifacts repositories delete cloud-run-source-deploy --location=$REGION --quiet 2>/dev/null