ฐานข้อมูลในฐานะเครื่องมือ: RAG แบบเอเจนต์ที่มี ADK, MCP Toolbox และ Cloud SQL

1. บทนำ

เอเจนต์ AI จะมีประโยชน์เพียงใดก็ขึ้นอยู่กับข้อมูลที่เข้าถึงได้ ข้อมูลในโลกแห่งความเป็นจริงส่วนใหญ่อยู่ในฐานข้อมูล และการเชื่อมต่อเอเจนต์กับฐานข้อมูลมักหมายถึงการเขียนการจัดการการเชื่อมต่อ ตรรกะการค้นหา และไปป์ไลน์การฝังภายในโค้ดเอเจนต์ ตัวแทนทุกรายที่ต้องการเข้าถึงฐานข้อมูลจะต้องทำซ้ำงานนี้ และการเปลี่ยนแปลงการค้นหาทุกครั้งจะต้องมีการติดตั้งใช้งานตัวแทนอีกครั้ง

Codelab นี้แสดงแนวทางที่แตกต่างออกไป คุณประกาศเครื่องมือฐานข้อมูลในไฟล์ YAML ซึ่งได้แก่ การค้นหา SQL มาตรฐาน การค้นหาความคล้ายคลึงของเวกเตอร์ หรือแม้แต่การสร้างการฝังอัตโนมัติ และ MCP Toolbox สำหรับฐานข้อมูลจะจัดการการดำเนินการฐานข้อมูลทั้งหมดในฐานะเซิร์ฟเวอร์ MCP โค้ดของเอเจนต์จะยังคงมีขนาดเล็ก: โหลดเครื่องมือ แล้วปล่อยให้ Gemini ตัดสินใจว่าจะเรียกใช้เครื่องมือใด

สิ่งที่คุณจะสร้าง

ผู้ช่วยกระดานหางานอัจฉริยะสำหรับ "TechJobs" ซึ่งเป็น Agent ADK ที่ทำงานด้วยระบบ Gemini ที่ช่วยให้นักพัฒนาซอฟต์แวร์เรียกดูประกาศรับสมัครงานด้านเทคโนโลยีโดยใช้ตัวกรองมาตรฐาน (บทบาท ชุดซอฟต์แวร์โครงสร้างพื้นฐาน) และค้นพบงานผ่านคำอธิบายภาษาธรรมชาติ เช่น "ฉันต้องการงานระยะไกลที่ทำงานเกี่ยวกับแชทบ็อต AI" Agent จะอ่านและเขียนไปยังฐานข้อมูล Cloud SQL PostgreSQL ผ่าน MCP Toolbox สำหรับฐานข้อมูลทั้งหมด ซึ่งจะจัดการการเข้าถึงฐานข้อมูลทั้งหมด รวมถึงการสร้างการฝังอัตโนมัติสำหรับการค้นหาเวกเตอร์ เมื่อสิ้นสุดแล้ว ทั้ง Toolbox และ Agent จะทำงานใน Cloud Run

สิ่งที่คุณจะได้เรียนรู้

วิธีที่ MCP (Model Context Protocol) สร้างมาตรฐานการเข้าถึงเครื่องมือสำหรับ Agent AI และวิธีที่ MCP Toolbox สำหรับฐานข้อมูลใช้มาตรฐานนี้กับการดำเนินการฐานข้อมูล
ตั้งค่า MCP Toolbox สำหรับฐานข้อมูลเป็นมิดเดิลแวร์ระหว่าง Agent ADK กับ Cloud SQL PostgreSQL
กำหนดเครื่องมือฐานข้อมูลแบบประกาศใน tools.yaml โดยไม่ต้องมีโค้ดฐานข้อมูลใน Agent
สร้าง ADK Agent ที่โหลดเครื่องมือจากเซิร์ฟเวอร์ Toolbox ที่ทำงานอยู่โดยใช้ ToolboxToolset
สร้าง Vector Embeddings โดยใช้ฟังก์ชัน embedding() ในตัวของ Cloud SQL และเปิดใช้การค้นหาเชิงความหมายด้วย pgvector
ใช้ฟีเจอร์ valueFromParam เพื่อการนำเข้าเวกเตอร์อัตโนมัติในการดำเนินการเขียน
ติดตั้งใช้งานทั้งเซิร์ฟเวอร์ Toolbox และ Agent ADK ใน Cloud Run

ข้อกำหนดเบื้องต้น

บัญชี Google Cloud ที่มีบัญชีสำหรับการเรียกเก็บเงินช่วงทดลองใช้
มีความรู้พื้นฐานเกี่ยวกับ Python และ SQL
ประสบการณ์การใช้งาน Cloud Database และ ADK มาก่อนจะเป็นประโยชน์

2. ตั้งค่าสภาพแวดล้อม

ขั้นตอนนี้จะเตรียมสภาพแวดล้อม Cloud Shell กำหนดค่าโปรเจ็กต์ Google Cloud และโคลนที่เก็บข้อมูลอ้างอิง

เปิด Cloud Shell

เปิด Cloud Shell ในเบราว์เซอร์ Cloud Shell มีสภาพแวดล้อมที่กำหนดค่าไว้ล่วงหน้าพร้อมเครื่องมือทั้งหมดที่คุณต้องการสำหรับ Codelab นี้ คลิกให้สิทธิ์เมื่อมีข้อความแจ้งให้ทำ

จากนั้นคลิก "ดู" -> "เทอร์มินัล" เพื่อเปิดเทอร์มินัล อินเทอร์เฟซของคุณควรมีลักษณะคล้ายกับภาพนี้

นี่จะเป็นอินเทอร์เฟซหลักของเรา โดยมี IDE อยู่ด้านบนและเทอร์มินัลอยู่ด้านล่าง

ตั้งค่าไดเรกทอรีการทำงาน

สร้างไดเรกทอรีที่ใช้งานอยู่ โค้ดทั้งหมดที่คุณเขียนใน Codelab นี้จะอยู่ที่นี่

mkdir -p ~/build-agent-adk-toolbox-cloudsql
cloudshell workspace ~/build-agent-adk-toolbox-cloudsql && cd ~/build-agent-adk-toolbox-cloudsql

หลังจากนั้น ให้เตรียมไดเรกทอรีหลายรายการเพื่อจัดการสิ่งต่างๆ เช่น สคริปต์การกระจายและบันทึก

mkdir -p ~/build-agent-adk-toolbox-cloudsql/scripts
mkdir -p ~/build-agent-adk-toolbox-cloudsql/logs

สร้างโปรเจ็กต์ที่อยู่ในระบบคลาวด์ของ Google

สร้างไฟล์ .env ที่มีตัวแปรสถานที่ตั้ง

# For Vertex AI / Gemini API calls
echo "GOOGLE_CLOUD_LOCATION=global" > .env
# For Cloud SQL, Cloud Run, Artifact Registry
echo "REGION=us-central1" >> .env

สำคัญ ขั้นตอนด้านล่างนี้มีไว้เพื่อช่วยตั้งค่าโปรเจ็กต์ Google Cloud ที่ลิงก์กับบัญชีสำหรับการเรียกเก็บเงินแบบทดลองใช้(เข้มงวด) อย่างรวดเร็ว หากต้องการใช้โปรเจ็กต์ก่อนหน้านี้ของคุณเอง คุณสามารถข้ามขั้นตอนต่อไปนี้และทำตามขั้นตอนต่อไปนี้แทนได้

เพิ่มชื่อโปรเจ็กต์ของคุณเองเป็นGOOGLE_CLOUD_PROJECTตัวแปรในไฟล์ .env
เปิดใช้งานโปรเจ็กต์ในเทอร์มินัลโดยใช้ gcloud config set project your-project-id

หลังจากนั้น คุณจะข้ามไปยังส่วนการเปิดใช้งาน API ได้โดยตรง

หากไม่แน่ใจ ให้ดำเนินการต่อในส่วนด้านล่าง

หากต้องการลดความซับซ้อนในการตั้งค่าโปรเจ็กต์ในเทอร์มินัล ให้ดาวน์โหลดสคริปต์การตั้งค่าโปรเจ็กต์นี้ลงในไดเรกทอรีการทำงาน

curl -sL https://raw.githubusercontent.com/alphinside/cloud-trial-project-setup/main/setup_verify_trial_project.sh -o setup_verify_trial_project.sh

เรียกใช้สคริปต์ โดยจะยืนยันบัญชีสำหรับการเรียกเก็บเงินของช่วงทดลองใช้ สร้างโปรเจ็กต์ใหม่ (หรือตรวจสอบโปรเจ็กต์ที่มีอยู่) บันทึกรหัสโปรเจ็กต์ลงในไฟล์ .env ในไดเรกทอรีปัจจุบัน และตั้งค่าโปรเจ็กต์ที่ใช้งานอยู่ใน gcloud

bash setup_verify_trial_project.sh && source .env

สคริปต์จะทำสิ่งต่อไปนี้

ตรวจสอบว่าคุณมีบัญชีสำหรับการเรียกเก็บเงินสำหรับการทดลองใช้ที่ใช้งานอยู่
ตรวจสอบโปรเจ็กต์ที่มีอยู่ใน .env (หากมี)
สร้างโปรเจ็กต์ใหม่หรือใช้โปรเจ็กต์ที่มีอยู่
ลิงก์บัญชีสำหรับการเรียกเก็บเงินของช่วงทดลองใช้กับโปรเจ็กต์
บันทึกรหัสโปรเจ็กต์ไปยัง .env
ตั้งค่าโปรเจ็กต์เป็นโปรเจ็กต์ gcloud ที่ใช้งานอยู่

ยืนยันว่าตั้งค่าโปรเจ็กต์ถูกต้องแล้วโดยตรวจสอบข้อความสีเหลืองข้างไดเรกทอรีการทำงานในพรอมต์เทอร์มินัลของ Cloud Shell โดยควรแสดงรหัสโปรเจ็กต์ของคุณ

เปิดใช้งาน API ที่จำเป็น

จากนั้นเราต้องเปิดใช้ API หลายรายการสำหรับผลิตภัณฑ์ที่เราจะโต้ตอบด้วย

gcloud services enable \
  aiplatform.googleapis.com \
  sqladmin.googleapis.com \
  compute.googleapis.com \
  run.googleapis.com \
  cloudbuild.googleapis.com \
  artifactregistry.googleapis.com

Vertex AI API (aiplatform.googleapis.com) - Agent ของคุณใช้โมเดล Gemini และกล่องเครื่องมือใช้ Embedding API สำหรับการค้นหาเวกเตอร์
Cloud SQL Admin API (sqladmin.googleapis.com) - คุณจัดสรรและจัดการอินสแตนซ์ PostgreSQL
Compute Engine API (compute.googleapis.com) — จำเป็นสำหรับการสร้างอินสแตนซ์ Cloud SQL
Cloud Run, Cloud Build, Artifact Registry - ใช้ในขั้นตอนการทำให้ใช้งานได้ในภายหลังใน Codelab นี้

3. การเตรียมสคริปต์สำหรับการเริ่มต้นฐานข้อมูล

ขั้นตอนนี้จะเริ่มสร้างอินสแตนซ์ Cloud SQL และเรียกใช้สคริปต์การตั้งค่าอัตโนมัติที่รอให้อินสแตนซ์พร้อม จากนั้นจะสร้างฐานข้อมูล ป้อนข้อมูลประกาศรับสมัครงาน และสร้างการฝัง ทั้งหมดนี้จะดำเนินการในครั้งเดียว

ก่อนอื่น ให้เพิ่มรหัสผ่านของฐานข้อมูลลงในไฟล์ .env แล้วโหลดซ้ำโดยทำดังนี้

echo "DB_PASSWORD=techjobs-pwd" >> .env
echo "DB_INSTANCE=jobs-instance" >> .env
echo "DB_NAME=jobs_db" >> .env
source .env

การสร้างสคริปต์ Bash สำหรับการสร้างอินสแตนซ์และฐานข้อมูล

จากนั้นสร้างสคริปต์ scripts/setup_database.sh ด้วยคำสั่งต่อไปนี้

mkdir -p ~/build-agent-adk-toolbox-cloudsql/scripts
cloudshell edit scripts/setup_database.sh

จากนั้นคัดลอกโค้ดต่อไปนี้ลงในไฟล์ scripts/setup_database.sh

#!/bin/bash
set -e
source .env

echo "================================================"
echo "Database Setup"
echo "================================================"
echo ""

# Step 1: Create Cloud SQL instance
echo "[1/5] Creating Cloud SQL instance..."

# Check if instance already exists
if gcloud sql instances describe "$DB_INSTANCE" --quiet >/dev/null 2>&1; then
    echo "      Instance already exists"
else
    echo "      Creating instance (takes 5-10 minutes)..."
    gcloud sql instances create "$DB_INSTANCE" \
        --database-version=POSTGRES_17 \
        --tier=db-custom-1-3840 \
        --edition=ENTERPRISE \
        --region="$REGION" \
        --root-password="$DB_PASSWORD" \
        --enable-google-ml-integration \
        --database-flags cloudsql.enable_google_ml_integration=on \
        --quiet
fi
echo "      ✓ Instance ready"
echo ""

# Step 2: Verify instance is ready
echo "[2/5] Verifying instance state..."

STATE=$(gcloud sql instances describe "$DB_INSTANCE" --format='value(state)')

if [ "$STATE" != "RUNNABLE" ]; then
    echo "ERROR: Instance not ready (state: $STATE)"
    exit 1
fi
echo "      ✓ Instance is RUNNABLE"
echo ""

# Step 3: Grant IAM permissions
echo "[3/5] Granting Vertex AI permissions..."

SERVICE_ACCOUNT=$(gcloud sql instances describe "$DB_INSTANCE" \
    --format='value(serviceAccountEmailAddress)')

if [ -z "$SERVICE_ACCOUNT" ]; then
    echo "ERROR: Could not retrieve service account"
    exit 1
fi

gcloud projects add-iam-policy-binding "$GOOGLE_CLOUD_PROJECT" \
    --member="serviceAccount:$SERVICE_ACCOUNT" \
    --role="roles/aiplatform.user" \
    --quiet

echo "      ✓ Permissions granted"
echo ""

# Step 4: Create database
echo "[4/5] Creating database..."

# Check if database already exists
if gcloud sql databases describe "$DB_NAME" \
    --instance="$DB_INSTANCE" --quiet >/dev/null 2>&1; then
    echo "      Database already exists"
else
    gcloud sql databases create "$DB_NAME" \
        --instance="$DB_INSTANCE" \
        --quiet
fi

echo "      ✓ Database '$DB_NAME' ready"
echo ""

# Step 5: Seed database and generate embeddings
echo "[5/5] Seeding database and generating embeddings..."

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
SETUP_SCRIPT="${SCRIPT_DIR}/setup_jobs_db.py"

if [ ! -f "$SETUP_SCRIPT" ]; then
    echo "ERROR: Setup script not found: $SETUP_SCRIPT"
    exit 1
fi

uv run "$SETUP_SCRIPT"

echo ""
echo "================================================"
echo "Setup complete!"
echo "================================================"
echo ""

การสร้างสคริปต์ Python สำหรับการเริ่มต้นข้อมูล

หลังจากนั้น ให้สร้างไฟล์ Python ของสคริปต์การเริ่มต้น scripts/setup_jobs_db.py โดยใช้คำสั่งด้านล่าง

cloudshell edit scripts/setup_jobs_db.py

จากนั้นคัดลอกโค้ดต่อไปนี้ลงในไฟล์ scripts/setup_jobs_db.py

import os
import sys
from pathlib import Path
from dotenv import load_dotenv
from google.cloud.sql.connector import Connector
import pg8000
import time

# Load environment variables from .env file
env_path = Path(__file__).parent.parent / '.env'
load_dotenv(env_path)
EMBEDDING_MODEL='gemini-embedding-001'

# Verify required environment variables
required_vars = ['GOOGLE_CLOUD_PROJECT', 'REGION', 'DB_PASSWORD']
missing_vars = [var for var in required_vars if not os.environ.get(var)]

if missing_vars:
    print(f"ERROR: Missing required environment variables: {', '.join(missing_vars)}", file=sys.stderr)
    print(f"", file=sys.stderr)
    print(f"Expected .env file location: {env_path}", file=sys.stderr)
    if not env_path.exists():
        print(f"✗ File not found at that location", file=sys.stderr)
    else:
        print(f"✓ File exists but is missing the variables above", file=sys.stderr)
    print(f"", file=sys.stderr)
    print(f"Make sure your .env file contains:", file=sys.stderr)
    for var in missing_vars:
        print(f"  {var}=<value>", file=sys.stderr)
    sys.exit(1)

# Job listings data (fictional, for tutorial purposes only)
JOBS = [
    ("Senior Backend Engineer", "Stripe", "Backend", "Go, PostgreSQL, gRPC, Kubernetes", "$180-250K/year", "San Francisco, Hybrid", 3,
     "Design and build high-throughput microservices powering payment infrastructure for millions of businesses. Optimize Go services for sub-100ms latency at scale, work with PostgreSQL and Redis for data persistence, and deploy on Kubernetes clusters handling billions of API calls."),
    ("Machine Learning Engineer", "Spotify", "Data/AI", "Python, TensorFlow, BigQuery, Vertex AI", "$170-230K/year", "Stockholm, Remote", 2,
     "Build and deploy ML models for music recommendation and personalization systems serving hundreds of millions of listeners. Design feature pipelines in BigQuery, train models using distributed computing, and serve predictions through real-time APIs processing thousands of requests per second."),
    ("Frontend Engineer", "Vercel", "Frontend", "React, TypeScript, Next.js", "$140-190K/year", "Remote", 4,
     "Build developer-facing dashboard interfaces and deployment tools used by millions of developers worldwide. Create responsive, accessible React components for project management, analytics, and real-time deployment monitoring with a focus on developer experience."),
    ("DevOps Engineer", "Datadog", "DevOps", "Terraform, GCP, Docker, Kubernetes, ArgoCD", "$160-220K/year", "New York, Hybrid", 2,
     "Manage cloud infrastructure powering an observability platform used by thousands of engineering teams. Automate deployment pipelines with ArgoCD, manage multi-cloud Kubernetes clusters, and implement infrastructure-as-code with Terraform across production environments."),
    ("Mobile Engineer (Android)", "Grab", "Mobile", "Kotlin, Jetpack Compose, GraphQL", "$120-170K/year", "Singapore, Hybrid", 3,
     "Develop features for a super-app serving millions of users across Southeast Asia. Build modern Android UIs with Jetpack Compose, integrate GraphQL APIs, and optimize app performance for diverse device capabilities and network conditions."),
    ("Data Engineer", "Airbnb", "Data", "Python, Apache Spark, Airflow, BigQuery", "$160-210K/year", "San Francisco, Hybrid", 2,
     "Build data pipelines that process booking, search, and pricing data for a global travel marketplace. Design ETL workflows with Apache Spark and Airflow, maintain data warehouses in BigQuery, and ensure data quality for analytics and machine learning teams."),
    ("Full Stack Engineer", "Revolut", "Full Stack", "TypeScript, Node.js, React, PostgreSQL", "$130-180K/year", "London, Remote", 5,
     "Build the next generation of financial products making banking accessible to millions of users across 35 countries. Develop real-time trading interfaces with React and WebSockets, build Node.js APIs handling market data streams, and design PostgreSQL schemas for financial transactions."),
    ("Site Reliability Engineer", "Cloudflare", "SRE", "Go, Prometheus, Grafana, GCP, Terraform", "$170-230K/year", "Austin, Hybrid", 2,
     "Ensure 99.99% uptime for a global network handling millions of requests per second. Define SLOs, build monitoring dashboards with Prometheus and Grafana, manage incident response, and automate infrastructure scaling across 300+ data centers worldwide."),
    ("Cloud Architect", "Google Cloud", "Cloud", "GCP, Terraform, Kubernetes, Python", "$200-280K/year", "Seattle, Hybrid", 1,
     "Help enterprises modernize their infrastructure on Google Cloud. Design multi-region architectures, lead migration projects from on-premises to GKE, and build reference implementations using Terraform and Cloud Foundation Toolkit."),
    ("Backend Engineer (Payments)", "Square", "Backend", "Java, Spring Boot, PostgreSQL, Kafka", "$160-220K/year", "San Francisco, Hybrid", 3,
     "Build payment processing systems handling millions of transactions for businesses of all sizes. Design event-driven architectures using Kafka, implement idempotent payment flows with Spring Boot, and ensure PCI-DSS compliance across all services."),
    ("AI Engineer", "Hugging Face", "Data/AI", "Python, LangChain, Vertex AI, FastAPI, PostgreSQL", "$150-210K/year", "Paris, Remote", 2,
     "Build AI-powered tools for the largest open-source ML community. Develop RAG pipelines that index and search model documentation, create conversational agents using LangChain, and deploy AI services with FastAPI on cloud infrastructure."),
    ("Platform Engineer", "Coinbase", "Platform", "Rust, Kubernetes, AWS, Terraform", "$180-250K/year", "Remote", 0,
     "Build the infrastructure platform for a leading cryptocurrency exchange. Develop high-performance matching engines in Rust, manage Kubernetes clusters for microservices, and design CI/CD pipelines that enable rapid feature deployment with zero downtime."),
    ("QA Automation Engineer", "Shopify", "QA", "Python, Selenium, Cypress, Jenkins", "$110-160K/year", "Toronto, Hybrid", 3,
     "Design and maintain automated test suites for a commerce platform powering millions of merchants. Build end-to-end test frameworks with Cypress and Selenium, integrate tests into Jenkins CI pipelines, and establish quality gates that prevent regressions in checkout and payment flows."),
    ("Security Engineer", "CrowdStrike", "Security", "Python, SIEM, Kubernetes, Penetration Testing", "$170-240K/year", "Austin, On-site", 1,
     "Protect enterprise customers from cyber threats on a leading endpoint security platform. Conduct penetration testing, design security monitoring with SIEM tools, implement zero-trust networking in Kubernetes environments, and lead incident response for security events."),
    ("Product Engineer", "GitLab", "Full Stack", "Go, React, PostgreSQL, Redis, GCP", "$140-200K/year", "Remote", 4,
     "Own features end-to-end for an all-in-one DevSecOps platform used by millions of developers. Build Go microservices for CI/CD pipelines, create React frontends for code review and project management, and collaborate with product managers to iterate on user-facing features using data-driven development."),
]


def get_connection():
    """Create a connection to Cloud SQL using the connector."""
    project = os.environ['GOOGLE_CLOUD_PROJECT']
    region = os.environ['REGION']
    password = os.environ['DB_PASSWORD']
    instance = os.environ['DB_INSTANCE']
    database = os.environ['DB_NAME']

    connector = Connector()
    conn = connector.connect(
        f"{project}:{region}:{instance}",
        "pg8000",
        user="postgres",
        password=password,
        db=database
    )
    return conn, connector


def create_schema(cursor):
    """Create extensions and jobs table."""
    cursor.execute("CREATE EXTENSION IF NOT EXISTS google_ml_integration")
    cursor.execute("CREATE EXTENSION IF NOT EXISTS vector")
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS jobs (
            id SERIAL PRIMARY KEY,
            title VARCHAR NOT NULL,
            company VARCHAR NOT NULL,
            role VARCHAR NOT NULL,
            tech_stack VARCHAR NOT NULL,
            salary_range VARCHAR NOT NULL,
            location VARCHAR NOT NULL,
            openings INTEGER NOT NULL,
            description TEXT NOT NULL,
            description_embedding vector(3072)
        )
    """)


def seed_jobs(cursor, conn):
    """Insert job listings."""
    cursor.execute("SELECT COUNT(*) FROM jobs")
    existing_count = cursor.fetchone()[0]

    if existing_count > 0:
        print(f"      {existing_count} jobs already exist, skipping seed")
        return 0

    cursor.executemany("""
        INSERT INTO jobs (title, company, role, tech_stack, salary_range, location, openings, description)
        VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
    """, JOBS)
    conn.commit()
    return len(JOBS)


def generate_embeddings(cursor, conn):
    """Generate embeddings using Cloud SQL's embedding() function."""
    cursor.execute("SELECT COUNT(*) FROM jobs WHERE description_embedding IS NULL")
    null_count = cursor.fetchone()[0]

    if null_count == 0:
        print("      All jobs already have embeddings")
        return 0

    cursor.execute(f"""
        UPDATE jobs
        SET description_embedding = embedding('{EMBEDDING_MODEL}', description)::vector
        WHERE description_embedding IS NULL
    """)
    rows_updated = cursor.rowcount
    conn.commit()
    return rows_updated


def main():
    conn, connector = get_connection()
    cursor = conn.cursor()

    try:
        create_schema(cursor)
        conn.commit()

        seeded = seed_jobs(cursor, conn)
        if seeded > 0:
            print(f"      ✓ Inserted {seeded} jobs")

        # Waiting for vertex role propagation
        time.sleep(60)
        embedded = generate_embeddings(cursor, conn)
        if embedded > 0:
            print(f"      ✓ Generated {embedded} embeddings")

    except Exception as e:
        print(f"ERROR: {e}", file=sys.stderr)
        sys.exit(1)
    finally:
        cursor.close()
        conn.close()
        connector.close()


if __name__ == "__main__":
    main()

ตอนนี้เรามาดูขั้นตอนถัดไปกัน

4. สร้างและเริ่มต้นฐานข้อมูล

ตอนนี้สคริปต์ของเราพร้อมที่จะดำเนินการแล้ว เราจะต้องใช้ Python เพื่อเรียกใช้สคริปต์ที่เตรียมไว้ ดังนั้นมาเตรียมสคริปต์นั้นก่อน

ตั้งค่าโปรเจ็กต์ Python

uv เป็นตัวจัดการแพ็กเกจและโปรเจ็กต์ Python ที่รวดเร็วซึ่งเขียนด้วย Rust ( เอกสารประกอบของ uv ) Codelab นี้ใช้เพื่อความเร็วและความเรียบง่ายในการดูแลโปรเจ็กต์ Python

เริ่มต้นโปรเจ็กต์ Python และเพิ่มการอ้างอิงที่จำเป็น

uv init
uv add cloud-sql-python-connector --extra pg8000
uv add python-dotenv

โปรดทราบว่าเราใช้ cloud-sql-python-connector Python SDK ที่นี่เพื่อเริ่มต้นการเชื่อมต่อที่ปลอดภัยกับอินสแตนซ์ฐานข้อมูลของเรา ซึ่งได้รับการตรวจสอบสิทธิ์โดยใช้ข้อมูลรับรองเริ่มต้นของแอปพลิเคชัน

เรียกใช้สคริปต์การตั้งค่า

ตอนนี้เราสามารถเรียกใช้สคริปต์การตั้งค่าในเบื้องหลังและตรวจสอบเอาต์พุตของคอนโซลที่จะเขียนลงในไฟล์ logs/atabase_setup.log โดยใช้คำสั่งต่อไปนี้ คุณไปที่ส่วนถัดไปได้ขณะรอให้การดำเนินการนี้เสร็จสิ้น

mkdir -p ~/build-agent-adk-toolbox-cloudsql/logs
bash scripts/setup_database.sh > logs/database_setup.log 2>&1 &

ดาวน์โหลดไบนารีของกล่องเครื่องมือ

ในบทแนะนำนี้ เราจะใช้ MCP Toolbox ซึ่งมาพร้อมกับไบนารีที่สร้างไว้ล่วงหน้าซึ่งพร้อมใช้งานในสภาพแวดล้อม Linux ตอนนี้เรามาดาวน์โหลดในเบื้องหลังกันเลยเพราะอาจใช้เวลานาน เรียกใช้คำสั่งต่อไปนี้เพื่อดาวน์โหลดไบนารีและตรวจสอบบันทึกเอาต์พุตใน logs/toolbox_dl.log คุณไปที่ส่วนถัดไปได้ขณะรอให้การดำเนินการนี้เสร็จสิ้น

cd ~/build-agent-adk-toolbox-cloudsql
curl -O https://storage.googleapis.com/mcp-toolbox-for-databases/v1.0.0/linux/amd64/toolbox > logs/toolbox_dl.log 2>&1 &

ทำความเข้าใจสคริปต์การตั้งค่า `scripts/setup_database.sh`

ตอนนี้มาลองทำความเข้าใจสคริปต์การตั้งค่าที่เรากำหนดค่าไว้ก่อนหน้านี้กัน โดยจะดำเนินการตามกระบวนการต่อไปนี้

คำสั่งแรกที่เราเรียกใช้คือคำสั่ง gcloud sql instances create ที่มีแฟล็กต่อไปนี้

db-custom-1-3840 คือระดับ Cloud SQL แบบคอร์เฉพาะที่เล็กที่สุด (1 vCPU, RAM 3.75 GB) ในรุ่น ENTERPRISE อ่านรายละเอียดเพิ่มเติมได้ที่นี่ ต้องใช้คอร์แบบเฉพาะสำหรับการผสานรวม Vertex AI ML โดยที่ระดับคอร์ที่ใช้แกนประมวลผลร่วมกัน (db-f1-micro, db-g1-small) ไม่รองรับ
--root-password ตั้งค่ารหัสผ่านสำหรับผู้ใช้ postgres เริ่มต้น
--enable-google-ml-integration ช่วยให้การผสานรวมในตัวของ Cloud SQL กับ Vertex AI ทำงานได้ ซึ่งช่วยให้คุณเรียกใช้โมเดลการฝังจาก SQL ได้โดยตรงโดยใช้ฟังก์ชัน embedding()

ตรวจสอบว่าอินสแตนซ์อยู่ในสถานะ RUNNABLE แล้วหรือไม่
ให้สิทธิ์บัญชีบริการของอินสแตนซ์ Cloud SQL ในการเรียกใช้ Vertex AI โดยใช้คำสั่ง gcloud projects add-iam-policy-binding ซึ่งจำเป็นสำหรับฟังก์ชัน embedding() ในตัวที่เราจะใช้เมื่อเริ่มต้นฐานข้อมูล
การสร้างฐานข้อมูล
การเรียกใช้สคริปต์การเริ่มต้น setup_jobs_db.py

ทำความเข้าใจสคริปต์เริ่มต้น `scripts/setup_jobs_db.py`

ตอนนี้มาดูสคริปต์การเริ่มต้นกัน สคริปต์นี้จะทำสิ่งต่อไปนี้

เริ่มต้นการเชื่อมต่อกับอินสแตนซ์ฐานข้อมูล
ติดตั้งส่วนขยาย PostgreSQL 2 รายการ

google_ml_integration — มีฟังก์ชัน SQL ของ embedding() ซึ่งเรียกใช้โมเดลการฝังของ Vertex AI จาก SQL โดยตรง ส่วนขยายนี้เป็นส่วนขยายระดับฐานข้อมูลที่ทำให้ฟังก์ชัน ML พร้อมใช้งานภายใน jobs_db แฟล็กระดับอินสแตนซ์ (--enable-google-ml-integration) ที่คุณตั้งค่าในระหว่างการสร้างอินสแตนซ์จะอนุญาตให้ VM ของ Cloud SQL เข้าถึง Vertex AI ได้ ส่วนการขยายจะทำให้ฟังก์ชัน SQL พร้อมใช้งานภายในฐานข้อมูลที่เฉพาะเจาะจงนี้
vector (pgvector) — เพิ่มประเภทข้อมูล vector และตัวดำเนินการระยะทางสำหรับการจัดเก็บและค้นหาการฝัง

สร้างตาราง โปรดทราบว่าคอลัมน์ description_embedding คือ vector(3072) ซึ่งเป็นคอลัมน์ pgvector ที่จัดเก็บเวกเตอร์ 3072 มิติ
เริ่มต้นข้อมูลงานเบื้องต้น
สร้างข้อมูลการฝังจากฟิลด์ description และกรอก description_embedding โดยใช้การผสานรวม Vertex ในตัวผ่านฟังก์ชัน embedding()

embedding('gemini-embedding-001', description) — เรียกใช้โมเดลการฝัง Gemini ของ Vertex AI โดยตรงจาก SQL โดยส่งdescriptionข้อความของแต่ละงาน นี่คือส่วนขยาย google_ml_integration ที่คุณติดตั้งในสคริปต์เริ่มต้น
::vector — แปลงอาร์เรย์ของค่าทศนิยมที่ส่งกลับเป็นประเภท vector ของ pgvector เพื่อให้จัดเก็บและค้นหาด้วยโอเปอเรเตอร์ระยะทางได้
UPDATE จะทำงานในทั้ง 15 แถว โดยสร้างการฝังมิติ 3072 จำนวน 1 รายการต่อคำอธิบายงาน

ซึ่งจะเตรียมข้อมูลเริ่มต้นที่ตัวแทนของเราจะเข้าถึงได้

5. กำหนดค่า MCP Toolbox สำหรับฐานข้อมูล

ขั้นตอนนี้จะแนะนำ MCP Toolbox สำหรับฐานข้อมูล กำหนดค่าให้เชื่อมต่อกับอินสแตนซ์ Cloud SQL และกำหนดเครื่องมือคำค้นหา SQL มาตรฐาน 2 รายการ

MCP คืออะไรและเหตุใดจึงควรใช้กล่องเครื่องมือ

MCP (Model Context Protocol) เป็นโปรโตคอลแบบเปิดที่กำหนดมาตรฐานวิธีที่ Agent AI ค้นพบและโต้ตอบกับเครื่องมือภายนอก โดยจะกำหนดรูปแบบไคลเอ็นต์-เซิร์ฟเวอร์ ซึ่ง Agent จะโฮสต์ไคลเอ็นต์ MCP และเซิร์ฟเวอร์ MCP จะแสดงเครื่องมือ ไคลเอ็นต์ที่เข้ากันได้กับ MCP สามารถใช้เซิร์ฟเวอร์ที่เข้ากันได้กับ MCP ได้ โดยเอเจนต์ไม่จำเป็นต้องใช้โค้ดการผสานรวมที่กำหนดเองสำหรับแต่ละเครื่องมือ

MCP Toolbox สำหรับฐานข้อมูลเป็นเซิร์ฟเวอร์ MCP โอเพนซอร์สที่สร้างขึ้นเพื่อการเข้าถึงฐานข้อมูลโดยเฉพาะ หากไม่มีฟีเจอร์นี้ คุณจะต้องเขียนฟังก์ชัน Python ที่เปิดการเชื่อมต่อฐานข้อมูล จัดการกลุ่มการเชื่อมต่อ สร้างการค้นหาที่มีพารามิเตอร์เพื่อป้องกันการแทรก SQL จัดการข้อผิดพลาด และฝังโค้ดทั้งหมดนั้นไว้ใน Agent ตัวแทนทุกคนที่ต้องการเข้าถึงฐานข้อมูลจะต้องทำซ้ำ การเปลี่ยนคำค้นหาหมายถึงการติดตั้งใช้งาน Agent อีกครั้ง

โดยคุณจะเขียนไฟล์ YAML ด้วย Toolbox เครื่องมือแต่ละอย่างจะแมปกับคำสั่ง SQL ที่มีพารามิเตอร์ Toolbox จัดการการจัดกลุ่มการเชื่อมต่อ คำค้นที่มีพารามิเตอร์ การตรวจสอบสิทธิ์ และความสามารถในการสังเกต เครื่องมือจะแยกออกจากเอเจนต์ ซึ่งคุณสามารถอัปเดตคําค้นหาได้โดยการแก้ไข tools.yaml และรีสตาร์ทกล่องเครื่องมือโดยไม่ต้องแตะต้องโค้ดเอเจนต์ เครื่องมือเดียวกันนี้ใช้ได้กับ ADK, LangGraph, LlamaIndex หรือเฟรมเวิร์กที่เข้ากันได้กับ MCP

เขียนการกำหนดค่าเครื่องมือ

ตอนนี้เราต้องสร้างไฟล์ชื่อ tools.yaml ใน Cloud Shell Editor เพื่อตั้งค่าเครื่องมือ

cloudshell edit tools.yaml

ไฟล์ใช้ YAML แบบหลายเอกสาร โดยแต่ละบล็อกที่คั่นด้วย --- จะเป็นทรัพยากรแบบสแตนด์อโลน ทรัพยากรทุกรายการมี kind ที่ประกาศว่าทรัพยากรนั้นคืออะไร (sources สำหรับการเชื่อมต่อฐานข้อมูล tools สำหรับการดำเนินการที่เรียกใช้ได้ของเอเจนต์) และ type ที่ระบุแบ็กเอนด์ (cloud-sql-postgres สำหรับแหล่งที่มา postgres-sql สำหรับเครื่องมือที่ใช้ SQL) เครื่องมือจะอ้างอิงแหล่งที่มาด้วย name ซึ่งเป็นวิธีที่กล่องเครื่องมือทราบว่าต้องใช้ Connection Pool ใด ตัวแปรสภาพแวดล้อมใช้ไวยากรณ์ ${VAR_NAME} และจะได้รับการแก้ไขเมื่อเริ่มต้นระบบ

ตอนนี้มาคัดลอกสคริปต์ต่อไปนี้ลงในไฟล์ tools.yaml ก่อน

# tools.yaml

# --- Data Source ---
kind: source
name: jobs-db
type: cloud-sql-postgres
project: ${GOOGLE_CLOUD_PROJECT}
region: ${REGION}
instance: ${DB_INSTANCE}
database: ${DB_NAME}
user: postgres
password: ${DB_PASSWORD}

---