Build an Autonomous Supply Chain with Gemini 3 Flash & AlloyDB AI

1. Overview

The era of "Chatbots that Read" is ending. We are entering the era of Agentic Vision.

In this Codelab, we will implement Deterministic AI Engineering—a practice of building AI systems that don't guess. Standard AI models often "hallucinate" (guess) when asked to count items in a complex image. In a supply chain, a guess is dangerous. If an AI guesses you have 12 items when you actually have 15, it triggers costly errors.

We will build an Autonomous Supply Chain Agent utilizing the new Think, Act, Observe loop in Gemini 3 Flash. It doesn't just look; it investigates.

The Deterministic Architecture

We will start with a "blind" and "amnesiac" system. You will manually "awaken" its senses one by one:

The Eyes (Vision Agent): We enable Gemini 3 Flash with Code Execution. Instead of predicting tokens to guess a number, the model writes Python code (OpenCV) to count pixels deterministically.
The Memory (Supplier Agent): We enable AlloyDB AI with ScaNN (Scalable Nearest Neighbors). This allows the agent to recall the exact supplier for a part from millions of options in milliseconds.
The Handshake (A2A Protocol): We enable Agent-to-Agent communication using a standardized agent_card.json , allowing the Vision Agent to autonomously order stock from the Supplier Agent.

What you'll build

A Vision Agent that performs "visual math" on camera feeds.
A Supplier Agent backed by AlloyDB ScaNN for high-speed vector search.
A Control Tower frontend with real-time WebSocket updates to visualize the autonomous loop.

What you'll learn

How to enable Agentic Vision with gemini-3-flash-preview using the Gemini API.
How to implement vector search using the <=> (cosine distance) operator in AlloyDB.
How to bridge Cloud Shell to AlloyDB using the Auth Proxy.

Requirements

A browser, such as Chrome or Firefox
A Google Cloud project with billing enabled.
A Gemini API key (free tier available at Google AI Studio) for the Vision Agent.

2. Before you begin

Create a project

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

You'll use Cloud Shell, a command-line environment running in Google Cloud. Click Activate Cloud Shell at the top of the Google Cloud console.

Activate Cloud Shell button image

Once connected to Cloud Shell, you check that you're already authenticated and that the project is set to your project ID using the following command:

gcloud auth list

That's it!

You're now ready for the one-click setup. The next section will:

Open Cloud Shell automatically
Clone the repository
Guide you through the entire setup in an interactive tutorial

3. One-Click Setup in Cloud Shell

We've streamlined the setup into a guided Cloud Shell tutorial. Everything is automated: infrastructure provisioning, AlloyDB setup, Auth Proxy configuration, and database seeding.

Launch Cloud Shell Tutorial

⚠️ IMPORTANT - Before clicking: When you click the button below, you'll see a security dialog asking "Open in Cloud Shell". This appears BEFORE the repository clones.

You must:

✅ Check the box: "Trust repo"
✅ Click "Confirm"

Without this, the repository will not get cloned.

Ready? Click to open the project with a step-by-step tutorial:

What happens next:

Cloud Shell opens with the repository pre-cloned
A tutorial panel appears on the right with step-by-step instructions
You'll be guided through:

Getting your Gemini API key (free tier available)
Setting your GCP project in the terminal
Running setup (checks APIs, enables if needed, provisions AlloyDB: ~15 minutes)
Making 2 key code changes (enable vision + memory)
Creating the agent card (A2A protocol)
Starting all services

The tutorial is interactive—each step is numbered and tracks your progress.

Alternative: Manual Setup

If you prefer manual control:

Open Cloud Shell and verify your project is set

gcloud config get-value project

If needed, set your project

gcloud config set project YOUR_PROJECT_ID

Clone the repository

git clone https://github.com/MohitBhimrajka/visual-commerce-gemini-3-alloydb.git
cd visual-commerce-gemini-3-alloydb

Run setup

sh setup.sh

Follow the on-screen instructions from the setup script.

What's Next: The tutorial guides you through the remaining steps. Once complete, continue to Section 4 to understand what happened under the hood

4. Behind the Scenes: Auth Proxy & Database Seeding

The Problem: AlloyDB lives inside a Private VPC. Cloud Shell is outside it. Direct connection is impossible.

The Fix: The AlloyDB Auth Proxy creates a secure, IAM-authenticated tunnel from 127.0.0.1:5432 on Cloud Shell to your AlloyDB instance. If your instance has Public IP enabled, the proxy uses it; otherwise it connects via the VPC's private IP.

What setup.sh Did

Auto-detected your AlloyDB instance (cluster, region, project)
Created .env with all credentials (GEMINI_API_KEY, DB_PASS, AlloyDB details)
Downloaded and started the Auth Proxy (with –public-ip if applicable)
Seeded the database with 8 sample inventory parts + ScaNN index

Your .env file is ready. All future runs load credentials automatically.

Verify It Worked

Ensure you're still in the repo root

pwd  # Should end with: visual-commerce-gemini-3-alloydb

Check Auth Proxy is running

ps aux | grep alloydb-auth-proxy

What Got Created

inventory table with 8 parts and 768-dimensional embeddings
ScaNN index (idx_inventory_scann) for fast vector search

5. Step 1: The Memory (Supplier Agent)

The Supplier Agent remembers millions of parts using AlloyDB ScaNN. We start it as an A2A server, then fix the vector query.

The Audit: The Amnesiac

If you query the Supplier Agent now (with the placeholder SQL), it returns the first row it finds—not the nearest match. It has no concept of similarity. It is an amnesiac.

Start the Supplier Agent

The A2A server (main.py) delegates to agent_executor.py, which bridges the protocol to the business logic in inventory.py.

pkill -f uvicorn #Kill all uvicorn processes

Step 1: Navigate to the agent directory

cd agents/supplier-agent

Step 2: Install dependencies

pip install -r requirements.txt

Step 3: Start the agent server

uvicorn main:app --host 0.0.0.0 --port 8082 > /dev/null 2>&1 &

The > /dev/null 2>&1 & runs the server in the background and suppresses output so it doesn't interrupt your terminal.

Step 4: Verify the agent is running (wait 2-3 seconds after starting)

curl http://localhost:8082/.well-known/agent-card.json

Expected Output: JSON with agent configuration (should return without errors)

Real Semantic Embeddings

During setup, the database was seeded with real semantic embeddings generated via Google Gen AI SDK's text-embedding-005 model. This ensures accurate similarity matching - not random vectors. The seed process takes ~10 seconds for 13 sample items using parallel embedding generation to create meaningful 768-dimensional vectors that capture the semantic meaning of each part.

AlloyDB Detour: Why ScaNN?

The Fix: Implementing the <=> Operator

The agent ships with a placeholder query. We need to enable ScaNN vector search.

Step 1: Open the inventory file

cd agents/supplier-agent

Step 2: Find the TODO in inventory.py

Look for the find_supplier() function around line 47-60. You'll see:

# ============================================================
# CODELAB STEP 1: Implement ScaNN Vector Search
# ============================================================
# TODO: Replace this placeholder query with ScaNN vector search

sql = "SELECT part_name, supplier_name FROM inventory LIMIT 1;"
cursor.execute(sql)

Step 3: Replace the placeholder SQL with ScaNN vector search

Delete these two lines:

sql = "SELECT part_name, supplier_name FROM inventory LIMIT 1;"
cursor.execute(sql)

And replace them with:

sql = """
SELECT part_name, supplier_name
FROM inventory
ORDER BY part_embedding <=> %s::vector
LIMIT 1;
"""
cursor.execute(sql, (embedding_vector,))

What this does:

<=> is the cosine distance operator in PostgreSQL
ORDER BY part_embedding <=> %s::vector finds the nearest match (lowest distance = closest semantic meaning)
%s::vector casts your embedding array to PostgreSQL's vector type
LIMIT 1 returns only the closest match
The ScaNN index automatically accelerates this query!

Step 4: Save the file (Ctrl+S or Cmd+S)

The agent will now use semantic search instead of returning random results!

Verification

Test A2A discovery and the inventory:

curl http://localhost:8082/.well-known/agent-card.json

python3 -c "
from inventory import find_supplier
import json
vec = [0.1]*768
r = find_supplier(vec)
if r:
    result = {'part': r[0], 'supplier': r[1]}
    if len(r) > 2:
        result['distance'] = float(r[2]) if r[2] else None
    print(json.dumps(result))
else:
    print('No result found')
"

Expected: agent-card.json returns the agent card. The Python snippet returns a part and supplier from the seeded data.

6. Step 2: The Eyes (Vision Agent)

While the database is accessible, let's awaken the eyes using Gemini 3 Flash. The Vision Agent performs "visual math" via Code Execution. The A2A server (main.py) delegates to agent_executor.py, which calls agent.py for Gemini analysis.

The Audit: The Hallucination

If you ask a standard multimodal model "How many boxes are in this messy image?", it processes the image as a static snapshot and guesses.

Model says: "I see about 12 boxes."
Reality: There are 15 boxes.
Result: Supply chain failure.

The Fix: Awakening the Think-Act-Observe Loop

We enable Code Execution and ThinkingConfig so the model writes Python (OpenCV) to count deterministically.

Open agents/vision-agent/agent.py .
Find the GenerateContentConfig section.
Uncomment both the thinking_config=types.ThinkingConfig(...) block and tools=[types.Tool(code_execution=...)] .
The client is already configured to use your GEMINI_API_KEY from the environment.

File: agents/vision-agent/agent.py

config = types.GenerateContentConfig(
    temperature=0,
    # CODELAB STEP 1: Uncomment to enable reasoning
    thinking_config=types.ThinkingConfig(
        thinking_level="LOW",  # Valid: "MINIMAL", "LOW", "MEDIUM", "HIGH"
        include_thoughts=False    # Set to True for debugging
    ),
    # CODELAB STEP 2: Uncomment to enable code execution
    tools=[types.Tool(code_execution=types.ToolCodeExecution)]
)

Why thinking_level="LOW"?

For this specific task (counting items via code execution), "LOW" provides sufficient reasoning budget to:

Plan the Python script structure
Decide which image processing approach to use
Verify the count matches the number of bounding boxes

Using "HIGH" would add 2-3x latency and cost without improving accuracy for deterministic tasks. Reserve "HIGH" for complex multi-step reasoning (e.g., "Analyze this supply chain disruption and recommend 3 alternative suppliers with justification").

Cost-Performance Optimization is a key skill for production AI engineering: match reasoning depth to task complexity.

Start the Vision Agent

🔄 Path Check: If you're still in agents/supplier-agent/, first go back to repo root with cd ../..

Step 1: Navigate to the vision agent directory

cd agents/vision-agent

Step 2: Install dependencies

pip install -r requirements.txt

Step 3: Start the vision agent server

uvicorn main:app --host 0.0.0.0 --port 8081 > /dev/null 2>&1 &

The > /dev/null 2>&1 & runs the server in the background and suppresses output so it doesn't interrupt your terminal.

Verification

Test A2A discovery:

curl http://localhost:8081/.well-known/agent-card.json

Expected: JSON with agent name and skills. You'll test the actual vision counting with the Control Tower UI in Step 8.

7. Step 3: The Handshake (A2A Agent Card)

Our agent sees the problem (Vision) and knows the supplier (Memory). The A2A protocol enables dynamic discovery—the frontend learns how to talk to each agent by reading its card.

A2A vs Traditional REST APIs

Aspect	Traditional REST	A2A Protocol
Endpoint Discovery	Hardcoded URLs in config	Dynamic via /.well-known/agent-card.json
Capability Description	API docs (for humans)	Skills (machine-readable)
Integration	Manual code per service	Semantic matching: "I need inventory search" → discovers skill
New Agent Added	Update all clients' configs	Zero config—auto-discovered

Real-World Benefit: In a traditional microservice, if you add a third "Logistics Agent," you'd need to update the Control Tower's code with its URL and API contract. With A2A, the Control Tower discovers it automatically and understands its capabilities through natural language skill descriptions.

This is why A2A enables Plug-and-Play Agent Composition—the architectural pattern for autonomous systems.

The Fix: Create the Agent Card

We need to define what the Supplier Agent can do.

Copy agents/supplier-agent/agent_card_skeleton.json to agents/supplier-agent/agent_card.json.
Edit the file to replace placeholders:

Before (skeleton):

{
  "name": "___FILL: agent-name ___",
  "description": "___FILL: what-this-agent-does ___"
}

After (your edits):

{
  "name": "Acme Supplier Agent",
  "description": "Autonomous fulfillment for industrial parts via AlloyDB ScaNN.",
  "version": "1.0.0",
  "skills": [{
    "id": "search_inventory",
    "name": "Search Inventory",
    "description": "Searches the warehouse database for semantic matches using AlloyDB ScaNN vector search.",
    "tags": ["inventory", "search", "alloydb"],
    "examples": ["Find stock for Industrial Widget X-9", "Who supplies ball bearings?"]
  }]
}

Restart the Supplier Agent so it loads the new card:

Step 1: Stop the running agent

pkill -f "uvicorn main:app.*8082"

Step 2: Navigate to the agent directory

cd agents/supplier-agent

Step 3: Start the agent again

uvicorn main:app --host 0.0.0.0 --port 8082 > /dev/null 2>&1 &

The > /dev/null 2>&1 & runs the server in the background and suppresses output so it doesn't interrupt your terminal.

Step 4: Verify the new agent card (wait 2-3 seconds after starting)

curl http://localhost:8082/.well-known/agent-card.json

Expected Output: JSON with your filled-in name, description, and skills.

8. Step 4: The Control Tower

Run the Control Tower frontend with FastAPI + WebSockets. It discovers agents via A2A and orchestrates the full loop with real-time updates.

Start All Services

The easiest way to start all services:

Verify you're in repo root

pwd  # Should end with: visual-commerce-gemini-3-alloydb

Then,

sh run.sh

This single command starts:

AlloyDB Auth Proxy (if not running)
Vision Agent on port 8081
Supplier Agent on port 8082
Control Tower on port 8080

Wait ~10 seconds for all services to initialize.

Test the System

Access the Control Tower:

Click Web Preview button (eye icon 👁️) in the Cloud Shell toolbar
Select "Preview on port 8080"
The Control Tower dashboard will open in a new tab

Run the Demo:

Top right: Connection status (green "Live" dot), DEMO/AUTO mode toggle, and audio controls
Center: Main workflow canvas with image upload and analysis visualization
Side panels (appear during analysis): Workflow timeline (left), progress tracking and code viewer (right)

Option 1: Quick Start (Recommended)

On the homepage, you'll see a "Quick start" section with sample images
Click any sample image to auto-start analysis
Watch the autonomous workflow (~30-45 seconds)

Option 2: Upload Your Own

Drag and drop a warehouse/shelf image (PNG, JPG, up to 10MB) or click to browse
Click "Initiate Autonomous Workflow"
Observe the 4-stage pipeline

What Happens:

Agent Discovery: A2A protocol modals show Vision Agent and Supplier Agent cards with their skills and endpoints
Vision Analysis: Gemini 3 Flash generates and executes Python code (OpenCV) to count items. Progress bar shows substeps. Bounding boxes overlay on detected items. Result badge shows "✓ Code-Verified" or "~ Estimated"
Supplier Match: AlloyDB ScaNN vector search animation. Search query displays (e.g., "industrial metal boxes"). Result card shows matched part, supplier, and confidence score
Order Placed: Receipt card with order ID, quantity, and details

Tip: Keep DEMO mode on (top right) to pause at each stage for presentations. In AUTO mode, the workflow runs continuously.

What Just Happened

The Control Tower used A2A Protocol to discover both agents via /.well-known/agent-card.json, orchestrated the vision analysis (Gemini 3 Flash with code execution), performed vector search (AlloyDB ScaNN), and placed an autonomous order—all with real-time WebSocket updates. Each agent exposes its capabilities via the A2A standard, enabling plug-and-play composition without custom SDKs. Learn more: A2A Protocol

Troubleshooting

Path-Related Errors:

"No such file or directory" when running commands: You're not in the repo root.

# Check where you are
pwd

# If you're lost, navigate to home and back to repo
cd
cd visual-commerce-gemini-3-alloydb

Service Errors:

"Address already in use": Processes from previous runs are still active.

# Kill all services and restart
pkill -f uvicorn
sh run.sh  # Or manually restart individual agents

Services not starting: Check if ports are occupied:

# Check which processes are using the ports
lsof -i :8080  # Control Tower
lsof -i :8081  # Vision Agent
lsof -i :8082  # Supplier Agent

"Connection refused" to AlloyDB: Verify Auth Proxy is running:

ps aux | grep alloydb-auth-proxy

AlloyDB Connection Issues:

If you see connection to server at 127.0.0.1, port 5432 failed:

If you see connection to server at 127.0.0.1, port 5432 failed:

Check Auth Proxy: ps aux | grep alloydb-auth-proxy
Verify Public IP enabled: gcloud alloydb instances describe INSTANCE_NAME –cluster=CLUSTER_NAME –region=us-central1 –format="value(ipAddress)"
For local development (not Cloud Shell):
Problem: Cloud Shell works automatically, but local machines need authorized networks
Solution: Re-run sh setup.sh and choose option 1 (authorize 0.0.0.0/0) when prompted
Security Note: Even with 0.0.0.0/0, the connection requires:
Valid GCP credentials (Application Default Credentials)
Database password
mTLS encryption (Auth Proxy handles this)

9. Cleanup

To avoid incurring charges, destroy all resources with the automated cleanup script:

# From repo root
sh cleanup.sh

This safely removes:

AlloyDB cluster (the primary cost driver)
Cloud Run services (if deployed)
Associated service accounts

The script will prompt for confirmation before deleting anything.

10. References & Further Reading

All technical claims in this codelab are verified from official Google Cloud and Google AI documentation.

Official Documentation

Gemini 3 Flash:

Code Execution API: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/code-execution-api
Developer Guide: https://ai.google.dev/gemini-api/docs/gemini-3
Model Documentation: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-flash
Model Card: https://deepmind.google/models/gemini/flash/

AlloyDB AI & ScaNN:

ScaNN Performance Benchmarks: https://cloud.google.com/blog/products/databases/how-scann-for-alloydb-vector-search-compares-to-pgvector-hnsw
Understanding ScaNN Index: https://cloud.google.com/blog/products/databases/understanding-the-scann-index-in-alloydb
AlloyDB AI Deep Dive: https://cloud.google.com/blog/products/databases/alloydb-ais-scann-index-improves-search-on-all-kinds-of-data
Best Practices for Tuning: https://docs.cloud.google.com/alloydb/docs/ai/best-practices-tuning-scann
AlloyDB Documentation: https://cloud.google.com/alloydb/docs

Pricing Information:

Gemini API Pricing: https://ai.google.dev/gemini-api/docs/pricing
AlloyDB Pricing: https://cloud.google.com/alloydb/pricing
Vertex AI Pricing: https://cloud.google.com/vertex-ai/pricing

Verified Performance Claims

Feature	Claim	Source
ScaNN vs HNSW (filtered)	10x faster	Google Cloud Blog (verified)
ScaNN vs HNSW (standard)	4x faster	Google Cloud Blog (verified)
ScaNN memory footprint	3-4x smaller	Google Cloud Blog (verified)
ScaNN index build time	8x faster	Google Cloud Blog (verified)
Code execution timeout	30 seconds max	Google Cloud Docs (verified)
Code execution file I/O	Not supported	Google Cloud Docs (verified)
Temperature=0 behavior	Deterministic output	Community verified

Additional Resources

Agent-to-Agent (A2A) Protocol:

A2A standardizes agent discovery and communication
Agent cards served at /.well-known/agent-card.json
Emerging standard for autonomous agent collaboration

ScaNN Research:

Based on 12 years of Google Research
Powers Google Search, YouTube at billion-scale
Released for general availability: October 2024
First PostgreSQL vector index suitable for million-to-billion vectors

11. Challenge Mode: Level Up Your Agentic Skills

You've built a working autonomous supply chain. Ready to push further? These challenges apply the patterns you've learned to new problems.

Challenge 1: Image-Based Search (Multimodal Embeddings)

Current Flow: Vision Agent counts items → generates text query → Supplier Agent embeds text → searches AlloyDB

Challenge: Bypass text entirely—send the cropped image directly to the Supplier Agent.

Hints:

The Vision Agent's code execution can crop individual items from the shelf image
Vertex AI's multimodalembedding@001 model can embed images directly
Modify inventory.py to accept image bytes instead of text
Update the A2A skill description to indicate "Accepts: image/jpeg or text"

Why This Matters: Visual search is more accurate for parts with complex appearances (color variations, damage, packaging differences).

Challenge 2: Observability—Trust Through Transparency

Current State: The system works, but you can't see "under the hood"

Challenge: Inspect AlloyDB's query logs to prove the vector search is executing.

Steps:

Query insights are enabled by default on AlloyDB. To verify, run:

gcloud alloydb instances describe INSTANCE_NAME \
  --cluster=CLUSTER_NAME \
  --region=us-central1 \
  --format="value(queryInsightsConfig.queryPlansPerMinute)"

Run a supplier search through the UI
View the actual SQL executed:

gcloud logging read \
  'resource.type="alloydb.googleapis.com/Instance" AND textPayload:"ORDER BY part_embedding"' \
  --limit 5 \
  --format=json

Expected Output: You'll see the exact ORDER BY part_embedding <=> $1::vector LIMIT 1 query with execution time.

Why This Matters: Observability builds trust. When stakeholders ask "How does this agent make decisions?", you can show them the query plan, not just the output.

Challenge 3: Multi-Agent Composition

Challenge: Add a third agent (Logistics Agent) that calculates shipping costs based on warehouse location and item weight.

Architecture:

Vision Agent outputs: item count
Supplier Agent outputs: supplier location
Logistics Agent (NEW) inputs: destination, weight → outputs: shipping cost + ETA

Hint: The A2A protocol makes this trivial—create a new agent card with a calculate_shipping skill. The Control Tower will discover it automatically.

Pattern You're Learning: This is the core of Agent-Oriented Architecture—complex systems built from small, composable specialists.

12. Conclusion

You have successfully moved from Generative AI to Agentic AI.

What we built:

Vision: We replaced "guessing" with Code Execution (Gemini 3 Flash via API key).
Memory: We replaced "slow search" with AlloyDB ScaNN (via GCP).
Action: We replaced "API integration" with the A2A Protocol.

Hybrid Architecture Benefits:

This codelab demonstrated a hybrid approach:

Vision Agent: Uses Gemini API (API key) - simple, free tier available, no GCP billing required
Supplier Agent: Uses GCP (Vertex AI + AlloyDB) - enterprise-grade, compliance-ready

This is the architecture of the autonomous economy. The code is yours to keep.

Next Steps

Get your Gemini API Key (free tier available)
Explore Gemini API Documentation
Check out the AlloyDB AI Documentation
Explore Gemini Code Execution