Building a Multi-Agent System using Gemini and Gemma models

1. Introduction

In this lab, you will go beyond simple chatbots and build a distributed multi-agent system.

While a single LLM can answer questions, real-world complexity often requires specialized roles. You don't ask your backend engineer to design the UI, and you don't ask your designer to optimize database queries. Similarly, we can create specialized AI agents that focus on one task and coordinate with each other to solve complex problems.

You will build a Course Creation System consisting of:

Researcher Agent: Using google_search to find up-to-date information.
Judge Agent: Critiquing the research for quality and completeness.
Content Builder Agent: Turning the research into a structured course.
Orchestrator Agent: Managing the workflow and communication between these specialists.

What you'll learn

Define a tool-using agent (researcher) that can search the web.
Implement structured output with Pydantic for the judge.
Connect to remote agents using the Agent-to-Agent (A2A) protocol.
Construct a LoopAgent to create a feedback loop between the researcher and judge.
Run the distributed system locally using the ADK.
Deploy the multi-agent system to Google Cloud Run.
Use a Gemma model on a Cloud Run GPU for the content builder agent.

What you'll need

A web browser such as Chrome
A Google Cloud project with billing enabled

2. Architecture and Orchestration Principles

First, let's understand how these agents work together. We are building a Course Creation Pipeline.

The System Design

Architecture Diagram

Orchestrating with Agents

Standard agents (like the Researcher) do work. Orchestrator Agents (like LoopAgent or SequentialAgent) manage other agents. They don't have their own tools; their "tool" is delegation.

LoopAgent: This acts like a while loop in code. It runs a sequence of agents repeatedly until a condition is met (or max iterations reached). We use this for the Research Loop:
- Researcher finds info.
- Judge critiques it.
- If Judge says "Fail", the EscalationChecker lets the loop continue.
- If Judge says "Pass", the EscalationChecker breaks the loop.
SequentialAgent: This acts like a standard script execution. It runs agents one after another. We use this for the High-Level Pipeline:
- First, run the Research Loop (until it finishes with good data).
- Then, run the Content Builder (to write the course).

By combining these, we create a robust system that can self-correct before generating the final output.

3. Setup

Project setup

Create a Google Cloud Project

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

Start Cloud Shell

Cloud Shell is a command-line environment running in Google Cloud that comes preloaded with necessary tools.

Click Activate Cloud Shell at the top of the Google Cloud console.
Once connected to Cloud Shell, verify your authentication:
```
gcloud auth list
```
Confirm your project is configured:
```
gcloud config get project
```

If your project is not set as expected, set it:

export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID

Environment Setup

Open Cloud Shell: Click the Activate Cloud Shell icon in the top-right of the Google Cloud Console.

Get the Starter Code

Clone the starter repository into your home directory:Move into your home directory

  cd ~

Clone just the code needed for this codelab from the Google Cloud DevRel Demos folder.

git clone --depth 1 --filter=blob:none --sparse https://github.com/GoogleCloudPlatform/devrel-demos.git temp-repo && cd temp-repo && git sparse-checkout set agents/multi-agent-system && cd .. && mv temp-repo/agents/multi-agent-system . && rm -rf temp-repo

Move into the folder containing the code for this Codelab

cd multi-agent-system

Enable APIs: Run the following command to enable the necessary Google Cloud services:

gcloud services enable \
    run.googleapis.com \
    artifactregistry.googleapis.com \
    cloudbuild.googleapis.com \
    aiplatform.googleapis.com \
    compute.googleapis.com

Open this folder in your editor.
```
cloudshell edit .
```

Setup Environment

Set up environment variables.We'll create a .env file to store these variables so you can easily reload them if your session disconnects.

cat <<EOF > .env
export GOOGLE_CLOUD_PROJECT=$(gcloud config get-value project)
export GOOGLE_CLOUD_LOCATION=europe-west4
export GOOGLE_GENAI_USE_VERTEXAI=true
EOF

Source the environment variables:
```
source .env
```
Warning Environment variables are not persisted across new terminal sessions. If you open a new terminal tab, run source .env to restore them.

4. 🕵️ The Researcher Agent

Researcher Agent

The Researcher is a specialist. Its only job is to find information. To do this, it needs access to a tool: Google Search.

Why separate the Researcher?

Deep Dive: Why not just have one agent do everything?

Small, focused agents are easier to evaluate and debug. If the research is bad, you iterate on the Researcher's prompt. If the course formatting is bad, you iterate on the Content Builder. In a monolithic "do-it-all" prompt, fixing one thing often breaks another.

If you are working in Cloud Shell, run the following command to open Cloud Shell editor:
```
cloudshell workspace .
```
Open agents/researcher/agent.py.

Review the following code that defines the researcher agent:

# ... existing imports ...

# Define the Researcher Agent
researcher = Agent(
    name="researcher",
    model=MODEL,
    description="Gathers information on a topic using Google Search.",
    instruction="""
    You are an expert researcher. Your goal is to find comprehensive and accurate information on the user's topic.
    Use the `google_search` tool to find relevant information.
    Summarize your findings clearly.
    If you receive feedback that your research is insufficient, use the feedback to refine your next search.
    """,
    tools=[google_search],
)

root_agent = researcher

Key Concept: Tool Use

Notice we pass tools=[google_search]. The ADK handles the complexity of describing this tool to the LLM. When the model decides it needs information, it generates a structured tool call, the ADK executes the Python function google_search, and feeds the result back to the model.

5. ⚖️ The Judge Agent

Judge Agent

The Researcher works hard, but LLMs can be lazy. We need a Judge to review the work. The Judge accepts the research and returns a structured Pass/Fail assessment.

Structured Output

Deep Dive: To automate workflows, we need predictable outputs. A rambling text review is hard to parse programmatically. By enforcing a JSON schema (using Pydantic), we ensure the Judge returns a boolean pass or fail that our code can reliably act upon.

Open agents/judge/agent.py.

Review the following code that defines the JudgeFeedback schema and the judge agent.

# 1. Define the Schema
class JudgeFeedback(BaseModel):
    """Structured feedback from the Judge agent."""
    status: Literal["pass", "fail"] = Field(
        description="Whether the research is sufficient ('pass') or needs more work ('fail')."
    )
    feedback: str = Field(
        description="Detailed feedback on what is missing. If 'pass', a brief confirmation."
    )

# 2. Define the Agent
judge = Agent(
    name="judge",
    model=MODEL,
    description="Evaluates research findings for completeness and accuracy.",
    instruction="""
    You are a strict editor.
    Evaluate the 'research_findings' against the user's original request.
    If the findings are missing key info, return status='fail'.
    If they are comprehensive, return status='pass'.
    """,
    output_schema=JudgeFeedback,
    # Disallow delegation because it should only output the schema
    disallow_transfer_to_parent=True,
    disallow_transfer_to_peers=True,
)

root_agent = judge

Key Concept: Restricting Agent Behavior

We set disallow_transfer_to_parent=True and disallow_transfer_to_peers=True. This forces the Judge to only return the structured JudgeFeedback. It cannot decide to "chat" with the user or delegate to another agent. This makes it a deterministic component in our logic flow.

6. ✍️ The Content Builder Agent

Content Builder

The Content Builder is the creative writer. It takes the approved research and turns it into a course. It uses a Gemma model served by Cloud Run.

Let's first look at the Cloud Run service that hosts the model

Open ollama_backend/Dockerfile
Here you can see how the Dockerfile uses an Ollama image, listens for requests on port 8080, and stores the requested model in a /model folder.

FROM ollama/ollama:latest

# Listen on all interfaces, port 8080 (Cloud Run default)
ENV OLLAMA_HOST 0.0.0.0:8080

# Store model weight files in /models
ENV OLLAMA_MODELS /models

⚙️ When you deploy, you will set the following configurations:

GPU: NVIDIA L4 chosen for its excellent price-performance ratio for inference workloads. The L4 provides 24GB GPU memory and optimized tensor operations, making it ideal for 270M parameter models like Gemma
Memory: 16GB system memory to handle model loading, CUDA operations, and Ollama's memory management
CPU: 8 cores for optimal I/O handling and preprocessing tasks
Concurrency: 4 requests per instance balances throughput with GPU memory usage
Timeout: 600 seconds accommodates initial model loading and container startup

Now let's look at the Content Builder agent that uses the Gemma model.

Open agents/content_builder/agent.py.
Review the following code that defines the content_builder agent.

# the `ollama-gemma-gpu` Cloud Run service URL which hosts the Gemma model
target_url = os.environ.get("OLLAMA_API_BASE")

# ... existing code ...

# (Note: We use 'ollama/gemma3:270m' to align with ADK's expected prefix)
gemma_model_name = os.environ.get("GEMMA_MODEL_NAME", "gemma3:270m")
model = LiteLlm(
    model=f"ollama_chat/{gemma_model_name}",
    api_base=target_url
)

# 5. Define the Agent
content_builder = Agent(
    name="content_builder",
    model=model,
    description="Transforms research findings into a structured course.",
    instruction="""
    You are an expert course creator.
    Take the approved 'research_findings' and transform them into a well-structured, engaging course module.

    **Formatting Rules:**
    1. Start with a main title using a single `#` (H1).
    2. Use `##` (H2) for main section headings. These will be used for the Table of Contents.
    3. Use `###` (H3) for sub-sections within main sections.
    4. Use bullet points and clear paragraphs.
    5. Maintain a professional but engaging tone.

    **Structure Requirements:**
    - Begin with a brief Introduction section explaining what the learner will gain.
    - Organize content into 3-5 main sections with clear headings.
    - Include Key Takeaways at the end as a bulleted summary.
    - Keep each section focused and concise.

    Ensure the content directly addresses the user's original request.
    Do not include any preamble or explanation outside the course content itself.
    """,
)

root_agent = content_builder

Key Concept: Context Propagation

You might wonder: "How does the Content Builder know what the Researcher found?" In the ADK, agents in a pipeline share a session.state. Later, in the Orchestrator, we will configure the Researcher and Judge to save their outputs to this shared state. The Content Builder's prompt effectively has access to this history.

7. 🎻 The Orchestrator

Orchestrator Agent

The Orchestrator is the manager of our multi-agent team. Unlike the specialist agents (Researcher, Judge, Content Builder) who perform specific tasks, the Orchestrator's job is to coordinate the workflow and ensure information flows correctly between them.

🌐 The Architecture: Agent-to-Agent (A2A)

A2A Architecture

In this lab, we are building a distributed system. Instead of running all agents in a single Python process, we deploy them as independent microservices. This allows each agent to scale independently and fail without crashing the entire system.

To make this possible, we use the Agent-to-Agent (A2A) protocol.

The A2A Protocol

Deep Dive: In a production system, agents run on different servers (or even different clouds). The A2A protocol creates a standard way for them to discover and talk to each other over HTTP. RemoteA2aAgent is the ADK client for this protocol.

Open agents/orchestrator/agent.py.

Review how the following code that defines the connections.

# ... existing code ...

# Connect to the Researcher (Localhost port 8001)
researcher_url = os.environ.get("RESEARCHER_AGENT_CARD_URL", "http://localhost:8001/a2a/agent/.well-known/agent-card.json")
researcher = RemoteA2aAgent(
    name="researcher",
    agent_card=researcher_url,
    description="Gathers information using Google Search.",
    # IMPORTANT: Save the output to state for the Judge to see
    after_agent_callback=create_save_output_callback("research_findings"),
    # IMPORTANT: Use authenticated client for communication
    httpx_client=create_authenticated_client(researcher_url)
)

# Connect to the Judge (Localhost port 8002)
judge_url = os.environ.get("JUDGE_AGENT_CARD_URL", "http://localhost:8002/a2a/agent/.well-known/agent-card.json")
judge = RemoteA2aAgent(
    name="judge",
    agent_card=judge_url,
    description="Evaluates research.",
    after_agent_callback=create_save_output_callback("judge_feedback"),
    httpx_client=create_authenticated_client(judge_url)
)

# Content Builder (Localhost port 8003)
content_builder_url = os.environ.get("CONTENT_BUILDER_AGENT_CARD_URL", "http://localhost:8003/a2a/agent/.well-known/agent-card.json")
content_builder = RemoteA2aAgent(
    name="content_builder",
    agent_card=content_builder_url,
    description="Builds the course.",
    httpx_client=create_authenticated_client(content_builder_url)
)

8. 🛑 The Escalation Checker

A loop needs a way to stop. If the Judge says "Pass", we want to exit the loop immediately and move to the Content Builder.

Custom Logic with BaseAgent

Deep Dive: Not all agents use LLMs. Sometimes you need simple Python logic. BaseAgent lets you define an agent that just runs code. In this case, we check the session state and use EventActions(escalate=True) to signal the LoopAgent to stop.

Still in agents/orchestrator/agent.py.

Review the following code reviews the judge's feedback and proceeds to the next step when ready

class EscalationChecker(BaseAgent):
    """Checks the judge's feedback and escalates (breaks the loop) if it passed."""

    async def _run_async_impl(
        self, ctx: InvocationContext
    ) -> AsyncGenerator[Event, None]:
        # Retrieve the feedback saved by the Judge
        feedback = ctx.session.state.get("judge_feedback")
        print(f"[EscalationChecker] Feedback: {feedback}")

        # Check for 'pass' status
        is_pass = False
        if isinstance(feedback, dict) and feedback.get("status") == "pass":
            is_pass = True
        # Handle string fallback if JSON parsing failed
        elif isinstance(feedback, str) and '"status": "pass"' in feedback:
            is_pass = True

        if is_pass:
            # 'escalate=True' tells the parent LoopAgent to stop looping
            yield Event(author=self.name, actions=EventActions(escalate=True))
        else:
            # Continue the loop
            yield Event(author=self.name)

escalation_checker = EscalationChecker(name="escalation_checker")

Key Concept: Control Flow via Events

Agents communicate not just with text, but with Events. By yielding an event with escalate=True, this agent sends a signal up to its parent (the LoopAgent). The LoopAgent is programmed to catch this signal and terminate the loop.

9. 🔁 The Research Loop

Research Loop

We need a feedback loop: Research -> Judge -> (Fail) -> Research -> ...

In agents/orchestrator/agent.py.

Review how the following code defines the research_loop definition.

research_loop = LoopAgent(
    name="research_loop",
    description="Iteratively researches and judges until quality standards are met.",
    sub_agents=[researcher, judge, escalation_checker],
    max_iterations=3,
)

Key Concept: LoopAgent

The LoopAgent cycles through its sub_agents in order.

researcher: Finds data.
judge: Evaluates data.
escalation_checker: Decides whether to yield Event(escalate=True). If escalate=True happens, the loop breaks early. Otherwise, it restarts at the researcher (up to max_iterations).

10. 🔗 The Final Pipeline

Final Pipeline

Bringing it all together....

In agents/orchestrator/agent.py.

Review how the root_agent is defined at the bottom of the file.

root_agent = SequentialAgent(
    name="course_creation_pipeline",
    description="A pipeline that researches a topic and then builds a course from it.",
    sub_agents=[research_loop, content_builder],
)

Key Concept: Hierarchical Composition

Notice that research_loop is itself an agent (a LoopAgent). We treat it just like any other sub-agent in the SequentialAgent. This composability allows you to build complex logic by nesting simple patterns (loops inside sequences, sequences inside routers, etc.).

11. 🚀 Deploy to Cloud Run

We will deploy each agent as a separate service on Cloud Run, including a Cloud Run service for the course creator UI and a Cloud Run service using GPUs for the Gemma model.

Understanding Deployment Configuration

When deploying agents to Cloud Run, we pass several environment variables to configure their behavior and connectivity:

GOOGLE_CLOUD_PROJECT: Ensures the agent uses the correct Google Cloud project for logging and Vertex AI calls.
GOOGLE_GENAI_USE_VERTEXAI: Tells the agent framework (ADK) to use Vertex AI for model inference instead of calling Gemini APIs directly.
[AGENT]_AGENT_CARD_URL: This is crucial for the Orchestrator. It tells the Orchestrator where to find the remote agents. By setting this to the deployed Cloud Run URL (specifically the agent card path), we enable the Orchestrator to discover and communicate with the Researcher, Judge, and Content Builder over the internet.

To deploy all the agents to Cloud Run services, run the following script.

First, make sure that the script is executable.

chmod u+x ~/multi-agent-system/deploy.sh

Note This will take several minutes to run as each service is deployed sequentially.

~/multi-agent-system/deploy.sh

12. Create a course!

Open the Course Creator website. The Course Creator Cloud Run service is the last service deployed from the script. You can identify the URL to the course creator as https://course-creator-..run.app, which should be the final output line from the deployment script.

And type in a course idea, e.g. "linear algebra".

Your agents will begin working on your course.

Final Pipeline

13. Clean Up

To avoid incurring charges to your Google Cloud account for the resources used in this codelab, follow these steps to delete your services and container images.

1. Delete Cloud Run Services

The most efficient way to clean up is to delete the services you deployed to Cloud Run.

# Delete the main agent and app services
gcloud run services delete researcher content-builder judge orchestrator course-creator \
    --region $REGION --quiet

# Delete the GPU backend (Ollama)
gcloud run services delete ollama-gemma-gpu \
    --region $OLLAMA_REGION --quiet

2. Delete Artifact Registry Images

When you used the --source flag to deploy, Google Cloud created a repository in the Artifact Registry to store your container images. To remove these and save on storage costs, delete the repository:

gcloud artifacts repositories delete cloud-run-source-deploy --location us-east4 --quiet

3. Remove Local Files and Environment

To keep your Cloud Shell environment clean, remove the project folder and any local configuration:

cd ~
rm -rf multi-agent-system

4. (Optional) Delete the Project

If you created a project just for this codelab, you can ensure no further billing occurs by shutting down the project itself via the Manage Resources Page.

14. Congratulations!

You have successfully built and deployed a production-ready, distributed multi-agent system.

What you've accomplished

Decomposed a Complex Task: Instead of one giant prompt, we split the work into specialized roles (Researcher, Judge, Content Builder).
Implemented Quality Control: We used a LoopAgent and a structured Judge to ensure only high-quality information reaches the final step.
Built for Production: By using the Agent-to-Agent (A2A) protocol and Cloud Run, we created a system where each agent is an independent, scalable microservice. This is far more robust than running everything in a single Python script.
Orchestration: We used SequentialAgent and LoopAgent to define clear control flow patterns. *. Cloud Run GPUs: Deployed a Gemma model to a Cloud Run GPU