1. Introduction
This tutorial will guide you through deploying, managing, and monitoring a powerful agent built with the Agent Development Kit (ADK) on Google Cloud Run. The ADK empowers you to create agents capable of complex, multi-agent workflows. By leveraging Cloud Run, a fully managed serverless platform, you can deploy your agent as a scalable, containerized application without worrying about the underlying infrastructure. This powerful combination allows you to focus on your agent's core logic while benefiting from Google Cloud's robust and scalable environment.
Throughout this tutorial, we will explore the seamless integration of the ADK with Cloud Run. You'll learn how to deploy your agent and then dive into the practical aspects of managing your application in a production-like setting. We will cover how to safely roll out new versions of your agent by managing traffic, enabling you to test new features with a subset of users before a full release.
Furthermore, you will gain hands-on experience with monitoring the performance of your agent. We will simulate a real-world scenario by conducting a load test to observe Cloud Run's automatic scaling capabilities in action. To gain deeper insights into your agent's behavior and performance, we will enable tracing with Cloud Trace. This will provide a detailed, end-to-end view of requests as they travel through your agent, allowing you to identify and address any performance bottlenecks. By the end of this tutorial, you will have a comprehensive understanding of how to effectively deploy, manage, and monitor your ADK-powered agents on Cloud Run.
Through the codelab, you will employ a step by step approach as follows:
- Create a PostgreSQL database on CloudSQL to be used for ADK Agent database session service
- Set up a basic ADK agent
- Setup database session service to be used by ADK runner
- Initial deploy the agent to cloud run
- Load testing and inspect cloud run auto scaling
- Deploy new agent revision and gradually increase traffic to new revisions
- Setup cloud tracing and inspect agent run tracing
Architecture Overview
Prerequisites
- Comfortable working with Python
- An understanding of basic full-stack architecture using HTTP service
What you'll learn
- ADK structure and local utilities
- Setup ADK agent with Database session service
- Setup PostgreSQL in CloudSQL to be used by Database session service
- Deploy application to Cloud Run using Dockerfile and setup initial environment variables
- Configure and Test Cloud Run autoscaling with load testing
- Strategy to gradual release with Cloud Run
- Setup ADK Agent tracing to Cloud Trace
What you'll need
- Chrome web browser
- A Gmail account
- A Cloud Project with billing enabled
This codelab, designed for developers of all levels (including beginners), uses Python in its sample application. However, Python knowledge isn't required for understanding the concepts presented.
2. Before you begin
Select Active Project in the Cloud Console
This codelab assumes that you already have a Google Cloud project with billing enabled. If you do not have it yet, you can follow the instructions below to get started.
- In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
- Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.
Prepare Cloud SQL Database
We will need a database to be utilized by the ADK agent later on. Let's create a PostgreSQL database on Cloud SQL. First, navigate to the search bar on the top section of the cloud console, and type "cloud sql". Then click the Cloud SQL product
After that, we will need to create a new database instance, click the Create Instance and choose PostgreSQL
You may also need to enable the Compute Engine API if you start with new project, just click the Enable API if this prompt show up
Next, we will choose the specifications of the database, choose Enterprise edition with Sandbox Edition preset
After that, set the instance name and default password for user postgres here. You can set this up with whatever credentials you want, however for the sake of this tutorial we will go with "adk-deployment" for both instance name and password here
Let's use us-central1 with single zone for this tutorial, we can finalize our database creation then and let it finish all the required setup by clicking the Create Instance button
While waiting this to be finished, we can continue to the next section
Setup Cloud Project in Cloud Shell Terminal
- You'll use Cloud Shell, a command-line environment running in Google Cloud. Click Activate Cloud Shell at the top of the Google Cloud console.
- Once connected to Cloud Shell, you check that you're already authenticated and that the project is set to your project ID using the following command:
gcloud auth list
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project.
gcloud config list project
- If your project is not set, use the following command to set it:
gcloud config set project <YOUR_PROJECT_ID>
Alternatively, you also can see the PROJECT_ID
id in the console
Click it and you will all of your project and the project ID on the right side
- Enable the required APIs via the command shown below. This could take a few minutes, so please be patient.
gcloud services enable aiplatform.googleapis.com \
run.googleapis.com \
cloudbuild.googleapis.com \
cloudresourcemanager.googleapis.com \
sqladmin.googleapis.com
On successful execution of the command, you should see a message similar to the one shown below:
Operation "operations/..." finished successfully.
The alternative to the gcloud command is through the console by searching for each product or using this link.
If any API is missed, you can always enable it during the course of the implementation.
Refer documentation for gcloud commands and usage.
Go to Cloud Shell Editor and Setup Application Working Directory
Now, we can set up our code editor to do some coding stuff. We will use the Cloud Shell Editor for this
- Click on the Open Editor button, this will open a Cloud Shell Editor, we can write our code here
- Make sure the Cloud Code project is set in the bottom left corner (status bar) of the Cloud Shell editor, as highlighted in the image below and is set to the active Google Cloud project where you have billing enabled. Authorize if prompted. If you already follow previous command, the button may also point directly to your activated project instead of sign in button
- Next, let's clone the template working directory for this codelab from Github, run the following command. It will create the working directory in the deploy_and_manage_adk directory
git clone https://github.com/alphinside/deploy-and-manage-adk-service.git deploy_and_manage_adk
- After that, go to the top section of the Cloud Shell Editor and click File->Open Folder, find your username directory and find the deploy_and_manage_adk directory then click the OK button. This will make the chosen directory as the main working directory. In this example, the username is alvinprayuda, hence the directory path is shown below
Now, your Cloud Shell Editor should look like this
Next, we can configure our python environment setup
Environment Setup
Prepare Python Virtual Environment
The next step is to prepare the development environment. Your current active terminal working directory should be inside the deploy_and_manage_adk working directory. We will utilize Python 3.12 in this codelab and we will use uv python project manager to simplify the need of creating and managing python version and virtual environment
- If you haven't opened the terminal yet, open it by clicking on Terminal -> New Terminal , or use Ctrl + Shift + C , it will open a terminal window on the bottom part of the browser
- Download
uv
and install python 3.12 with the following command
curl -LsSf https://astral.sh/uv/0.6.16/install.sh | sh && \
source $HOME/.local/bin/env && \
uv python install 3.12
- Now let's initialize the virtual environment using
uv
, Run this command
uv sync --frozen
This will create the .venv directory and install the dependencies. Quick sneak peek on the pyproject.toml will give you information about the dependencies shown like this
dependencies = [ "google-adk==1.3.0", "locust==2.37.10", "pg8000==1.31.2", "python-dotenv==1.1.0", ]
- To test the virtual env, create new file main.py and copy the following code
def main():
print("Hello from deploy_and_manage_adk!")
if __name__ == "__main__":
main()
- Then, run the following command
uv run main.py
You will get output like shown below
Using CPython 3.12 Creating virtual environment at: .venv Hello from deploy_and_manage_adk!
This shows that the python project is being set up properly.
Setup Configuration Files
Now we will need to set up configuration files for this project.
Rename the .env.example file to .env and it will show the value below. Update the GOOGLE_CLOUD_PROJECT value to your project-id
# Google Cloud and Vertex AI configuration GOOGLE_CLOUD_PROJECT=your-project-id GOOGLE_CLOUD_LOCATION=global GOOGLE_GENAI_USE_VERTEXAI=True # Database connection for session service # SESSION_SERVICE_URI=postgresql+pg8000://<username>:<password>@/<database>?unix_sock=/cloudsql/<instance_connection_name>/.s.PGSQL.5432
For this codelab, we are going with the pre-configured values for GOOGLE_CLOUD_LOCATION
and GOOGLE_GENAI_USE_VERTEXAI.
For now, we will keep the SESSION_SERVICE_URI
commented out.
Now we can move to the next step, inspect the agent logic and deploy it
3. Build the Weather Agent with ADK and Gemini 2.5
Introduction to ADK Directory Structure
Let's start by exploring what ADK has to offer and how to build the agent. ADK complete documentation can be accessed in this URL . ADK offers us many utilities within its CLI command execution. Some of them are the following :
- Setup the agent directory structure
- Quickly try interaction via CLI input output
- Quickly setup local development UI web interface
Now, let's check the agent structure on the weather_agent directory
weather_agent/ ├── __init__.py ├── agent.py
And if you inspect the init.py and agent.py you will see this code
# __init__.py
from weather_agent.agent import root_agent
__all__ = ["root_agent"]
# agent.py
import os
from pathlib import Path
import google.auth
from dotenv import load_dotenv
from google.adk.agents import Agent
from google.cloud import logging as google_cloud_logging
# Load environment variables from .env file in root directory
root_dir = Path(__file__).parent.parent
dotenv_path = root_dir / ".env"
load_dotenv(dotenv_path=dotenv_path)
# Use default project from credentials if not in .env
_, project_id = google.auth.default()
os.environ.setdefault("GOOGLE_CLOUD_PROJECT", project_id)
os.environ.setdefault("GOOGLE_CLOUD_LOCATION", "global")
os.environ.setdefault("GOOGLE_GENAI_USE_VERTEXAI", "True")
logging_client = google_cloud_logging.Client()
logger = logging_client.logger("weather-agent")
def get_weather(city: str) -> dict:
"""Retrieves the current weather report for a specified city.
Args:
city (str): The name of the city (e.g., "New York", "London", "Tokyo").
Returns:
dict: A dictionary containing the weather information.
Includes a 'status' key ('success' or 'error').
If 'success', includes a 'report' key with weather details.
If 'error', includes an 'error_message' key.
"""
logger.log_text(
f"--- Tool: get_weather called for city: {city} ---", severity="INFO"
) # Log tool execution
city_normalized = city.lower().replace(" ", "") # Basic normalization
# Mock weather data
mock_weather_db = {
"newyork": {
"status": "success",
"report": "The weather in New York is sunny with a temperature of 25°C.",
},
"london": {
"status": "success",
"report": "It's cloudy in London with a temperature of 15°C.",
},
"tokyo": {
"status": "success",
"report": "Tokyo is experiencing light rain and a temperature of 18°C.",
},
}
if city_normalized in mock_weather_db:
return mock_weather_db[city_normalized]
else:
return {
"status": "error",
"error_message": f"Sorry, I don't have weather information for '{city}'.",
}
root_agent = Agent(
name="weather_agent",
model="gemini-2.5-flash",
instruction="You are a helpful AI assistant designed to provide accurate and useful information.",
tools=[get_weather],
)
ADK Code Explanation
This script contains our agent initiation where we initialize the following things:
- Set the model to be used to
gemini-2.5-flash
- Provide tool
get_weather
to support the agent functionality as weather agent
Run the Web UI
Now, we can interact with the agent and inspect its behavior locally. ADK allows us to have a development web UI to interact and inspect what's going on during the interaction. Run the following command to start the local development UI server
uv run adk web --port 8080
It will spawn output like the following example, means that we can already access the web interface
INFO: Started server process [xxxx] INFO: Waiting for application startup. +-----------------------------------------------------------------------------+ | ADK Web Server started | | | | For local testing, access at http://localhost:8080. | +-----------------------------------------------------------------------------+ INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
Now, to check it, click the Web Preview button on the top area of your Cloud Shell Editor and select Preview on port 8080
You will see the following web page where you can select available agents on the top left drop down button ( in our case it should be weather_agent ) and interact with the bot. You will see many informations about the log details during agent runtime in the left window
Now, try to interact with it. On the left bar, we can inspect the trace for each input, so we can understand how long the time it takes for each action taken by the agent before forming the final answer.
This one of the observability features that has been built into ADK, currently we inspect it locally. Later on we will see how this integrated into Cloud Tracing so we have centralized trace of all requests
4. The Backend Server Script
In order to make the agent accessible as a service, we will wrap the agent inside a FastAPI app. We can configure necessary services to support the agent here like preparing Session, Memory, or Artifact service for production purposes here. Here is the code of the server.py that will be used
import os
from dotenv import load_dotenv
from fastapi import FastAPI
from google.adk.cli.fast_api import get_fast_api_app
from pydantic import BaseModel
from typing import Literal
from google.cloud import logging as google_cloud_logging
from tracing import CloudTraceLoggingSpanExporter
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider, export
# Load environment variables from .env file
load_dotenv()
logging_client = google_cloud_logging.Client()
logger = logging_client.logger(__name__)
AGENT_DIR = os.path.dirname(os.path.abspath(__file__))
# Get session service URI from environment variables
session_uri = os.getenv("SESSION_SERVICE_URI", None)
# Prepare arguments for get_fast_api_app
app_args = {"agents_dir": AGENT_DIR, "web": True}
# Only include session_service_uri if it's provided
if session_uri:
app_args["session_service_uri"] = session_uri
else:
logger.log_text(
"SESSION_SERVICE_URI not provided. Using in-memory session service instead. "
"All sessions will be lost when the server restarts.",
severity="WARNING",
)
provider = TracerProvider()
processor = export.BatchSpanProcessor(CloudTraceLoggingSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
# Create FastAPI app with appropriate arguments
app: FastAPI = get_fast_api_app(**app_args)
app.title = "weather-agent"
app.description = "API for interacting with the Agent weather-agent"
class Feedback(BaseModel):
"""Represents feedback for a conversation."""
score: int | float
text: str | None = ""
invocation_id: str
log_type: Literal["feedback"] = "feedback"
service_name: Literal["weather-agent"] = "weather-agent"
user_id: str = ""
@app.post("/feedback")
def collect_feedback(feedback: Feedback) -> dict[str, str]:
"""Collect and log feedback.
Args:
feedback: The feedback data to log
Returns:
Success message
"""
logger.log_struct(feedback.model_dump(), severity="INFO")
return {"status": "success"}
# Main execution
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8080)
Server Code Explanation
These are the things that is defined in the server.py script:
- Convert our agent into a FastAPI app using the
get_fast_api_app
method. This way we will inherit the same route definition that is utilized for the web development UI. - Configure necessary Session, Memory, or Artifact service by adding the keyword arguments to the
get_fast_api_app
method. In this tutorial, if we configureSESSION_SERVICE_URI
env var, then the session service will use that otherwise it will use in-memory session - We can add custom route to support other backend business logic, in the script we add feedback functionality route example
- Enable cloud tracing, to send trace to Google Cloud Trace
5. Deploying to Cloud Run
Now, let's deploy this agent service to Cloud Run. For the sake of this demo, this service will be exposed as a public service that can be accessed by others. However, keep in mind that this is not the best practices as it is not secure
In this codelab, we will use Dockerfile to deploy our agent to Cloud Run. Below is the Dockerfile content that will be used
FROM python:3.12-slim
RUN pip install --no-cache-dir uv==0.7.13
WORKDIR /app
COPY . .
RUN uv sync --frozen
EXPOSE 8080
CMD ["uv", "run", "uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]
At this point, we already have all files needed to deploy our applications to Cloud Run, let's deploy it. Navigate to the Cloud Shell Terminal and make sure the current project is configured to your active project, if not you have use the gcloud configure command to set the project id:
gcloud config set project [PROJECT_ID]
Then, run the following command to deploy it to Cloud Run.
gcloud run deploy weather-agent \
--source . \
--port 8080 \
--project {YOUR_PROJECT_ID} \
--allow-unauthenticated \
--add-cloudsql-instances {YOUR_DB_CONNECTION_NAME} \
--update-env-vars SESSION_SERVICE_URI="postgresql+pg8000://postgres:{YOUR_DEFAULT_USER_PASS}@postgres/?unix_sock=/cloudsql/{YOUR_DB_CONNECTION_NAME}/.s.PGSQL.5432",GOOGLE_CLOUD_PROJECT={YOUR_PROJECT_ID} \
--region us-central1
To get the {YOUR_DB_CONNECTION_NAME} value, you can go to Cloud SQL again and click the instance that you've created. Inside the instance page, scroll down to "Connect to this instance" section and you can copy the Connection Name to substitute the {YOUR_DB_CONNECTION_NAME} value. For example look image shown below
If you're prompted to acknowledge creation of an artifact registry for docker repository, just answer Y. Note that we are allowing unauthenticated access here because this is a demo application. Recommendation is to use appropriate authentication for your enterprise and production applications.
Once the deployment is complete, you should get a link similar to the below:
https://weather-agent-*******.us-central1.run.app
Go ahead and use your application from the Incognito window or your mobile device. It should be live already.
6. Inspecting Cloud Run Auto Scaling with Load Testing
Now, we will inspect the auto-scaling capabilities of cloud run. For this scenario, let's deploy new revision while enabling maximum concurrencies per instance. Run the following command
gcloud run deploy weather-agent \
--source . \
--port 8080 \
--project {YOUR_PROJECT_ID} \
--allow-unauthenticated \
--region us-central1 \
--concurrency 10
After that let's inspect the load_test.py file. This will be the script we use to do the load testing using locust framework. This script will do the following things :
- Randomized user_id and session_id
- Create session_id for the user_id
- Hit endpoint "/run_sse" with the created user_id and session_id
We will need to know our deployed service URL, if you missed it. Go to the Cloud Run console and click your weather-agent service
Then, find your weather-agent service and click it
The service URL will be displayed right beside the Region information. E.g.
Then run the following command to do the load test
uv run locust -f load_test.py \
-H {YOUR_SERVICE_URL} \
-u 60 \
-r 5 \
-t 120 \
--headless
Running this you will see metrics like this displayed. ( In this example all reqs success )
Type Name # reqs # fails | Avg Min Max Med | req/s failures/s
--------|------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
POST /run_sse end 813 0(0.00%) | 5817 2217 26421 5000 | 6.79 0.00
POST /run_sse message 813 0(0.00%) | 2678 1107 17195 2200 | 6.79 0.00
--------|------------------------------------|-------|-------------|-------|-------|-------|-------|--------|-----------
Aggregated 1626 0(0.00%) | 4247 1107 26421 3500 | 13.59 0.00
Then let's see what happened in the Cloud Run, go to your deployed service again, and see the dashboard. This will show how cloud runs automatically scale the instance to handle incoming requests. Because we are limiting the max concurrency to 10 per instance, the cloud run instance will try to adjust the number of containers to satisfy this condition automatically.
7. Gradual Release New Revisions
Now, let's have the following scenario. We want to update the prompt of the agent to the following :
# agent.py
...
root_agent = Agent(
name="weather_agent",
model="gemini-2.5-flash-preview-05-20",
instruction="You are a helpful AI assistant designed to provide accurate and useful information. You only answer inquiries about the weather. Refuse all other user query",
tools=[get_weather],
)
Then, you want to release new revisions but don't want all request traffic to go directly to the new version. We can do gradual release with cloud run. First, we need to deploy a new revision, but with –no-traffic flag. Save the previous agent script and run the following command
gcloud run deploy weather-agent \
--source . \
--port 8080 \
--project {YOUR_PROJECT_ID} \
--allow-unauthenticated \
--region us-central1 \
--no-traffic
After finishing, you will receive a similar log like the previous deployment process with the difference of the number of traffic served. It will show 0 percent traffic served.
Next, let's go to the Cloud Run product page and find your deployed instance. Type cloud run on the search bar and click the Cloud Run product
Then, find your weather-agent service and click it
Go to Revisions tab and you will see the list of deployed revision there
You will see that the new deployed revisions is serving 0%, from here you can click the kebab button (⋮) and choose Manage Traffic
In the newly pop up window, you can edit the percentage of the traffic going to which revisions.
After waiting for a while, the traffic will be directed proportionally based on the percentage configurations. This way, we can easily roll back to the previous revisions if something happened with the new release
8. ADK Tracing
Agents built with ADK already support tracing using open telemetry embedding in it. We have Cloud Trace to capture those tracing and visualize it. Let's inspect the server.py on how we enable it in our previously deployed service
# server.py
from tracing import CloudTraceLoggingSpanExporter
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider, export
...
provider = TracerProvider()
processor = export.BatchSpanProcessor(CloudTraceLoggingSpanExporter())
provider.add_span_processor(processor)
trace.set_tracer_provider(provider)
...
Here, we initialize the tracer and the exporter. The details of the exporter can be inspected on tracing.py . Here we create a custom exporter because there is a limit on trace data that can be exported to cloud trace. We are using an implementation from https://googlecloudplatform.github.io/agent-starter-pack/guide/observability.html for this tracing capability.
Try to access your service web dev UI and have a chat with the agent. After that go to the cloud console search bar and type "trace explorer" and choose the Trace Explorer product there
On the trace explorer page, you will see our conversation with the agent trace is submitted. You can see from the Span name section and filter out the span specific to our agent ( it's named agent_run [weather_agent]
) there
When the spans are already filtered, you also can inspect each trace directly. It will show detailed duration on each action taken by the agent. For example, look images below
On each section, you can inspect the details in the attributes like shown below
There you go, now we have good observability and information on each interaction of our agent with the user to help debug issues. Feel free to try various tooling or workflows!
9. Challenge
Try multi-agent or agentic workflows to see how they perform under loads and what the trace looks like
10. Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this codelab, follow these steps:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
- Alternatively you can go to Cloud Run on the console, select the service you just deployed and delete.