1. Introduction
Overview
In this codelab, you'll learn how to build an event-driven, asynchronous AI processing pipeline. You'll deploy an open-source model using Ollama on a Cloud Run Worker Pool. The worker pool pulls messages from a Pub/Sub topic and processes them using a gemma3:4b model.
What you'll learn
- How to use worker pools with a Pub/Sub Pull subscription
- How to use Ollama to do inference as a worker pool
2. Before you begin
Enable APIs
Before you can start using this codelab, enable the following APIs by running:
gcloud services enable run.googleapis.com \
cloudbuild.googleapis.com \
artifactregistry.googleapis.com \
pubsub.googleapis.com \
storage.googleapis.com
3. Setup and Requirements
To set up the required resources, follow these steps:
- Set environment variables for this codelab:
export PROJECT_ID=<YOUR_PROJECT_ID>
export REGION=<YOUR_REGION>
export BUCKET_NAME=$PROJECT_ID-gemma3-4b
export SERVICE_ACCOUNT_NAME=ollama-worker-sa
export SERVICE_ACCOUNT_EMAIL=${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
export TOPIC_NAME=ollama-prompts
export SUBSCRIPTION_NAME=ollama-prompts-sub
export AR_REPO_NAME=ollama-worker-repo
export PULL_MSG_IMAGE_NAME=pubsub-pull-msg
export OLLAMA_IMAGE_NAME=ollama-coordinator
- Create a service account for the worker pool
gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} \
--display-name="Ollama Worker Service Account"
- Grant the SA access to Pub/Sub
gcloud projects add-iam-policy-binding ${PROJECT_ID} \
--member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
--role="roles/pubsub.subscriber"
- Create an AR repository for the worker pool image
gcloud artifacts repositories create ${AR_REPO_NAME} \
--repository-format=docker \
--location=${REGION}
- Create the PubSub topic and subscription
gcloud pubsub topics create $TOPIC_NAME
gcloud pubsub subscriptions create $SUBSCRIPTION_NAME --topic $TOPIC_NAME
4. Download and Host the Model on GCS
Instead of pulling the model directly inside the container during the build process, which can be slow and inefficient, we'll pull the model to a local machine using the Ollama CLI and then upload the model files to a GCS bucket. The worker pool will then mount this bucket to access the model.
- Install Ollama on your local machine:
Run the following command to install Ollama on Linux. For other operating systems, please refer to the Ollama website.
curl -fsSL https://ollama.com/install.sh | sh
- Start the Ollama service and pull the model:
First, start the Ollama service in the background.
ollama serve &
ollama pull gemma3:4b
- Create a GCS bucket:
Create the GCS bucket using the BUCKET_NAME environment variable you set earlier.
gsutil mb gs://${BUCKET_NAME}
- Upload the model files to your GCS bucket:
Ollama stores model files in the ~/.ollama/models directory. Upload the contents of this directory to your GCS bucket. This will copy all models you have downloaded.
gsutil -m cp -r ~/.ollama/models/* gs://${BUCKET_NAME}/
- Grant the SA access to the Cloud Storage bucket
gcloud storage buckets add-iam-policy-binding gs://${BUCKET_NAME} \
--member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
--role=roles/storage.objectViewer
5. Create the Cloud Run job
The Cloud Run job uses 2 containers:
- ollama-coordinator - for hosting ollama and serving the gemma 3 4B model
- pubsub-pull-msg - for pulling from pubsub subscription and passing the message to the ollama-coordinator container
First, you'll create the ollama-coordinator container.
- Create a parent directory for the codelab:
mkdir codelab-ollama-wp
cd codelab-ollama-wp
- Create a directory for the ollama-coordinator container
mkdir ollama-coordinator
cd ollama-coordinator
- Create a
Dockerfilewith the following contents
# Use the official Ollama image as a base image
FROM ollama/ollama
# Expose the port that Ollama listens on
EXPOSE 11434
# Set the entrypoint to start the Ollama server
ENTRYPOINT ["ollama", "serve"]
- Build the ollama container
gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME} --timeout=20m
Next, you'll create the pubsub-pull-msg container.
- Create a directory for the pubsub-pull-msg container
cd ..
mkdir pubsub-pull-msg
cd pubsub-pull-msg
- Create a
Dockerfile
# Use the official Python image as a base image
FROM python:3.9-slim
# Set the working directory in the container
WORKDIR /app
# Copy the requirements file into the container
COPY requirements.txt .
# Install the required Python packages
RUN pip install --no-cache-dir -r requirements.txt
# Copy the Python script into the container
COPY main.py .
# Set the entrypoint to run the Python script
CMD ["python", "main.py"]
- Create a
requirements.txtfile with the following contents
google-cloud-pubsub
requests
- Create a
main.pyfile with the following contents
import os
import sys
import requests
import json
from google.cloud import pubsub_v1
# --- Main Application Logic ---
print("--- Sidecar container script started ---")
# --- Environment and Configuration ---
project_id = os.environ.get("PROJECT_ID")
subscription_name = os.environ.get("SUBSCRIPTION_NAME")
ollama_api_url = "http://localhost:11434/api/generate"
if not project_id or not subscription_name:
print("FATAL: PROJECT_ID and SUBSCRIPTION_NAME must be set.")
sys.exit(1)
print(f"PROJECT_ID: {project_id}")
print(f"SUBSCRIPTION_NAME: {subscription_name}")
def callback(message):
"""Processes a single Pub/Sub message."""
print(f"Received message ID: {message.message_id}")
try:
prompt = message.data.decode("utf-8")
print(f"Decoded prompt: '{prompt}'")
data = {"model": "gemma3:4b", "prompt": prompt, "stream": False}
print("Sending request to Ollama...")
response = requests.post(ollama_api_url, json=data, timeout=300)
response.raise_for_status()
print("Successfully received response from Ollama.")
ollama_response = response.json()
print(f"Ollama response: {json.dumps(ollama_response)[:200]}...")
message.ack()
print(f"Message {message.message_id} acknowledged.")
except requests.exceptions.RequestException as e:
print(f"Error calling Ollama API: {e}")
message.nack()
print(f"Message {message.message_id} not acknowledged.")
except Exception as e:
print(f"An unexpected error occurred in callback: {e}")
message.nack()
print(f"Message {message.message_id} not acknowledged.")
def main():
"""Starts the Pub/Sub subscriber."""
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, subscription_name)
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
print(f"Subscribed to {subscription_path}. Listening for messages...")
try:
# .result() will block indefinitely.
streaming_pull_future.result()
except Exception as e:
print(f"A fatal error occurred in the subscriber: {e}")
streaming_pull_future.cancel()
streaming_pull_future.result()
if __name__ == "__main__":
main()
- Now build the pubsub-pull-msg container
gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}
6. Deploy and execute the job
In this step, you'll create the Cloud Run job by deploying a yaml file.
Move to the root folder to create the yaml file.
cd ..
- Create a file
worker-pool.template.yamlwith the following content
apiVersion: run.googleapis.com/v1
kind: WorkerPool
metadata:
name: codelab-ollama-wp
labels:
cloud.googleapis.com/location: europe-west1
annotations:
run.googleapis.com/launch-stage: BETA
run.googleapis.com/scalingMode: manual
run.googleapis.com/manualInstanceCount: '1'
run.googleapis.com/gcs-fuse-mounter-enabled: "true"
spec:
template:
metadata:
annotations:
run.googleapis.com/gpu: "1"
run.googleapis.com/gpu-zonal-redundancy-disabled: 'true'
spec:
serviceAccountName: ${SERVICE_ACCOUNT_EMAIL}
nodeSelector:
run.googleapis.com/accelerator: nvidia-l4
volumes:
- name: gcs-bucket
csi:
driver: gcsfuse.run.googleapis.com
readOnly: true
volumeAttributes:
bucketName: ${BUCKET_NAME}
containers:
- image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}
name: pubsub-pull-msg
env:
- name: PROJECT_ID
value: ${PROJECT_ID}
- name: SUBSCRIPTION_NAME
value: "ollama-prompts-sub"
- name: PYTHONUNBUFFERED
value: "1"
resources:
limits:
cpu: '1'
memory: 1Gi
- image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME}
name: ollama-coordinator
env:
- name: OLLAMA_MODELS
value: /mnt/models
volumeMounts:
- name: gcs-bucket
mountPath: /mnt/models
resources:
limits:
cpu: '6'
nvidia.com/gpu: '1'
memory: 16Gi
Then, define the full image URLs and use sed to substitute the variables in the template file, creating the final worker-pool.yaml.
sed -e "s|\${SERVICE_ACCOUNT_EMAIL}|${SERVICE_ACCOUNT_EMAIL}|g" \
-e "s|\${BUCKET_NAME}|${BUCKET_NAME}|g" \
-e "s|\${PULL_MSG_IMAGE_NAME}|${PULL_MSG_IMAGE_NAME}|g" \
-e "s|\${OLLAMA_IMAGE_NAME}|${OLLAMA_IMAGE_NAME}|g" \
-e "s|\${PROJECT_ID}|${PROJECT_ID}|g" \
-e "s|\${REGION}|${REGION}|g" \
-e "s|\${AR_REPO_NAME}|${AR_REPO_NAME}|g" \
worker-pool.template.yaml > worker-pool.yaml
Now you can Deploy
gcloud beta run worker-pools replace worker-pool.yaml
And Test
gcloud pubsub topics publish ${TOPIC_NAME} --message="What is 1 + 1?"
And then view the logs. You may need to wait a minute or you can go to the Cloud Console worker pool page and watch the logs in real time.
gcloud alpha run worker-pools logs read "codelab-ollama-wp" --limit 10
and you should see something that says
Ollama response: {"model": "gemma3:4b", "created_at": "2025-11-06T23:48:39.572079369Z", "response": "1 + 1 = 2\n", ...
7. Congratulations!
Congratulations for completing the codelab!
We recommend reviewing the Cloud Run documentation.
What we've covered
- How to use Cloud Run worker pools with a Pub/Sub Pull subscription
- How to use Ollama to do inference as a Cloud Run worker pool
8. Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Deleting the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Deleting individual resources
To delete the individual resources, run the following commands:
- Delete the Cloud Run worker pool:
gcloud beta run worker-pools delete codelab-ollama-wp --region ${REGION}
- Delete the GCS bucket:
gsutil -m rm -r gs://${BUCKET_NAME}
- Delete the Pub/Sub subscription and topic:
gcloud pubsub subscriptions delete ${SUBSCRIPTION_NAME}
gcloud pubsub topics delete ${TOPIC_NAME}
- Delete the Artifact Registry repository:
gcloud artifacts repositories delete ${AR_REPO_NAME} --location=${REGION} --quiet
- Delete the service account:
gcloud iam service-accounts delete ${SERVICE_ACCOUNT_EMAIL} --quiet
Cleaning up local files
To clean up local files, do the following:
- Stop the local Ollama service:If you started Ollama with
ollama serve &, you can stop it by finding its process ID (PID) and then using thekillcommand.# Find the process ID of the Ollama server pgrep ollama # Replace <PID> with the actual process ID obtained from the previous command kill <PID> - Delete the downloaded models:
rm -rf ~/.ollama/models
- Uninstall Ollama:
Follow the instructions on the Ollama website to uninstall Ollama from your local machine.