Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

如何将 Ollama 托管为用于推理的工作器池

1. 简介

概览

在此 Codelab 中，您将学习如何构建一个由事件驱动的异步 AI 处理流水线。您将在 Cloud Run 工作器池上使用 Ollama 部署一个开源模型。工作器池从 Pub/Sub 主题中拉取消息，并使用 gemma3:4b 模型处理这些消息。

学习内容

如何将工作器池与 Pub/Sub 拉取订阅搭配使用
如何使用 Ollama 作为工作器池进行推理

2. 准备工作

启用 API

在开始使用此 Codelab 之前，请运行以下命令来启用以下 API：

gcloud services enable run.googleapis.com \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    pubsub.googleapis.com \
    storage.googleapis.com

3. 设置和要求

如需设置所需资源，请按以下步骤操作：

为此 Codelab 设置环境变量：

export PROJECT_ID=<YOUR_PROJECT_ID>
export REGION=<YOUR_REGION>

export BUCKET_NAME=$PROJECT_ID-gemma3-4b
export SERVICE_ACCOUNT_NAME=ollama-worker-sa
export SERVICE_ACCOUNT_EMAIL=${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
export TOPIC_NAME=ollama-prompts
export SUBSCRIPTION_NAME=ollama-prompts-sub
export AR_REPO_NAME=ollama-worker-repo
export PULL_MSG_IMAGE_NAME=pubsub-pull-msg
export OLLAMA_IMAGE_NAME=ollama-coordinator

为工作器池创建服务账号

gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} \
  --display-name="Ollama Worker Service Account"

向服务账号授予对 Pub/Sub 的访问权限

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/pubsub.subscriber"

为工作器池映像创建 AR 制品库

gcloud artifacts repositories create ${AR_REPO_NAME} \
  --repository-format=docker \
  --location=${REGION}

创建 PubSub 主题和订阅

gcloud pubsub topics create $TOPIC_NAME
gcloud pubsub subscriptions create $SUBSCRIPTION_NAME --topic $TOPIC_NAME

4. 在 GCS 上下载并托管模型

我们不会在构建流程中直接在容器内拉取模型（这可能会很慢且效率低下），而是使用 Ollama CLI 将模型拉取到本地机器，然后将模型文件上传到 GCS 存储分区。然后，工作器池将装载此存储分区以访问模型。

在本地机器上安装 Ollama：

运行以下命令可在 Linux 上安装 Ollama。对于其他操作系统，请参阅 Ollama 网站。

curl -fsSL https://ollama.com/install.sh | sh

启动 Ollama 服务并拉取模型：

首先，在后台启动 Ollama 服务。

ollama serve &
ollama pull gemma3:4b

创建 GCS 存储分区：

使用您之前设置的 BUCKET_NAME 环境变量创建 GCS 存储分区。

gsutil mb gs://${BUCKET_NAME}

将模型文件上传到您的 GCS 存储分区：

Ollama 将模型文件存储在 ~/.ollama/models 目录中。将此目录的内容上传到您的 GCS 存储分区。此操作会复制您下载的所有模型。

gsutil -m cp -r ~/.ollama/models/* gs://${BUCKET_NAME}/

向服务账号授予对 Cloud Storage 存储分区的访问权限

gcloud storage buckets add-iam-policy-binding gs://${BUCKET_NAME} \
     --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
     --role=roles/storage.objectViewer

5. 创建 Cloud Run 作业

Cloud Run 作业使用 2 个容器：

ollama-coordinator - 用于托管 ollama 并部署 gemma 3 4B 模型
pubsub-pull-msg - 用于从 pubsub 订阅中拉取消息并将其传递给 ollama-coordinator 容器

首先，您将创建 ollama-coordinator 容器。

mkdir codelab-ollama-wp
cd codelab-ollama-wp

为 ollama-coordinator 容器创建目录

mkdir ollama-coordinator
cd ollama-coordinator

创建一个包含以下内容的 Dockerfile

# Use the official Ollama image as a base image
FROM ollama/ollama

# Expose the port that Ollama listens on
EXPOSE 11434

# Set the entrypoint to start the Ollama server
ENTRYPOINT ["ollama", "serve"]

构建 ollama 容器

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME} --timeout=20m

接下来，您将创建 pubsub-pull-msg 容器。

为 pubsub-pull-msg 容器创建目录

cd ..
mkdir pubsub-pull-msg
cd pubsub-pull-msg

创建 Dockerfile

# Use the official Python image as a base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the required Python packages
RUN pip install --no-cache-dir -r requirements.txt

# Copy the Python script into the container
COPY main.py .

# Set the entrypoint to run the Python script
CMD ["python", "main.py"]

创建一个包含以下内容的 requirements.txt 文件

google-cloud-pubsub
requests

创建一个包含以下内容的 main.py 文件

import os
import sys
import requests
import json
from google.cloud import pubsub_v1

# --- Main Application Logic ---
print("--- Sidecar container script started ---")

# --- Environment and Configuration ---
project_id = os.environ.get("PROJECT_ID")
subscription_name = os.environ.get("SUBSCRIPTION_NAME")
ollama_api_url = "http://localhost:11434/api/generate"

if not project_id or not subscription_name:
    print("FATAL: PROJECT_ID and SUBSCRIPTION_NAME must be set.")
    sys.exit(1)

print(f"PROJECT_ID: {project_id}")
print(f"SUBSCRIPTION_NAME: {subscription_name}")

def callback(message):
    """Processes a single Pub/Sub message."""
    print(f"Received message ID: {message.message_id}")
    try:
        prompt = message.data.decode("utf-8")
        print(f"Decoded prompt: '{prompt}'")
        
        data = {"model": "gemma3:4b", "prompt": prompt, "stream": False}
        
        print("Sending request to Ollama...")
        response = requests.post(ollama_api_url, json=data, timeout=300)
        response.raise_for_status()
        
        print("Successfully received response from Ollama.")
        ollama_response = response.json()
        print(f"Ollama response: {json.dumps(ollama_response)[:200]}...")

        message.ack()
        print(f"Message {message.message_id} acknowledged.")

    except requests.exceptions.RequestException as e:
        print(f"Error calling Ollama API: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")
    except Exception as e:
        print(f"An unexpected error occurred in callback: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")

def main():
    """Starts the Pub/Sub subscriber."""
    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(project_id, subscription_name)
    
    streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
    print(f"Subscribed to {subscription_path}. Listening for messages...")

    try:
        # .result() will block indefinitely.
        streaming_pull_future.result()
    except Exception as e:
        print(f"A fatal error occurred in the subscriber: {e}")
        streaming_pull_future.cancel()
        streaming_pull_future.result()

if __name__ == "__main__":
    main()

现在构建 pubsub-pull-msg 容器

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}

6. 部署和执行作业

在此步骤中，您将通过部署 YAML 文件来创建 Cloud Run 作业。

移动到根文件夹以创建 YAML 文件。

cd ..

创建一个包含以下内容的 worker-pool.template.yaml 文件

apiVersion: run.googleapis.com/v1
kind: WorkerPool
metadata:
  name: codelab-ollama-wp
  labels:
    cloud.googleapis.com/location: europe-west1
  annotations:
    run.googleapis.com/launch-stage: BETA
    run.googleapis.com/scalingMode: manual
    run.googleapis.com/manualInstanceCount: '1'
    run.googleapis.com/gcs-fuse-mounter-enabled: "true"
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/gpu: "1"
        run.googleapis.com/gpu-zonal-redundancy-disabled: 'true'        
    spec:
      serviceAccountName: ${SERVICE_ACCOUNT_EMAIL}
      nodeSelector:
        run.googleapis.com/accelerator: nvidia-l4
      volumes:
      - name: gcs-bucket
        csi:
          driver: gcsfuse.run.googleapis.com
          readOnly: true
          volumeAttributes: 
            bucketName: ${BUCKET_NAME}
      containers:
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}
        name: pubsub-pull-msg
        env:
        - name: PROJECT_ID
          value: ${PROJECT_ID}
        - name: SUBSCRIPTION_NAME
          value: "ollama-prompts-sub"
        - name: PYTHONUNBUFFERED
          value: "1"
        resources:
          limits:
            cpu: '1'
            memory: 1Gi
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME}
        name: ollama-coordinator
        env:
        - name: OLLAMA_MODELS
          value: /mnt/models
        volumeMounts:
        - name: gcs-bucket
          mountPath: /mnt/models
        resources:
          limits:
            cpu: '6'
            nvidia.com/gpu: '1'
            memory: 16Gi

然后，定义完整的图片网址，并使用 sed 替换模板文件中的变量，从而创建最终的 worker-pool.yaml。

sed -e "s|\${SERVICE_ACCOUNT_EMAIL}|${SERVICE_ACCOUNT_EMAIL}|g" \
     -e "s|\${BUCKET_NAME}|${BUCKET_NAME}|g" \
     -e "s|\${PULL_MSG_IMAGE_NAME}|${PULL_MSG_IMAGE_NAME}|g" \
     -e "s|\${OLLAMA_IMAGE_NAME}|${OLLAMA_IMAGE_NAME}|g" \
     -e "s|\${PROJECT_ID}|${PROJECT_ID}|g" \
     -e "s|\${REGION}|${REGION}|g" \
     -e "s|\${AR_REPO_NAME}|${AR_REPO_NAME}|g" \
     worker-pool.template.yaml > worker-pool.yaml

现在，您可以部署

gcloud beta run worker-pools replace worker-pool.yaml

和测试

gcloud pubsub topics publish ${TOPIC_NAME} --message="What is 1 + 1?"

然后查看日志。您可能需要等待一分钟，也可以前往 Cloud 控制台工作器池页面实时查看日志。

gcloud alpha run worker-pools logs read "codelab-ollama-wp" --limit 10

您应该会看到类似如下内容：

Ollama response: {"model": "gemma3:4b", "created_at": "2025-11-06T23:48:39.572079369Z", "response": "1 + 1 = 2\n", ...

7. 恭喜！

恭喜您完成此 Codelab！

建议您查看 Cloud Run 文档。

所学内容

如何将 Cloud Run 工作器池与 Pub/Sub 拉取订阅搭配使用
如何使用 Ollama 作为 Cloud Run 工作器池进行推理

8. 清理

为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的项目，或者保留项目但删除各个资源。

删除项目

若要避免产生费用，最简单的方法是删除您为本教程创建的项目。

如需删除项目，请执行以下操作：

在 Google Cloud 控制台中，前往管理资源页面。
在项目列表中，选择要删除的项目，然后点击删除。
在对话框中输入项目 ID，然后点击关停以删除项目。

逐个删除资源

如需删除各个资源，请运行以下命令：

删除 Cloud Run 工作器池：

gcloud beta run worker-pools delete codelab-ollama-wp --region ${REGION}

删除 GCS 存储分区：

gsutil -m rm -r gs://${BUCKET_NAME}

删除 Pub/Sub 订阅和主题：

gcloud pubsub subscriptions delete ${SUBSCRIPTION_NAME}
gcloud pubsub topics delete ${TOPIC_NAME}

删除 Artifact Registry 代码库：

gcloud artifacts repositories delete ${AR_REPO_NAME} --location=${REGION} --quiet

删除服务账号：

gcloud iam service-accounts delete ${SERVICE_ACCOUNT_EMAIL} --quiet

清理本地文件

如需清理本地文件，请执行以下操作：

停止本地 Ollama 服务：如果您使用 ollama serve & 启动了 Ollama，则可以通过查找其进程 ID (PID)，然后使用 kill 命令来停止它。
```
# Find the process ID of the Ollama server
pgrep ollama

# Replace <PID> with the actual process ID obtained from the previous command
kill <PID>
```
删除下载的模型：

rm -rf ~/.ollama/models

卸载 Ollama：

按照 Ollama 网站上的说明从本地机器上卸载 Ollama。