Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

推論用のワーカープールとして Ollama をホストする方法

1. はじめに

概要

この Codelab では、イベントドリブン型の非同期 AI 処理パイプラインを構築する方法について説明します。Cloud Run ワーカープールで Ollama を使用してオープンソースモデルをデプロイします。ワーカープールは、Pub/Sub トピックからメッセージを pull し、gemma3:4b モデルを使用して処理します。

学習内容

Pub/Sub pull サブスクリプションでワーカープールを使用する方法
Ollama を使用してワーカープールとして推論を行う方法

2. 始める前に

API を有効にする

この Codelab を使用する前に、次のコマンドを実行して次の API を有効にします。

gcloud services enable run.googleapis.com \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    pubsub.googleapis.com \
    storage.googleapis.com

3. 設定と要件

必要なリソースを設定する手順は次のとおりです。

この Codelab の環境変数を設定します。

export PROJECT_ID=<YOUR_PROJECT_ID>
export REGION=<YOUR_REGION>

export BUCKET_NAME=$PROJECT_ID-gemma3-4b
export SERVICE_ACCOUNT_NAME=ollama-worker-sa
export SERVICE_ACCOUNT_EMAIL=${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
export TOPIC_NAME=ollama-prompts
export SUBSCRIPTION_NAME=ollama-prompts-sub
export AR_REPO_NAME=ollama-worker-repo
export PULL_MSG_IMAGE_NAME=pubsub-pull-msg
export OLLAMA_IMAGE_NAME=ollama-coordinator

ワーカープールのサービスアカウントを作成する

gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} \
  --display-name="Ollama Worker Service Account"

Pub/Sub へのアクセス権を SA に付与する

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/pubsub.subscriber"

ワーカープールイメージの AR リポジトリを作成する

gcloud artifacts repositories create ${AR_REPO_NAME} \
  --repository-format=docker \
  --location=${REGION}

Pub/Sub トピックとサブスクリプションを作成する

gcloud pubsub topics create $TOPIC_NAME
gcloud pubsub subscriptions create $SUBSCRIPTION_NAME --topic $TOPIC_NAME

4. モデルをダウンロードして GCS でホストする

ビルドプロセス中にコンテナ内でモデルを直接 pull すると、処理が遅く非効率的になる可能性があります。代わりに、Ollama CLI を使用してモデルをローカルマシンに pull し、モデルファイルを GCS バケットにアップロードします。ワーカープールは、このバケットをマウントしてモデルにアクセスします。

ローカルマシンに Ollama をインストールします。

次のコマンドを実行して、Linux に Ollama をインストールします。その他のオペレーティングシステムについては、Ollama のウェブサイトをご覧ください。

curl -fsSL https://ollama.com/install.sh | sh

Ollama サービスを起動してモデルを pull します。

まず、Ollama サービスをバックグラウンドで開始します。

ollama serve &
ollama pull gemma3:4b

GCS バケットを作成します。

以前に設定した BUCKET_NAME 環境変数を使用して、GCS バケットを作成します。

gsutil mb gs://${BUCKET_NAME}

モデルファイルを GCS バケットにアップロードします。

Ollama はモデルファイルを ~/.ollama/models ディレクトリに保存します。このディレクトリの内容を GCS バケットにアップロードします。ダウンロードしたすべてのモデルがコピーされます。

gsutil -m cp -r ~/.ollama/models/* gs://${BUCKET_NAME}/

Cloud Storage バケットへのアクセス権を SA に付与する

gcloud storage buckets add-iam-policy-binding gs://${BUCKET_NAME} \
     --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
     --role=roles/storage.objectViewer

5. Cloud Run ジョブを作成する

Cloud Run ジョブは 2 つのコンテナを使用します。

ollama-coordinator - ollama をホストし、gemma 3 4B モデルをサービングする
pubsub-pull-msg - pubsub サブスクリプションから pull して、メッセージを ollama-coordinator コンテナに渡す

まず、ollama-coordinator コンテナを作成します。

Codelab の親ディレクトリを作成します。

mkdir codelab-ollama-wp
cd codelab-ollama-wp

ollama-coordinator コンテナ用のディレクトリを作成する

mkdir ollama-coordinator
cd ollama-coordinator

次の内容で Dockerfile を作成します。

# Use the official Ollama image as a base image
FROM ollama/ollama

# Expose the port that Ollama listens on
EXPOSE 11434

# Set the entrypoint to start the Ollama server
ENTRYPOINT ["ollama", "serve"]

ollama コンテナをビルドする

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME} --timeout=20m

次に、pubsub-pull-msg コンテナを作成します。

pubsub-pull-msg コンテナのディレクトリを作成する

cd ..
mkdir pubsub-pull-msg
cd pubsub-pull-msg

Dockerfile を作成する

# Use the official Python image as a base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the required Python packages
RUN pip install --no-cache-dir -r requirements.txt

# Copy the Python script into the container
COPY main.py .

# Set the entrypoint to run the Python script
CMD ["python", "main.py"]

次の内容の requirements.txt ファイルを作成します。

google-cloud-pubsub
requests

次の内容の main.py ファイルを作成します。

import os
import sys
import requests
import json
from google.cloud import pubsub_v1

# --- Main Application Logic ---
print("--- Sidecar container script started ---")

# --- Environment and Configuration ---
project_id = os.environ.get("PROJECT_ID")
subscription_name = os.environ.get("SUBSCRIPTION_NAME")
ollama_api_url = "http://localhost:11434/api/generate"

if not project_id or not subscription_name:
    print("FATAL: PROJECT_ID and SUBSCRIPTION_NAME must be set.")
    sys.exit(1)

print(f"PROJECT_ID: {project_id}")
print(f"SUBSCRIPTION_NAME: {subscription_name}")

def callback(message):
    """Processes a single Pub/Sub message."""
    print(f"Received message ID: {message.message_id}")
    try:
        prompt = message.data.decode("utf-8")
        print(f"Decoded prompt: '{prompt}'")
        
        data = {"model": "gemma3:4b", "prompt": prompt, "stream": False}
        
        print("Sending request to Ollama...")
        response = requests.post(ollama_api_url, json=data, timeout=300)
        response.raise_for_status()
        
        print("Successfully received response from Ollama.")
        ollama_response = response.json()
        print(f"Ollama response: {json.dumps(ollama_response)[:200]}...")

        message.ack()
        print(f"Message {message.message_id} acknowledged.")

    except requests.exceptions.RequestException as e:
        print(f"Error calling Ollama API: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")
    except Exception as e:
        print(f"An unexpected error occurred in callback: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")

def main():
    """Starts the Pub/Sub subscriber."""
    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(project_id, subscription_name)
    
    streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
    print(f"Subscribed to {subscription_path}. Listening for messages...")

    try:
        # .result() will block indefinitely.
        streaming_pull_future.result()
    except Exception as e:
        print(f"A fatal error occurred in the subscriber: {e}")
        streaming_pull_future.cancel()
        streaming_pull_future.result()

if __name__ == "__main__":
    main()

pubsub-pull-msg コンテナをビルドします。

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}

6. ジョブをデプロイして実行する

このステップでは、yaml ファイルをデプロイして Cloud Run ジョブを作成します。

ルートフォルダに移動して、yaml ファイルを作成します。

cd ..

次の内容の worker-pool.template.yaml ファイルを作成します。

apiVersion: run.googleapis.com/v1
kind: WorkerPool
metadata:
  name: codelab-ollama-wp
  labels:
    cloud.googleapis.com/location: europe-west1
  annotations:
    run.googleapis.com/launch-stage: BETA
    run.googleapis.com/scalingMode: manual
    run.googleapis.com/manualInstanceCount: '1'
    run.googleapis.com/gcs-fuse-mounter-enabled: "true"
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/gpu: "1"
        run.googleapis.com/gpu-zonal-redundancy-disabled: 'true'        
    spec:
      serviceAccountName: ${SERVICE_ACCOUNT_EMAIL}
      nodeSelector:
        run.googleapis.com/accelerator: nvidia-l4
      volumes:
      - name: gcs-bucket
        csi:
          driver: gcsfuse.run.googleapis.com
          readOnly: true
          volumeAttributes: 
            bucketName: ${BUCKET_NAME}
      containers:
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}
        name: pubsub-pull-msg
        env:
        - name: PROJECT_ID
          value: ${PROJECT_ID}
        - name: SUBSCRIPTION_NAME
          value: "ollama-prompts-sub"
        - name: PYTHONUNBUFFERED
          value: "1"
        resources:
          limits:
            cpu: '1'
            memory: 1Gi
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME}
        name: ollama-coordinator
        env:
        - name: OLLAMA_MODELS
          value: /mnt/models
        volumeMounts:
        - name: gcs-bucket
          mountPath: /mnt/models
        resources:
          limits:
            cpu: '6'
            nvidia.com/gpu: '1'
            memory: 16Gi

次に、完全な画像 URL を定義し、sed を使用してテンプレートファイル内の変数を置換し、最終的な worker-pool.yaml を作成します。

sed -e "s|\${SERVICE_ACCOUNT_EMAIL}|${SERVICE_ACCOUNT_EMAIL}|g" \
     -e "s|\${BUCKET_NAME}|${BUCKET_NAME}|g" \
     -e "s|\${PULL_MSG_IMAGE_NAME}|${PULL_MSG_IMAGE_NAME}|g" \
     -e "s|\${OLLAMA_IMAGE_NAME}|${OLLAMA_IMAGE_NAME}|g" \
     -e "s|\${PROJECT_ID}|${PROJECT_ID}|g" \
     -e "s|\${REGION}|${REGION}|g" \
     -e "s|\${AR_REPO_NAME}|${AR_REPO_NAME}|g" \
     worker-pool.template.yaml > worker-pool.yaml

これで、デプロイできるようになりました。

gcloud beta run worker-pools replace worker-pool.yaml

And Test

gcloud pubsub topics publish ${TOPIC_NAME} --message="What is 1 + 1?"

ログを表示します。1 分ほど待つか、Cloud Console のワーカープールページに移動して、ログをリアルタイムで確認します。

gcloud alpha run worker-pools logs read "codelab-ollama-wp" --limit 10

次のようなメッセージが表示されます。

Ollama response: {"model": "gemma3:4b", "created_at": "2025-11-06T23:48:39.572079369Z", "response": "1 + 1 = 2\n", ...

7. 完了

以上で、この Codelab は完了です。

Cloud Run のドキュメントを確認することをおすすめします。

学習した内容

Pub/Sub Pull サブスクリプションで Cloud Run ワーカープールを使用する方法
Ollama を使用して Cloud Run ワーカープールとして推論を行う方法

8. クリーンアップ

このチュートリアルで使用したリソースについて、Google Cloud アカウントに課金されないようにするには、リソースを含むプロジェクトを削除するか、プロジェクトを維持して個々のリソースを削除します。

プロジェクトの削除

課金をなくす最も簡単な方法は、チュートリアル用に作成したプロジェクトを削除することです。

プロジェクトを削除するには、次の操作を行います。

Google Cloud コンソールで、[リソースの管理] ページに移動します。
プロジェクトリストで、削除するプロジェクトを選択し、[削除] をクリックします。
ダイアログでプロジェクト ID を入力し、[シャットダウン] をクリックしてプロジェクトを削除します。

リソースを個別に削除する

個々のリソースを削除するには、次のコマンドを実行します。

Cloud Run ワーカープールを削除します。

gcloud beta run worker-pools delete codelab-ollama-wp --region ${REGION}

GCS バケットを削除します。

gsutil -m rm -r gs://${BUCKET_NAME}

Pub/Sub サブスクリプションとトピックを削除します。

gcloud pubsub subscriptions delete ${SUBSCRIPTION_NAME}
gcloud pubsub topics delete ${TOPIC_NAME}

Artifact Registry リポジトリを削除します。

gcloud artifacts repositories delete ${AR_REPO_NAME} --location=${REGION} --quiet

サービスアカウントを削除します。

gcloud iam service-accounts delete ${SERVICE_ACCOUNT_EMAIL} --quiet

ローカルファイルのクリーンアップ

ローカルファイルをクリーンアップする手順は次のとおりです。

ローカル Ollama サービスを停止する:ollama serve & で Ollama を起動した場合は、プロセス ID（PID）を見つけて kill コマンドを使用することで停止できます。
```
# Find the process ID of the Ollama server
pgrep ollama

# Replace <PID> with the actual process ID obtained from the previous command
kill <PID>
```
ダウンロードしたモデルを削除します。

rm -rf ~/.ollama/models

Ollama をアンインストールします。

Ollama ウェブサイトの手順に沿って、ローカルマシンから Ollama をアンインストールします。