Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

วิธีโฮสต์ Ollama เป็นพูลผู้ปฏิบัติงานสำหรับการอนุมาน

1. บทนำ

ภาพรวม

ใน Codelab นี้ คุณจะได้เรียนรู้วิธีสร้างไปป์ไลน์การประมวลผล AI แบบอะซิงโครนัสที่ขับเคลื่อนด้วยเหตุการณ์ คุณจะทําให้โมเดลโอเพนซอร์สใช้งานได้โดยใช้ Ollama ในกลุ่มผู้ปฏิบัติงาน Cloud Run พูลของ Worker จะดึงข้อความจากหัวข้อ Pub/Sub และประมวลผลโดยใช้โมเดล gemma3:4b

สิ่งที่คุณจะได้เรียนรู้

วิธีใช้พูลผู้ปฏิบัติงานกับการสมัครใช้บริการ Pub/Sub Pull
วิธีใช้ Ollama เพื่อทำการอนุมานเป็นพูลผู้ปฏิบัติงาน

2. ก่อนเริ่มต้น

เปิดใช้ API

ก่อนจะเริ่มใช้ Codelab นี้ได้ ให้เปิดใช้ API ต่อไปนี้โดยเรียกใช้คำสั่ง

gcloud services enable run.googleapis.com \
    cloudbuild.googleapis.com \
    artifactregistry.googleapis.com \
    pubsub.googleapis.com \
    storage.googleapis.com

3. การตั้งค่าและข้อกำหนด

หากต้องการตั้งค่าทรัพยากรที่จำเป็น ให้ทำตามขั้นตอนต่อไปนี้

ตั้งค่าตัวแปรสภาพแวดล้อมสำหรับ Codelab นี้

export PROJECT_ID=<YOUR_PROJECT_ID>
export REGION=<YOUR_REGION>

export BUCKET_NAME=$PROJECT_ID-gemma3-4b
export SERVICE_ACCOUNT_NAME=ollama-worker-sa
export SERVICE_ACCOUNT_EMAIL=${SERVICE_ACCOUNT_NAME}@${PROJECT_ID}.iam.gserviceaccount.com
export TOPIC_NAME=ollama-prompts
export SUBSCRIPTION_NAME=ollama-prompts-sub
export AR_REPO_NAME=ollama-worker-repo
export PULL_MSG_IMAGE_NAME=pubsub-pull-msg
export OLLAMA_IMAGE_NAME=ollama-coordinator

สร้างบัญชีบริการสำหรับพูลผู้ปฏิบัติงาน

gcloud iam service-accounts create ${SERVICE_ACCOUNT_NAME} \
  --display-name="Ollama Worker Service Account"

ให้สิทธิ์การเข้าถึง Pub/Sub แก่ SA

gcloud projects add-iam-policy-binding ${PROJECT_ID} \
  --member="serviceAccount:${SERVICE_ACCOUNT_EMAIL}" \
  --role="roles/pubsub.subscriber"

สร้างที่เก็บ AR สำหรับรูปภาพพูลผู้ปฏิบัติงาน

gcloud artifacts repositories create ${AR_REPO_NAME} \
  --repository-format=docker \
  --location=${REGION}

สร้างหัวข้อและการสมัครใช้บริการ PubSub

gcloud pubsub topics create $TOPIC_NAME
gcloud pubsub subscriptions create $SUBSCRIPTION_NAME --topic $TOPIC_NAME

4. ดาวน์โหลดและโฮสต์โมเดลใน GCS

แทนที่จะดึงโมเดลภายในคอนเทนเนอร์โดยตรงในระหว่างกระบวนการสร้าง ซึ่งอาจช้าและไม่มีประสิทธิภาพ เราจะดึงโมเดลไปยังเครื่องในพื้นที่โดยใช้ Ollama CLI แล้วอัปโหลดไฟล์โมเดลไปยังที่เก็บข้อมูล GCS จากนั้น Worker Pool จะเชื่อมต่อที่เก็บข้อมูลนี้เพื่อเข้าถึงโมเดล

ติดตั้ง Ollama ในเครื่องของคุณ

เรียกใช้คำสั่งต่อไปนี้เพื่อติดตั้ง Ollama ใน Linux สำหรับระบบปฏิบัติการอื่นๆ โปรดดูเว็บไซต์ Ollama

curl -fsSL https://ollama.com/install.sh | sh

เริ่มบริการ Ollama และดึงโมเดล

ก่อนอื่น ให้เริ่มบริการ Ollama ในเบื้องหลัง

ollama serve &
ollama pull gemma3:4b

สร้างที่เก็บข้อมูล GCS

สร้าง Bucket ของ GCS โดยใช้ตัวแปรสภาพแวดล้อม BUCKET_NAME ที่คุณตั้งค่าไว้ก่อนหน้านี้

gsutil mb gs://${BUCKET_NAME}

อัปโหลดไฟล์โมเดลไปยัง Bucket ของ GCS

Ollama จัดเก็บไฟล์โมเดลไว้ในไดเรกทอรี ~/.ollama/models อัปโหลดเนื้อหาของไดเรกทอรีนี้ไปยังที่เก็บข้อมูล GCS การดำเนินการนี้จะคัดลอกโมเดลทั้งหมดที่คุณดาวน์โหลด

gsutil -m cp -r ~/.ollama/models/* gs://${BUCKET_NAME}/

ให้สิทธิ์ SA เข้าถึงที่เก็บข้อมูล Cloud Storage

gcloud storage buckets add-iam-policy-binding gs://${BUCKET_NAME} \
     --member=serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
     --role=roles/storage.objectViewer

5. สร้างงาน Cloud Run

งานใน Cloud Run ใช้คอนเทนเนอร์ 2 รายการ ได้แก่

ollama-coordinator - สำหรับโฮสต์ ollama และแสดงโมเดล gemma 3 4B
pubsub-pull-msg - สำหรับการดึงข้อมูลจากการสมัครใช้บริการ PubSub และส่งข้อความไปยังคอนเทนเนอร์ ollama-coordinator

ก่อนอื่น คุณจะต้องสร้างคอนเทนเนอร์ ollama-coordinator

สร้างไดเรกทอรีหลักสำหรับ Codelab

mkdir codelab-ollama-wp
cd codelab-ollama-wp

สร้างไดเรกทอรีสำหรับคอนเทนเนอร์ ollama-coordinator

mkdir ollama-coordinator
cd ollama-coordinator

สร้าง Dockerfile ที่มีเนื้อหาต่อไปนี้

# Use the official Ollama image as a base image
FROM ollama/ollama

# Expose the port that Ollama listens on
EXPOSE 11434

# Set the entrypoint to start the Ollama server
ENTRYPOINT ["ollama", "serve"]

สร้างคอนเทนเนอร์ Ollama

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME} --timeout=20m

จากนั้นคุณจะสร้างคอนเทนเนอร์ pubsub-pull-msg

สร้างไดเรกทอรีสำหรับคอนเทนเนอร์ pubsub-pull-msg

cd ..
mkdir pubsub-pull-msg
cd pubsub-pull-msg

สร้าง Dockerfile

# Use the official Python image as a base image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy the requirements file into the container
COPY requirements.txt .

# Install the required Python packages
RUN pip install --no-cache-dir -r requirements.txt

# Copy the Python script into the container
COPY main.py .

# Set the entrypoint to run the Python script
CMD ["python", "main.py"]

สร้างไฟล์ requirements.txt ที่มีเนื้อหาต่อไปนี้

google-cloud-pubsub
requests

สร้างไฟล์ main.py ที่มีเนื้อหาต่อไปนี้

import os
import sys
import requests
import json
from google.cloud import pubsub_v1

# --- Main Application Logic ---
print("--- Sidecar container script started ---")

# --- Environment and Configuration ---
project_id = os.environ.get("PROJECT_ID")
subscription_name = os.environ.get("SUBSCRIPTION_NAME")
ollama_api_url = "http://localhost:11434/api/generate"

if not project_id or not subscription_name:
    print("FATAL: PROJECT_ID and SUBSCRIPTION_NAME must be set.")
    sys.exit(1)

print(f"PROJECT_ID: {project_id}")
print(f"SUBSCRIPTION_NAME: {subscription_name}")

def callback(message):
    """Processes a single Pub/Sub message."""
    print(f"Received message ID: {message.message_id}")
    try:
        prompt = message.data.decode("utf-8")
        print(f"Decoded prompt: '{prompt}'")
        
        data = {"model": "gemma3:4b", "prompt": prompt, "stream": False}
        
        print("Sending request to Ollama...")
        response = requests.post(ollama_api_url, json=data, timeout=300)
        response.raise_for_status()
        
        print("Successfully received response from Ollama.")
        ollama_response = response.json()
        print(f"Ollama response: {json.dumps(ollama_response)[:200]}...")

        message.ack()
        print(f"Message {message.message_id} acknowledged.")

    except requests.exceptions.RequestException as e:
        print(f"Error calling Ollama API: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")
    except Exception as e:
        print(f"An unexpected error occurred in callback: {e}")
        message.nack()
        print(f"Message {message.message_id} not acknowledged.")

def main():
    """Starts the Pub/Sub subscriber."""
    subscriber = pubsub_v1.SubscriberClient()
    subscription_path = subscriber.subscription_path(project_id, subscription_name)
    
    streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
    print(f"Subscribed to {subscription_path}. Listening for messages...")

    try:
        # .result() will block indefinitely.
        streaming_pull_future.result()
    except Exception as e:
        print(f"A fatal error occurred in the subscriber: {e}")
        streaming_pull_future.cancel()
        streaming_pull_future.result()

if __name__ == "__main__":
    main()

ตอนนี้ให้สร้างคอนเทนเนอร์ pubsub-pull-msg

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}

6. นำไปใช้และเรียกใช้งาน

ในขั้นตอนนี้ คุณจะสร้างงาน Cloud Run โดยการทำให้ไฟล์ yaml ใช้งานได้

ย้ายไปที่โฟลเดอร์รูทเพื่อสร้างไฟล์ YAML

cd ..

สร้างไฟล์ worker-pool.template.yaml ที่มีเนื้อหาต่อไปนี้

apiVersion: run.googleapis.com/v1
kind: WorkerPool
metadata:
  name: codelab-ollama-wp
  labels:
    cloud.googleapis.com/location: europe-west1
  annotations:
    run.googleapis.com/launch-stage: BETA
    run.googleapis.com/scalingMode: manual
    run.googleapis.com/manualInstanceCount: '1'
    run.googleapis.com/gcs-fuse-mounter-enabled: "true"
spec:
  template:
    metadata:
      annotations:
        run.googleapis.com/gpu: "1"
        run.googleapis.com/gpu-zonal-redundancy-disabled: 'true'        
    spec:
      serviceAccountName: ${SERVICE_ACCOUNT_EMAIL}
      nodeSelector:
        run.googleapis.com/accelerator: nvidia-l4
      volumes:
      - name: gcs-bucket
        csi:
          driver: gcsfuse.run.googleapis.com
          readOnly: true
          volumeAttributes: 
            bucketName: ${BUCKET_NAME}
      containers:
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${PULL_MSG_IMAGE_NAME}
        name: pubsub-pull-msg
        env:
        - name: PROJECT_ID
          value: ${PROJECT_ID}
        - name: SUBSCRIPTION_NAME
          value: "ollama-prompts-sub"
        - name: PYTHONUNBUFFERED
          value: "1"
        resources:
          limits:
            cpu: '1'
            memory: 1Gi
      - image: ${REGION}-docker.pkg.dev/${PROJECT_ID}/${AR_REPO_NAME}/${OLLAMA_IMAGE_NAME}
        name: ollama-coordinator
        env:
        - name: OLLAMA_MODELS
          value: /mnt/models
        volumeMounts:
        - name: gcs-bucket
          mountPath: /mnt/models
        resources:
          limits:
            cpu: '6'
            nvidia.com/gpu: '1'
            memory: 16Gi

จากนั้นกำหนด URL รูปภาพแบบเต็มและใช้ sed เพื่อแทนที่ตัวแปรในไฟล์เทมเพลตเพื่อสร้าง worker-pool.yaml สุดท้าย

sed -e "s|\${SERVICE_ACCOUNT_EMAIL}|${SERVICE_ACCOUNT_EMAIL}|g" \
     -e "s|\${BUCKET_NAME}|${BUCKET_NAME}|g" \
     -e "s|\${PULL_MSG_IMAGE_NAME}|${PULL_MSG_IMAGE_NAME}|g" \
     -e "s|\${OLLAMA_IMAGE_NAME}|${OLLAMA_IMAGE_NAME}|g" \
     -e "s|\${PROJECT_ID}|${PROJECT_ID}|g" \
     -e "s|\${REGION}|${REGION}|g" \
     -e "s|\${AR_REPO_NAME}|${AR_REPO_NAME}|g" \
     worker-pool.template.yaml > worker-pool.yaml

ตอนนี้คุณสามารถทำให้ใช้งานได้แล้ว

gcloud beta run worker-pools replace worker-pool.yaml

และทดสอบ

gcloud pubsub topics publish ${TOPIC_NAME} --message="What is 1 + 1?"

จากนั้นดูบันทึก คุณอาจต้องรอสักครู่ หรือไปที่หน้ากลุ่มพนักงานของ Cloud Console แล้วดูบันทึกแบบเรียลไทม์

gcloud alpha run worker-pools logs read "codelab-ollama-wp" --limit 10

และคุณควรเห็นข้อความที่ระบุว่า

Ollama response: {"model": "gemma3:4b", "created_at": "2025-11-06T23:48:39.572079369Z", "response": "1 + 1 = 2\n", ...

7. ยินดีด้วย

ขอแสดงความยินดีที่ทำ Codelab นี้เสร็จสมบูรณ์

เราขอแนะนำให้อ่านเอกสารประกอบของ Cloud Run

สิ่งที่เราได้พูดถึงไปแล้ว

วิธีใช้กลุ่ม Worker ของ Cloud Run กับการสมัครใช้บริการ Pub/Sub Pull
วิธีใช้ Ollama เพื่อทำการอนุมานเป็นพูลผู้ปฏิบัติงานของ Cloud Run

8. ล้างข้อมูล

โปรดลบโปรเจ็กต์ที่มีทรัพยากรหรือเก็บโปรเจ็กต์ไว้และลบทรัพยากรแต่ละรายการเพื่อหลีกเลี่ยงการเรียกเก็บเงินจากบัญชี Google Cloud สำหรับทรัพยากรที่ใช้ในบทแนะนำนี้

การลบโปรเจ็กต์

วิธีที่ง่ายที่สุดในการยกเลิกการเรียกเก็บเงินคือการลบโปรเจ็กต์ที่คุณสร้างขึ้นสำหรับบทแนะนำ

วิธีลบโปรเจ็กต์

ในคอนโซล Google Cloud ให้ไปที่หน้าจัดการทรัพยากร
ในรายการโปรเจ็กต์ ให้เลือกโปรเจ็กต์ที่ต้องการลบ แล้วคลิกลบ
ในกล่องโต้ตอบ ให้พิมพ์รหัสโปรเจ็กต์ แล้วคลิกปิดเพื่อลบโปรเจ็กต์

การลบทรัพยากรแต่ละรายการ

หากต้องการลบทรัพยากรแต่ละรายการ ให้เรียกใช้คำสั่งต่อไปนี้

ลบพูลผู้ปฏิบัติงาน Cloud Run

gcloud beta run worker-pools delete codelab-ollama-wp --region ${REGION}

ลบ Bucket ของ GCS โดยทำดังนี้

gsutil -m rm -r gs://${BUCKET_NAME}

ลบการสมัครใช้บริการและหัวข้อ Pub/Sub โดยทำดังนี้

gcloud pubsub subscriptions delete ${SUBSCRIPTION_NAME}
gcloud pubsub topics delete ${TOPIC_NAME}

ลบที่เก็บ Artifact Registry

gcloud artifacts repositories delete ${AR_REPO_NAME} --location=${REGION} --quiet

ลบบัญชีบริการโดยทำดังนี้

gcloud iam service-accounts delete ${SERVICE_ACCOUNT_EMAIL} --quiet

การล้างไฟล์ในเครื่อง

หากต้องการล้างไฟล์ในเครื่อง ให้ทำดังนี้

หยุดบริการ Ollama ในเครื่อง:หากเริ่ม Ollama ด้วย ollama serve & คุณจะหยุดได้โดยค้นหารหัสกระบวนการ (PID) แล้วใช้คำสั่ง kill
```
# Find the process ID of the Ollama server
pgrep ollama

# Replace <PID> with the actual process ID obtained from the previous command
kill <PID>
```
วิธีลบโมเดลที่ดาวน์โหลด

rm -rf ~/.ollama/models

ถอนการติดตั้ง Ollama

ทำตามวิธีการในเว็บไซต์ Ollama เพื่อถอนการติดตั้ง Ollama จากเครื่องของคุณ