이 페이지는 Cloud Translation API를 통해 번역되었습니다.

Gemini (Python)를 사용하여 클라우드에서 멀티모달 어시스턴트 빌드 및 배포

1. 소개

이 Codelab에서는 채팅 웹 인터페이스 형식의 애플리케이션을 빌드합니다. 이 애플리케이션을 통해 통신하고, 문서나 이미지를 업로드하고, 이를 토론할 수 있습니다. 애플리케이션 자체는 프런트엔드와 백엔드라는 두 가지 서비스로 분리되어 있으므로 빠른 프로토타입을 빌드하고 사용해 볼 수 있으며 API 계약이 두 가지를 모두 통합하는 방식을 이해할 수 있습니다.

이 Codelab에서는 다음과 같이 단계별로 접근합니다.

Google Cloud 프로젝트를 준비하고 필요한 모든 API를 사용 설정합니다.
Gradio 라이브러리를 사용하여 프런트엔드 서비스 - 채팅 인터페이스 빌드
수신 데이터를 Gemini SDK 표준으로 재형식화하고 Gemini API와의 통신을 사용 설정하는 FastAPI를 사용하여 백엔드 서비스(HTTP 서버) 빌드
Cloud Run에 애플리케이션을 배포하는 데 필요한 환경 변수를 관리하고 필요한 파일을 설정합니다.
Cloud Run에 애플리케이션 배포

아키텍처 개요

기본 요건

Gemini API 및 Google Gen AI SDK를 능숙하게 사용합니다.
HTTP 서비스를 사용하는 기본 전체 스택 아키텍처 이해

학습할 내용

Gemini SDK를 사용하여 텍스트 및 기타 데이터 유형 (멀티모달)을 제출하고 텍스트 응답을 생성하는 방법
대화 컨텍스트를 유지하기 위해 Gemini SDK에 채팅 기록을 구성하는 방법
Gradio를 사용한 프런트엔드 웹 프로토타이핑
FastAPI 및 Pydantic을 사용한 백엔드 서비스 개발
Pydantic-settings를 사용하여 YAML 파일에서 환경 변수 관리
Dockerfile을 사용하여 Cloud Run에 애플리케이션을 배포하고 YAML 파일로 환경 변수를 제공합니다.

필요한 항목

Chrome 웹브라우저
Gmail 계정
결제가 사용 설정된 Cloud 프로젝트

이 Codelab은 초보자를 포함한 모든 수준의 개발자를 위해 설계되었으며 샘플 애플리케이션에서 Python을 사용합니다. 하지만 Python 지식이 없어도 제시된 개념을 이해하는 데는 문제가 없습니다.

2. 시작하기 전에

Cloud Shell 편집기에서 Cloud 프로젝트 설정하기

이 Codelab에서는 결제가 사용 설정된 Google Cloud 프로젝트가 이미 있다고 가정합니다. 아직 계정이 없다면 아래 안내에 따라 시작할 수 있습니다.

2Google Cloud 콘솔의 프로젝트 선택기 페이지에서 Google Cloud 프로젝트를 선택하거나 만듭니다.
Cloud 프로젝트에 결제가 사용 설정되어 있어야 하므로 프로젝트에 결제가 사용 설정되어 있는지 확인하는 방법을 알아보세요 .
bq가 미리 로드되어 제공되는 Google Cloud에서 실행되는 명령줄 환경인 Cloud Shell을 사용합니다. Google Cloud 콘솔 상단에서 Cloud Shell 활성화를 클릭합니다.

Cloud Shell에 연결되면 다음 명령어를 사용하여 이미 인증되었는지, 프로젝트가 프로젝트 ID로 설정되어 있는지 확인합니다.

gcloud auth list

Cloud Shell에서 다음 명령어를 실행하여 gcloud 명령어가 프로젝트를 알고 있는지 확인합니다.

gcloud config list project

프로젝트가 설정되지 않은 경우 다음 명령어를 사용하여 설정합니다.

gcloud config set project <YOUR_PROJECT_ID>

또는 콘솔에서 PROJECT_ID ID를 확인할 수도 있습니다.

이 아이콘을 클릭하면 오른쪽에 모든 프로젝트와 프로젝트 ID가 표시됩니다.

아래 명령어를 통해 필수 API를 사용 설정합니다. 이 작업은 몇 분 정도 걸릴 수 있으니 기다려 주시기 바랍니다.

gcloud services enable aiplatform.googleapis.com \
                           run.googleapis.com \
                           cloudbuild.googleapis.com \
                           cloudresourcemanager.googleapis.com

명령어 실행이 성공하면 아래와 유사한 메시지가 표시됩니다.

Operation "operations/..." finished successfully.

gcloud 명령어 대신 각 제품을 검색하거나 이 링크를 사용하여 콘솔을 통해 수행할 수도 있습니다.

누락된 API가 있으면 구현 과정에서 언제든지 사용 설정할 수 있습니다.

gcloud 명령어 및 사용법은 문서를 참조하세요.

애플리케이션 작업 디렉터리 설정

'편집기 열기' 버튼을 클릭하면 Cloud Shell 편집기가 열리고 여기에 코드를 작성할 수 있습니다.
Cloud Code 프로젝트가 아래 이미지에서 강조 표시된 것처럼 Cloud Shell 편집기의 왼쪽 하단 (상태 표시줄)에 설정되어 있고 결제가 사용 설정된 활성 Google Cloud 프로젝트로 설정되어 있는지 확인합니다. 메시지가 표시되면 승인을 클릭합니다. Cloud Shell 편집기를 초기화한 후 Cloud Code - Sign In 버튼이 표시될 때까지 잠시 기다려 주세요. 이미 이전 명령어를 따랐다면 버튼이 로그인 버튼 대신 활성화된 프로젝트로 직접 연결될 수도 있습니다.

상태 표시줄에서 활성 프로젝트를 클릭하고 Cloud Code 팝업이 열릴 때까지 기다립니다. 팝업에서 '새 애플리케이션'을 선택합니다.

애플리케이션 목록에서 Gemini 생성형 AI를 선택한 다음 Gemini API Python을 선택합니다.

원하는 이름으로 새 애플리케이션을 저장합니다. 이 예에서는 gemini-multimodal-chat-assistant를 사용합니다. 그런 다음 확인을 클릭합니다.

이 시점에서는 이미 새 애플리케이션 작업 디렉터리에 있으며 다음 파일이 표시됩니다.

다음으로 Python 환경을 준비합니다.

환경 설정

Python 가상 환경 준비

다음 단계는 개발 환경을 준비하는 것입니다. 이 Codelab에서는 Python 3.12를 활용하고 uv python project manager를 사용하여 Python 버전과 가상 환경을 만들고 관리하는 작업을 간소화합니다.

아직 터미널을 열지 않았다면 터미널 -> 새 터미널을 클릭하거나 Ctrl + Shift + C를 사용하여 터미널을 엽니다.

uv를 다운로드하고 다음 명령어로 python 3.12를 설치합니다.

curl -LsSf https://astral.sh/uv/0.6.6/install.sh | sh && \
source $HOME/.local/bin/env && \
uv python install 3.12

이제 uv를 사용하여 Python 프로젝트를 초기화해 보겠습니다.

uv init

디렉터리에 main.py, .python-version, pyproject.toml이 생성됩니다. 이러한 파일은 디렉터리에서 프로젝트를 유지하는 데 필요합니다. Python 종속 항목 및 구성은 pyproject.toml에서 지정할 수 있으며 .python-version은 이 프로젝트에 사용된 Python 버전을 표준화했습니다. 자세한 내용은 이 문서를 참고하세요.

main.py
.python-version
pyproject.toml

테스트하려면 다음 코드로 main.py를 덮어쓰세요.

def main():
   print("Hello from gemini-multimodal-chat-assistant!")

if __name__ == "__main__":
   main()

그런 다음 다음 명령어를 실행합니다.

uv run main.py

아래와 같은 출력이 표시됩니다.

Using CPython 3.12
Creating virtual environment at: .venv
Hello from gemini-multimodal-chat-assistant!

이는 python 프로젝트가 올바르게 설정되고 있음을 나타냅니다. uv가 이미 가상 환경을 처리하므로 가상 환경을 수동으로 만들 필요가 없습니다. 따라서 이제부터 표준 Python 명령어 (예: python main.py)가 uv run (예: uv run main.py)으로 대체됩니다.

필수 종속 항목 설치

uv 명령어를 사용하여 이 Codelab 패키지 종속 항목도 추가합니다. 다음 명령어를 실행합니다.

uv add google-genai==1.5.0 \
       gradio==5.20.1 \
       pydantic==2.10.6 \
       pydantic-settings==2.8.1 \
       pyyaml==6.0.2

이전 명령어를 반영하도록 pyproject.toml 'dependencies' 섹션이 업데이트됩니다.

설정 구성 파일

이제 이 프로젝트의 구성 파일을 설정해야 합니다. 구성 파일은 재배포 시 쉽게 변경할 수 있는 동적 변수를 저장하는 데 사용됩니다. 이 프로젝트에서는 나중에 Cloud Run 배포와 쉽게 통합할 수 있도록 pydantic-settings 패키지와 함께 YAML 기반 구성 파일을 사용합니다. pydantic-settings는 구성 파일의 유형 검사를 적용할 수 있는 Python 패키지입니다.

다음 구성으로 settings.yaml이라는 파일을 만듭니다. File->New Text File을 클릭하고 다음 코드로 채웁니다. 그런 다음 settings.yaml로 저장합니다.

VERTEXAI_LOCATION: "us-central1"
VERTEXAI_PROJECT_ID: "{YOUR-PROJECT-ID}"
BACKEND_URL: "http://localhost:8081/chat"

Google Cloud 프로젝트를 만들 때 선택한 대로 VERTEXAI_PROJECT_ID 값을 업데이트하세요. 이 Codelab에서는 VERTEXAI_LOCATION 및 BACKEND_URL의 사전 구성된 값을 사용합니다 .

그런 다음 python 파일 settings.py를 만듭니다. 이 모듈은 구성 파일의 구성 값에 대한 프로그래매틱 항목으로 작동합니다. File->New Text File을 클릭하고 다음 코드로 채웁니다. 그런 다음 settings.py로 저장합니다. 코드에서 settings.yaml이라는 파일이 읽힐 파일임을 명시적으로 설정했습니다.

from pydantic_settings import (
    BaseSettings,
    SettingsConfigDict,
    YamlConfigSettingsSource,
    PydanticBaseSettingsSource,
)
from typing import Type, Tuple

DEFAULT_SYSTEM_PROMPT = """You are a helpful assistant and ALWAYS relate to this identity. 
You are expert at analyzing given documents or images.
"""

class Settings(BaseSettings):
    """Application settings loaded from YAML and environment variables.

    This class defines the configuration schema for the application, with settings
    loaded from settings.yaml file and overridable via environment variables.

    Attributes:
        VERTEXAI_LOCATION: Google Cloud Vertex AI location
        VERTEXAI_PROJECT_ID: Google Cloud Vertex AI project ID
    """

    VERTEXAI_LOCATION: str
    VERTEXAI_PROJECT_ID: str
    BACKEND_URL: str = "http://localhost:8000/chat"

    model_config = SettingsConfigDict(
        yaml_file="settings.yaml", yaml_file_encoding="utf-8"
    )

    @classmethod
    def settings_customise_sources(
        cls,
        settings_cls: Type[BaseSettings],
        init_settings: PydanticBaseSettingsSource,
        env_settings: PydanticBaseSettingsSource,
        dotenv_settings: PydanticBaseSettingsSource,
        file_secret_settings: PydanticBaseSettingsSource,
    ) -> Tuple[PydanticBaseSettingsSource, ...]:
        """Customize the settings sources and their priority order.

        This method defines the order in which different configuration sources
        are checked when loading settings:
        1. Constructor-provided values
        2. YAML configuration file
        3. Environment variables

        Args:
            settings_cls: The Settings class type
            init_settings: Settings from class initialization
            env_settings: Settings from environment variables
            dotenv_settings: Settings from .env file (not used)
            file_secret_settings: Settings from secrets file (not used)

        Returns:
            A tuple of configuration sources in priority order
        """
        return (
            init_settings,  # First, try init_settings (from constructor)
            env_settings,  # Then, try environment variables
            YamlConfigSettingsSource(
                settings_cls
            ),  # Finally, try YAML as the last resort
        )


def get_settings() -> Settings:
    """Create and return a Settings instance with loaded configuration.

    Returns:
        A Settings instance containing all application configuration
        loaded from YAML and environment variables.
    """
    return Settings()

이러한 구성을 통해 런타임을 유연하게 업데이트할 수 있습니다. 초기 배포에서는 첫 번째 기본 구성을 갖도록 settings.yaml 구성을 사용합니다. 그런 다음 콘솔을 통해 환경 변수를 유연하게 업데이트하고 기본 YAML 구성에 비해 환경 변수의 우선순위를 높여 다시 배포할 수 있습니다.

이제 다음 단계인 서비스 빌드로 이동할 수 있습니다.

3. Gradio를 사용하여 프런트엔드 서비스 빌드

다음과 같은 채팅 웹 인터페이스를 빌드합니다.

여기에는 사용자가 텍스트를 전송하고 파일을 업로드할 수 있는 입력란이 포함됩니다. 또한 사용자는 추가 입력란에서 Gemini API로 전송될 시스템 안내를 덮어쓸 수도 있습니다.

Gradio를 사용하여 프런트엔드 서비스를 빌드합니다. main.py의 이름을 frontend.py로 바꾸고 다음 코드를 사용하여 코드를 덮어씁니다.

import gradio as gr
import requests
import base64
from pathlib import Path
from typing import List, Dict, Any
from settings import get_settings, DEFAULT_SYSTEM_PROMPT

settings = get_settings()

IMAGE_SUFFIX_MIME_MAP = {
    ".png": "image/png",
    ".jpg": "image/jpeg",
    ".jpeg": "image/jpeg",
    ".heic": "image/heic",
    ".heif": "image/heif",
    ".webp": "image/webp",
}
DOCUMENT_SUFFIX_MIME_MAP = {
    ".pdf": "application/pdf",
}


def get_mime_type(filepath: str) -> str:
    """Get the MIME type for a file based on its extension.

    Args:
        filepath: Path to the file.

    Returns:
        str: The MIME type of the file.

    Raises:
        ValueError: If the file type is not supported.
    """
    filepath = Path(filepath)
    suffix = filepath.suffix

    # modify ".jpg" suffix to ".jpeg" to unify the mime type
    suffix = suffix if suffix != ".jpg" else ".jpeg"

    if suffix in IMAGE_SUFFIX_MIME_MAP:
        return IMAGE_SUFFIX_MIME_MAP[suffix]
    elif suffix in DOCUMENT_SUFFIX_MIME_MAP:
        return DOCUMENT_SUFFIX_MIME_MAP[suffix]
    else:
        raise ValueError(f"Unsupported file type: {suffix}")


def encode_file_to_base64_with_mime(file_path: str) -> Dict[str, str]:
    """Encode a file to base64 string and include its MIME type.

    Args:
        file_path: Path to the file to encode.

    Returns:
        Dict[str, str]: Dictionary with 'data' and 'mime_type' keys.
    """
    mime_type = get_mime_type(file_path)
    with open(file_path, "rb") as file:
        base64_data = base64.b64encode(file.read()).decode("utf-8")

    return {"data": base64_data, "mime_type": mime_type}


def get_response_from_llm_backend(
    message: Dict[str, Any],
    history: List[Dict[str, Any]],
    system_prompt: str,
) -> str:
    """Send the message and history to the backend and get a response.

    Args:
        message: Dictionary containing the current message with 'text' and optional 'files' keys.
        history: List of previous message dictionaries in the conversation.
        system_prompt: The system prompt to be sent to the backend.

    Returns:
        str: The text response from the backend service.
    """

    # Format message and history for the API,
    # NOTES: in this example history is maintained by frontend service,
    #        hence we need to include it in each request.
    #        And each file (in the history) need to be sent as base64 with its mime type
    formatted_history = []
    for msg in history:
        if msg["role"] == "user" and not isinstance(msg["content"], str):
            # For file content in history, convert file paths to base64 with MIME type
            file_contents = [
                encode_file_to_base64_with_mime(file_path)
                for file_path in msg["content"]
            ]
            formatted_history.append({"role": msg["role"], "content": file_contents})
        else:
            formatted_history.append({"role": msg["role"], "content": msg["content"]})

    # Extract files and convert to base64 with MIME type
    files_with_mime = []
    if uploaded_files := message.get("files", []):
        for file_path in uploaded_files:
            files_with_mime.append(encode_file_to_base64_with_mime(file_path))

    # Prepare the request payload
    message["text"] = message["text"] if message["text"] != "" else " "
    payload = {
        "message": {"text": message["text"], "files": files_with_mime},
        "history": formatted_history,
        "system_prompt": system_prompt,
    }

    # Send request to backend
    try:
        response = requests.post(settings.BACKEND_URL, json=payload)
        response.raise_for_status()  # Raise exception for HTTP errors

        result = response.json()
        if error := result.get("error"):
            return f"Error: {error}"

        return result.get("response", "No response received from backend")
    except requests.exceptions.RequestException as e:
        return f"Error connecting to backend service: {str(e)}"


if __name__ == "__main__":
    demo = gr.ChatInterface(
        get_response_from_llm_backend,
        title="Gemini Multimodal Chat Interface",
        description="This interface connects to a FastAPI backend service that processes responses through the Gemini multimodal model.",
        type="messages",
        multimodal=True,
        textbox=gr.MultimodalTextbox(file_count="multiple"),
        additional_inputs=[
            gr.Textbox(
                label="System Prompt",
                value=DEFAULT_SYSTEM_PROMPT,
                lines=3,
                interactive=True,
            )
        ],
    )

    demo.launch(
        server_name="0.0.0.0",
        server_port=8080,
    )

그런 다음 다음 명령어를 사용하여 프런트엔드 서비스를 실행해 볼 수 있습니다. main.py 파일의 이름을 frontend.py로 바꾸어야 합니다.

uv run frontend.py

Cloud 콘솔에 다음과 유사한 출력이 표시됩니다.

* Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.

그런 다음 로컬 URL 링크를 Ctrl+클릭하면 웹 인터페이스를 확인할 수 있습니다. 또는 Cloud 편집기 오른쪽 상단에 있는 웹 미리보기 버튼을 클릭하고 포트 8080에서 미리보기를 선택하여 프런트엔드 애플리케이션에 액세스할 수도 있습니다.

웹 인터페이스가 표시되지만 아직 설정되지 않은 백엔드 서비스로 인해 채팅을 제출하려고 하면 예상되는 오류가 발생합니다.

이제 서비스를 실행하고 아직 종료하지 않습니다. 그동안 중요한 코드 구성요소를 여기에서 논의할 수 있습니다.

코드 설명

웹 인터페이스에서 백엔드로 데이터를 전송하는 코드가 이 부분에 있습니다.

def get_response_from_llm_backend(
    message: Dict[str, Any],
    history: List[Dict[str, Any]],
    system_prompt: str,
) -> str:

    ... 
    # Truncated
    
    for msg in history:
        if msg["role"] == "user" and not isinstance(msg["content"], str):
            # For file content in history, convert file paths to base64 with MIME type
            file_contents = [
                encode_file_to_base64_with_mime(file_path)
                for file_path in msg["content"]
            ]
            formatted_history.append({"role": msg["role"], "content": file_contents})
        else:
            formatted_history.append({"role": msg["role"], "content": msg["content"]})

    # Extract files and convert to base64 with MIME type
    files_with_mime = []
    if uploaded_files := message.get("files", []):
        for file_path in uploaded_files:
            files_with_mime.append(encode_file_to_base64_with_mime(file_path))

    # Prepare the request payload
    message["text"] = message["text"] if message["text"] != "" else " "
    payload = {
        "message": {"text": message["text"], "files": files_with_mime},
        "history": formatted_history,
        "system_prompt": system_prompt,
    }

    # Truncated
    ...

멀티모달 데이터를 Gemini로 전송하고 서비스 간에 데이터에 액세스할 수 있도록 하려면 데이터를 코드에 선언된 base64 데이터 유형으로 변환하는 메커니즘을 사용할 수 있습니다. 또한 데이터의 MIME 유형을 선언해야 합니다. 그러나 Gemini API는 기존 MIME 유형을 모두 지원할 수 없으므로 이 문서에서 Gemini에서 지원하는 MIME 유형을 확인하는 것이 중요합니다 . 각 Gemini API 기능 (예: Vision)에서 정보를 확인할 수 있습니다.

또한 채팅 인터페이스에서는 Gemini에 대화의 '메모리'를 제공하기 위해 채팅 기록을 추가 컨텍스트로 전송하는 것도 중요합니다. 따라서 이 웹 인터페이스에서는 Gradio에서 웹 세션별로 관리하는 채팅 기록도 전송하고 사용자의 메시지 입력과 함께 전송합니다. 또한 사용자가 시스템 안내를 수정하고 전송할 수도 있습니다.

4. FastAPI를 사용하여 백엔드 서비스 빌드

다음으로 이전에 설명한 페이로드인 마지막 사용자 메시지, 채팅 기록, 시스템 안내를 처리할 수 있는 백엔드를 빌드해야 합니다. FastAPI를 사용하여 HTTP 백엔드 서비스를 만듭니다.

새 파일을 만들고 파일 -> 새 텍스트 파일을 클릭한 다음 다음 코드를 복사하여 붙여넣고 backend.py로 저장합니다.

import base64
from fastapi import FastAPI, Body
from google.genai.types import Content, Part
from google.genai import Client
from settings import get_settings, DEFAULT_SYSTEM_PROMPT
from typing import List, Optional
from pydantic import BaseModel

app = FastAPI(title="Gemini Multimodal Service")

settings = get_settings()
GENAI_CLIENT = Client(
    location=settings.VERTEXAI_LOCATION,
    project=settings.VERTEXAI_PROJECT_ID,
    vertexai=True,
)
GEMINI_MODEL_NAME = "gemini-2.0-flash-001"


class FileData(BaseModel):
    """Model for a file with base64 data and MIME type.

    Attributes:
        data: Base64 encoded string of the file content.
        mime_type: The MIME type of the file.
    """

    data: str
    mime_type: str


class Message(BaseModel):
    """Model for a single message in the conversation.

    Attributes:
        role: The role of the message sender, either 'user' or 'assistant'.
        content: The text content of the message or a list of file data objects.
    """

    role: str
    content: str | List[FileData]


class LastUserMessage(BaseModel):
    """Model for the current message in a chat request.

    Attributes:
        text: The text content of the message.
        files: List of file data objects containing base64 data and MIME type.
    """

    text: str
    files: List[FileData] = []


class ChatRequest(BaseModel):
    """Model for a chat request.

    Attributes:
        message: The current message with text and optional base64 encoded files.
        history: List of previous messages in the conversation.
        system_prompt: Optional system prompt to be used in the chat.
    """

    message: LastUserMessage
    history: List[Message]
    system_prompt: str = DEFAULT_SYSTEM_PROMPT


class ChatResponse(BaseModel):
    """Model for a chat response.

    Attributes:
        response: The text response from the model.
        error: Optional error message if something went wrong.
    """

    response: str
    error: Optional[str] = None


def handle_multimodal_data(file_data: FileData) -> Part:
    """Converts Multimodal data to a Google Gemini Part object.

    Args:
        file_data: FileData object with base64 data and MIME type.

    Returns:
        Part: A Google Gemini Part object containing the file data.
    """
    data = base64.b64decode(file_data.data)  # decode base64 string to bytes
    return Part.from_bytes(data=data, mime_type=file_data.mime_type)


def format_message_history_to_gemini_standard(
    message_history: List[Message],
) -> List[Content]:
    """Converts message history format to Google Gemini Content format.

    Args:
        message_history: List of message objects from the chat history.
            Each message contains 'role' and 'content' attributes.

    Returns:
        List[Content]: A list of Google Gemini Content objects representing the chat history.

    Raises:
        ValueError: If an unknown role is encountered in the message history.
    """
    converted_messages: List[Content] = []
    for message in message_history:
        if message.role == "assistant":
            converted_messages.append(
                Content(role="model", parts=[Part.from_text(text=message.content)])
            )
        elif message.role == "user":
            # Text-only messages
            if isinstance(message.content, str):
                converted_messages.append(
                    Content(role="user", parts=[Part.from_text(text=message.content)])
                )

            # Messages with files
            elif isinstance(message.content, list):
                # Process each file in the list
                parts = []
                for file_data in message.content:
                    for file_data in message.content:
                        parts.append(handle_multimodal_data(file_data))

                # Add the parts to a Content object
                if parts:
                    converted_messages.append(Content(role="user", parts=parts))

            else:
                raise ValueError(f"Unexpected content format: {type(message.content)}")

        else:
            raise ValueError(f"Unknown role: {message.role}")

    return converted_messages


@app.post("/chat", response_model=ChatResponse)
async def chat(
    request: ChatRequest = Body(...),
) -> ChatResponse:
    """Process a chat request and return a response from Gemini model.

    Args:
        request: The chat request containing message and history.

    Returns:
        ChatResponse: The model's response to the chat request.
    """
    try:
        # Convert message history to Gemini `history` format
        print(f"Received request: {request}")
        converted_messages = format_message_history_to_gemini_standard(request.history)

        # Create chat model
        chat_model = GENAI_CLIENT.chats.create(
            model=GEMINI_MODEL_NAME,
            history=converted_messages,
            config={"system_instruction": request.system_prompt},
        )

        # Prepare multimodal content
        content_parts = []

        # Handle any base64 encoded files in the current message
        if request.message.files:
            for file_data in request.message.files:
                content_parts.append(handle_multimodal_data(file_data))

        # Add text content
        content_parts.append(Part.from_text(text=request.message.text))

        # Send message to Gemini
        response = chat_model.send_message(content_parts)
        print(f"Generated response: {response}")

        return ChatResponse(response=response.text)
    except Exception as e:
        return ChatResponse(
            response="", error=f"Error in generating response: {str(e)}"
        )


if __name__ == "__main__":
    import uvicorn

    uvicorn.run(app, host="0.0.0.0", port=8081)

backend.py로 저장해야 합니다. 그런 다음 백엔드 서비스를 실행해 볼 수 있습니다. 이전 단계에서 프런트엔드 서비스를 실행했습니다. 이제 새 터미널을 열고 이 백엔드 서비스를 실행해 보겠습니다.

새 터미널을 만듭니다. 하단 영역의 터미널로 이동하여 '+' 버튼을 찾아 새 터미널을 만듭니다. 또는 Ctrl + Shift + C를 눌러 새 터미널을 열 수 있습니다.

그런 다음 작업 디렉터리 gemini-multimodal-chat-assistant에 있는지 확인한 다음 다음 명령어를 실행합니다.

uv run backend.py

성공하면 다음과 같은 출력이 표시됩니다.

INFO:     Started server process [xxxxx]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)

코드 설명

채팅 요청을 수신할 HTTP 경로 정의

FastAPI에서는 app 데코레이터를 사용하여 경로를 정의합니다. Pydantic을 사용하여 API 계약도 정의합니다. 응답을 생성하는 경로가 POST 메서드와 함께 /chat 경로에 있다고 지정합니다. 다음 코드에 선언된 이러한 기능

class FileData(BaseModel):
    data: str
    mime_type: str

class Message(BaseModel):
    role: str
    content: str | List[FileData]

class LastUserMessage(BaseModel):
    text: str
    files: List[FileData] = []

class ChatRequest(BaseModel):
    message: LastUserMessage
    history: List[Message]
    system_prompt: str = DEFAULT_SYSTEM_PROMPT

class ChatResponse(BaseModel):
    response: str
    error: Optional[str] = None

    ...

@app.post("/chat", response_model=ChatResponse)
async def chat(
    request: ChatRequest = Body(...),
) -> ChatResponse:
    
    # Truncated
    ...

Gemini SDK 채팅 기록 형식 준비하기

이해해야 하는 중요한 사항 중 하나는 나중에 Gemini 클라이언트를 초기화할 때 history 인수 값으로 삽입될 수 있도록 채팅 기록을 재구성하는 방법입니다. 아래 코드를 검사할 수 있습니다.

def format_message_history_to_gemini_standard(
    message_history: List[Message],
) -> List[Content]:
    
    ...
    # Truncated    

    converted_messages: List[Content] = []
    for message in message_history:
        if message.role == "assistant":
            converted_messages.append(
                Content(role="model", parts=[Part.from_text(text=message.content)])
            )
        elif message.role == "user":
            # Text-only messages
            if isinstance(message.content, str):
                converted_messages.append(
                    Content(role="user", parts=[Part.from_text(text=message.content)])
                )

            # Messages with files
            elif isinstance(message.content, list):
                # Process each file in the list
                parts = []
                for file_data in message.content:
                    parts.append(handle_multimodal_data(file_data))

                # Add the parts to a Content object
                if parts:
                    converted_messages.append(Content(role="user", parts=parts))
    
    #Truncated
    ...

    return converted_messages

Gemini SDK에 채팅 기록을 제공하려면 데이터 형식을 List[Content] 데이터 유형으로 지정해야 합니다. 각 콘텐츠에는 역할 및 부분 값이 하나 이상 있어야 합니다. 역할은 사용자 또는 모델 등 메시지의 소스를 나타냅니다. 여기서 parts는 프롬프트 자체를 나타내며, 텍스트일 수도 있고 다양한 모달리티의 조합일 수도 있습니다. 이 문서에서 콘텐츠 인수를 구성하는 방법을 자세히 알아보세요.

텍스트가 아닌 ( 멀티모달) 데이터 처리

프런트엔드 섹션에서 이전에 언급한 것처럼 텍스트가 아닌 데이터 또는 멀티모달 데이터를 전송하는 방법 중 하나는 데이터를 base64 문자열로 전송하는 것입니다. 또한 데이터를 올바르게 해석할 수 있도록 데이터의 MIME 유형을 지정해야 합니다. 예를 들어 .jpg 접미사가 있는 이미지 데이터를 전송하는 경우 image/jpeg MIME 유형을 제공합니다.

이 코드 부분은 base64 데이터를 Gemini SDK의 Part.from_bytes 형식으로 변환합니다.

def handle_multimodal_data(file_data: FileData) -> Part:
    """Converts Multimodal data to a Google Gemini Part object.

    Args:
        file_data: FileData object with base64 data and MIME type.

    Returns:
        Part: A Google Gemini Part object containing the file data.
    """
    data = base64.b64decode(file_data.data)  # decode base64 string to bytes
    return Part.from_bytes(data=data, mime_type=file_data.mime_type)

5. 통합 테스트

이제 여러 Cloud 콘솔 탭에서 여러 서비스가 실행되고 있습니다.

포트 8080에서 실행되는 프런트엔드 서비스

* Running on local URL:  http://0.0.0.0:8080

To create a public link, set `share=True` in `launch()`.

포트 8081에서 실행되는 백엔드 서비스

INFO:     Started server process [xxxxx]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8081 (Press CTRL+C to quit)

현재 상태에서 포트 8080의 웹 애플리케이션에서 어시스턴트와 원활하게 채팅으로 문서를 보낼 수 있습니다. 파일을 업로드하고 질문을 하여 실험을 시작할 수 있습니다. 일부 파일 형식은 아직 지원되지 않으며 오류가 발생합니다.

텍스트 상자 아래의 추가 입력 필드에서 시스템 안내를 수정할 수도 있습니다.

6. Cloud Run에 배포

이제 이 멋진 앱을 다른 사용자에게 소개하고자 합니다. 이를 위해 이 애플리케이션을 패키징하고 다른 사용자가 액세스할 수 있는 공개 서비스로 Cloud Run에 배포할 수 있습니다. 이를 위해 아키텍처를 다시 살펴보겠습니다.

이 Codelab에서는 프런트엔드 서비스와 백엔드 서비스를 모두 하나의 컨테이너에 배치합니다. 두 서비스를 모두 관리하려면 supervisord의 도움이 필요합니다.

새 파일을 만들고 파일 -> 새 텍스트 파일을 클릭한 다음 다음 코드를 복사하여 붙여넣고 supervisord.conf로 저장합니다.

[supervisord]
nodaemon=true
user=root
logfile=/dev/stdout
logfile_maxbytes=0
pidfile=/var/run/supervisord.pid

[program:backend]
command=uv run backend.py
directory=/app
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
startsecs=10
startretries=3

[program:frontend]
command=uv run frontend.py
directory=/app
autostart=true
autorestart=true
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
startsecs=10
startretries=3

다음으로 Dockerfile이 필요합니다. 파일 -> 새 텍스트 파일을 클릭하고 다음 코드를 복사하여 붙여넣은 후 Dockerfile로 저장합니다.

FROM python:3.12-slim
COPY --from=ghcr.io/astral-sh/uv:0.6.6 /uv /uvx /bin/

RUN apt-get update && apt-get install -y \
    supervisor curl \
    && rm -rf /var/lib/apt/lists/*

ADD . /app
WORKDIR /app

RUN uv sync --frozen

EXPOSE 8080

# Copy supervisord configuration
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf

ENV PYTHONUNBUFFERED=1

ENTRYPOINT ["/usr/bin/supervisord", "-c", "/etc/supervisor/conf.d/supervisord.conf"]

이제 Cloud Run에 애플리케이션을 배포하는 데 필요한 모든 파일이 있으므로 배포해 보겠습니다. Cloud Shell 터미널로 이동하여 현재 프로젝트가 활성 프로젝트로 구성되어 있는지 확인합니다. 그렇지 않은 경우 gcloud configure 명령어를 사용하여 프로젝트 ID를 설정합니다.

gcloud config set project [PROJECT_ID]

그런 다음 다음 명령어를 실행하여 Cloud Run에 배포합니다.

gcloud run deploy --source . \
                  --env-vars-file settings.yaml \
                  --port 8080 \
                  --region us-central1

서비스 이름을 입력하라는 메시지가 표시됩니다(예: 'gemini-multimodal-chat-assistant'). 애플리케이션 작업 디렉터리에 Dockerfile이 있으므로 Docker 컨테이너를 빌드하고 Artifact Registry에 푸시합니다. 또한 리전에 Artifact Registry 저장소가 생성된다는 메시지가 표시되면 'Y'를 선택합니다. 인증되지 않은 호출을 허용할지 묻는 메시지가 표시되면 'y'라고 말합니다. 이 데모 애플리케이션에서는 인증되지 않은 액세스를 허용합니다. 엔터프라이즈 및 프로덕션 애플리케이션에 적절한 인증을 사용하는 것이 좋습니다.

배포가 완료되면 다음과 유사한 링크가 표시됩니다.

https://gemini-multimodal-chat-assistant-*******.us-central1.run.app

시크릿 창이나 휴대기기에서 애플리케이션을 사용해 보세요. 이미 게시되어 있어야 합니다.

7. 도전과제

이제 탐색 기술을 연마하고 빛을 발할 때입니다. 어시스턴트가 오디오 파일 또는 동영상 파일 읽기를 지원할 수 있도록 코드를 변경할 수 있나요?

8. 삭제

이 Codelab에서 사용한 리소스의 비용이 Google Cloud 계정에 청구되지 않도록 하려면 다음 단계를 따르세요.

Google Cloud 콘솔에서 리소스 관리 페이지로 이동합니다.
프로젝트 목록에서 삭제할 프로젝트를 선택하고 삭제를 클릭합니다.
대화상자에서 프로젝트 ID를 입력하고 종료를 클릭하여 프로젝트를 삭제합니다.
또는 콘솔에서 Cloud Run으로 이동하여 방금 배포한 서비스를 선택하고 삭제할 수도 있습니다.