AlloyDB AI로 벡터 임베딩 시작하기

117분 남음

이 Codelab 정보

최종 업데이트: 5월 13, 2025

작성자: Gleb Otochkin

이 페이지는 Cloud Translation API를 통해 번역되었습니다.

1. 소개

이 Codelab에서는 벡터 검색을 Vertex AI 임베딩과 결합하여 AlloyDB AI를 사용하는 방법을 알아봅니다.

기본 요건

Google Cloud 콘솔에 관한 기본적인 이해
명령줄 인터페이스 및 Google 셸의 기본 기술

학습할 내용

AlloyDB 클러스터 및 기본 인스턴스를 배포하는 방법
Google Compute Engine VM에서 AlloyDB에 연결하는 방법
데이터베이스를 만들고 AlloyDB AI를 사용 설정하는 방법
데이터베이스에 데이터를 로드하는 방법
AlloyDB에서 Vertex AI 임베딩 모델을 사용하는 방법
Vertex AI 생성형 모델을 사용하여 결과를 보강하는 방법
벡터 색인을 사용하여 성능을 개선하는 방법

필요한 항목

Google Cloud 계정 및 Google Cloud 프로젝트
웹브라우저(예: Chrome)

자습형 환경 설정

Google Cloud Console에 로그인하여 새 프로젝트를 만들거나 기존 프로젝트를 재사용합니다. 아직 Gmail이나 Google Workspace 계정이 없는 경우 계정을 만들어야 합니다.

프로젝트 이름은 이 프로젝트 참가자의 표시 이름입니다. 이는 Google API에서 사용하지 않는 문자열이며 언제든지 업데이트할 수 있습니다.
프로젝트 ID는 모든 Google Cloud 프로젝트에서 고유하며, 변경할 수 없습니다(설정된 후에는 변경할 수 없음). Cloud 콘솔은 고유한 문자열을 자동으로 생성합니다. 일반적으로는 신경 쓰지 않아도 됩니다. 대부분의 Codelab에서는 프로젝트 ID (일반적으로 PROJECT_ID로 식별됨)를 참조해야 합니다. 생성된 ID가 마음에 들지 않으면 다른 임의 ID를 생성할 수 있습니다. 또는 직접 시도해 보고 사용 가능한지 확인할 수도 있습니다. 이 단계 이후에는 변경할 수 없으며 프로젝트 기간 동안 유지됩니다.
참고로 세 번째 값은 일부 API에서 사용하는 프로젝트 번호입니다. 이 세 가지 값에 대한 자세한 내용은 문서를 참고하세요.

다음으로 Cloud 리소스/API를 사용하려면 Cloud 콘솔에서 결제를 사용 설정해야 합니다. 이 Codelab 실행에는 많은 비용이 들지 않습니다. 이 튜토리얼이 끝난 후에 요금이 청구되지 않도록 리소스를 종료하려면 만든 리소스 또는 프로젝트를 삭제하면 됩니다. Google Cloud 신규 사용자는 300달러(USD) 상당의 무료 체험판 프로그램에 참여할 수 있습니다.

Cloud Shell 시작

Google Cloud를 노트북에서 원격으로 실행할 수 있지만, 이 Codelab에서는 Cloud에서 실행되는 명령줄 환경인 Google Cloud Shell을 사용합니다.

Google Cloud Console의 오른쪽 상단 툴바에 있는 Cloud Shell 아이콘을 클릭합니다.

환경을 프로비저닝하고 연결하는 데 몇 분 정도 소요됩니다. 완료되면 다음과 같이 표시됩니다.

가상 머신에는 필요한 개발 도구가 모두 들어있습니다. 영구적인 5GB 홈 디렉터리를 제공하고 Google Cloud에서 실행되므로 네트워크 성능과 인증이 크게 개선됩니다. 이 Codelab의 모든 작업은 브라우저 내에서 수행할 수 있습니다. 아무것도 설치할 필요가 없습니다.

API 사용 설정

출력:

Cloud Shell 내에 프로젝트 ID가 설정되어 있는지 확인합니다.

gcloud config set project [YOUR-PROJECT-ID]

PROJECT_ID 환경 변수를 설정합니다.

PROJECT_ID=$(gcloud config get-value project)

필요한 모든 서비스를 사용 설정합니다.

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com

예상 출력

student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-14650]
student@cloudshell:~ (test-project-001-402417)$ 
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.

Vertex AI 임베딩 모델을 사용하려면 기본 리전을 구성합니다. Vertex AI를 사용할 수 있는 위치에 대해 자세히 알아보세요. 이 예에서는 us-central1 리전을 사용합니다.

gcloud config set compute/region us-central1

4. AlloyDB 배포

AlloyDB 클러스터를 만들기 전에 향후 AlloyDB 인스턴스에서 사용할 수 있는 비공개 IP 범위가 VPC에 있어야 합니다. 계정이 없는 경우 계정을 만들고 내부 Google 서비스에서 사용할 수 있도록 할당해야 합니다. 그런 다음 클러스터와 인스턴스를 만들 수 있습니다.

비공개 IP 범위 만들기

VPC에서 AlloyDB용 비공개 서비스 액세스 구성을 설정해야 합니다. 여기서는 프로젝트에 '기본' VPC 네트워크가 있고 이 네트워크가 모든 작업에 사용된다고 가정합니다.

비공개 IP 범위를 만듭니다.

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default

할당된 IP 범위를 사용하여 비공개 연결을 만듭니다.

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range].

student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project-402417)$

AlloyDB 클러스터 만들기

이 섹션에서는 us-central1 리전에 AlloyDB 클러스터를 만듭니다.

postgres 사용자의 비밀번호를 정의합니다. 자체 비밀번호를 정의하거나 랜덤 함수를 사용하여 비밀번호를 생성할 수 있습니다.

export PGPASSWORD=`openssl rand -hex 12`

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`

나중에 사용할 수 있도록 PostgreSQL 비밀번호를 기록해 둡니다.

echo $PGPASSWORD

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD
bbefbfde7601985b0dee5723

무료 체험판 클러스터 만들기

이전에 AlloyDB를 사용하지 않은 경우 무료 체험판 클러스터를 만들 수 있습니다.

리전 및 AlloyDB 클러스터 이름을 정의합니다. us-central1 리전과 alloydb-aip-01을 클러스터 이름으로 사용합니다.

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

다음 명령어를 실행하여 클러스터를 만듭니다.

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL

예상되는 콘솔 출력:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

동일한 Cloud Shell 세션에서 클러스터의 AlloyDB 기본 인스턴스를 만듭니다. 연결이 끊어지면 지역 및 클러스터 이름 환경 변수를 다시 정의해야 합니다.

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

AlloyDB 표준 클러스터 만들기

프로젝트의 첫 번째 AlloyDB 클러스터가 아닌 경우 표준 클러스터 만들기를 진행합니다.

리전 및 AlloyDB 클러스터 이름을 정의합니다. us-central1 리전과 alloydb-aip-01을 클러스터 이름으로 사용합니다.

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

다음 명령어를 실행하여 클러스터를 만듭니다.

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

예상되는 콘솔 출력:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION 
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

동일한 Cloud Shell 세션에서 클러스터의 AlloyDB 기본 인스턴스를 만듭니다. 연결이 끊어지면 지역 및 클러스터 이름 환경 변수를 다시 정의해야 합니다.

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

5. AlloyDB에 연결

AlloyDB는 비공개 전용 연결을 사용하여 배포되므로 데이터베이스를 사용하려면 PostgreSQL 클라이언트가 설치된 VM이 필요합니다.

GCE VM 배포

AlloyDB 클러스터와 동일한 리전 및 VPC에 GCE VM을 만듭니다.

Cloud Shell에서 다음을 실행합니다.

export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ export ZONE=us-central1-a
student@cloudshell:~ (test-project-402417)$ export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform

Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/zones/us-central1-a/instances/instance-1].
NAME: instance-1
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE: 
INTERNAL_IP: 10.128.0.2
EXTERNAL_IP: 34.71.192.233
STATUS: RUNNING

Postgres 클라이언트 설치

배포된 VM에 PostgreSQL 클라이언트 소프트웨어 설치

VM에 연결합니다.

gcloud compute ssh instance-1 --zone=us-central1-a

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Updating project ssh metadata...working..Updated [https://www.googleapis.com/compute/v1/projects/test-project-402417].                                                                                                                                                         
Updating project ssh metadata...done.                                                                                                                                                                                                                                              
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5110295539541121102' (ECDSA) to the list of known hosts.
Linux instance-1.us-central1-a.c.gleb-test-short-001-418811.internal 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

VM 내에 명령어를 실행할 소프트웨어를 설치합니다.

sudo apt-get update
sudo apt-get install --yes postgresql-client

예상되는 콘솔 출력:

student@instance-1:~$ sudo apt-get update
sudo apt-get install --yes postgresql-client
Get:1 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable InRelease [5146 B]
Get:2 https://packages.cloud.google.com/apt cloud-sdk-bullseye InRelease [6406 B]   
Hit:3 https://deb.debian.org/debian bullseye InRelease  
Get:4 https://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:5 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable/main amd64 Packages [1930 B]
Get:6 https://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:7 https://deb.debian.org/debian bullseye-backports InRelease [49.0 kB]
...redacted...
update-alternatives: using /usr/share/postgresql/13/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (13+225) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.31-13+deb11u7) ...

인스턴스에 연결

psql을 사용하여 VM에서 기본 인스턴스에 연결합니다.

instance-1 VM에 대한 SSH 세션이 열려 있는 동일한 Cloud Shell 탭에서

언급된 AlloyDB 비밀번호 (PGPASSWORD) 값과 AlloyDB 클러스터 ID를 사용하여 GCE VM에서 AlloyDB에 연결합니다.

export PGPASSWORD=<Noted password>

export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"

예상되는 콘솔 출력:

student@instance-1:~$ export PGPASSWORD=CQhOi5OygD4ps6ty
student@instance-1:~$ ADBCLUSTER=alloydb-aip-01
student@instance-1:~$ REGION=us-central1
student@instance-1:~$ INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
gleb@instance-1:~$ psql "host=$INSTANCE_IP user=postgres sslmode=require"
psql (15.6 (Debian 15.6-0+deb12u1), server 15.5)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=>

psql 세션을 닫습니다.

exit

6. 데이터베이스 준비

데이터베이스를 만들고, Vertex AI 통합을 사용 설정하고, 데이터베이스 객체를 만들고, 데이터를 가져와야 합니다.

AlloyDB에 필요한 권한 부여

AlloyDB 서비스 에이전트에 Vertex AI 권한을 추가합니다.

맨 위에 있는 '+' 기호를 사용하여 다른 Cloud Shell 탭을 엽니다.

새 Cloud Shell 탭에서 다음을 실행합니다.

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1

탭에서 실행 명령어 'exit' 중 하나를 사용하여 탭을 닫습니다.

exit

데이터베이스 만들기

데이터베이스 만들기 빠른 시작

GCE VM 세션에서 다음을 실행합니다.

데이터베이스 만들기:

psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"

예상되는 콘솔 출력:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"
CREATE DATABASE
student@instance-1:~$

Vertex AI 통합 사용 설정

데이터베이스에서 Vertex AI 통합 및 pgvector 확장 프로그램을 사용 설정합니다.

GCE VM에서 다음을 실행합니다.

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"

예상되는 콘솔 출력:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"
CREATE EXTENSION
CREATE EXTENSION
student@instance-1:~$

데이터 가져오기

준비된 데이터를 다운로드하여 새 데이터베이스로 가져옵니다.

GCE VM에서 다음을 실행합니다.

gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"

예상되는 콘솔 출력:

student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
ALTER TABLE
ALTER TABLE
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
COPY 941
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
COPY 263861
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"
COPY 4654
student@instance-1:~$

데이터를 가져온 후 cymbal_products 테이블에 제품 데이터, cymbal_inventory 테이블에 각 매장에서 구매할 수 있는 제품 수를 보여주는 인벤토리, cymbal_stores 테이블에 매장 목록이 생성되었습니다. 제품 설명을 기반으로 벡터 데이터를 계산해야 하며 이를 위해 embedding 함수를 사용합니다. 이 함수를 사용하여 Vertex AI 통합을 통해 제품 설명을 기반으로 벡터 데이터를 계산하고 표에 추가합니다. 사용된 기술에 관한 자세한 내용은 문서를 참고하세요.

임베딩 열 만들기

psql을 사용하여 데이터베이스에 연결하고 cymbal_products 테이블의 임베딩 함수를 사용하여 벡터 데이터로 가상 열을 만듭니다. 임베딩 함수는 product_description 열에서 제공된 데이터를 기반으로 Vertex AI의 벡터 데이터를 반환합니다.

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"

데이터베이스에 연결한 후 psql 세션에서 다음을 실행합니다.

ALTER TABLE cymbal_products ADD COLUMN embedding vector(768) GENERATED ALWAYS AS (embedding('text-embedding-005',product_description)) STORED;

이 명령어는 가상 열을 만들고 벡터 데이터로 채웁니다.

예상되는 콘솔 출력:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
psql (13.11 (Debian 13.11-0+deb11u1), server 14.7)
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

quickstart_db=> ALTER TABLE cymbal_products ADD COLUMN embedding vector(768) GENERATED ALWAYS AS (embedding('text-embedding-004',product_description)) STORED;
ALTER TABLE
quickstart_db=>

8. 유사성 검색 실행

이제 설명에 대해 계산된 벡터 값과 요청에 대해 가져온 벡터 값을 기반으로 유사성 검색을 사용하여 검색을 실행할 수 있습니다.

SQL 쿼리는 동일한 psql 명령줄 인터페이스에서 실행하거나, 또는 대안으로 AlloyDB 스튜디오에서 실행할 수 있습니다. 행이 여러 개이고 복잡한 출력은 AlloyDB 스튜디오에서 더 잘 보일 수 있습니다.

AlloyDB Studio에 연결하기

다음 챕터에서는 데이터베이스에 연결해야 하는 모든 SQL 명령어를 AlloyDB 스튜디오에서 실행할 수도 있습니다. 명령어를 실행하려면 기본 인스턴스를 클릭하여 AlloyDB 클러스터의 웹 콘솔 인터페이스를 열어야 합니다.

그런 다음 왼쪽에서 AlloyDB 스튜디오를 클릭합니다.

quickstart_db 데이터베이스, 사용자 postgres를 선택하고 클러스터를 만들 때 기록한 비밀번호를 입력합니다. 그런 다음 '인증' 버튼을 클릭합니다.

AlloyDB Studio 인터페이스가 열립니다. 데이터베이스에서 명령어를 실행하려면 오른쪽의 '편집기 1' 탭을 클릭합니다.

SQL 명령어를 실행할 수 있는 인터페이스가 열립니다.

명령줄 psql을 사용하려면 이전 장에서 설명한 것처럼 대체 경로를 따라 VM SSH 세션에서 데이터베이스에 연결합니다.

psql에서 유사성 검색 실행

데이터베이스 세션의 연결이 끊어진 경우 psql 또는 AlloyDB Studio를 사용하여 데이터베이스에 다시 연결합니다.

데이터베이스에 연결합니다.

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"

쿼리를 실행하여 고객의 요청과 가장 밀접하게 관련된 사용 가능한 제품 목록을 가져옵니다. 벡터 값을 가져오기 위해 Vertex AI에 전달할 요청은 '여기에서 잘 자라는 과일 나무는 어떤 종류인가요?'와 같습니다.

다음은 요청에 가장 적합한 첫 10개 항목을 선택하는 데 사용할 수 있는 쿼리입니다.

SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (cp.embedding <=> embedding('text-embedding-005','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 10;

예상되는 출력은 다음과 같습니다.

quickstart_db=> SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (cp.embedding <=> embedding('text-embedding-004','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 10;
      product_name       |                                   description                                    | sale_price | zip_code |      distance       
-------------------------+----------------------------------------------------------------------------------+------------+----------+---------------------
 Cherry Tree             | This is a beautiful cherry tree that will produce delicious cherries. It is an d |      75.00 |    93230 | 0.43922018972266397
 Meyer Lemon Tree        | Meyer Lemon trees are California's favorite lemon tree! Grow your own lemons by  |         34 |    93230 |  0.4685112926118228
 Toyon                   | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e |      10.00 |    93230 |  0.4835677149651668
 California Lilac        | This is a beautiful lilac tree that can grow to be over 10 feet tall. It is an d |       5.00 |    93230 |  0.4947204525907498
 California Peppertree   | This is a beautiful peppertree that can grow to be over 30 feet tall. It is an e |      25.00 |    93230 |  0.5054166905547247
 California Black Walnut | This is a beautiful walnut tree that can grow to be over 80 feet tall. It is a d |     100.00 |    93230 |  0.5084219510932597
 California Sycamore     | This is a beautiful sycamore tree that can grow to be over 100 feet tall. It is  |     300.00 |    93230 |  0.5140519790508755
 Coast Live Oak          | This is a beautiful oak tree that can grow to be over 100 feet tall. It is an ev |     500.00 |    93230 |  0.5143126438081371
 Fremont Cottonwood      | This is a beautiful cottonwood tree that can grow to be over 100 feet tall. It i |     200.00 |    93230 |  0.5174774727252058
 Madrone                 | This is a beautiful madrona tree that can grow to be over 80 feet tall. It is an |      50.00 |    93230 |  0.5227400803389093

9. 대응 개선

쿼리 결과를 사용하여 클라이언트 애플리케이션에 대한 응답을 개선하고 제공된 쿼리 결과를 Vertex AI 생성형 기반 언어 모델에 대한 프롬프트의 일부로 사용하여 의미 있는 출력을 준비할 수 있습니다.

이를 위해 벡터 검색 결과로 JSON을 생성한 다음 생성된 JSON을 Vertex AI의 텍스트 LLM 모델 프롬프트에 추가하여 의미 있는 출력을 생성할 계획입니다. 첫 번째 단계에서는 JSON을 생성한 다음 Vertex AI 스튜디오에서 테스트하고 마지막 단계에서는 애플리케이션에서 사용할 수 있는 SQL 문에 JSON을 통합합니다.

JSON 형식으로 출력 생성

JSON 형식으로 출력을 생성하고 Vertex AI에 전달할 행 하나만 반환하도록 쿼리를 수정합니다.

다음은 쿼리의 예입니다.

WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.embedding <=> embedding('text-embedding-005','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1)
SELECT json_agg(trees) FROM trees;

다음은 예상되는 출력의 JSON입니다.

[{"product_name":"Cherry Tree","description":"This is a beautiful cherry tree that will produce delicious cherries. It is an d","sale_price":75.00,"zip_code":93230,"product_id":"d536e9e823296a2eba198e52dd23e712"}]

Vertex AI Studio에서 프롬프트 실행

생성된 JSON을 사용하여 Vertex AI Studio의 생성형 AI 텍스트 모델에 프롬프트의 일부로 제공할 수 있습니다.

Cloud 콘솔에서 Vertex AI Studio를 엽니다.

'동의 및 계속' 버튼을 누릅니다.

인터페이스 하단에 프롬프트를 작성합니다.

추가 API를 사용 설정하라는 메시지가 표시될 수 있지만 이 요청은 무시해도 됩니다. 실습을 완료하는 데는 추가 API가 필요하지 않습니다.

다음은 나무에 관한 초기 쿼리의 JSON 출력과 함께 사용할 프롬프트입니다.

고객의 필요에 따라 제품을 찾는 데 도움을 주는 친절한 상담사입니다.

클라이언트 요청에 따라 검색과 밀접한 관련이 있는 제품 목록을 로드했습니다.

{"product_name":"name","description":"some description","sale_price":10,"zip_code": 10234, "produt_id": "02056727942aeb714dc9a2313654e1b0"}과 같은 값 목록이 포함된 JSON 형식의 목록

다음은 제품 목록입니다.

{"product_name":"Cherry Tree","description":"This is a beautiful cherry tree that will produce delicious cherries. d","sale_price":75.00,"zip_code":93230,"product_id":"d536e9e823296a2eba198e52dd23e712"}

고객이 '여기에서 잘 자라는 나무는 무엇인가요?'라고 물었습니다.

제품, 가격, 일부 보충 정보에 관한 정보를 제공해야 합니다.'라는 프롬프트가 표시됩니다.

JSON 값을 사용하여 gemini-2.0-flash-001 모델을 사용해 프롬프트를 실행하면 다음과 같은 결과가 표시됩니다.

이 예시에서 모델로부터 받은 답변은 다음과 같습니다. 시간 경과에 따라 모델과 매개변수가 변경되므로 답변이 다를 수 있습니다.

"제가 도와드리겠습니다. 제한된 제품 목록을 토대로 보면 Cherry Tree가 좋은 선택일 수 있습니다.

확인한 내용은 다음과 같습니다.

제품: 체리 나무

설명: '맛있는 체리를 생산하는 아름다운 벚나무입니다. d" (불행히도 설명이 불완전함)

가격: $75.00

우편번호: 93230 (거주 지역에서 잘 자라는지 파악하는 데 중요합니다.) "

PSQL에서 프롬프트 실행

AlloyDB AI와 Vertex AI의 통합을 사용하여 데이터베이스에서 직접 SQL을 사용하여 생성형 모델에서 동일한 응답을 가져올 수 있습니다. 하지만 gemini-1.5-flash 모델을 사용하려면 먼저 등록해야 합니다.

google_ml_integration 확장 프로그램을 확인합니다. 버전 1.4.2 이상이어야 합니다.

앞에서 설명한 대로 psql에서 quickstart_db 데이터베이스에 연결하거나 AlloyDB 스튜디오를 사용하고 다음을 실행합니다.

SELECT extversion from pg_extension where extname='google_ml_integration';

google_ml_integration.enable_model_support 데이터베이스 플래그를 확인합니다.

show google_ml_integration.enable_model_support;

psql 세션의 예상 출력은 'on'입니다.

postgres=> show google_ml_integration.enable_model_support;
 google_ml_integration.enable_model_support 
--------------------------------------------
 on
(1 row)

'off'라고 표시되면 google_ml_integration.enable_model_support 데이터베이스 플래그를 'on'으로 설정해야 합니다. 이를 수행하려면 AlloyDB 웹 콘솔 인터페이스를 사용하거나 다음 gcloud 명령어를 실행하면 됩니다.

PROJECT_ID=$(gcloud config get-value project)
REGION=us-central1
ADBCLUSTER=alloydb-aip-01
gcloud beta alloydb instances update $ADBCLUSTER-pr \
  --database-flags google_ml_integration.enable_model_support=on \
  --region=$REGION \
  --cluster=$ADBCLUSTER \
  --project=$PROJECT_ID \
  --update-mode=FORCE_APPLY

이 명령어는 백그라운드에서 실행하는 데 약 3~5분 정도 걸립니다. 그런 다음 신고를 다시 확인할 수 있습니다.

이제 두 모델을 등록해야 합니다. 첫 번째는 이미 사용된 text-embedding-005 모델입니다. 모델 등록 기능을 사용 설정했으므로 등록해야 합니다.

모델을 등록하려면 psql 또는 AlloyDB 스튜디오에서 다음 코드를 실행합니다.

CALL
  google_ml.create_model(
    model_id => 'text-embedding-005',
    model_provider => 'google',
    model_qualified_name => 'text-embedding-005',
    model_type => 'text_embedding',
    model_auth_type => 'alloydb_service_agent_iam',
    model_in_transform_fn => 'google_ml.vertexai_text_embedding_input_transform',
    model_out_transform_fn => 'google_ml.vertexai_text_embedding_output_transform');

다음으로 등록해야 하는 모델은 사용자 친화적인 출력을 생성하는 데 사용되는 gemini-2.0-flash-001입니다.

CALL
  google_ml.create_model(
    model_id => 'gemini-2.0-flash-001',
    model_request_url => 'publishers/google/models/gemini-2.0-flash-001:streamGenerateContent',
    model_provider => 'google',
    model_auth_type => 'alloydb_service_agent_iam');

google_ml.model_info_view에서 정보를 선택하여 언제든지 등록된 모델 목록을 확인할 수 있습니다.

select model_id,model_type from google_ml.model_info_view;

다음은 샘플 출력입니다.

quickstart_db=> select model_id,model_type from google_ml.model_info_view;
        model_id         |   model_type   
-------------------------+----------------
 textembedding-gecko     | text_embedding
 textembedding-gecko@001 | text_embedding
 text-embedding-005      | text_embedding
 gemini-2.0-flash-001    | generic
(4 rows)

이제 하위 쿼리 JSON에서 생성된 내용을 사용하여 SQL을 사용하여 생성형 AI 텍스트 모델에 프롬프트의 일부로 제공할 수 있습니다.

데이터베이스에 대한 psql 또는 AlloyDB 스튜디오 세션에서 쿼리를 실행합니다.

WITH trees AS (
SELECT
        cp.product_name,
        cp.product_description AS description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id AS product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci ON
        ci.uniq_id = cp.uniq_id
JOIN cymbal_stores cs ON
        cs.store_id = ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.embedding <=> embedding('text-embedding-005',
        'What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1),
prompt AS (
SELECT
        'You are a friendly advisor helping to find a product based on the customer''s needs.
Based on the client request we have loaded a list of products closely related to search.
The list in JSON format with list of values like {"product_name":"name","product_description":"some description","sale_price":10}
Here is the list of products:' || json_agg(trees) || 'The customer asked "What kind of fruit trees grow well here?"
You should give information about the product, price and some supplemental information' AS prompt_text
FROM
        trees),
response AS (
SELECT
        json_array_elements(google_ml.predict_row( model_id =>'gemini-2.0-flash-001',
        request_body => json_build_object('contents',
        json_build_object('role',
        'user',
        'parts',
        json_build_object('text',
        prompt_text)))))->'candidates'->0->'content'->'parts'->0->'text' AS resp
FROM
        prompt)
SELECT
        string_agg(resp::text,
        ' ')
FROM
        response;

예상되는 출력은 다음과 같습니다. 모델 버전 및 매개변수에 따라 출력이 다를 수 있습니다.

"Okay" ", based on" " the product list, the \"Cherry Tree\" seems like a potential option for you.\n\n" "* **Product:** Cherry Tree\n* **Description:** It's a beautiful" " deciduous tree that grows to about 15 feet tall. You'll get dark green leaves in the summer that turn red in the fall. These trees are known for" " their beauty, shade, and privacy. Plus, you'll get delicious cherries!\n* **Growing Conditions:** Cherry trees prefer a cool, moist climate" " and sandy soil.\n* **USDA Zones:** They are best suited for USDA zones 4-9. (You may want to confirm that zone 4-9 is appropriate for your location.)\n* **Price:** \\$" "75.00\n\n**To make sure this is the *best* fit for you, could you tell me:**\n\n1. **Your Zip Code:** While the product lists zip code 93230, I" " would like to confirm where you are to verify that the USDA zone is a match for your area.\n2. **What kind of soil do you have?** The product description says that cherry trees prefer sandy soil.\n\nOnce I have this information, I can give you a more confident recommendation!\n"

10. 벡터 색인 만들기

데이터 세트는 매우 작으며 응답 시간은 주로 AI 모델과의 상호작용에 따라 달라집니다. 하지만 벡터가 수백만 개 있는 경우 벡터 검색 부분이 응답 시간의 상당 부분을 차지하고 시스템에 높은 부하를 줄 수 있습니다. 이를 개선하기 위해 벡터 위에 색인을 빌드할 수 있습니다.

ScaNN 색인 만들기

SCANN 색인을 빌드하려면 확장 프로그램을 하나 더 사용 설정해야 합니다. 확장 프로그램 alloydb_scann은 Google ScaNN 알고리즘을 사용하여 ANN 유형 벡터 색인을 사용할 수 있는 인터페이스를 제공합니다.

CREATE EXTENSION IF NOT EXISTS alloydb_scann;

예상 출력:

quickstart_db=> CREATE EXTENSION IF NOT EXISTS alloydb_scann;
CREATE EXTENSION
Time: 27.468 ms
quickstart_db=>

이제 색인을 만들 수 있습니다. 다음 예에서는 대부분의 매개변수를 기본값으로 두고 색인의 파티션 수 (num_leaves)만 제공합니다.

CREATE INDEX cymbal_products_embeddings_scann ON cymbal_products
  USING scann (embedding cosine)
  WITH (num_leaves=31, max_num_levels = 2);

색인 매개변수 조정에 관한 자세한 내용은 문서를 참고하세요.

예상 출력:

quickstart_db=> CREATE INDEX cymbal_products_embeddings_scann ON cymbal_products
  USING scann (embedding cosine)
  WITH (num_leaves=31, max_num_levels = 2);
CREATE INDEX
quickstart_db=>

응답 비교

이제 EXPLAIN 모드에서 벡터 검색 쿼리를 실행하고 색인이 사용되었는지 확인할 수 있습니다.

EXPLAIN (analyze) 
WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.embedding <=> embedding('text-embedding-005','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1)
SELECT json_agg(trees) FROM trees;

예상 출력:

Aggregate (cost=16.59..16.60 rows=1 width=32) (actual time=2.875..2.877 rows=1 loops=1)
-> Subquery Scan on trees (cost=8.42..16.59 rows=1 width=142) (actual time=2.860..2.862 rows=1 loops=1)
-> Limit (cost=8.42..16.58 rows=1 width=158) (actual time=2.855..2.856 rows=1 loops=1)
-> Nested Loop (cost=8.42..6489.19 rows=794 width=158) (actual time=2.854..2.855 rows=1 loops=1)
-> Nested Loop (cost=8.13..6466.99 rows=794 width=938) (actual time=2.742..2.743 rows=1 loops=1)
-> Index Scan using cymbal_products_embeddings_scann on cymbal_products cp (cost=7.71..111.99 rows=876 width=934) (actual time=2.724..2.724 rows=1 loops=1)
Order By: (embedding <=> '[0.008864171,0.03693164,-0.024245683,-0.00355923,0.0055611245,0.015985578,...<redacted>...5685,-0.03914233,-0.018452475,0.00826032,-0.07372604]'::vector)
-> Index Scan using walmart_inventory_pkey on cymbal_inventory ci (cost=0.42..7.26 rows=1 width=37) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: ((store_id = 1583) AND (uniq_id = (cp.uniq_id)::text))

출력에서 쿼리가 'cymbal_products에서 cymbal_products_embeddings_scann을 사용하여 색인 스캔'을 사용하고 있음을 명확하게 알 수 있습니다.

explain 없이 쿼리를 실행하면 다음과 같이 표시됩니다.

WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.embedding <=> embedding('text-embedding-005','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1)
SELECT json_agg(trees) FROM trees;

예상 출력:

[{"product_name":"Meyer Lemon Tree","description":"Meyer Lemon trees are California's favorite lemon tree! Grow your own lemons by ","sale_price":34,"zip_code":93230,"product_id":"02056727942aeb714dc9a2313654e1b0"}]

결과가 약간 다르며 색인이 없는 검색에서 최상위에 표시되었던 체리 나무가 아닌 두 번째 선택사항인 마이어 레몬 나무가 반환됩니다. 따라서 색인은 성능을 제공하지만 좋은 결과를 제공하기에 충분히 정확합니다.

문서 페이지에서 벡터에 사용할 수 있는 다양한 색인과 langchain 통합을 사용한 실험실 및 예시를 더 많이 사용해 볼 수 있습니다.

11. 환경 정리

실습을 마치면 AlloyDB 인스턴스와 클러스터를 폐기합니다.

AlloyDB 클러스터 및 모든 인스턴스 삭제

클러스터는 옵션 강제로 폐기되며, 클러스터에 속한 모든 인스턴스도 삭제됩니다.

연결이 끊어지고 이전 설정이 모두 손실된 경우 Cloud Shell에서 프로젝트와 환경 변수를 정의합니다.

gcloud config set project <your project id>

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)

다음과 같이 클러스터를 삭제합니다.

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force
All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.

AlloyDB 백업 삭제

클러스터의 모든 AlloyDB 백업을 삭제합니다.

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.

이제 VM을 폐기할 수 있습니다.

GCE VM 삭제

Cloud Shell에서 다음을 실행합니다.

export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet

예상되는 콘솔 출력:

student@cloudshell:~ (test-project-001-402417)$ export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet
Deleted

12. 축하합니다

축하합니다. Codelab을 완료했습니다.

학습한 내용

AlloyDB 클러스터 및 기본 인스턴스를 배포하는 방법
Google Compute Engine VM에서 AlloyDB에 연결하는 방법
데이터베이스를 만들고 AlloyDB AI를 사용 설정하는 방법
데이터베이스에 데이터를 로드하는 방법
AlloyDB에서 Vertex AI 임베딩 모델을 사용하는 방법
Vertex AI 생성형 모델을 사용하여 결과를 보강하는 방법
벡터 색인을 사용하여 성능을 개선하는 방법

13. 설문조사

결과:

본 튜토리얼을 어떻게 사용하실 계획인가요?

읽기만 할 계획입니다.읽은 다음 연습 활동을 완료할 계획입니다.

오류 신고