Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

AlloyDB でハイブリッド検索を使ってみる

1. はじめに

この Codelab では、（ランキング更新メソッド）RUM 拡張機能とスケーラブルな近似最近傍（ScaNN）インデックスを使用して AlloyDB でハイブリッド検索を実行する方法を学びます。このラボは、AlloyDB AI 機能専用のラボコレクションの一部です。詳細については、ドキュメントの AlloyDB AI ページをご覧ください。

前提条件

Google Cloud コンソールの基本的な知識
コマンドラインインターフェースと Google Shell の基本的なスキル

学習内容

AlloyDB クラスタとプライマリインスタンスをデプロイする方法
Google Compute Engine VM から AlloyDB に接続する方法
データベースを作成して AlloyDB AI を有効にする方法
データベースにデータを読み込む方法
AlloyDB Studio の使用方法
Vertex AI でエンベディングを生成する
ScaNN ベクトルインデックスを作成してベクトル検索を高速化する方法
Elasticsearch の外部データラッパー（FDW）を作成する方法
AlloyDB のセマンティック検索と Elastic の全文検索を組み合わせてハイブリッド検索を行います。

必要なもの

Google Cloud アカウントと Google Cloud プロジェクト
ウェブブラウザ（Chrome など）

2. 設定と要件

プロジェクトのセットアップ

Google Cloud コンソールにログインします。Gmail アカウントも Google Workspace アカウントもまだお持ちでない場合は、アカウントを作成してください。

仕事用または学校用のアカウントではなく、個人用のアカウントを使用します。

Google Cloud プロジェクトの作成

Google Cloud コンソールのプロジェクトセレクタページで、Google Cloud プロジェクトを選択または作成します。
Cloud プロジェクトに対して課金が有効になっていることを確認します。プロジェクトで課金が有効になっているかどうかを確認する方法をご覧ください。

課金を有効にする

課金を有効にするには、次の 2 つの方法があります。個人用の請求先アカウントを使用するか、次の手順でクレジットを利用できます。

個人用の請求先アカウントを設定する

Google Cloud クレジットを使用して課金を設定した場合は、この手順をスキップできます。

個人用の請求先アカウントを設定するには、Cloud コンソールでこちらに移動して課金を有効にします。

注意事項:

このラボを完了するのにかかる Cloud リソースの費用は 3 米ドル未満です。
このラボの最後の手順に沿ってリソースを削除すると、それ以上の料金は発生しません。
新規ユーザーは、300 米ドル分の無料トライアルをご利用いただけます。

Cloud Shell の起動

Google Cloud はノートパソコンからリモートで操作できますが、この Codelab では、Google Cloud Shell（Cloud 上で動作するコマンドライン環境）を使用します。

Cloud Shell は、必要なツールがプリロードされた Google Cloud で動作するコマンドライン環境です。

Google Cloud コンソールの上部にある [Cloud Shell をアクティブにする] をクリックします。
Cloud Shell に接続したら、認証を確認します。
```
gcloud auth list
```
プロジェクトが構成されていることを確認します。
```
gcloud config get project
```
プロジェクトが想定どおりに設定されていない場合は、設定します。
```
export PROJECT_ID=<YOUR_PROJECT_ID>
gcloud config set project $PROJECT_ID
```

この仮想マシンには、必要な開発ツールがすべて用意されています。永続的なホームディレクトリが 5 GB 用意されており、Google Cloud で稼働します。そのため、ネットワークのパフォーマンスと認証機能が大幅に向上しています。この Codelab での作業はすべて、ブラウザ内から実行できます。インストールは不要です。

3. はじめに

API を有効にする

出力:

AlloyDB、Compute Engine、ネットワーキングサービス、Vertex AI を使用するには、Google Cloud プロジェクトでそれぞれの API を有効にする必要があります。

API の有効化

Cloud Shell のターミナルで、プロジェクト ID が設定されていることを確認します。

gcloud config set project [YOUR-PROJECT-ID]

環境変数 PROJECT_ID を設定します。

PROJECT_ID=$(gcloud config get-value project)

必要な API をすべて有効にします。

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com \
                       secretmanager.googleapis.com

想定される出力

student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-14650]
student@cloudshell:~ (test-project-001-402417)$ 
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com \
                       secretmanager.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.

API の概要

AlloyDB API（alloydb.googleapis.com）を使用すると、AlloyDB for PostgreSQL クラスタの作成、管理、スケーリングを行うことができます。要求の厳しいエンタープライズトランザクションワークロードと分析ワークロード用に設計された、PostgreSQL 互換のフルマネージドデータベースサービスを提供します。
Compute Engine API（compute.googleapis.com）を使用すると、仮想マシン（VM）、永続ディスク、ネットワーク設定を作成して管理できます。これは、ワークロードの実行と、多くのマネージドサービスの基盤となるインフラストラクチャのホストに必要な、Infrastructure-as-a-Service（IaaS）の基盤となるものです。
Cloud Resource Manager API（cloudresourcemanager.googleapis.com）を使用すると、Google Cloud プロジェクトのメタデータと構成をプログラムで管理できます。これにより、リソースの整理、Identity and Access Management（IAM）ポリシーの処理、プロジェクト階層全体での権限の検証が可能になります。
Service Networking API（servicenetworking.googleapis.com）を使用すると、Virtual Private Cloud（VPC）ネットワークと Google のマネージドサービス間のプライベート接続の設定を自動化できます。AlloyDB などのサービスが他のリソースと安全に通信できるように、プライベート IP アクセスを確立するために必要です。
Vertex AI API（aiplatform.googleapis.com）を使用すると、アプリケーションで ML モデルを構築、デプロイ、スケーリングできます。これは、生成 AI モデル（Gemini など）へのアクセスやカスタムモデルのトレーニングなど、Google Cloud のすべての AI サービスに統合インターフェースを提供します。
Secret Manager API（secretmanager.googleapis.com）は、API キー、ユーザー名、パスワード、証明書などの機密データを保存して管理できるシークレットと認証情報の管理サービスです。

必要に応じて、Vertex AI エンベディングモデルを使用するようにデフォルトのリージョンを構成できます。Vertex AI で使用可能なロケーションの詳細を確認する。この例では、us-central1 リージョンを使用しています。

gcloud config set compute/region us-central1

4. AlloyDB をデプロイする

AlloyDB クラスタを作成する前に、将来の AlloyDB インスタンスで使用する VPC で使用可能なプライベート IP 範囲が必要です。ない場合は、作成して内部の Google サービスで使用されるように割り当てる必要があります。その後、クラスタとインスタンスを作成できます。

プライベート IP 範囲を作成する

AlloyDB の VPC でプライベートサービスアクセス構成を構成する必要があります。ここでは、プロジェクトに「デフォルト」の VPC ネットワークがあり、すべてのアクションで使用されることを前提としています。

プライベート IP 範囲を作成します。

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default

割り振られた IP 範囲を使用してプライベート接続を作成します。

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default \
    --export-custom-routes

想定されるコンソール出力:

student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range].

student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project-402417)$

AlloyDB クラスタを作成する

このセクションでは、us-central1 リージョンに AlloyDB クラスタを作成します。

postgres ユーザーのパスワードを定義します。独自のパスワードを定義することも、ランダム関数を使用してパスワードを生成することもできます。

export PGPASSWORD=`openssl rand -hex 12`

想定されるコンソール出力:

student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`

後で使用できるように PostgreSQL のパスワードをメモしておきます。

echo $PGPASSWORD

このパスワードは、後で postgres ユーザーとしてインスタンスに接続するために必要になります。安全な場所（パスワードマネージャーなど）にコピーすることをおすすめします。

想定されるコンソール出力:

student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD
<generated password>

AlloyDB クラスタを作成する

リージョンと AlloyDB クラスタ名を定義します。ここでは、us-central1 リージョンと alloydb-hybrid-search をクラスタ名として使用します。

export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search

コマンドを実行してクラスタを作成します。

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

想定されるコンソール出力:

export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.

同じ Cloud Shell セッションで、クラスタの AlloyDB プライマリインスタンスを作成します。切断された場合は、リージョンとクラスタ名の環境変数を再度定義する必要があります。

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

想定されるコンソール出力:

student@cloudshell:~ (alloydb-hybrid-search)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.

5. AlloyDB に接続する

AlloyDB はプライベート接続のみを使用してデプロイされるため、データベースを操作するには PostgreSQL クライアントがインストールされた VM が必要です。この VM を使用して Elasticsearch インスタンスも実行します。

GCE VM をデプロイする

AlloyDB クラスタと同じリージョンと VPC に GCE VM を作成し、ブートディスクが Elastic を実行するのに十分な大きさであることを確認します。ここでは、--create-disk フラグで 20 GB のブートディスクを指定します。

Cloud Shell で、次のコマンドを実行します。

export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,size=20,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform

想定されるコンソール出力:

student@cloudshell:~ (alloydb-hybrid-search)$ export ZONE=us-central1-a
student@cloudshell:~ (talloydb-hybrid-search)$ export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --create-disk=auto-delete=yes,boot=yes,image=projects/debian-cloud/global/images/$(gcloud compute images list --filter="family=debian-12 AND family!=debian-12-arm64" --format="value(name)") \
    --scopes=https://www.googleapis.com/auth/cloud-platform

Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/zones/us-central1-a/instances/instance-1].
NAME: instance-1
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE:
INTERNAL_IP: 10.128.0.2
EXTERNAL_IP: 34.71.192.233
STATUS: RUNNING

Postgres クライアントをインストールする

デプロイされた VM に PostgreSQL クライアントソフトウェアをインストールします。

VM に接続します。

gcloud compute ssh instance-1 --zone=us-central1-a

想定されるコンソール出力:

student@cloudshell:~ (alloydb-hybrid-search)$ gcloud compute ssh instance-1 --zone=us-central1-a
Updating project ssh metadata...working..Updated [https://www.googleapis.com/compute/v1/projects/alloydb-hybrid-search].                                                                                                                                                         
Updating project ssh metadata...done.                                                                                                                                                                                                                                              
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5110295539541121102' (ECDSA) to the list of known hosts.
Linux instance-1.us-central1-a.c.gleb-test-short-001-418811.internal 6.1.0-18-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$

VM 内でソフトウェア実行コマンドをインストールします。

sudo apt-get update
sudo apt-get install --yes postgresql-client

想定されるコンソール出力:

student@instance-1:~$ sudo apt-get update
sudo apt-get install --yes postgresql-client
Get:1 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable InRelease [5146 B]
Get:2 https://packages.cloud.google.com/apt cloud-sdk-bullseye InRelease [6406 B]   
Hit:3 https://deb.debian.org/debian bullseye InRelease  
Get:4 https://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:5 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable/main amd64 Packages [1930 B]
Get:6 https://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:7 https://deb.debian.org/debian bullseye-backports InRelease [49.0 kB]
...redacted...
update-alternatives: using /usr/share/postgresql/13/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (13+225) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.31-13+deb11u7) ...

インスタンスに接続する

psql を使用して VM からプライマリインスタンスに接続します。

インスタンス 1 VM への SSH セッションが開いている Cloud Shell の同じタブで、次の操作を行います。

メモした AlloyDB パスワード（PGPASSWORD）の値と AlloyDB クラスタ ID を使用して、GCE VM から AlloyDB に接続します。

export PGPASSWORD=<Noted password>

export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search
export INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
psql "host=$INSTANCE_IP user=postgres sslmode=require"

想定されるコンソール出力:

student@instance-1:~$ export PGPASSWORD=<noted password>
student@instance-1:~$ ADBCLUSTER=alloydb-aip-01
student@instance-1:~$ REGION=us-central1
student@instance-1:~$ INSTANCE_IP=$(gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="value(ipAddress)")
gleb@instance-1:~$ psql "host=$INSTANCE_IP user=postgres sslmode=require"
psql (15.6 (Debian 15.6-0+deb12u1), server 15.5)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=>

psql セッションを閉じます。

exit

6. データベースを準備する

データベースを作成し、Vertex AI インテグレーションを有効にして、データベースオブジェクトを作成し、データをインポートする必要があります。

AlloyDB に必要な権限を付与する

AlloyDB サービスエージェントに Vertex AI 権限を追加します。

上部の「+」記号を選択して、別の Cloud Shell タブを開きます。

新しい Cloud Shell タブで、次のコマンドを実行します。

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

想定されるコンソール出力:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1

[X] をクリックするか、次のコマンドを実行してタブを閉じます。

exit

データベースを作成する

quickstart という名前のデータベースを作成します。

GCE VM セッションで、次のコマンドを実行します。

データベースを作成します。

psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"

想定されるコンソール出力:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"
CREATE DATABASE
student@instance-1:~$

Vertex AI の統合を有効にする

Vertex AI の統合とデータベースの pgvector 拡張機能を有効にします。

GCE VM で次のコマンドを実行します。

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"

想定されるコンソール出力:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"
CREATE EXTENSION
CREATE EXTENSION
student@instance-1:~$

データをインポート

準備したデータをダウンロードして、新しいデータベースにインポートします。

GCE VM で次のコマンドを実行します。

gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"

想定されるコンソール出力:

student@instance-1:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
ALTER TABLE
ALTER TABLE
student@instance-1:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
COPY 941
student@instance-1:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
COPY 263861
student@instance-1:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"
COPY 4654
student@instance-1:~$

次に、必要なデータベースフラグを設定します。ウェブコンソールを使用してプライマリインスタンスでフラグを管理するか、次のように gcloud コマンドを使用できます。

export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search
gcloud beta alloydb instances update $ADBCLUSTER-pr \
   --database-flags google_ml_integration.enable_faster_embedding_generation=on,scann.enable_preview_features=on,google_ml_integration.enable_preview_ai_functions=on,google_ml_integration.enable_ai_query_engine=on \
   --region=$REGION \
   --cluster=$ADBCLUSTER \
   --project=$PROJECT_ID \
   --update-mode=FORCE_APPLY

想定されるコンソール出力

export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search
gcloud beta alloydb instances update $ADBCLUSTER-pr \
   --database-flags google_ml_integration.enable_faster_embedding_generation=on,scann.enable_preview_features=on,google_ml_integration.enable_preview_ai_functions=on,google_ml_integration.enable_ai_query_engine=on \
   --region=$REGION \
   --cluster=$ADBCLUSTER \
   --project=$PROJECT_ID \
   --update-mode=FORCE_APPLY
Your active configuration is: [cloudshell-724]
Operation ID: operation-1775159889986-64e7f9ea9858a-b031e866-4c7c36df

データベースフラグを有効にするには、インスタンスの再起動が必要で、数分かかります。完了すると、AlloyDB インスタンスのステータスが [Ready] になります。

7. ベクトルエンベディングを生成する

データをインポートすると、商品に関する情報を格納する cymbal_products、各店舗の商品の在庫を追跡する cymbal_inventory、店舗のリストである cymbal_stores の各テーブルが作成されます。商品に対してセマンティック検索を実行するには、initialize_embeddings 関数を使用して商品説明のベクトルエンベディングを生成する必要があります。Vertex AI のインテグレーションを使用して、商品説明に基づいてベクトルデータを計算し、テーブルに追加します。使用されているテクノロジーの詳細については、ドキュメントをご覧ください。

統合を使用するには、AlloyDB Studio を使用してデータベースに接続するか、AlloyDB インスタンスの IP と postgres パスワードを使用して VM から psql を使用して接続します。

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"

google_ml_integration 拡張機能のバージョンを確認します。

SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';

バージョンは 1.5.2 以降である必要があります。出力例を次に示します。

quickstart_db=> SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';
 extversion
------------
 1.5.2
(1 row)

デフォルトのバージョンは 1.5.2 以降ですが、インスタンスに古いバージョンが表示されている場合は、更新が必要になる可能性があります。インスタンスのメンテナンスが無効になっているかどうかを確認します。

ベクトル拡張機能をインストールし、エンベディングを cymbal_products に保存する新しい列を作成する

CREATE EXTENSION IF NOT EXISTS vector;
ALTER TABLE cymbal_products ADD COLUMN product_embedding vector(768);

想定されるコンソール出力:

quickstart_db=> ALTER TABLE cymbal_products ADD COLUMN product_embedding vector(768);
ALTER TABLE
quickstart_db=>

バッチエンベディング生成を使用して効率を向上させます。さまざまなエンベディング生成オプションと手法については、ガイドをご覧ください。以前に goole_ml_integration.enable_faster_embedding_generation フラグを有効にしました。これにより、エンベディング生成をバッチ処理できます。

最後に、関数呼び出しに incremental_refresh_mode 引数を含めることで、列の値が変更されたときにエンベディングが更新されるようにします。これによりデータベースにオーバーヘッドが発生しますが、エンベディングとコンテンツの同期を自動的に維持するためのトレードオフです。エンベディングを手動で更新する場合は、ドキュメントの手順をご覧ください。

すべてをまとめてエンベディングを生成するには、initialize_embeddings 関数を使用し、バッチヒントとして batch_size の 50 を渡し、incremental_refresh_mode を transactional に設定します。

CALL ai.initialize_embeddings(
    model_id => 'text-embedding-005',
    table_name => 'cymbal_products',
    content_column => 'product_description',
    embedding_column => 'product_embedding',
    batch_size => 50,
    incremental_refresh_mode => 'transactional'
);

product_embedding 列に NULL 値を指定してテーブルに新しい行を挿入すると、

INSERT INTO "cymbal_products" ("uniq_id", "crawl_timestamp", "product_url", "product_name", "product_description", "list_price", "sale_price", "brand", "item_number", "gtin", "package_size", "category", "postal_code", "available", "product_embedding") VALUES ('fd604542e04b470f9e6348e640cff794', NOW(), 'https://example.com/new_product', 'New Cymbal Product', 'This is a new cymbal product description.', 199.99, 149.99, 'Example Brand', 'EB123', '1234567890', 'Single', 'Cymbals', '12345', TRUE, NULL);

挿入した行をクエリすると、product_embedding 列が自動的に更新されていることがわかります。

SELECT uniq_id, (product_embedding::real[])[1:5] as product_embedding  FROM cymbal_products WHERE uniq_id='fd604542e04b470f9e6348e640cff794';

出力は次のようになります。

quickstart_db=> SELECT uniq_id,(product_embedding::real[])[1:5] as product_embedding  FROM cymbal_products WHERE uniq_id='fd604542e04b470f9e6348e640cff794';
             uniq_id              |                      product_embedding                       
----------------------------------+---------------------------------------------------------------
 fd604542e04b470f9e6348e640cff794 | {0.015003494,-0.005349732,-0.059790313,-0.0087091,-0.0271452}
(1 row)

Time: 3.295 ms

8. ベクトルインデックスを作成する

ベクトル検索のパフォーマンスを改善するために、ScaNN インデックスを追加します。

ScaNN インデックスを作成する

SCANN インデックスを構築するには、もう 1 つの拡張機能を有効にする必要があります。拡張機能 alloydb_scann は、Google の ScaNN アルゴリズムを使用して ANN タイプのベクトルインデックスを操作するためのインターフェースを提供します。

CREATE EXTENSION IF NOT EXISTS alloydb_scann;

予想される出力:

quickstart_db=> CREATE EXTENSION IF NOT EXISTS alloydb_scann;
CREATE EXTENSION
Time: 27.468 ms
quickstart_db=>

インデックスは MANUAL モードまたは AUTO モードで作成できます。MANUAL モードはデフォルトで有効になっており、他のインデックスと同様にインデックスを作成して維持できます。ただし、AUTO モードを有効にすると、ユーザー側でメンテナンスを必要としないインデックスを作成できます。すべてのオプションの詳細については、ドキュメントをご覧ください。この例では、自動モードでインデックスを作成するのに十分な行がないため、手動で作成し、チューニングパラメータを含めます。インデックスパラメータのチューニングについては、ドキュメントをご覧ください。

CREATE INDEX cymbal_products_embeddings_scann ON cymbal_products
  USING scann (product_embedding cosine)
  WITH (mode='MANUAL', num_leaves=31, max_num_levels = 2);

予想される出力:

quickstart_db=> CREATE INDEX cymbal_products_embeddings_scann ON cymbal_products
  USING scann (product_embedding cosine)
  WITH (num_leaves=31, max_num_levels = 2);
CREATE INDEX
quickstart_db=>

インデックスの使用状況を検査する

これで、EXPLAIN モードでベクトル検索クエリを実行し、インデックスが使用されているかどうかを確認できます。

EXPLAIN (analyze)
WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.product_embedding <=> embedding('text-embedding-005','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1)
SELECT json_agg(trees) FROM trees;

想定される出力（明確にするために記すと秘匿化済み）:

...
Aggregate (cost=16.59..16.60 rows=1 width=32) (actual time=2.875..2.877 rows=1 loops=1)
-> Subquery Scan on trees (cost=8.42..16.59 rows=1 width=142) (actual time=2.860..2.862 rows=1 loops=1)
-> Limit (cost=8.42..16.58 rows=1 width=158) (actual time=2.855..2.856 rows=1 loops=1)
-> Nested Loop (cost=8.42..6489.19 rows=794 width=158) (actual time=2.854..2.855 rows=1 loops=1)
-> Nested Loop (cost=8.13..6466.99 rows=794 width=938) (actual time=2.742..2.743 rows=1 loops=1)
-> Index Scan using cymbal_products_embeddings_scann on cymbal_products cp (cost=7.71..111.99 rows=876 width=934) (actual time=2.724..2.724 rows=1 loops=1)
Order By: (embedding <=> '[0.008864171,0.03693164,-0.024245683,-0.00355923,0.0055611245,0.015985578,...<redacted>...5685,-0.03914233,-0.018452475,0.00826032,-0.07372604]'::vector)
...

出力から、クエリが「cymbal_products の cymbal_products_embeddings_scann を使用したインデックススキャン」を使用していることがわかります。

9. エラスティックインスタンスの作成

Elasticsearch は Elastic NV の商標であり、Google LLC との提携関係はありません。ハイブリッド検索の全文検索（FTS）部分には Elasticsearch を使用します。Elasticsearch インスタンスがある場合は、次の手順をスキップして、AlloyDB が Elasticsearch クラスタへのアクセスに使用できる読み取り専用の個人用/ユーザー API キーを作成できます。それ以外の場合は、前に作成した VM で Elasticsearch インスタンスをスピンアップできます。

VM に SSH 接続して Docker をインストールする

sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/debian/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/debian \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

これで、ユーザーが実行する docker コマンドを変更できます。

sudo usermod -aG docker $USER
newgrp docker

VM ターミナルで次のコマンドを実行して、エラスティックコンテナを作成します。

curl -fsSL https://elastic.co/start-local | sh

想定される出力（秘匿化済み）

🎉 Congrats, Elasticsearch and Kibana are installed and running in Docker!

🌐 Open your browser at http://localhost:5601

   Username: elastic
   Password: [password_value]

🔌 Elasticsearch API endpoint: http://localhost:9200
🔑 API key: [API Key]

Learn more at https://github.com/elastic/start-local

出力をコピーして、次のファイルに貼り付けます。

nano elastic-last-run.txt

Ctrl+O、Enter、Ctrl+X の順に押して保存して終了します。

デフォルトでは、Docker コンテナは http://localhost:9200 をリッスンするため、AlloyDB からの外部リクエストの受信に問題が発生する可能性があります。9200:9200 をリッスンするように docker-compose.yml を構成する必要があります。

スクリプトを実行すると、新しい elastic-start-local ディレクトリが作成されます。このディレクトリに移動して、Dockerfile を編集します。

cd elastic-start-local/

nano docker-compose.yml

上部には、

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:${ES_LOCAL_VERSION}
    container_name: ${ES_LOCAL_CONTAINER_NAME}
    volumes:
      - dev-elasticsearch:/usr/share/elasticsearch/data
    ports:
      - localhost:9200

ports フィールドを次のように変更します。

    ports:
      - 9200:9200

Ctrl + O、Enter、Ctrl + X の順にキーを押して、保存して終了します。スタックを再起動します。

docker compose up -d

次に、Elastic インスタンスに商品の説明と名前を入力します。Cloud Storage から VM に商品 CSV をコピーします。

gcloud storage cp gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv .

想定される出力

gcloud storage cp gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv .
Copying gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv to file://./cymbal_products.csv
  Completed files 1/1 | 1.4MiB/1.4MiB                                                                       

Average throughput: 147.9MiB/s

次に、CSV を抽出し、一括アップロード用にデータを NDJSON にフォーマットする Python スクリプトを作成します。

nano convert.py

ファイルに次の内容を貼り付けます。

import csv
import json

# Configuration
input_file = 'cymbal_products.csv'
output_file = 'products.json'
index_name = 'elasticindexdemo'

def convert():
    try:
        with open(input_file, mode='r', encoding='utf-8') as f_in, \
             open(output_file, mode='w', encoding='utf-8') as f_out:
            
            reader = csv.DictReader(f_in)
            
            count = 0
            for row in reader:
                metadata = {
                    "index": {
                        "_index": index_name,
                        "_id": row['uniq_id'].strip()
                    }
                }
                
                # 2. Data/Source line
                document = {
                    "uniq_id": row['uniq_id'].strip(),
                    "product_name": row['product_name'].strip(),
                    "product_description": row['product_description'].strip()
                }
                
                # Write to file
                f_out.write(json.dumps(metadata) + '\n')
                f_out.write(json.dumps(document) + '\n')
                count += 1
                
            print(f"Success: Processed {count} products.")
            print(f"Output saved to: {output_file}")

    except Exception as e:
        print(f"An error occurred: {e}")

if __name__ == "__main__":
    convert()

ファイルを保存して実行する

python3 convert.py

想定される出力

~$ python3 convert.py
Success: Processed 941 products.
Output saved to: products.json

一括データ読み込みでは、前回の実行ファイルから API キーを取得し、それを使用してインスタンスへの認証を行う必要があります。次に、以前に作成した JSON ファイルが渡されます。本番環境のシナリオでは、Elasticsearch でインデックスとデータ型を定義することをおすすめします。この Codelab のデータセットは小さいため、Elasticsearch にデータを処理させます。まず、ファイルから API キーを取得して変数に保存します。

AUTH_HDR=$(grep "API key" elastic-last-run.txt | sed -e "s/^.*API key:[[:space:]]*/ApiKey /g")

保存した変数を使用して、次の curl コマンドを実行し、JSON からデータをアップロードします。

curl -s -X POST "localhost:9200/elasticindexdemo/_bulk?pretty" \
  -H "Content-Type: application/x-ndjson" \
  -H "Authorization: $AUTH_HDR" \
  --data-binary "@products.json"

想定される出力

~$ curl -s -X POST "localhost:9200/elasticindexdemo/_bulk?pretty" \
  -H "Content-Type: application/x-ndjson" \
  -H "Authorization: $AUTH_HDR" \
  --data-binary "@products.json"
{
  "errors" : false,
  "took" : 1003,
  "items" : [
    {
      "index" : {
        "_index" : "elasticindexdemo",
        "_id" : "a73d5f754f225ecb9fdc64232a57bc37",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
...]
}
Done

最後に、AlloyDB で使用できるように保存するために、Elasticsearch インスタンスの API キーを取得する必要があります。次のコマンドを実行して、API キーの値をコピーします。

cat elastic-last-run.txt

API キーを使用して、シークレットマネージャーにシークレットを作成する必要があります。Cloud Shell で、API キーを使用して次のコマンドを実行します。

echo -n "[YOUR_API_KEY_VALUE]" | \
gcloud secrets create elasticsearch \
    --replication-policy="automatic" \
    --data-file=-

10. AlloyDB での外部データラッパーの作成

Duration 20:00

AlloyDB から Elasticsearch に保存されているデータをクエリするには、Elastic 用の外部データラッパー（FDW）と外部テーブルを作成する必要があります。以前は、Elastic API キーを Secret Manager に保存していました。AlloyDB がシークレットにアクセスできるように、サービスアカウントに必要な権限を付与します。

Cloudshell で、サービスアカウントに elasticsearch シークレットへのアクセス権を付与します。

gcloud secrets add-iam-policy-binding elasticsearch \
    --member="serviceAccount:service-$(gcloud projects describe $(gcloud config get-value project) --format='value(projectNumber)')@gcp-sa-alloydb.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"

想定される出力

gcloud secrets add-iam-policy-binding elasticsearch \
    --member="serviceAccount:service-$(gcloud projects describe $(gcloud config get-value project) --format='value(projectNumber)')@gcp-sa-alloydb.iam.gserviceaccount.com" \
    --role="roles/secretmanager.secretAccessor"
Updated IAM policy for secret [elasticsearch].
bindings:
- members:
  - serviceAccount:service-257907437930@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/secretmanager.secretAccessor
etag: BwZOghJiP5U=
version: 1

AlloyDB クラスタに移動して AlloyDB Studio を開きます（VM から psql を使用して接続することもできます）。postgres ユーザーとして quickstart_db にログインします。

FDW 拡張機能を有効にする

CREATE EXTENSION external_search_fdw;

想定される出力

Statement executed successfully

Elasticsearch にアクセスするには、外部データサーバーを作成します。VM の内部 IP アドレスは、[Compute Engine] > [VM インスタンス] で確認できます。シークレットパスについては、Secret Manager に移動してシークレットを選択します。パスは上部に表示されます。/versions/latest を含めて、シークレットの最新バージョンを取得してください。

CREATE SERVER elastic_demo_server
FOREIGN DATA WRAPPER external_search_fdw
OPTIONS(
    server 'http://[VM INTERNAL IP ADDRESS]:9200',
    search_provider 'elastic',
    auth_method 'ApiKey',
    secret_path '[SECRET PATH]/versions/latest'
);

次に、外部テーブルを定義します。メタデータの後に、以前に読み込んだデータと一致する Elasticsearch フィールドスキーマ定義を指定します。リモートテーブルで、Elastic インデックスの名前を指定します。

CREATE FOREIGN TABLE elasticindexdemo (
    metadata external_search_fdw_schema.OpaqueMetadata,
    uniq_id TEXT,
    product_name TEXT,
    product_description TEXT
)
SERVER elastic_demo_server
OPTIONS(
    remote_table_name 'elasticindexdemo'
);

サーバーのユーザーマッピングを作成する

CREATE USER MAPPING FOR CURRENT_USER SERVER elastic_demo_server;

これで、外部テーブルをテストできるようになりました。

SELECT uniq_id, product_name
FROM elasticindexdemo
ORDER BY metadata <@> 'product_description:lamp' DESC
limit 10;

想定される出力

"uniq_id","product_name"
"59c05332f09426c23d8d005528e3c12e","CVL Mini Vintage Metal Lamp Shade: Black Metal with Punched Pattern"
"c24dd78c0d570105256e1bf1cb2fea9d","Better Homes & Gardens Tapered Drum Shade, White Box Pleat"
"7ba20db2bcfab28f88fc714d73af1eb8","4 Pack E27 Wireless Remote Control Light Lamp Bulb Holder Cap Socket Switch 30m"
"0fad1469ea9dfa80b35cfe5266b8bfe7","Star Projector Lamp, 360 Degree Star Night Light Romantic Room Rotating Cosmos Star Projuctor With USB Cable, Light Lamp Starry Moon Sky Night Projector Kid Bedroom Lamp"
"70b37e483ef3678078236d36954525ce","Lucille 10.5\""h Duck Egg Blue Empire Stitched Lamp Shade 7x10x8"
"b7a4b9151598f4cae7707cbedabe3c1b","10x12x8\"" SLIP UNO FITTER Hardback Shallow Drum Lamp Shade Textured Slate"
"5962cf47b88186eed76d14f6376882df","E27 To E14 Lampshade Lamp Light Shades Socket Reducing Ring Adapter Washer"
"7c54fdebfe0b1dd3f649741b8928a95b","iMounTEK LED Projector Lamp Kids Night Light Star Moon Projection Night Lamp 360\u00b0 Rotation Timer for Children Bedroom"
"4531201095c2653530747e215fcc1435","Home Concept Inc 11 Classics Brass Empire Lamp Shade"
"350527adb4299a015bcce74dee97805e","6 Colors LED Star Projector Lamp 360 Degree Romantic Rotating Night Cosmos Star Sky Moon Projector Kids Sleep Night Light For Children Gift Bedroom Decor"

11. ハイブリッド検索の使用

Duration 15:00

これで、ai.hybrid_search() 関数を使用して、ベクトル検索と全文検索を組み合わせることができます。ハイブリッド検索の詳細については、ドキュメントをご覧ください。ハイブリッド検索を使用する場合、デフォルトでは、クエリ結果は Reciprocal Rank Fusion アルゴリズムを使用して、複数のクエリの結果をランク付けします。まず、ベクトル検索とハイブリッド検索を個別に試して、その違いを分析してみましょう。

次のクエリは、ベクトル検索を実行して、チェリーに類似した商品を検索します。配列は実行する検索のリストを提供します。この例ではベクトル検索のみを使用しますが、後でベクトルと FTS の両方を提供します。

SELECT id, score, cymbal_products.product_name, cymbal_products.product_description
FROM ai.hybrid_search(
  ARRAY[
      '{
        "data_type": "vector",
        "table_name": "cymbal_products",
        "key_column": "uniq_id",
        "vec_column": "product_embedding",
        "distance_operator": "public.<=>",
        "limit": 3,
        "query_vector": "ai.embedding(''text-embedding-005'', ''cherry'')::vector"
      }'::JSONB
  ]
) JOIN cymbal_products ON id = cymbal_products.uniq_id;

出力では、最初の結果は cherry tree ですが、次の 2 つも果樹であることに注意してください。これは、product_description 列でベクトル検索を使用すると、検索条件にセマンティックマッチが見つかるためです。

"id","score","product_name","product_description"
"d536e9e823296a2eba198e52dd23e712","0.01639344262295082","Cherry Tree","This is a beautiful cherry tree that will produce delicious cherries. It is an deciduous tree that grows to be about 15 feet tall. The leaves are dark green in the summer and turn a beautiful red in the fall. Cherry trees are known for their beauty and their ability to provide shade and privacy. Cherry trees prefer a cool, moist climate and sandy soil. They are best suited for USDA zones 4-9."
"b70c44b1a38c0a2329fa583c9109a80f","0.016129032258064516","Peach Tree","This is a beautiful peach tree that will produce delicious peaches. It is an evergreen tree that grows to be about 20 feet tall. The leaves are dark green in the summer and turn a beautiful yellow in the fall. Peach trees are known for their beauty and their ability to provide shade and privacy. Peach trees prefer a cool, moist climate and sandy soil. They are best suited for USDA zones 2-9."
"23e41a71d63d8bbc9bdfa1d118cfddc5","0.015873015873015872","Apple Tree","This is a beautiful apple tree that will produce delicious apples. It is a deciduous tree that grows to be about 30 feet tall. The leaves are dark green in the summer and turn a beautiful red, orange, and yellow in the fall. Apple trees are known for their strength and durability. They are also a popular choice for shade trees. Apple trees prefer a cool, moist climate and loamy soil. They are best suited for USDA zones 4-8."

全文検索を実行するには、次のクエリを実行します。

SELECT id, score, cymbal_products.product_name, cymbal_products.product_description
FROM ai.hybrid_search(
  ARRAY[
      '{
        "limit": 3,
        "data_type": "external_search_fdw",
        "table_name": "elasticindexdemo",
        "key_column": "uniq_id",
        "query_text_input": "product_description:(cherry)"
      }'::JSONB
  ]
) JOIN cymbal_products ON id = cymbal_products.uniq_id;

全文検索では完全一致が使用されるため、結果には商品名に「cherry」という単語が含まれているものが返されます。

"id","score","product_name","product_description"
"d536e9e823296a2eba198e52dd23e712","0.01639344262295082","Cherry Tree","This is a beautiful cherry tree that will produce delicious cherries. It is an deciduous tree that grows to be about 15 feet tall. The leaves are dark green in the summer and turn a beautiful red in the fall. Cherry trees are known for their beauty and their ability to provide shade and privacy. Cherry trees prefer a cool, moist climate and sandy soil. They are best suited for USDA zones 4-9."
"390cf08feac229e7b752709fd1f943b3","0.016129032258064516","Woven Round Placemat, Set of Twelve, Grass","...These placemats are great for special occasions and holidays, but are also perfect to accessorize your everyday place settings.|Measurements. 15-inch round diameter is the perfect size for most table sizes and shapes.|Pop Colors. Choose from 7 pop woven color placemats including: Black, Cherry, Grass, Taupe, Navy, Sun and Graphite."
"2c9aa7ac98c30abf78dd9c62a68a34e6","0.015873015873015872","48 Scented Wax Melts Wax Cubes: Jelly Belly Jelly Beans Candy Bulk Soy Wax Melts For Candle Warmer, Wax Warmers, Wax Melt Warmers In 8 Pack Set","...From These Flavors: Lemon Drop, Mixed Berry Smoothie, Sizzling Cinnamon, Crushed Pineapple, Juicy Pear, Cotton Candy, Toasted Marshmallow, French Vanilla, Watermelon, Red Apple, Very Cherry, Buttered Popcorn..."

セマンティック検索と FTS を組み合わせて、より有意義な結果を得ることができます。たとえば、家よりも高く成長するカリフォルニア産の樹木を検索するとします。クエリを分割して、セマンティックインテントとリテラルマッチングを活用します。ベクトル検索は、正確なキーワードを必要とせずに高さとスケールの概念を理解するため、「家よりも高く成長する木」という説明部分を処理します。一方、全文検索では「California」が厳密なフィルタとして処理され、概念的に類似したものではなく、地理的に完全に一致するものが確実に取得されます。

SELECT id, score, cymbal_products.product_name, cymbal_products.product_description
FROM ai.hybrid_search(
  ARRAY[
    '{
        "data_type": "vector",
        "table_name": "cymbal_products",
        "key_column": "uniq_id",
        "vec_column": "product_embedding",
        "distance_operator": "public.<=>",
        "limit": 3,
        "query_vector": "ai.embedding(''text-embedding-005'', ''tree that can grow taller than a house'')::vector"
      }'::JSONB,
      '{
        "limit": 3,
        "data_type": "external_search_fdw",
        "table_name": "elasticindexdemo",
        "key_column": "uniq_id",
        "query_text_input": "product_description:(California)"
      }'::JSONB
  ]
) JOIN cymbal_products ON id = cymbal_products.uniq_id;

期待される結果:

"id","score","product_name","product_description"
"a589fd36a8a20fd9472d2403d6ed692a","0.00819672631147241","California Redwood","This is a beautiful redwood tree that can grow to be over 300 feet tall. It is an evergreen tree that grows in the coastal forests of California. Redwoods are known for their beauty and their strength. They are best suited for USDA zones 7-10."
"ef9432802da24041594c2cf368dfb4d2","0.008064521129029258","Madrone","This is a beautiful madrona tree that can grow to be over 80 feet tall. It is an evergreen tree that grows in the coastal forests of California. Madronas are known for their beauty and their bark. They are best suited for USDA zones 7-10."
"1360d8642bc218e4ea28e9c32b2e1721","0.007936512936504936","California Sycamore","This is a beautiful sycamore tree that can grow to be over 100 feet tall. It is an deciduous tree that grows in the valleys and foothills of California. California sycamores are known for their beauty and their shade. They are best suited for USDA zones 7-10."

12. 環境をクリーンアップする

ラボの終了時に AlloyDB インスタンスとクラスタを破棄します。

AlloyDB クラスタとすべてのインスタンスを削除する

AlloyDB の試用版を使用した場合。トライアルクラスタを使用して他のラボやリソースをテストする予定がある場合は、トライアルクラスタを削除しないでください。同じプロジェクトに別のトライアルクラスタを作成することはできません。

クラスタは force オプションで破棄され、クラスタに属するすべてのインスタンスも削除されます。

接続が切断され、以前の設定がすべて失われた場合は、Cloud Shell でプロジェクトと環境変数を定義します。

gcloud config set project <your project id>

export REGION=us-central1
export ADBCLUSTER=alloydb-hybrid-search
export PROJECT_ID=$(gcloud config get-value project)

クラスタを削除します。

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

想定されるコンソール出力:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force
All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.

AlloyDB バックアップを削除する

クラスタの AlloyDB バックアップをすべて削除します。

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

想定されるコンソール出力:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.

これで、VM を破棄できます。

GCE VM を削除する

Cloud Shell で、次のコマンドを実行します。

export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet

想定されるコンソール出力:

student@cloudshell:~ (test-project-001-402417)$ export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet
Deleted

13. 完了

以上で、この Codelab は完了です。

学習した内容

AlloyDB クラスタとプライマリインスタンスをデプロイする方法
Google Compute Engine VM から AlloyDB に接続する方法
データベースを作成して AlloyDB AI を有効にする方法
データベースにデータを読み込む方法
AlloyDB Studio の使用方法
Vertex AI でエンベディングを生成する
ScaNN ベクトルインデックスを作成してベクトル検索を高速化する方法
Elasticsearch の外部データラッパー（FDW）を作成する方法
AlloyDB のセマンティック検索と Elastic の全文検索を組み合わせてハイブリッド検索を行います。

次のステップ

AlloyDB のその他の Codelab については、公式の Codelab サイトをご覧ください。

AlloyDB でハイブリッド検索を使ってみる

1. はじめに

前提条件

学習内容

必要なもの

2. 設定と要件

プロジェクトのセットアップ

Google Cloud プロジェクトの作成

課金を有効にする

個人用の請求先アカウントを設定する

Cloud Shell の起動

3. はじめに

API を有効にする

API の有効化

API の概要

4. AlloyDB をデプロイする

プライベート IP 範囲を作成する

AlloyDB クラスタを作成する

AlloyDB クラスタを作成する

5. AlloyDB に接続する

GCE VM をデプロイする

Postgres クライアントをインストールする

インスタンスに接続する

6. データベースを準備する

AlloyDB に必要な権限を付与する

データベースを作成する

Vertex AI の統合を有効にする

データをインポート

7. ベクトル エンベディングを生成する

8. ベクトル インデックスを作成する

ScaNN インデックスを作成する

インデックスの使用状況を検査する

9. エラスティック インスタンスの作成

10. AlloyDB での外部データ ラッパーの作成

11. ハイブリッド検索の使用

12. 環境をクリーンアップする

AlloyDB クラスタとすべてのインスタンスを削除する

AlloyDB バックアップを削除する

GCE VM を削除する

13. 完了

学習した内容

次のステップ

7. ベクトルエンベディングを生成する

8. ベクトルインデックスを作成する

9. エラスティックインスタンスの作成

10. AlloyDB での外部データラッパーの作成