AlloyDB Omni and EmbeddingGemma on Kubernetes.

1. Introduction

In this codelab you will learn how to deploy AlloyDB Omni on GKE and use it with an open embedding model deployed in the same Kubernetes cluster. The deployment of a model next to the database instance in the same GKE cluster reduces latency and dependencies on 3rd party services. In addition to that, local deployment might be a requirement set up by the security and compliances when the data should not leave the organization and usage of 3rd party services is not allowed.

Prerequisites

A basic understanding of Google Cloud, console
Basic knowledge of Kubernetes and GKE
Basic skills in command line interface and Cloud Shell

What you'll learn

How to deploy AlloyDB Omni on Google Kubernetes cluster
How to connect to the AlloyDB Omni
How to load data to AlloyDB Omni
How to deploy an open embedding model to GKE
How to register embedding model in AlloyDB Omni
How to generate embeddings for semantic search
How to use generated embeddings for semantic search in AlloyDB Omni
How to create and use vector indexes in AlloyDB

What you'll need

A Google Cloud Account and Google Cloud Project
A web browser such as Chrome supporting Google Cloud console and Cloud Shell

2. Setup and Requirements

Project Setup

Sign-in to the Google Cloud Console. If you don't already have a Gmail or Google Workspace account, you must create one.

Use a personal account instead of a work or school account.

Create a new project or reuse an existing one. To create a new project in the Google Cloud console, in the header, click the Select a project button which will open a popup window.

In the Select a project window push the button New Project which will open a dialog box for the new project.

In the dialog box put your preferable Project name and choose the location.

The Project name is the display name for this project's participants. The project name isn't used by Google APIs, and it can be changed at any time.
The Project ID is unique across all Google Cloud projects and is immutable (it can't be changed after it has been set). The Google Cloud console automatically generates a unique ID, but you can customize it. If you don't like the generated ID, you can generate another random one or provide your own to check its availability. In most codelabs, you'll need to reference your project ID, which is typically identified with the placeholder PROJECT_ID.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

Enable Billing

Set up a personal billing account

If you set up billing using Google Cloud credits, you can skip this step.

To set up a personal billing account, go here to enable billing in the Cloud Console.

Some Notes:

Completing this lab should cost less than $3 USD in Cloud resources.
You can follow the steps at the end of this lab to delete resources to avoid further charges.
New users are eligible for the $300 USD Free Trial.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

Activate Cloud Shell

Alternatively you can press G then S. This sequence will activate Cloud Shell if you are within the Google Cloud Console or use this link.

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

Screenshot of Google Cloud Shell terminal showing that the environment has connected

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Before you begin

Enable API

Output:

To use Google Kubernetes Engine (GKE) for AlloyDB Omni and open models deployments you need to enable their respective APIs in your Google Cloud project.

Inside Cloud Shell, make sure that your project ID is setup:

PROJECT_ID=$(gcloud config get-value project)
echo $PROJECT_ID

If it is not defined in the cloud shell configuration set it up using following commands

export PROJECT_ID=<your project>
gcloud config set project $PROJECT_ID

Enable all necessary services:

gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com

Expected output

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=test-project-001-402417
student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable compute.googleapis.com
gcloud services enable container.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.

Introducing the APIs

Kubernetes Engine API (container.googleapis.com) allows you to create and manage Google Kubernetes Engine (GKE) clusters. It provides a managed environment for deploying, managing, and scaling your containerized applications using Google's infrastructure.
Compute Engine API (compute.googleapis.com) allows you to create and manage virtual machines (VMs), persistent disks, and network settings. It provides the core Infrastructure-as-a-Service (IaaS) foundation required to run your workloads and host the underlying infrastructure for many managed services.

4. Deploy AlloyDB Omni on GKE

To deploy AlloyDB Omni on GKE we need to prepare a Kubernetes cluster following the requirements listed in the AlloyDB Omni operator requirements.

Create a GKE Cluster

We need to deploy a standard GKE cluster with a pool configuration sufficient to deploy a pod with AlloyDB Omni instance. For AlloyDB Omni we need at least 2 CPU and 8 GB of RAM and have some room for operator and monitoring services containers. We are going to use the e2-standard-4 VM type.

Set up the environment variables for your deployment.

export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
export MACHINE_TYPE=e2-standard-4

Then we use gcloud to create the GKE standard cluster.

gcloud container clusters create ${CLUSTER_NAME} \
  --project=${PROJECT_ID} \
  --region=${LOCATION} \
  --workload-pool=${PROJECT_ID}.svc.id.goog \
  --release-channel=rapid \
  --machine-type=${MACHINE_TYPE} \
  --num-nodes=1

Expected console output:

student@cloudshell:~ (gleb-test-short-001-415614)$ export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
export MACHINE_TYPE=n2-highmem-2
Your active configuration is: [gleb-test-short-001-415614]
student@cloudshell:~ (gleb-test-short-001-415614)$ gcloud container clusters create ${CLUSTER_NAME} \
  --project=${PROJECT_ID} \
  --region=${LOCATION} \
  --workload-pool=${PROJECT_ID}.svc.id.goog \
  --release-channel=rapid \
  --machine-type=${MACHINE_TYPE} \
  --num-nodes=1
Note: The Kubelet readonly port (10255) is now deprecated. Please update your workloads to use the recommended alternatives. See https://cloud.google.com/kubernetes-engine/docs/how-to/disable-kubelet-readonly-port for ways to check usage and for migration instructions.
Note: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
Creating cluster alloydb-ai-gke in us-central1..


NAME: omni01
ZONE: us-central1-a
MACHINE_TYPE: e2-standard-4
PREEMPTIBLE: 
INTERNAL_IP: 10.128.0.3
EXTERNAL_IP: 35.232.157.123
STATUS: RUNNING
student@cloudshell:~ (gleb-test-short-001-415614)$

Prepare the Cluster

We need to install required components such as cert-manager service - native certificates manager for kubernetes. We can follow steps in the documentation for cert-manager installation

We use the Kubernetes command-line tool, kubectl, which is already installed in the Cloud Shell by default. Before using the utility we need to get credentials for our cluster.

gcloud container clusters get-credentials ${CLUSTER_NAME} --region=${LOCATION}

Now we can use kubectl to install the cert-manager:

kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.19.2/cert-manager.yaml

Expected console output(redacted):

student@cloudshell:~$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.16.2/cert-manager.yaml
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
...
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created

Install AlloyDB Omni

Install AlloyDB Omni operator can be installed using the helm utility.

Run the following command to install AlloyDB Omni operator:

export GCS_BUCKET=alloydb-omni-operator
export HELM_PATH=$(gcloud storage cat gs://$GCS_BUCKET/latest)
export OPERATOR_VERSION="${HELM_PATH%%/*}"
gcloud storage cp gs://$GCS_BUCKET/$HELM_PATH ./ --recursive
helm install alloydbomni-operator alloydbomni-operator-${OPERATOR_VERSION}.tgz \
--create-namespace \
--namespace alloydb-omni-system \
--atomic \
--timeout 5m

Expected console output(redacted):

student@cloudshell:~$ gcloud storage cp gs://$GCS_BUCKET/$HELM_PATH ./ --recursive
Copying gs://alloydb-omni-operator/1.2.0/alloydbomni-operator-1.2.0.tgz to file://./alloydbomni-operator-1.2.0.tgz
  Completed files 1/1 | 126.5kiB/126.5kiB
student@cloudshell:~$ helm install alloydbomni-operator alloydbomni-operator-${OPERATOR_VERSION}.tgz \
> --create-namespace \
> --namespace alloydb-omni-system \
> --atomic \
> --timeout 5m
NAME: alloydbomni-operator
LAST DEPLOYED: Mon Jan 20 13:13:20 2025
NAMESPACE: alloydb-omni-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
student@cloudshell:~$

When the AlloyDB Omni operator is installed we can follow up with the deployment of our database cluster.

Here is example of deployment manifest with enabled googleMLExtension parameter and internal (private) load balancer.:

apiVersion: v1
kind: Secret
metadata:
  name: db-pw-my-omni
type: Opaque
data:
  my-omni: "VmVyeVN0cm9uZ1Bhc3N3b3Jk"
---
apiVersion: alloydbomni.dbadmin.goog/v1
kind: DBCluster
metadata:
  name: my-omni
spec:
  databaseVersion: "15.13.0"
  primarySpec:
    adminUser:
      passwordRef:
        name: db-pw-my-omni
    features:
      googleMLExtension:
        enabled: true
    resources:
      cpu: 1
      memory: 8Gi
      disks:
      - name: DataDisk
        size: 20Gi
        storageClass: standard
    dbLoadBalancerOptions:
      annotations:
        networking.gke.io/load-balancer-type: "internal"
  allowExternalIncomingTraffic: true

The secret value for the password is a Base64 representation of the password word "VeryStrongPassword". The more reliable way is to use Google secret manager to store the password value. You can read about it more in the documentation.

Save the manifest as my-omni.yaml to be applied in the next step. If you are in the Cloud Shell you can do it using the editor by pressing the "Open Editor" button on the top right of the terminal.

After saving the file with the name my-omni.yaml return back to the terminal by pressing the "Open Terminal" button.

Apply the my-omni.yaml manifest to the cluster using kubectl utility:

kubectl apply -f my-omni.yaml

Expected console output:

secret/db-pw-my-omni created
dbcluster.alloydbomni.dbadmin.goog/my-omni created

Check status of your my-omni cluster status using kubectl utility:

kubectl get dbclusters.alloydbomni.dbadmin.goog my-omni -n default

During deployment the cluster goes through different phases and eventually should end with DBClusterReady state.

Expected console output:

$ kubectl get dbclusters.alloydbomni.dbadmin.goog my-omni -n default
NAME      PRIMARYENDPOINT   PRIMARYPHASE   DBCLUSTERPHASE   HAREADYSTATUS   HAREADYREASON
my-omni   10.131.0.33        Ready          DBClusterReady

Connect to AlloyDB Omni

Connect Using Kubernetes Pod

When the cluster is ready we can use the PostgreSQL client binaries on the AlloyDB Omni instance pod. We find the pod id and then use kubectl to directly connect to the pod and run client software. The password is VeryStrongPassword as set via the kubernetes secret in the my-omni.yaml manifest:

DB_CLUSTER_NAME=my-omni
DB_CLUSTER_NAMESPACE=default
DBPOD=`kubectl get pod --selector=alloydbomni.internal.dbadmin.goog/dbcluster=$DB_CLUSTER_NAME,alloydbomni.internal.dbadmin.goog/task-type=database -n $DB_CLUSTER_NAMESPACE -o jsonpath='{.items[0].metadata.name}'`
kubectl exec -ti $DBPOD -n $DB_CLUSTER_NAMESPACE -c database -- psql -h localhost -U postgres

Sample console output:

DB_CLUSTER_NAME=my-omni
DB_CLUSTER_NAMESPACE=default
DBPOD=`kubectl get pod --selector=alloydbomni.internal.dbadmin.goog/dbcluster=$DB_CLUSTER_NAME,alloydbomni.internal.dbadmin.goog/task-type=database -n $DB_CLUSTER_NAMESPACE -o jsonpath='{.items[0].metadata.name}'`
kubectl exec -ti $DBPOD -n $DB_CLUSTER_NAMESPACE -c database -- psql -h localhost -U postgres
Password for user postgres: 
psql (15.7)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_128_GCM_SHA256, compression: off)
Type "help" for help.

postgres=#

5. Deploy AI Model on GKE

To test the AlloyDB Omni AI integration with local models we need to deploy a model to the cluster. We are going to use Google's EmbeddingGemma model.

Create a Node Pool for the Model

To run the model we need to prepare a node pool to run inference. We can run it using a CPU only pool or a pool with GPU accelerators. The CPU only approach might be more feasible in some regions due to high concurrency for the resources. In our lab we are going to use the CPU approach but the best approach from a performance point of view is a pool with graphic accelerators using a node configuration like g2-standard-8 with L4 Nvidia accelerator.

CPU Based Node Pool

Create a node pool with e2-standard-32 nodes. We are going to limit our pull to one node to save resources.

export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container node-pools create cpupool \
  --project=${PROJECT_ID} \
  --location=${LOCATION} \
  --node-locations=${LOCATION}-a \
  --cluster=${CLUSTER_NAME} \
  --machine-type=c3-standard-8 \
  --num-nodes=1

Expected output

student@cloudshell$ export PROJECT_ID=$(gcloud config get project)
Your active configuration is: [pant]
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
student@cloudshell$ gcloud container node-pools create cpupool \
>   --project=${PROJECT_ID} \
>   --location=${LOCATION} \
>   --node-locations=${LOCATION}-a \
>   --cluster=${CLUSTER_NAME} \
>   --machine-type=c3-standard-8 \
>   --num-nodes=1
Creating node pool cpupool...done.
Created [https://container.googleapis.com/v1/projects/gleb-test-short-003-483115/zones/us-central1/clusters/alloydb-ai-gke/nodePools/cpupool].
NAME     MACHINE_TYPE    DISK_SIZE_GB  NODE_VERSION
cpupool  c3-standard-8  100           1.34.1-gke.3355002

Get Hugging Face Token

In this lab we use a partnership with Hugging Face to deploy the EmbeddingGemma model and to do so we need to get a Hugging Face token.

Follow the steps below to generate a new token if you haven't got one before.

Log in or sign up on the Hugging Face site using either Log In or Sign Up links in the top right corner.
Click Your Profile -> Access Tokens
Confirm your identity
Click on Create new token
Choose a name for your token
Select a role for the token - you need at least Read privilege
Click Create token at the bottom of the page
Copy the generated token and save it for use later

Also you need to accept the conditions to access files and content related to the EmbeddingGemma on Hugging Face on the page https://huggingface.co/google/embeddinggemma-300m

Create a kubernetes secret using the token

In the cloud shell session execute (replace the value for the HF_TOKEN by your HF token).

export HF_TOKEN=hf_QjgW...lfrXF

kubectl create secret generic hf-secret \
    --from-literal=hf_api_token=$HF_TOKEN \
    --dry-run=client -o yaml | kubectl apply -f -

Prepare Deployment Manifest

To deploy the model we need to prepare a deployment manifest.

We are using Google's EmbeddingGemma model from Hugging Face. You can read the model card here. To deploy the model we are going to use an approach based on the instructions from Hugging Face and the deployment package from GitHub.

Clone the package from GitHub

git clone https://github.com/huggingface/Google-Cloud-Containers

Adjust the manifest for tei (text embedding interface) on CPU nodes. We need to replace several parameters including the model, image, correct resource allocation and add the Hugging Face token secret to the configuration.

Edit the manifest (using any available editor)

vi Google-Cloud-Containers/examples/gke/tei-deployment/cpu-config/deployment.yaml

Here is a corrected manifest d=for deployment on a CPU based pool.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tei-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tei-server
  template:
    metadata:
      labels:
        app: tei-server
        hf.co/model: Google--embeddinggemma-300m
        hf.co/task: text-embeddings
    spec:
      containers:
        - name: tei-container
          image: ghcr.io/huggingface/text-embeddings-inference:cpu-latest
          #image: us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cpu.1-4:latest
          resources:
            requests:
              cpu: "6"
              memory: "24Gi"
            limits:
              cpu: "6"
              memory: "24Gi"
          env:
            - name: MODEL_ID
              value: google/embeddinggemma-300m
            - name: NUM_SHARD
              value: "1"
            - name: PORT
              value: "8080"
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-secret
                  key: hf_api_token
          volumeMounts:
            - mountPath: /tmp
              name: tmp
      volumes:
        - name: tmp
          emptyDir: {}
      nodeSelector:
        #cloud.google.com/compute-class: "Performance"
        cloud.google.com/machine-family: "c3"

Deploy the Model

Deploy the model by applying the modified manifest for CPU deployments.

kubectl apply -f Google-Cloud-Containers/examples/gke/tei-deployment/cpu-config

Verify the deployments

kubectl get pods

Verify the model service

kubectl get service tei-service

It is supposed to show the running service type ClusterIP

Sample output:

student@cloudshell$ kubectl get service tei-service
NAME          TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
tei-service   ClusterIP   34.118.233.48   <none>        8080/TCP   10m

The CLUSTER-IP for the service is what we are going to use as our endpoint address. The model embedding can respond by URI http://34.118.233.48:8080/embed. It will be used later when you register the model in AlloyDB Omni.

We can test it by exposing it using the kubectl port-forward command.

kubectl port-forward service/tei-service 8080:8080

If you are using Cloud Shell then the port forwarding can be running in one Cloud Shell session and we need another session to test it.

Open another Cloud Shell tab using the sign "+" at the top.

And run a curl command in the new shell session.

curl http://localhost:8080/embed \
    -X POST \
    -d '{"inputs":"Test"}' \
    -H 'Content-Type: application/json'

It should return a vector array like in the following sample output (redacted):

curl http://localhost:8080/embed \
>     -X POST \
>     -d '{"inputs":"Test"}' \
>     -H 'Content-Type: application/json'
[[-0.018975832,0.0071419072,0.06347208,0.022992613,0.014205903
...
-0.03677433,0.01636146,0.06731572]]

If we see the numbers we can confirm that we have successfully tested the model and now can register it in our AlloyDB Omni to be used directly from SQL.

6. Register the Model in AlloyDB Omni

To test how our AlloyDB Omni works with the deployed model we need to create a database and register the model.

Create Database

Create a GCE VM as a jump box to connect to the AlloyDB Omni from your client VM and create a database.

We need the jump box since the GKE external load balancer for Omni gives you access from the VPC using private IP addressing but doesn't allow you to connect from outside of VPC. It is more secure in general and doesn't expose your database instance to the internet. Please check the diagram for clarity.

To create a VM in the Cloud Shell session execute:

export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE

Find AlloyDB Omni endpoint IP using kubectl in the Cloud Shell:

kubectl get dbclusters.alloydbomni.dbadmin.goog my-omni -n default

Note down the PRIMARYENDPOINT.

Here is an example output:

student@cloudshell:~$ kubectl get dbclusters.alloydbomni.dbadmin.goog my-omni -n default
NAME      PRIMARYENDPOINT   PRIMARYPHASE   DBCLUSTERPHASE   HAREADYSTATUS   HAREADYREASON
my-omni   10.131.0.33        Ready          DBClusterReady
student@cloudshell:~$

The 10.131.0.33 is the IP we will be using in our examples to connect to the AlloyDB Omni instance.

Connect to VM using gcloud:

gcloud compute ssh instance-1 --zone=$ZONE

If prompted for ssh key generation, follow the instructions. Read more about ssh connection in the documentation.

In the ssh session to the VM install PostgreSQL client:

sudo apt-get update
sudo apt-get install --yes postgresql-client

Export the AlloyDB Omni load balancer IP variable using the following example (replace IP by your load balancer IP):

export INSTANCE_IP=10.131.0.33

Connect to the AlloyDB Omni, the password is VeryStrongPassword as set via the hash in my-omni.yaml:

psql "host=$INSTANCE_IP user=postgres sslmode=require"

In the established psql session execute:

create database demo;

Exit the session and connect to the database demo (or you can simply run \c demo in the same session)

psql "host=$INSTANCE_IP user=postgres sslmode=require dbname=demo"

Create transform functions

For 3rd party embedding models we need to create transform functions which format the input and output to the format expected by the model and our internal functions. Those functions perform as translators to make format conversion between different interfaces.

Here is the transform function which handles the input:

-- Input Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_input_transform(model_id VARCHAR(100), input_text TEXT)
RETURNS JSON
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_input JSON;
  model_qualified_name TEXT;
BEGIN
  SELECT json_build_object('inputs', input_text, 'truncate', true)::JSON INTO transformed_input;
  RETURN transformed_input;
END;
$$;

Execute the provided code while connected to the demo database as it is shown in the sample output:

demo=# -- Input Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_input_transform(model_id VARCHAR(100), input_text TEXT)
RETURNS JSON
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_input JSON;
  model_qualified_name TEXT;
BEGIN
  SELECT json_build_object('inputs', input_text, 'truncate', true)::JSON INTO transformed_input;
  RETURN transformed_input;
END;
$$;
CREATE FUNCTION
demo=#

And here is the output function which transforms the response from the model to the real numbers array:

-- Output Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_output_transform(model_id VARCHAR(100), response_json JSON)
RETURNS REAL[]
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_output REAL[];
BEGIN
  SELECT ARRAY(SELECT json_array_elements_text(response_json->0)) INTO transformed_output;
  RETURN transformed_output;
END;
$$;

Execute it in the same session:

demo=# -- Output Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION tei_text_output_transform(model_id VARCHAR(100), response_json JSON)
RETURNS REAL[]
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_output REAL[];
BEGIN
  SELECT ARRAY(SELECT json_array_elements_text(response_json->0)) INTO transformed_output;
  RETURN transformed_output;
END;
$$;
CREATE FUNCTION
demo=#

Register the model

Now we can register the model in the database.

Here is the procedure call to register the model with name embeddinggemma. We use tei-service service name in our model_request_url parameter when we register the model. That is the internal kubernetes cluster service name and translates to the internal IP in the GKE cluster:

CALL
  google_ml.create_model(
    model_id => 'embeddinggemma',
    model_request_url => 'http://tei-service:8080/embed',
    model_provider => 'custom',
    model_type => 'text_embedding',
    model_in_transform_fn => 'tei_text_input_transform',
    model_out_transform_fn => 'tei_text_output_transform');

Execute the provided code while connected to the demo database:

demo=# CALL
  google_ml.create_model(
    model_id => 'embeddinggemma',
    model_request_url => 'http://tei-service:8080/embed',
    model_provider => 'custom',
    model_type => 'text_embedding',
    model_in_transform_fn => 'tei_text_input_transform',
    model_out_transform_fn => 'tei_text_output_transform');
CALL
demo=#

We can test the register model using the following test query which should return a real numbers array.

select google_ml.embedding('embeddinggemma','What is AlloyDB Omni?');

Don't be surprised by the prolonged delay before getting the vector data back. For this test we use CPU based nodes pool to host the embedding model and it works much faster on nodes with GPU.

7. Test the Model in AlloyDB Omni

Load Data

To test how our AlloyDB Omni works with the deployed model we need to load some data. I used the same data as in one of the other codelabs for vector search in AlloyDB.

One way to load the data is to use the Google Cloud SDK and the PostgreSQL client software. We can use the same client VM. The Google Cloud SDK should be already installed there if you've used the defaults for the VM image. But if you've used a custom image without Google SDK you can add it following the documentation.

Export the AlloyDB Omni load balancer IP as in the following example (replace IP by your load balancer IP):

export INSTANCE_IP=10.131.0.33

Connect to the database and enable pgvector extension.

psql "host=$INSTANCE_IP user=postgres sslmode=require dbname=demo"

In the psql session:

CREATE EXTENSION IF NOT EXISTS vector;

Exit the psql session and in the command line session execute commands to load the data to the demo database.

Create the tables. The following command will get the cymbal_demo_schema.sql file and execute the SQL with all tables definitions in the demo database:

gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=demo"

Expected console output:

student@cloudshell:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=demo"
Password for user postgres:
SET
SET
SET
SET
SET
 set_config
------------

(1 row)

SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
ALTER TABLE
ALTER TABLE
student@cloudshell:~$

Here is the list of created tables:

psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\dt+"

Output:

student@cloudshell:~$ psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\dt+"
Password for user postgres: 
                                           List of relations
 Schema |       Name       | Type  |  Owner   | Persistence | Access method |    Size    | Description 
--------+------------------+-------+----------+-------------+---------------+------------+-------------
 public | cymbal_embedding | table | postgres | permanent   | heap          | 8192 bytes | 
 public | cymbal_inventory | table | postgres | permanent   | heap          | 8192 bytes | 
 public | cymbal_products  | table | postgres | permanent   | heap          | 8192 bytes | 
 public | cymbal_stores    | table | postgres | permanent   | heap          | 8192 bytes | 
(4 rows)
student@cloudshell:~$

Load data to the cymbal_products table:

gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_products from stdin csv header"

Expected console output:

student@cloudshell:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_products from stdin csv header"
COPY 941
student@cloudshell:~$

Here is a sample of a few rows from the cymbal_products table.

psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT uniq_id,left(product_name,30),left(product_description,50),sale_price FROM cymbal_products limit 3"

Output:

student@cloudshell:~$ psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT uniq_id,left(product_name,30),left(product_description,50),sale_price FROM cymbal_products limit 3"
Password for user postgres: 
             uniq_id              |              left              |                        left                        | sale_price 
----------------------------------+--------------------------------+----------------------------------------------------+------------
 a73d5f754f225ecb9fdc64232a57bc37 | Laundry Tub Strainer Cup       |   Laundry tub strainer cup Chrome For 1-.50, drain |      11.74
 41b8993891aa7d39352f092ace8f3a86 | LED Starry Star Night Light La |  LED Starry Star Night Light Laser Projector 3D Oc |      46.97
 ed4a5c1b02990a1bebec908d416fe801 | Surya Horizon HRZ-1060 Area Ru |  The 100% polypropylene construction of the Surya  |       77.4
(3 rows)
student@cloudshell:~$

Load data to the cymbal_inventory table:

gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_inventory from stdin csv header"

Expected console output:

student@cloudshell:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_inventory from stdin csv header"
Password for user postgres: 
COPY 263861
student@cloudshell:~$

Here is a sample of a few rows from the cymbal_inventory table.

psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT * FROM cymbal_inventory LIMIT 3"

Output:

student@cloudshell:~$ psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT * FROM cymbal_inventory LIMIT 3"
Password for user postgres: 
 store_id |             uniq_id              | inventory 
----------+----------------------------------+-----------
     1583 | adc4964a6138d1148b1d98c557546695 |         5
     1490 | adc4964a6138d1148b1d98c557546695 |         4
     1492 | adc4964a6138d1148b1d98c557546695 |         3
(3 rows)
student@cloudshell:~$

Load data to the cymbal_stores table:

gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_stores from stdin csv header"

Expected console output:

student@cloudshell:~$ gcloud storage cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "\copy cymbal_stores from stdin csv header"
Password for user postgres: 
COPY 4654
student@cloudshell:~$

Here is a sample of a few rows from the cymbal_stores table.

psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT store_id, name, zip_code FROM cymbal_stores limit 3"

Output:

student@cloudshell:~$ psql "host=$INSTANCE_IP user=postgres dbname=demo" -c "SELECT store_id, name, zip_code FROM cymbal_stores limit 3"
Password for user postgres: 
 store_id |       name        | zip_code 
----------+-------------------+----------
     1990 | Mayaguez Store    |      680
     2267 | Ware Supercenter  |     1082
     4359 | Ponce Supercenter |      780
(3 rows)
student@cloudshell:~$

Build Embeddings

Connect to the demo database using psql and build embeddings for the products described in the cymbal_products table based on the products descriptions.

Connect to the demo database:

psql "host=$INSTANCE_IP user=postgres sslmode=require dbname=demo"

We are using a cymbal_embedding table with column embedding to store our embeddings and we use product description as the text input to the function.

Enable timing for your queries to compare later with remote models.:

\timing

Run the query to build the embeddings:

INSERT INTO cymbal_embedding(uniq_id,embedding)  SELECT uniq_id, google_ml.embedding('embeddinggemma',product_description)::vector FROM cymbal_products;

Expected console output:

demo=# INSERT INTO cymbal_embedding(uniq_id,embedding)  SELECT uniq_id, google_ml.embedding('embeddinggemma',product_description)::vector FROM cymbal_products;
INSERT 0 941
Time: 497878.136 ms (08:17.878)
demo=#

In this example, building embeddings took about 8 minutes. That is expected for CPU based node pool. For a pool with GPU accelerators it can be significantly faster depending on the GPU type.

Run Test Queries

Connect to the demo database using psql and enable timing to measure execution time for our queries as we did for building embeddings.

Let's find the top 5 products matching a request like "What kind of fruit trees grow well here?" using cosine distance as the algorithm for vector search.

In the psql session execute:

SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (ce.embedding <=> google_ml.embedding('embeddinggemma','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_embedding ce on
        ce.uniq_id=cp.uniq_id
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;

Expected console output:

demo=# SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (ce.embedding <=> google_ml.embedding('embeddinggemma','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_embedding ce on
        ce.uniq_id=cp.uniq_id
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;
     product_name      |                                   description                                    | sale_price | zip_code |      distance
-----------------------+----------------------------------------------------------------------------------+------------+----------+--------------------
 Cherry Tree           | This is a beautiful cherry tree that will produce delicious cherries. It is an d |      75.00 |    93230 | 0.5210549378080666
 California Lilac      | This is a beautiful lilac tree that can grow to be over 10 feet tall. It is an d |       5.00 |    93230 | 0.5639421771781971
 Toyon                 | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e |      10.00 |    93230 | 0.5670010914504852
 Rose Bush             | This is a beautiful rose bush that will produce fragrant roses. It is a perennia |      50.00 |    93230 | 0.5731542622882957
 California Peppertree | This is a beautiful peppertree that can grow to be over 30 feet tall. It is an e |      25.00 |    93230 | 0.5750934653011995
(5 rows)

Time: 83.610 ms
demo=#

The query ran 83 ms and returned a list of trees from the cymbal_products table matching the request and with inventory available in the store with number 1583.

Build ANN Index

When we have only a small data set it is easy to use exact search scanning all embeddings but when the data grows then load and response time increases as well. To improve performance you can build indexes on your embedding data. Here is an example of how to do it using Google ScaNN index for vector data.

Reconnect to the demo database if you've lost the connection:

psql "host=$INSTANCE_IP user=postgres sslmode=require dbname=demo"

Enable alloydb_scann extension:

CREATE EXTENSION IF NOT EXISTS alloydb_scann;

Build the index:

CREATE INDEX cymbal_embedding_scann ON cymbal_embedding USING scann (embedding cosine);

Try the same query as before and compare the results:

demo=# SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (ce.embedding <=> google_ml.embedding('embeddinggemma','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_embedding ce on
        ce.uniq_id=cp.uniq_id
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;
     product_name      |                                   description                                    | sale_price | zip_code |      distance
-----------------------+----------------------------------------------------------------------------------+------------+----------+--------------------
 Cherry Tree           | This is a beautiful cherry tree that will produce delicious cherries. It is an d |      75.00 |    93230 | 0.5210549378080666
 California Lilac      | This is a beautiful lilac tree that can grow to be over 10 feet tall. It is an d |       5.00 |    93230 | 0.5639421771781971
 Toyon                 | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e |      10.00 |    93230 | 0.5670010914504852
 Rose Bush             | This is a beautiful rose bush that will produce fragrant roses. It is a perennia |      50.00 |    93230 | 0.5731542622882957
 California Peppertree | This is a beautiful peppertree that can grow to be over 30 feet tall. It is an e |      25.00 |    93230 | 0.5750934653011995
(5 rows)

Time: 64.783 ms

The query execution time has slightly reduced and that gain would be more noticeable with larger datasets. The results are quite similar and we got the same top 5 trees in the result..

Try other queries and read more about choosing vector index in documentation.

And don't forget AlloyDB Omni has more features and labs.

Accelerating analytical queries with columnar engine in AlloyDB Omni

8. Clean up environment

Now we can delete our GKE cluster with AlloyDB Omni and a AI model

Delete GKE Cluster

In Cloud Shell execute:

export PROJECT_ID=$(gcloud config get project)
export LOCATION=us-central1
export CLUSTER_NAME=alloydb-ai-gke
gcloud container clusters delete ${CLUSTER_NAME} \
  --project=${PROJECT_ID} \
  --region=${LOCATION}

Expected console output:

student@cloudshell:~$ gcloud container clusters delete ${CLUSTER_NAME} \
>   --project=${PROJECT_ID} \
>   --region=${LOCATION}
The following clusters will be deleted.
 - [alloydb-ai-gke] in [us-central1]

Do you want to continue (Y/n)?  Y

Deleting cluster alloydb-ai-gke...done.
Deleted

Delete VM

In Cloud Shell execute:

export PROJECT_ID=$(gcloud config get project)
export ZONE=us-central1-a
gcloud compute instances delete instance-1 \
  --project=${PROJECT_ID} \
  --zone=${ZONE}

Expected console output:

student@cloudshell:~$ export PROJECT_ID=$(gcloud config get project)
export ZONE=us-central1-a
gcloud compute instances delete instance-1 \
  --project=${PROJECT_ID} \
  --zone=${ZONE}
Your active configuration is: [cloudshell-5399]
The following instances will be deleted. Any attached disks configured to be auto-deleted will be deleted unless they are attached to any other instances or the `--keep-disks` flag is given and specifies them for keeping. Deleting a disk 
is irreversible and any data on the disk will be lost.
 - [instance-1] in [us-central1-a]

Do you want to continue (Y/n)?  Y

Deleted

If you created a new project for this codelab, you can instead delete the full project: https://console.cloud.google.com/cloud-resource-manager

9. Congratulations

Congratulations for completing the codelab.

What we've covered

How to deploy AlloyDB Omni on Google Kubernetes cluster
How to connect to the AlloyDB Omni
How to load data to AlloyDB Omni
How to deploy an open embedding model to GKE
How to register embedding model in AlloyDB Omni
How to generate embeddings for semantic search
How to use generated embeddings for semantic search in AlloyDB Omni
How to create and use vector indexes in AlloyDB

You can read more about working with AI in AlloyDB Omni in the documentation.

10. Survey

Output:

How will you use this tutorial?

Only read through it

Read it and complete the exercises