Getting started with Vector Embeddings with AlloyDB AI

1. Introduction

In this codelab you will learn how to use AlloyDB AI by combining vector search with Vertex AI embeddings.

f83688d8bad3cc22.png

Prerequisites

  • A basic understanding of the Google Cloud, Console
  • Basic skills in command line interface and google shell

What you'll learn

  • How to deploy AlloyDB cluster and primary instance
  • How to connect to the AlloyDB from Google Compute Engine VM
  • How to create database and enable AlloyDB AI
  • How to load data to the database
  • How to use Vertex AI embedding model in AlloyDB
  • How to enrich the result using Vertex AI generative model

What you'll need

  • A Google Cloud Account and Google Cloud Project
  • A web browser such as Chrome

2. Setup and Requirements

Self-paced environment setup

  1. Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

fbef9caa1602edd0.png

a99b7ace416376c4.png

5e3ff691252acf41.png

  • The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
  • The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
  • For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
  1. Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

55efc1aaa7a4d3ad.png

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

7ffe5cbb04455448.png

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Before you begin

Enable API

Output:

Inside Cloud Shell, make sure that your project ID is setup:

gcloud config set project [YOUR-PROJECT-ID]
PROJECT_ID=$(gcloud config get-value project)

Configure your default region to us-central1 to use the Vertex AI embedding models. Read more about regional restrictions.

gcloud config set compute/region us-central1

Enable all necessary services:

gcloud services enable aiplatform.googleapis.com
gcloud services enable alloydb.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable servicenetworking.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com

Expected output

student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-14650]
student@cloudshell:~ (test-project-001-402417)$ 
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable aiplatform.googleapis.com
gcloud services enable alloydb.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable servicenetworking.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.
Operation "operations/acat.p2-4470404856-895b1c7d-b3ab-438b-b8bd-8ccf887c34bb" finished successfully.
Operation "operations/acf.p2-4470404856-e8e5f4b5-6f28-41cc-972c-d0595d95c664" finished successfully.
Operation "operations/acat.p2-4470404856-2cd1a372-a7f4-49a6-849c-aca8218a9503" finished successfully.
Operation "operations/acat.p2-4470404856-5ce4f43c-53e9-40dc-8a7b-26e99c8dbf74" finished successfully.

4. Deploy AlloyDB

Before creating an AlloyDB cluster we need to allocate a private IP range in our VPC to be used by the future AlloyDB instance, after that we will be able to create the cluster and instance.

Create private IP range

We need to configure Private Service Access configuration in our VPC for AlloyDB. The assumption here is that we have the "default" VPC network in the project and it is going to be used for all actions.

Create the private IP range:

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=16 \
    --description="VPC private service access" \
    --network=default

Create private connection using the allocated IP range:

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default

Expected console output:

student@cloudshell:~ (test-project)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=16 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project/global/addresses/psa-range].

student@cloudshell:~ (test-project)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project)$

Create AlloyDB Cluster

Create an AlloyDB cluster in the default region:

export PGPASSWORD=`openssl rand -base64 12`
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

Expected console output:

student@cloudshell:~ (test-project)$ export PGPASSWORD=`openssl rand -base64 12`
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.                                                                                                                                                                                                                                                           

Note the PostgreSQL password for future use:

echo $PGPASSWORD

Expected console output:

student@cloudshell:~ (test-project)$ echo $PGPASSWORD=
<your password for user postgres)

Create AlloyDB Primary Instance

Create an AlloyDB primary instance for our cluster:

export REGION=us-central1
gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

Expected console output:

student@cloudshell:~ (test-project)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.                                                                                                                                                                                                                                                     

5. Connect to AlloyDB

AlloyDB is deployed using a private-only connection, so we need a VM with PostgreSQL client installed to work with the database.

Deploy GCE VM

Create a GCE VM in the same region and VPC as the AlloyDB cluster.

In Cloud Shell execute:

export ZONE=us-central1-a
gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --scopes=https://www.googleapis.com/auth/cloud-platform

Expected console output:

student@cloudshell:~ (test-project-402417)$ export ZONE=us-central1-a
student@cloudshell:~ (test-project-402417)$ gcloud compute instances create instance-1 \
    --zone=$ZONE \
    --scopes=https://www.googleapis.com/auth/cloud-platform
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/zones/us-central1-a/instances/instance-1].
NAME: instance-1
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE: 
INTERNAL_IP: 10.128.0.2
EXTERNAL_IP: 34.71.192.233
STATUS: RUNNING

Install Postgres Client

Install the PostgreSQL client software on the deployed VM

Connect to the VM:

gcloud compute ssh instance-1 --zone=us-central1-a

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud compute ssh instance-1 --zone=us-central1-a
Updating project ssh metadata...working..Updated [https://www.googleapis.com/compute/v1/projects/test-project-402417].                                                                                                                                                         
Updating project ssh metadata...done.                                                                                                                                                                                                                                              
Waiting for SSH key to propagate.
Warning: Permanently added 'compute.5110295539541121102' (ECDSA) to the list of known hosts.
Linux instance-1 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
student@instance-1:~$ 

Install the software running command inside the VM:

sudo apt-get update
sudo apt-get install --yes postgresql-client

Expected console output:

student@instance-1:~$ sudo apt-get update
sudo apt-get install --yes postgresql-client
Get:1 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable InRelease [5146 B]
Get:2 https://packages.cloud.google.com/apt cloud-sdk-bullseye InRelease [6406 B]   
Hit:3 https://deb.debian.org/debian bullseye InRelease  
Get:4 https://deb.debian.org/debian-security bullseye-security InRelease [48.4 kB]
Get:5 https://packages.cloud.google.com/apt google-compute-engine-bullseye-stable/main amd64 Packages [1930 B]
Get:6 https://deb.debian.org/debian bullseye-updates InRelease [44.1 kB]
Get:7 https://deb.debian.org/debian bullseye-backports InRelease [49.0 kB]
...redacted...
update-alternatives: using /usr/share/postgresql/13/man/man1/psql.1.gz to provide /usr/share/man/man1/psql.1.gz (psql.1.gz) in auto mode
Setting up postgresql-client (13+225) ...
Processing triggers for man-db (2.9.4-2) ...
Processing triggers for libc-bin (2.31-13+deb11u7) ...

Connect to the Instance

Connect to the primary instance from the VM using psql.

Open another Cloud Shell tab using the sign "+" at the top.

4ca978f5142bb6ce.png

In the new cloud shell tab execute:

gcloud config set project <your PROJECT_ID>
export REGION=<your region>
export ADBCLUSTER=<your AlloyDB cluster name>
gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="(ipAddress)"

Expected console output:

student@cloudshell:~ (test-project-402417)$ gcloud config set project test-project-402417
Updated property [core/project].
student@cloudshell:~ (test-project-402417)$ export REGION=us-central1
student@cloudshell:~ (test-project-402417)$ export ADBCLUSTER=alloydb-aip-01
student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances describe $ADBCLUSTER-pr --cluster=$ADBCLUSTER --region=$REGION --format="(ipAddress)"
ipAddress: 10.94.0.11
student@cloudshell:~ (test-project-402417)$ 

Return to the previous Cloud Shell tab with the openedSSH session to your VM.

Use the noted $PGASSWORD and the cluster IP to connect to AlloyDB from the GCE VM:

export PGPASSWORD=<Noted password>
export INSTANCE_IP=<AlloyDB Instance IP>
psql "host=$INSTANCE_IP user=postgres sslmode=require"

Expected console output:

student@instance-1:~$ export PGPASSWORD=P9...
student@instance-1:~$ export INSTANCE_IP=10.94.0.11
student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres sslmode=require"
psql (13.11 (Debian 13.11-0+deb11u1), server 14.7)
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

postgres=> 

6. Prepare Database

We need to create a database, enable Vertex AI integration, create database objects and import the data.

Grant Necessary Permissions to AlloyDB

Add Vertex AI permissions to the AlloyDB service agent.

In the second Cloud Shell session (where you are not connected to the VM) execute:

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1
 

Create Database

Create database quickstart.

In the GCE VM session execute:

psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"

Expected console output:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres" -c "CREATE DATABASE quickstart_db"
CREATE DATABASE
student@instance-1:~$  

Enable Vertex AI Integration

Enable Vertex AI integration and the pgvector extensions in the database.

In the GCE VM execute:

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"

Expected console output:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS google_ml_integration CASCADE"
CREATE EXTENSION
student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "CREATE EXTENSION IF NOT EXISTS vector"  
CREATE EXTENSION
student@instance-1:~$ 

Import Data

Download the prepared data and import it into the new database.

In the GCE VM execute:

gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"

Expected console output:

student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_demo_schema.sql |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
SET
SET
SET
SET
SET
 set_config 
------------
 
(1 row)
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE TABLE
ALTER TABLE
CREATE SEQUENCE
ALTER TABLE
ALTER SEQUENCE
ALTER TABLE
ALTER TABLE
ALTER TABLE
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_products.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_products from stdin csv header"
COPY 941
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_inventory.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_inventory from stdin csv header"
COPY 263861
student@instance-1:~$ gsutil cat gs://cloud-training/gcc/gcc-tech-004/cymbal_stores.csv |psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db" -c "\copy cymbal_stores from stdin csv header"
COPY 4654
student@instance-1:~$

7. Calculate embeddings

After importing the data we got our product data in the cymbal_products table, inventory showing the number of available products in each store in the cymbal_inventory table and list of the stores in the cymbal_stores table. We need to calculate the vector data based on descriptions for our products and we are going to use function embedding for that. Using the function we are going to use Vertex AI integration to calculate vector data based on our products descriptions and add it to the table. You can read more about the used technology in the documentation.

Create embedding column

Connect to the database using psql and create a virtual column with the vector data using the embedding function in the cymbal_products table. The embedding function returns vector data from Vertex AI based on the data supplied from the product_description column.

In the VM SSH session connect to the database:

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"

In the psql session after connecting to the database execute:

ALTER TABLE cymbal_products ADD COLUMN embedding vector GENERATED ALWAYS AS (embedding('textembedding-gecko@001',product_description)) STORED;

The command will create the virtual column and populate it with vector data.

Expected console output:

student@instance-1:~$ psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"
psql (13.11 (Debian 13.11-0+deb11u1), server 14.7)
WARNING: psql major version 13, server major version 14.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)
Type "help" for help.

quickstart_db=> ALTER TABLE cymbal_products ADD COLUMN embedding vector GENERATED ALWAYS AS (embedding('textembedding-gecko@001',product_description)) STORED;
ALTER TABLE
quickstart_db=> 

8. Run Similarity Search

We can now run our search using similarity search based on vector values calculated for the descriptions and the vector value we get for our request.

Run Similarity Search from psql

If your database session was disconnected then connect to the database again using psql.

Connect to the database:

psql "host=$INSTANCE_IP user=postgres dbname=quickstart_db"

Run a query to get a list of available products most closely related to a client's request. The request we are going to pass to Vertex AI to get the vector value sounds like "What kind of fruit trees grow well here?"

Here is the query you can run to choose first 10 items most suitable for our request:

SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (cp.embedding <=> embedding('textembedding-gecko@001','What kind of fruit trees grow well here?')::vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 10;

And here is the expected output:

quickstart_db=> SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        (cp.embedding <=> embedding('textembedding-gecko@001','What kind of fruit trees grow well here?')) as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 10;
                                                            product_name                                                             |                                   description                                    | sale_price | zip_code |      distance       
-------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------+------------+----------+---------------------
 Meyer Lemon Tree                                                                                                                    | Meyer Lemon trees are California's favorite lemon tree! Grow your own lemons by  |         34 |    93230 |  0.3064513262485483
 Cherry Tree                                                                                                                         | This is a beautiful cherry tree that will produce delicious cherries. It is an d |      75.00 |    93230 |  0.3069243955283131
 Toyon                                                                                                                               | This is a beautiful toyon tree that can grow to be over 20 feet tall. It is an e |      10.00 |    93230 |   0.313714042750346
 Fremont Cottonwood                                                                                                                  | This is a beautiful cottonwood tree that can grow to be over 100 feet tall. It i |     200.00 |    93230 |  0.3336509009628251
 Moon & Stars Yellow Flesh Watermelon - 2 g ~25 Seeds - Heirloom, Open Pollinated, Non-GMO, Farm & Vegetable Gardening / Fruit Seeds | This watermelon is a family heirloom from Georgia although it was said that this |       2.49 |    93230 |  0.3387332564671345
 Maple Tree                                                                                                                          | This is a beautiful maple tree that will produce colorful leaves in the fall. It |     100.00 |    93230 | 0.33925288236373474
 California Sycamore                                                                                                                 | This is a beautiful sycamore tree that can grow to be over 100 feet tall. It is  |     300.00 |    93230 |  0.3404536395103116
 California Lilac                                                                                                                    | This is a beautiful lilac tree that can grow to be over 10 feet tall. It is an d |       5.00 |    93230 | 0.34110167449998563
 California Black Walnut                                                                                                             | This is a beautiful walnut tree that can grow to be over 80 feet tall. It is a d |     100.00 |    93230 | 0.34131240295701903
 Cypress Tree                                                                                                                        | This is a beautiful cypress tree that will provide shade and privacy. It is an e |      75.00 |    93230 | 0.34409911854076425
(10 rows)

quickstart_db=> 

9. Improve Response

You can improve the response to a client application using the result of the query and prepare a meaningful output using the supplied query results as part of the prompt to the Vertex AI generative foundation language model.

To achieve that we plan to generate a JSON with our results from the vector search, then use that generated JSON as addition to a prompt for a text LLM model in Vertex AI to create a meaningful output. In the first step we generate the JSON, then we test it in the Vertex AI Studio and in the last step we incorporate it into a SQL statement which can be used in an application.

Generate output in JSON format

Modify the query to generate the output in JSON format and return only one row to pass to Vertex AI

Here is the example of the query:

WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (cp.embedding <=> embedding('textembedding-gecko@001','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1)
SELECT json_agg(trees) FROM trees;

And here is the expected output:

[{"product_name":"Meyer Lemon Tree","description":"Meyer Lemon trees are California's favorite lemon tree! Grow your own lemons by ","sale_price":34,"zip_code":93230,"product_id":"02056727942aeb714dc9a2313654e1b0"}]

Run the prompt in Vertex AI Studio

We can use the generated JSON to supply it as a part of the prompt to generative AI text model in the Vertex AI Studio

Open the Vertex AI Studio in the cloud console.

e514b176aef7945e.png

fde6054f66c12a78.png

Here is the prompt we are going to use:

You are a friendly advisor helping to find a product based on the customer's needs.

Based on the client request we have loaded a list of products closely related to search.

The list in JSON format with list of values like {"product_name":"name","description":"some description","sale_price":10,"zip_code": 10234, "produt_id": "02056727942aeb714dc9a2313654e1b0"}

Here is the list of products:

[place for our JSON]

The customer asked "What tree is growing the best here?"

You should give information about the product, price and some supplemental information' as prompt

And here is the result when we run the prompt with our JSON values:

ce89b1363dee1516.png

The answer provided by the model in this example is:

"The Meyer Lemon Tree is California's favorite lemon tree! It is a citrus tree that produces fragrant, juicy lemons. The tree is easy to grow and care for, and it is a great addition to any garden. The Meyer Lemon Tree is currently on sale for $34.

Here are some additional tips for growing lemon trees:

  • Choose a sunny spot in your garden for your lemon tree.
  • Plant your lemon tree in well-drained soil.
  • Water your lemon tree regularly, but do not overwater it.
  • Fertilize your lemon tree every few months with a citrus fertilizer.
  • Prune your lemon tree regularly to keep it healthy and productive."

Run the prompt in PSQL

We can use the same generated JSON to supply it as a part of the prompt to generative AI text model using SQL.

In the pSQL session to the database run the query

WITH trees as (
SELECT
    cp.product_name,
    cp.product_description as description,
    cp.sale_price,
    cs.zip_code,
    cp.uniq_id as product_id
FROM
    cymbal_products cp
JOIN cymbal_inventory ci on
    ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
    cs.store_id=ci.store_id
    AND ci.inventory>0
    AND cs.store_id = 1583
ORDER BY
    (cp.embedding <=> embedding('textembedding-gecko@001','What kind of fruit trees grow well here?')::vector) ASC
LIMIT 1),
prompt as (
select
    'You are a friendly advisor helping to find a product based on the customer''s needs.
Based on the client request we have loaded a list of products closely related to search.
The list in JSON format with list of values like {"product_name":"name","product_description":"some description","sale_price":10}
Here is the list of products:' || json_agg(trees) || 'The customer asked "What kind of fruit trees grow well here?"
You should give information about the product, price and some supplemental information' as prompt
from
    trees)
select
    ml_predict_row(
 FORMAT('publishers/google/models/%s',
    'text-bison'),
    json_build_object('instances',
    json_build_object('prompt',
    prompt),
    'parameters',
    json_build_object('maxOutputTokens',
    2048))
    )->'predictions'->0->'content'
from
    prompt
;

And here is the expected output:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 " The Meyer Lemon Tree is a popular choice for California gardens. It is a semi-dwarf tree that produces juicy lemons that are perfect for baking, seasoning, and marinades. The tree has evergreen foliage and fragrant flowers that bloom winter through fall. It is grafted onto semi-dwarf C-35 rootstock and arrives between 30 in.-36 in. tall (excluding can). The sale price is $34."
(1 row)

10. Clean up environment

Destroy the AlloyDB instances and cluster when you are done with the lab

Delete AlloyDB cluster and all instances

The cluster is destroyed with option force which also deletes all the instances belonging to the cluster.

In the cloud shell define the project and environment variables if you've been disconnected and all the previous settings are lost:

gcloud config set project <your project id>
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)

Delete the cluster:

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force
All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.   

Delete AlloyDB Backups

Delete an AlloyDB backups for the cluster:

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.                                                                                                                                                                                                                                                            

Now we can destroy our VM

Destroy GCE VM

Delete the GCE VM

In Cloud Shell execute:

export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet

Expected console output:

student@cloudshell:~ (test-project-001-402417)$ export GCEVM=instance-1
export ZONE=us-central1-a
gcloud compute instances delete $GCEVM \
    --zone=$ZONE \
    --quiet
Deleted 

11. Congratulations

Congratulations for completing the codelab.

What we've covered

  • How to deploy AlloyDB cluster and primary instance
  • How to connect to the AlloyDB from Google Compute Engine VM
  • How to create database and enable AlloyDB AI
  • How to load data to the database
  • How to use Vertex AI embedding model in AlloyDB
  • How to enrich the result using Vertex AI generative model

12. Survey

Output:

How will you use this tutorial?

Only read through it Read it and complete the exercises