Getting started with Vector Embeddings in Cloud SQL for MySQL

Getting started with Vector Embeddings in Cloud SQL for MySQL

About this codelab

subjectLast updated Apr 22, 2025
account_circleWritten by Gleb Otochkin

1. Introduction

In this codelab you will learn how to use Cloud SQL for MySQL Vertex AI integration by combining vector search with Vertex AI embeddings.

30b7c4dcdd8bb68f.png

Prerequisites

  • A basic understanding of Google Cloud, console
  • Basic skills in command line interface and Cloud Shell

What you'll learn

  • How to deploy a Cloud SQL for MySQL instance
  • How to create database and enable Cloud SQL AI integration
  • How to load data to the database
  • How to use Vertex AI embedding model in Cloud SQL
  • How to enrich the result using Vertex AI generative model
  • How to improve performance using vector index

What you'll need

  • A Google Cloud Account and Google Cloud Project
  • A web browser such as Chrome supporting Google Cloud console and Cloud Shell

2. Setup and Requirements

Self-paced environment setup

  1. Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

fbef9caa1602edd0.png

a99b7ace416376c4.png

5e3ff691252acf41.png

  • The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
  • The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
  • For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
  1. Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Google Cloud Shell, a command line environment running in the Cloud.

From the Google Cloud Console, click the Cloud Shell icon on the top right toolbar:

55efc1aaa7a4d3ad.png

It should only take a few moments to provision and connect to the environment. When it is finished, you should see something like this:

7ffe5cbb04455448.png

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on Google Cloud, greatly enhancing network performance and authentication. All of your work in this codelab can be done within a browser. You do not need to install anything.

3. Before you begin

Enable API

Inside Cloud Shell, make sure that your project ID is setup:

gcloud config set project [YOUR-PROJECT-ID]

Set environment variable PROJECT_ID:

PROJECT_ID=$(gcloud config get-value project)

Enable all necessary services:

gcloud services enable sqladmin.googleapis.com \
                       compute
.googleapis.com \
                       cloudresourcemanager
.googleapis.com \
                       servicenetworking
.googleapis.com \
                       aiplatform
.googleapis.com

Expected output

student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-14650]
student@cloudshell:~ (test-project-001-402417)$ 
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable sqladmin.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.

4. Create a Cloud SQL instance

Create Cloud SQL instance with database integration with Vertex AI.

Create Database password

Define password for the default database user. You can define your own password or use a random function to generate one:

export CLOUDSQL_PASSWORD=`openssl rand -hex 12`

Note the generated value for the password:

echo $CLOUDSQL_PASSWORD

Create Cloud SQL for MySQL Instance

The cloudsql_vector flag can be enabled when creating an instance. Vector support is currently available for MySQL 8.0 R20241208.01_00 or newer

In the Cloud Shell session execute:

gcloud sql instances create my-cloudsql-instance \
--database-version=MYSQL_8_4 \
--tier=db-custom-2-8192 \
--region=us-central1 \
--enable-google-ml-integration \
--edition=ENTERPRISE \
--root-password=$CLOUDSQL_PASSWORD

We can verify our connection executing from the Cloud Shell

gcloud sql connect my-cloudsql-instance --user=root

Run the command and put your password in the prompt when it is ready to connect.

The expected output:

$gcloud sql connect my-cloudsql-instance --user=root
Allowlisting your IP for incoming connection for 5 minutes...done.                                                                                                                           
Connecting to database with SQL user [root].Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 71
Server version: 8.4.4-google (Google)

Copyright (c) 2000, 2025, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Enable Vertex AI Integration

Grant necessary privileges to the internal cloud sql service account to be able to use Vertex AI integration.

Find out Cloud SQL internal service account email and export it as a variable.

SERVICE_ACCOUNT_EMAIL=$(gcloud sql instances describe my-cloudsql-instance --format="value(serviceAccountEmailAddress)")
echo $SERVICE_ACCOUNT_EMAIL

Grant access to Vertex AI to the Cloud SQL service account:

gcloud projects add-iam-policy-binding $PROJECT_ID \
 
--member="serviceAccount:$SERVICE_ACCOUNT_EMAIL" \
 
--role="roles/aiplatform.user"

Read more about instance creation and configuration in the Cloud SQL documentation here.

5. Prepare Database

Now we need to create a database and enable vectors support.

Create Database

Create a database with the name quickstart_db .To do so we have different options such as command line database clients such as mysql for mySQL, SDK or Cloud SQL Studio. We will use the SDK (gcloud) for creating the database.

In the Cloud Shell execute command to create the database

gcloud sql databases create quickstart_db --instance=my-cloudsql-instance

6. Load Data

Now we need to create objects in the database and load data. We are going to use fictional Cymbal Store data. The data are available in SQL (for schema) and CSV format (for data).

Cloud Shell will be our main environment to connect to a database, create all the objects and load the data.

First we need to add our Cloud Shell public IP to the list of authorized networks for our Cloud SQL instance. In the cloud shell execute:

gcloud sql instances patch my-cloudsql-instance --authorized-networks=$(curl ifconfig.me)

If you session was lost, reset or you work from another tool then export your CLOUDSQL_PASSWORD variable again:

export CLOUDSQL_PASSWORD=...your password defined for the instance...

Now we can create all required objects in our database. To do so we are going to use the MySQL mysql utility in combination with curl utility which gets the data from the public source.

In the cloud shell execute:

export INSTANCE_IP=$(gcloud sql instances describe my-cloudsql-instance --format="value(ipAddresses.ipAddress)")
curl -LJ https://raw.githubusercontent.com/GoogleCloudPlatform/devrel-demos/main/infrastructure/cymbal-store-embeddings/cymbal_mysql_schema.sql | mysql --host=$INSTANCE_IP --user=root --password=$CLOUDSQL_PASSWORD quickstart_db

What did we do exactly in the previous command? We connected to our database and executed the downloaded SQL code which created tables, indexes and sequences.

The next step is to load the cymbal_products data. We use the same curl and mysql utilities.

curl -LJ https://raw.githubusercontent.com/GoogleCloudPlatform/devrel-demos/main/infrastructure/cymbal-store-embeddings/cymbal_products.csv | mysql --enable-local-infile --host=$INSTANCE_IP --user=root --password=$CLOUDSQL_PASSWORD quickstart_db -e "LOAD DATA LOCAL INFILE '/dev/stdin'  INTO TABLE cymbal_products FIELDS TERMINATED BY ','  OPTIONALLY ENCLOSED BY '\"'  LINES TERMINATED BY '\n'  IGNORE 1 LINES;"

Then we continue with cymbal_stores.

curl -LJ https://raw.githubusercontent.com/GoogleCloudPlatform/devrel-demos/main/infrastructure/cymbal-store-embeddings/cymbal_stores.csv | mysql --enable-local-infile --host=$INSTANCE_IP --user=root --password=$CLOUDSQL_PASSWORD quickstart_db -e "LOAD DATA LOCAL INFILE '/dev/stdin'  INTO TABLE cymbal_stores FIELDS TERMINATED BY ','  OPTIONALLY ENCLOSED BY '\"'  LINES TERMINATED BY '\n'  IGNORE 1 LINES;"

And complete with cymbal_inventory which has the number of each product in each store.

curl -LJ https://raw.githubusercontent.com/GoogleCloudPlatform/devrel-demos/main/infrastructure/cymbal-store-embeddings/cymbal_inventory.csv | mysql --enable-local-infile --host=$INSTANCE_IP --user=root --password=$CLOUDSQL_PASSWORD quickstart_db -e "LOAD DATA LOCAL INFILE '/dev/stdin'  INTO TABLE cymbal_inventory FIELDS TERMINATED BY ','  OPTIONALLY ENCLOSED BY '\"'  LINES TERMINATED BY '\n'  IGNORE 1 LINES;"

If you have your own sample data and your CSV files compatible with the Cloud SQL import tool available from the Cloud console you can use it instead of the presented approach.

7. Create Embeddings

The next step is to build embeddings for our product descriptions using the textembedding-005 model from Google Vertex AI and store them in the new column in the table cymbal_products.

To store the vector data we need to enable vector functionality in our Cloud SQL instance. Execute in the Cloud Shell:

gcloud sql instances patch my-cloudsql-instance \
--database-flags=cloudsql_vector=on

Connect to the database:

mysql --host=$INSTANCE_IP --user=root --password=$CLOUDSQL_PASSWORD quickstart_db

And create a virtual column embedding in our cymbal_products table using the embedding function.

ALTER TABLE cymbal_products ADD COLUMN embedding vector(768) using varbinary;
UPDATE cymbal_products SET embedding = mysql.ml_embedding('text-embedding-005', product_description);

Generating vector embeddings for 2000 rows usually takes under 5 minutes, but could sometimes take slightly longer, and often finishes much faster.

8. Run Similarity Search

We can now run our search using similarity search based on vector values calculated for the descriptions and the vector value we generate for our request using the same embedding model.

The SQL query can be executed from the same command line interface or, as alternative, from Cloud SQL Studio. Any multirow and complex query is better to manage in the Cloud SQL Studio.

Create a User

We need a new user which can use the Cloud SQL Studio. We are going to create a built-in type user student with the same password as we have used for the root user.

In the Cloud Shell execute:

gcloud sql users create student  --instance=my-cloudsql-instance --password=$CLOUDSQL_PASSWORD --host=%

Start Cloud SQL Studio

In the console click on the Cloud SQL instance we've created earlier.

667b658dbf98eb0b.png

When it is open on the right panel we can see Cloud SQL Studio. Click on it.

a879e8ac914a8ce9.png

It will open a dialog where you provide database name and your credentials:

  • Database: quickstart_db
  • User: student
  • Password: your noted password for the user

And click on the "AUTHENTICATE" button.

36e6036847333d18.png

It will open the next window where you click on the "Editor" tab on the right side to open the SQL Editor.

d803b7b6a798094f.png

Now we are ready to run our queries.

Run Query

Run a query to get a list of available products most closely related to a client's request. The request we are going to pass to Vertex AI to get the vector value sounds like "What kind of fruit trees grow well here?"

Here is the query you can run to choose first 5 items most suitable for our request using cosine_distance function:

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cosine_distance(cp.embedding ,@query_vector) as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;

Copy and paste the query to the Cloud SQL Studio editor and push the "RUN" button or paste it in your command line session connecting to the quickstart_db database.

dffc70835901cf03.png

And here is a list of products chosen matching the query.

+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| product_name    | description                                                                      | sale_price | zip_code | distance            |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| Malus Domestica | Malus Domestica, the classic apple tree, brings beauty and delicious fruit to yo |     100.00 |    93230 | 0.37740096545831603 |
| Cerasus         | Cerasus: A beautiful cherry tree that brings delicious fruit and vibrant color t |      75.00 |    93230 |   0.405704177142419 |
| Persica         | Persica: Enjoy homegrown, delicious peaches with this beautiful peach tree. Reac |     150.00 |    93230 | 0.41031799106722877 |
| Meyer Lemon     | Grow your own juicy Meyer Lemons with this semi-dwarf tree, California's favorit |      34.00 |    93230 | 0.42823360959352186 |
| Acer            | Acer, the classic maple. Known for vibrant fall foliage in reds, oranges, and ye |     100.00 |    93230 | 0.42953897057301615 |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
5 rows in set (0.13 sec)

The query execution took 0.13 sec with the cosine_distance function.

Now we run the same query but using KNN search using approx_distance function. If we don't have an ANN index for our embeddings it automatically reverts to exact search behind the scenes:

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        approx_distance(cp.embedding ,@query_vector, 'distance_measure=cosine') as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;

And here is a list of products returned by the query.

+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| product_name    | description                                                                      | sale_price | zip_code | distance            |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| Malus Domestica | Malus Domestica, the classic apple tree, brings beauty and delicious fruit to yo |     100.00 |    93230 | 0.37740096545831603 |
| Cerasus         | Cerasus: A beautiful cherry tree that brings delicious fruit and vibrant color t |      75.00 |    93230 |   0.405704177142419 |
| Persica         | Persica: Enjoy homegrown, delicious peaches with this beautiful peach tree. Reac |     150.00 |    93230 | 0.41031799106722877 |
| Meyer Lemon     | Grow your own juicy Meyer Lemons with this semi-dwarf tree, California's favorit |      34.00 |    93230 | 0.42823360959352186 |
| Acer            | Acer, the classic maple. Known for vibrant fall foliage in reds, oranges, and ye |     100.00 |    93230 | 0.42953897057301615 |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
5 rows in set, 1 warning (0.12 sec)

The query execution took only 0.12 of a second. We got the same results as for cosine_distance function.

9. Improve LLM Response Using Retrieved Data

We can improve the Gen AI LLM response to a client application using the result of the executed query and prepare a meaningful output using the supplied query results as part of the prompt to a Vertex AI generative foundation language model.

To achieve that we need to generate a JSON with our results from the vector search, then use that generated JSON as addition to a prompt for a LLM model in Vertex AI to create a meaningful output. In the first step we generate the JSON, then we test it in the Vertex AI Studio and in the last step we incorporate it into a SQL statement which can be used in an application.

Generate output in JSON format

Modify the query to generate the output in JSON format and return only one row to pass to Vertex AI

Here is the example of the query using ANN search:

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
WITH trees as (
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id as product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        (approx_distance(cp.embedding ,@query_vector, 'distance_measure=cosine')) ASC
LIMIT 1)
SELECT json_arrayagg(json_object('product_name',product_name,'description',description,'sale_price',sale_price,'zip_code',zip_code,'product_id',product_id)) FROM trees;

And here is the expected JSON in the output:

[{"zip_code": 93230, "product_id": "23e41a71d63d8bbc9bdfa1d118cfddc5", "sale_price": 100.00, "description": "Malus Domestica, the classic apple tree, brings beauty and delicious fruit to yo", "product_name": "Malus Domestica"}]

Run the prompt in Vertex AI Studio

We can use the generated JSON to supply it as a part of the prompt to generative AI text model in the Vertex AI Studio

Open the Vertex AI Studio Prompt in the cloud console.

411ffb9d164ac140.png

It may ask you to enable additional APIs but you can ignore the request. We don't need any additional APIs to finish our lab.

Here is the prompt we are going to use:

You are a friendly advisor helping to find a product based on the customer's needs.
Based on the client request we have loaded a list of products closely related to search.
The list in JSON format with list of values like {"product_name":"name","description":"some description","sale_price":10,"zip_code": 10234, "produt_id": "02056727942aeb714dc9a2313654e1b0"}
Here is the list of products:
[place your JSON here]
The customer asked "What tree is growing the best here?"
You should give information about the product, price and some supplemental information.
Do not ask any additional questions and assume location based on the zip code provided in the list of products.

And here how it looks when we replace the JSON placeholder by the response from the query:

You are a friendly advisor helping to find a product based on the customer's needs.
Based on the client request we have loaded a list of products closely related to search.
The list in JSON format with list of values like {"product_name":"name","description":"some description","sale_price":10,"zip_code": 10234, "produt_id": "02056727942aeb714dc9a2313654e1b0"}
Here is the list of products:
{"zip_code": 93230, "product_id": "23e41a71d63d8bbc9bdfa1d118cfddc5", "sale_price": 100.00, "description": "Malus Domestica, the classic apple tree, brings beauty and delicious fruit to yo", "product_name": "Malus Domestica"}
The customer asked "What tree is growing the best here?"
You should give information about the product, price and some supplemental information.
Do not ask any additional questions and assume location based on the zip code provided in the list of products.

And here is the result when we run the prompt with our JSON values and using the gemini-2.0-flash model:

9839af512686130d.png

The answer we got from the model in this example follows using the results of the semantic search and best matched product available in the mentioned zip code.

Run the prompt in PSQL

We can also use the Cloud SQL AI integration with Vertex AI to get the similar response from a generative model using SQL directly in the database.

Now we can use the generated in a subquery with JSON results to supply it as a part of the prompt to generative AI text model using SQL.

In the mysql or Cloud SQL Studio session to the database run the query

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
WITH trees AS (
SELECT
        cp.product_name,
        cp.product_description AS description,
        cp.sale_price,
        cs.zip_code,
        cp.uniq_id AS product_id
FROM
        cymbal_products cp
JOIN cymbal_inventory ci ON
        ci.uniq_id = cp.uniq_id
JOIN cymbal_stores cs ON
        cs.store_id = ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
         (approx_distance(cp.embedding ,@query_vector, 'distance_measure=cosine')) ASC
LIMIT 1),
prompt AS (
SELECT
       CONCAT( 'You are a friendly advisor helping to find a product based on the customer''s needs.
Based on the client request we have loaded a list of products closely related to search.
The list in JSON format with list of values like {"product_name":"name","product_description":"some description","sale_price":10}
Here is the list of products:', json_arrayagg(json_object('product_name',trees.product_name,'description',trees.description,'sale_price',trees.sale_price,'zip_code',trees.zip_code,'product_id',trees.product_id)) , 'The customer asked "What kind of fruit trees grow well here?"
You should give information about the product, price and some supplemental information') AS prompt_text
FROM
        trees),
response AS (
SELECT
       mysql.ML_PREDICT_ROW('publishers/google/models/gemini-2.0-flash-001:generateContent',
        json_object('contents',
        json_object('role',
        'user',
        'parts',
        json_array(
        json_object('text',
        prompt_text))))) AS resp
FROM
        prompt)
SELECT
JSON_EXTRACT(resp, '$.candidates[0].content.parts[0].text')
FROM
        response;

And here is the sample output. Your output might be different depending on the model version and parameters.:

"Okay, I see you're looking for fruit trees that grow well in your area. Based on the available product, the **Malus Domestica** (Apple Tree) is a great option to consider!\n\n* **Product:** Malus Domestica (Apple Tree)\n* **Description:** This classic apple tree grows to about 30 feet tall and provides beautiful seasonal color with green leaves in summer and fiery colors in the fall. It's known for its strength and provides good shade. Most importantly, it produces delicious apples!\n* **Price:** \\$100.00\n* **Growing Zones:** This particular apple tree is well-suited for USDA zones 4-8. Since your zip code is 93230, you are likely in USDA zone 9a or 9b. While this specific tree is rated for zones 4-8, with proper care and variety selection, apple trees can still thrive in slightly warmer climates. You may need to provide extra care during heat waves.\n\n**Recommendation:** I would recommend investigating varieties of Malus Domestica suited to slightly warmer climates or contacting a local nursery/arborist to verify if it is a good fit for your local climate conditions.\n"

The output is provided in markdown format.

10. Create a nearest-neighbor index

Our dataset is relatively small and the response time primarily depends on interactions with AI models. But when you have millions of vectors the vector search can take a significant portion of our response time and put a high load on the system. To improve that we can build an index on top of our vectors.

Create ScANN index

We are going to try the ScANN index type for our test.

To build the index for our embedding column we need to define our distance measurement for the embedding column. You can read about the parameters in detail in the documentation.

CREATE VECTOR INDEX cymbal_products_embedding_idx ON cymbal_products(embedding) USING SCANN DISTANCE_MEASURE=COSINE;

Compare Response

Now we can run the vector search query again and see the results

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        approx_distance(cp.embedding ,@query_vector, 'distance_measure=cosine') as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;

Expected output:

+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| product_name    | description                                                                      | sale_price | zip_code | distance            |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
| Malus Domestica | Malus Domestica, the classic apple tree, brings beauty and delicious fruit to yo |     100.00 |    93230 | 0.37740096545831603 |
| Cerasus         | Cerasus: A beautiful cherry tree that brings delicious fruit and vibrant color t |      75.00 |    93230 |   0.405704177142419 |
| Persica         | Persica: Enjoy homegrown, delicious peaches with this beautiful peach tree. Reac |     150.00 |    93230 | 0.41031799106722877 |
| Meyer Lemon     | Grow your own juicy Meyer Lemons with this semi-dwarf tree, California's favorit |      34.00 |    93230 | 0.42823360959352186 |
| Acer            | Acer, the classic maple. Known for vibrant fall foliage in reds, oranges, and ye |     100.00 |    93230 | 0.42953897057301615 |
+-----------------+----------------------------------------------------------------------------------+------------+----------+---------------------+
5 rows in set (0.08 sec)

We can see that the execution time was only slightly different but it is expected for such a small dataset. It should be much more noticeable for big datasets with millions of vectors.

And we can have a look to the execution plan using EXPLAIN command:

SELECT mysql.ML_EMBEDDING('text-embedding-005','What kind of fruit trees grow well here?') into @query_vector;
EXPLAIN ANALYZE SELECT
        cp.product_name,
        left(cp.product_description,80) as description,
        cp.sale_price,
        cs.zip_code,
        approx_distance(cp.embedding ,@query_vector, 'distance_measure=cosine') as distance
FROM
        cymbal_products cp
JOIN cymbal_inventory ci on
        ci.uniq_id=cp.uniq_id
JOIN cymbal_stores cs on
        cs.store_id=ci.store_id
        AND ci.inventory>0
        AND cs.store_id = 1583
ORDER BY
        distance ASC
LIMIT 5;

Execution plan (excerpt):

...
-> Nested loop inner join  (cost=443 rows=5) (actual time=1.14..1.18 rows=5 loops=1)
                                -> Vector index scan on cp  (cost=441 rows=5) (actual time=1.1..1.1 rows=5 loops=1)
                                -> Single-row index lookup on cp using PRIMARY (uniq_id=cp.uniq_id)  (cost=0.25 rows=1) (actual time=0.0152..0.0152 rows=1 loops=5)

...

We can see that it was using vector index scan on cp (alias for the cymbal_products table).

You can experiment with your own data or test different search queries to see how the semantic search works in MySQL.

11. Clean up environment

Delete the Cloud SQL instance

Destroy the Cloud SQL instance when you are done with the lab

In the cloud shell define the project and environment variables if you've been disconnected and all the previous settings are lost:

export INSTANCE_NAME=my-cloudsql-instance
export PROJECT_ID=$(gcloud config get-value project)

Delete the instance:

gcloud sql instances delete $INSTANCE_NAME --project=$PROJECT_ID

Expected console output:

student@cloudshell:~$ gcloud sql instances delete $INSTANCE_NAME --project=$PROJECT_ID
All of the instance data will be lost when the instance is deleted.

Do you want to continue (Y/n)?  y

Deleting Cloud SQL instance...done.                                                                                                                
Deleted [https://sandbox.googleapis.com/v1beta4/projects/test-project-001-402417/instances/my-cloudsql-instance].

12. Congratulations

Congratulations for completing the codelab.

What we've covered

  • How to deploy a Cloud SQL for MySQL instance
  • How to create database and enable Cloud SQL AI integration
  • How to load data to the database
  • How to use Vertex AI embedding model in Cloud SQL
  • How to enrich the result using Vertex AI generative model
  • How to improve performance using vector index

Try similar codelab for AlloyDB or a codelab for Cloud SQL for Postgres

13. Survey

Output:

How will you use this tutorial?