AlloyDB AI 操作器和重新排名

1. 简介

本 Codelab 提供了有关如何部署具有 AI 运算符的 AlloyDB 并利用它们执行语义搜索、联接和结果排名等任务的指南。

前提条件

  • 对 Google Cloud 控制台有基本的了解
  • 具备命令行界面和 Cloud Shell 方面的基本技能

学习内容

  • 如何部署 AlloyDB for Postgres
  • 如何启用 AlloyDB AI 运算符
  • 如何使用不同的 AlloyDB AI 运算符
  • 如何在 AlloyDB AI 运算符中使用重新排名来改进结果输出

所需条件

  • Google Cloud 账号和 Google Cloud 项目
  • 支持 Google Cloud 控制台和 Cloud Shell 的网络浏览器,例如 Chrome

2. 设置和要求

自定进度的环境设置

  1. 登录 Google Cloud 控制台,然后创建一个新项目或重复使用现有项目。如果您还没有 Gmail 或 Google Workspace 账号,则必须创建一个

fbef9caa1602edd0.png

a99b7ace416376c4.png

5e3ff691252acf41.png

  • 项目名称是此项目参与者的显示名称。它是 Google API 尚未使用的字符串。您可以随时对其进行更新。
  • 项目 ID 在所有 Google Cloud 项目中是唯一的,并且是不可变的(一经设置便无法更改)。Cloud 控制台会自动生成一个唯一字符串;通常情况下,您无需关注该字符串。在大多数 Codelab 中,您都需要引用项目 ID(通常用 PROJECT_ID 标识)。如果您不喜欢生成的 ID,可以再随机生成一个 ID。或者,您也可以尝试自己的项目 ID,看看是否可用。完成此步骤后便无法更改该 ID,并且此 ID 在项目期间会一直保留。
  • 此外,还有第三个值,即部分 API 使用的项目编号,供您参考。如需详细了解所有这三个值,请参阅文档
  1. 接下来,您需要在 Cloud 控制台中启用结算功能,以便使用 Cloud 资源/API。运行此 Codelab 应该不会产生太多的费用(如果有的话)。若要关闭资源以避免产生超出本教程范围的结算费用,您可以删除自己创建的资源或删除项目。Google Cloud 新用户符合参与 300 美元免费试用计划的条件。

启动 Cloud Shell

虽然可以通过笔记本电脑对 Google Cloud 进行远程操作,但在此 Codelab 中,您将使用 Google Cloud Shell,这是一个在云端运行的命令行环境。

Google Cloud 控制台 中,点击右上角工具栏中的 Cloud Shell 图标:

55efc1aaa7a4d3ad.png

预配和连接到环境应该只需要片刻时间。完成后,您应该会看到如下内容:

7ffe5cbb04455448.png

这个虚拟机已加载了您需要的所有开发工具。它提供了一个持久的 5 GB 主目录,并且在 Google Cloud 中运行,大大增强了网络性能和身份验证功能。您在此 Codelab 中的所有工作都可以在浏览器中完成。您无需安装任何程序。

3. 准备工作

启用 API

在 Cloud Shell 中,确保项目 ID 已设置:

gcloud config set project [YOUR-PROJECT-ID]

设置环境变量 PROJECT_ID:

PROJECT_ID=$(gcloud config get-value project)

启用所有必要的服务:

gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com \
                       discoveryengine.googleapis.com

预期输出

student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417
Updated property [core/project].
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-14650]
student@cloudshell:~ (test-project-001-402417)$ 
student@cloudshell:~ (test-project-001-402417)$ gcloud services enable alloydb.googleapis.com \
                       compute.googleapis.com \
                       cloudresourcemanager.googleapis.com \
                       servicenetworking.googleapis.com \
                       aiplatform.googleapis.com
Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.

4. 部署 AlloyDB

创建 AlloyDB 集群和主实例。以下步骤介绍了如何使用 Google Cloud SDK 创建 AlloyDB 集群和实例。如果您偏好使用控制台方法,可以点击此处查看相关文档。

在创建 AlloyDB 集群之前,我们需要在 VPC 中分配一个可用的专用 IP 范围,以供未来的 AlloyDB 实例使用。如果我们没有该服务账号,则需要创建该服务账号,并将其分配给内部 Google 服务使用,之后我们才能创建集群和实例。

创建专用 IP 范围

我们需要在 VPC 中为 AlloyDB 配置专用服务访问配置。这里假设我们的项目中有“默认”VPC 网络,它将用于所有操作。

创建专用 IP 范围:

gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default

使用分配的 IP 范围创建专用连接:

gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default

预期的控制台输出:

student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \
    --global \
    --purpose=VPC_PEERING \
    --prefix-length=24 \
    --description="VPC private service access" \
    --network=default
Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range].

student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \
    --service=servicenetworking.googleapis.com \
    --ranges=psa-range \
    --network=default
Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully.

student@cloudshell:~ (test-project-402417)$

创建 AlloyDB 集群

在本部分中,我们将在 us-central1 区域中创建一个 AlloyDB 集群。

为 postgres 用户定义密码。您可以自行定义密码,也可以使用随机函数生成密码

export PGPASSWORD=`openssl rand -hex 12`

预期的控制台输出:

student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`

请记下该 PostgreSQL 密码,以备将来使用。

echo $PGPASSWORD

您日后需要使用该密码以 postgres 用户身份连接到实例。建议您将其写下来或复制到某个地方,以便日后使用。

预期的控制台输出:

student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD
bbefbfde7601985b0dee5723

创建免费试用集群

如果您之前未使用过 AlloyDB,可以创建一个免费试用集群:

定义区域和 AlloyDB 集群名称。我们将使用 us-central1 区域,并将 alloydb-aip-01 作为集群名称:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

运行命令以创建集群:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL

预期的控制台输出:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION \
    --subscription-type=TRIAL
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.                                                                                                                                                                                                                                                           

在同一 Cloud Shell 会话中为集群创建 AlloyDB 主实例。如果您断开连接,则需要再次定义区域和集群名称环境变量。

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

预期的控制台输出:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=8 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.                                                                                                                                                                                                                                                     

创建 AlloyDB Standard 集群

如果这不是您在项目中创建的第一个 AlloyDB 集群,请继续创建标准集群。

定义区域和 AlloyDB 集群名称。我们将使用 us-central1 区域,并将 alloydb-aip-01 作为集群名称:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01

运行命令以创建集群:

gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION

预期的控制台输出:

export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
gcloud alloydb clusters create $ADBCLUSTER \
    --password=$PGPASSWORD \
    --network=default \
    --region=$REGION 
Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4
Creating cluster...done.                                                                                                                                                                                                                                                           

在同一 Cloud Shell 会话中为集群创建 AlloyDB 主实例。如果您断开连接,则需要再次定义区域和集群名称环境变量。

gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --cluster=$ADBCLUSTER

预期的控制台输出:

student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \
    --instance-type=PRIMARY \
    --cpu-count=2 \
    --region=$REGION \
    --availability-type ZONAL \
    --cluster=$ADBCLUSTER
Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721
Creating instance...done.                                                                                                                                                                                                                                                     

5. 准备数据库

我们需要创建数据库、启用 Vertex AI 集成、创建数据库对象并导入数据。

向 AlloyDB 授予必要权限

向 AlloyDB 服务代理添加 Vertex AI 权限。

使用顶部的“+”号打开另一个 Cloud Shell 标签页。

4ca978f5142bb6ce.png

在新的 Cloud Shell 标签页中,执行以下命令:

PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/discoveryengine.viewer"

预期的控制台输出:

student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project)
Your active configuration is: [cloudshell-11039]
student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
  --role="roles/aiplatform.user"
Updated IAM policy for project [test-project-001-402417].
bindings:
- members:
  - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com
  role: roles/aiplatform.user
- members:
...
etag: BwYIEbe_Z3U=
version: 1
 

在标签页中执行命令“exit”,关闭该标签页:

exit

连接到 AlloyDB Studio

在以下章节中,所有需要连接到数据库的 SQL 命令都可以通过 AlloyDB Studio 执行。如需运行该命令,您需要点击主实例,打开 AlloyDB 集群的 Web 控制台界面。

ef4bfbcf0ed2ef3a.png

然后点击左侧的 AlloyDB Studio:

5c155cbcd7d43a1.png

选择 postgres 数据库、用户 postgres,并提供我们在创建集群时记下的密码。然后点击“验证”按钮。

1c9dab73c6836798.png

系统会打开 AlloyDB Studio 界面。如需在数据库中运行命令,请点击右侧的“编辑器 1”标签页。

b36c28f8165119ca.png

系统会打开一个界面,您可以在其中运行 SQL 命令

cf43aa20f292797e.png

创建数据库

创建数据库快速入门。

在 AlloyDB Studio 编辑器中,执行以下命令。

创建数据库:

CREATE DATABASE quickstart_db

预期输出:

Statement executed successfully

连接到 quickstart_db

使用切换用户/数据库按钮重新连接到工作室。

e826ad973eb23a74.png

从下拉列表中选择新的 quickstart_db 数据库,并使用与之前相同的用户和密码。

1ca70c59b5aea8c1.png

系统将打开一个新连接,您可以在其中处理 quickstart_db 数据库中的对象。

验证 google_ml 扩展程序

检查 google_ml 扩展程序版本,确保其为 1.4.4 或更高版本,以便能够使用 AI 查询引擎。

在 AlloyDB Studio 中,连接到 quickstart_db 后执行以下操作:

SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';

预期输出:

1.4.4

如果版本低于所需版本,请更新扩展程序。

在 AlloyDB Studio 中,连接到 quickstart_db 后执行以下操作:

CALL google_ml.upgrade_to_preview_version();

预期输出:

Statement executed successfully

成功执行后,再次验证版本。

在 AlloyDB Studio 中,连接到 quickstart_db 后执行以下操作:

SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';

预期输出:

1.4.4

6. 示例数据

现在,我们需要在数据库中创建对象并加载数据。我们将使用一个包含几行的虚构电影数据集。

将以下语句复制到 AlloyDB Studio 编辑器,然后按“运行”按钮。

-- Drop tables if they exist to prevent errors on re-running the script
DROP TABLE IF EXISTS movie_reviews;
DROP TABLE IF EXISTS movies;

-- Create the 'movies' table
-- This table stores information about each movie.
CREATE TABLE movies (
    id BIGINT PRIMARY KEY,              -- Unique identifier for the movie
    title TEXT NOT NULL,               -- Title of the movie
    description TEXT,                  -- A brief description or synopsis of the movie
    genres TEXT,                       -- Comma-separated list of genres (e.g., "Action, Adventure, Sci-Fi")
    actors TEXT                        -- Comma-separated list of main actors
);

-- Create the 'movie_reviews' table
-- This table stores reviews for the movies.
CREATE TABLE movie_reviews (
    review_id BIGINT PRIMARY KEY,      -- Unique identifier for the review
    movie_id BIGINT NOT NULL,          -- Foreign key referencing the movie being reviewed
    reviewer_name TEXT,                -- Name of the person who wrote the review
    rating INT CHECK (rating >= 1 AND rating <= 5), -- Rating from 1 to 5 stars
    review_text TEXT,                  -- The content of the review
    review_date DATE DEFAULT CURRENT_DATE, -- Date when the review was submitted
    FOREIGN KEY (movie_id) REFERENCES movies(id) ON DELETE CASCADE -- Ensures referential integrity; if a movie is deleted, its reviews are also deleted.
);

-- Insert sample data into the 'movies' table (20 rows)
INSERT INTO movies (id, title, description, genres, actors) VALUES
(1, 'Inception', 'A thief who steals information by entering people''s dreams.', 'Sci-Fi, Thriller, Action', 'Leonardo DiCaprio, Joseph Gordon-Levitt, Elliot Page'),
(2, 'The Matrix', 'A computer hacker learns about the true nature of his reality.', 'Sci-Fi, Action', 'Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss'),
(3, 'Interstellar', 'A team of explorers journey through a cosmic passage beyond our world in an attempt to ensure humanity''s survival.', 'Sci-Fi, Drama, Adventure', 'Matthew McConaughey, Anne Hathaway, Jessica Chastain'), -- Updated description
(4, 'The Dark Knight', 'When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.', 'Action, Crime, Drama', 'Christian Bale, Heath Ledger, Aaron Eckhart'),
(5, 'Pulp Fiction', 'The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption.', 'Crime, Drama', 'John Travolta, Uma Thurman, Samuel L. Jackson'),
(6, 'Forrest Gump', 'The presidencies of Kennedy and Johnson, the Vietnam War, the Watergate scandal and other historical events unfold from the perspective of an Alabama man with an IQ of 75.', 'Drama, Romance', 'Tom Hanks, Robin Wright, Gary Sinise'),
(7, 'The Shawshank Redemption', 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.', 'Drama', 'Tim Robbins, Morgan Freeman, Bob Gunton'),
(8, 'Gladiator', 'A former Roman General sets out to exact vengeance against the corrupt emperor who murdered his family and sent him into slavery.', 'Action, Adventure, Drama', 'Russell Crowe, Joaquin Phoenix, Connie Nielsen'),
(9, 'Fight Club', 'An insomniac office worker looking for a way to change his life crosses paths with a devil-may-care soap maker and they form an underground fight club that evolves into something much, much more.', 'Drama', 'Brad Pitt, Edward Norton, Meat Loaf'),
(10, 'The Lord of the Rings: The Return of the King', 'Gandalf and Aragorn lead the World of Men against Sauron''s army to draw his gaze from Frodo and Sam as they approach Mount Doom with the One Ring.', 'Action, Adventure, Drama', 'Elijah Wood, Viggo Mortensen, Ian McKellen'),
(11, 'Spirited Away', 'During her family''s move to the suburbs, a sullen 10-year-old girl wanders into a world ruled by gods, witches, and spirits, and where humans are changed into beasts.', 'Animation, Adventure, Family', 'Daveigh Chase, Suzanne Pleshette, Miyu Irino'),
(12, 'Parasite', 'Greed and class discrimination threaten the newly formed symbiotic relationship between the wealthy Park family and the destitute Kim clan.', 'Comedy, Drama, Thriller', 'Song Kang-ho, Lee Sun-kyun, Cho Yeo-jeong'),
(13, 'The Godfather', 'The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son.', 'Crime, Drama', 'Marlon Brando, Al Pacino, James Caan'),
(14, 'Avengers: Endgame', 'After the devastating events of Avengers: Infinity War, the universe is in ruins. With the help of remaining allies, the Avengers assemble once more in order to reverse Thanos'' actions and restore balance to the universe.', 'Action, Adventure, Drama', 'Robert Downey Jr., Chris Evans, Mark Ruffalo'),
(15, 'Joker', 'In Gotham City, mentally troubled comedian Arthur Fleck is disregarded and mistreated by society. He then embarks on a downward spiral of revolution and bloody crime.', 'Crime, Drama, Thriller', 'Joaquin Phoenix, Robert De Niro, Zazie Beetz'),
(16, 'Mad Max: Fury Road', 'In a post-apocalyptic wasteland, a woman rebels against a tyrannical ruler in search for her homeland with the help of a group of female prisoners, a psychotic worshiper, and a drifter named Max.', 'Action, Adventure, Sci-Fi', 'Tom Hardy, Charlize Theron, Nicholas Hoult'),
(17, 'Coco', 'Aspiring musician Miguel, confronted with his family''s ancestral ban on music, enters the Land of the Dead to find his great-great-grandfather, a legendary singer.', 'Animation, Adventure, Family', 'Anthony Gonzalez, Gael García Bernal, Benjamin Bratt'),
(18, 'Whiplash', 'A promising young drummer enrolls at a cut-throat music conservatory where his dreams of greatness are mentored by an instructor who will stop at nothing to realize a student''s potential.', 'Drama, Music', 'Miles Teller, J.K. Simmons, Paul Reiser'),
(19, 'The Grand Budapest Hotel', 'The adventures of Gustave H, a legendary concierge at a famous hotel from the fictional Republic of Zubrowka between the first and second World Wars, and Zero Moustafa, the lobby boy who becomes his most trusted friend.', 'Adventure, Comedy, Drama', 'Ralph Fiennes, F. Murray Abraham, Mathieu Amalric'),
(20, 'Blade Runner 2049', 'Young Blade Runner K''s discovery of a long-buried secret leads him to track down former Blade Runner Rick Deckard, who''s been missing for thirty years.', 'Action, Drama, Mystery', 'Ryan Gosling, Harrison Ford, Ana de Armas');

-- Insert sample data into the 'movie_reviews' table (30 rows)
-- Reviews are linked to movies via movie_id. Includes a mix of positive and negative reviews.
-- Movie title is prepended to the review text.
INSERT INTO movie_reviews (review_id, movie_id, reviewer_name, rating, review_text) VALUES
(1, 1, 'Alice Wonderland', 5, 'Inception: Absolutely mind-bending! A masterpiece of sci-fi.'),
(2, 1, 'Bob The Critic', 2, 'Inception: Too confusing and pretentious. Didn''t enjoy it.'),
(3, 2, 'Charlie Reviewer', 5, 'The Matrix: Revolutionary visuals and a compelling story.'),
(4, 3, 'Diana Prince', 5, 'Interstellar: Visually stunning and emotionally powerful. A must-see.'),
(5, 3, 'Edward Nigma', 4, 'Interstellar: Long, but worth it for the spectacle and ideas.'),
(6, 4, 'Fiona Glenanne', 5, 'The Dark Knight: Heath Ledger''s Joker is iconic. Dark and thrilling.'),
(7, 5, 'George Costanza', 5, 'Pulp Fiction: Quirky, violent, and endlessly quotable.'),
(8, 5, 'Hannah Montana', 1, 'Pulp Fiction: Way too violent and the timeline was confusing. Hated it.'),
(9, 6, 'Ian Malcolm', 4, 'Forrest Gump: A heartwarming story with a great performance by Hanks.'),
(10, 7, 'Jane Doe', 5, 'The Shawshank Redemption: An uplifting story of hope and friendship. Perfect.'),
(11, 7, 'John Smith', 5, 'The Shawshank Redemption: Morgan Freeman is amazing. Truly a classic.'),
(12, 8, 'Kyle Broflovski', 2, 'Gladiator: Generic plot and boring action scenes. Overrated.'),
(13, 9, 'Laura Palmer', 5, 'Fight Club: Provocative and thought-provoking. Norton and Pitt are fantastic.'),
(14, 10, 'Michael Scott', 5, 'The Lord of the Rings: The Return of the King: A fitting and epic conclusion to a legendary trilogy.'),
(15, 11, 'Nancy Drew', 5, 'Spirited Away: Beautiful animation and a magical story for all ages.'),
(16, 12, 'Oscar Martinez', 5, 'Parasite: A brilliant satire with unexpected twists. Loved it!'),
(17, 12, 'Pam Beesly', 4, 'Parasite: Very intense, but incredibly well-directed and acted.'),
(18, 13, 'Quentin Coldwater', 5, 'The Godfather: A cinematic masterpiece. Brando is unforgettable.'),
(19, 14, 'Rachel Green', 3, 'Avengers: Endgame: It was okay, but felt bloated and had too many characters.'),
(20, 14, 'Steve Rogers', 5, 'Avengers: Endgame: The culmination of a decade of storytelling. Perfect ending.'),
(21, 15, 'Tony Stark', 4, 'Joker: A dark and disturbing character study. Phoenix is mesmerizing.'),
(22, 16, 'Uma Thurman', 5, 'Mad Max: Fury Road: Non-stop action and incredible practical effects. What a ride!'),
(23, 17, 'Victor Frankenstein', 5, 'Coco: A heartwarming and visually stunning celebration of family and culture.'),
(24, 18, 'Walter White', 5, 'Whiplash: Intense and gripping. J.K. Simmons is terrifyingly good.'),
(25, 19, 'Xena Warrior', 2, 'The Grand Budapest Hotel: Too quirky for its own good. Style over substance.'),
(26, 20, 'Ygritte Snow', 5, 'Blade Runner 2049: A worthy sequel that expands on the original in meaningful ways. Visually breathtaking.'),
(27, 1, 'Zack Morris', 4, 'Inception: Kept me on the edge of my seat. Very clever.'),
(28, 4, 'Buffy Summers', 5, 'The Dark Knight: The best superhero movie ever made. Ledger is a legend.'),
(29, 8, 'Clark Kent', 3, 'Gladiator: Decent action, but the story felt predictable and dragged a bit.'),
(30, 15, 'Diana Troy', 3, 'Joker: Hard to watch at times, but a powerful performance. Felt it was a bit one-note though.');

如果您有自己的示例数据,并且您的 CSV 文件与 Cloud 控制台中提供的 Cloud SQL 导入工具兼容,则可以使用该工具,而不是采用此处介绍的方法。

7. 使用 IF 运算符

我们先尝试使用标准 PostgreSQL 方法进行经典搜索。

如果我们想搜索太空冒险电影,可以尝试以下查询

SELECT title,description AS movies_about_space
    FROM movies
    WHERE description like '%space%' OR title like '%space%';

未返回任何结果。但我确信我们至少有一部电影属于该类别。我们可以尝试使用全文搜索方法。

SELECT title,description
FROM movies
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'space');

我们可能根本得不到任何结果。因此,我们需要知道一些关键字或词组,才能利用 PostgreSQL 搜索的“经典”技巧。

现在,我们可以尝试使用 google_ml.if 函数进行 AI 驱动的语义过滤。它会在后台使用 AI 根据我们的自然语言请求执行语义过滤。

SELECT title,description AS movies_about_space
    FROM movies
    WHERE
      google_ml.if(
        prompt => 'Here are descriptions of movies, can you return the ones about space adventures:  '||description);

即使请求的标题或说明中没有“太空”一词,我们也会根据请求的语义含义获取电影《星际穿越》。您可能已经注意到,我们也没有预先构建任何内容,而是完全依赖于自动内置函数。

8. 将 JOIN 与 IF 运算符搭配使用

如果我们想使用 AI 驱动的语义过滤功能联接两个表,该怎么办?例如,如果用户评价中提及了某部电影,我们可以尝试根据用户评价将用户与电影进行匹配。

在新 AlloyDB Studio 编辑器标签页中运行:

SELECT title, rating, movie_reviews 
    FROM movies
    JOIN
    movie_reviews ON 
      google_ml.if(
        prompt => 'Does the following reviews talk about a movie mentioned? The review: ' || review_text||' and the movie title is: '||title)
    AND
    title='Interstellar';

我们获得了两篇与我们的要求相符的影评,因为标题中提到了电影名称。我们甚至可以进一步简化请求:

SELECT title, rating, movie_reviews 
    FROM movies
    JOIN
    movie_reviews ON 
      google_ml.if(
        prompt => 'Do we have the movie in the review?: ' || review_text||' and the movie title is: '||title)
    AND
    title='Interstellar';

9. 根据内容对结果进行评分

movie_reviews 表包含电影的评分,但如果我们想实现自己的评分,可以使用 google_ml.rank 函数。我们可以根据自然语言定义的条件对评价进行评分,并获取电影的前 5 条评价,然后显示原始评分以供比较。

SELECT rating,review_text AS top_five
    FROM movie_reviews
    ORDER BY google_ml.rank('Score of 7 to 10 if the review says the movie was really good, 3 to 6 if the review says it''s alright is and 1 to 2 if the review says it was not worth of time. Review: ' || review_text) DESC
    LIMIT 5;

如果我们想在原始评分旁边显示新评分,可以将其添加到列列表中。

SELECT rating,
    google_ml.rank('Score of 7 to 10 if the review says the movie was really good, 3 to 6 if the review says it''s alright is and 1 to 2 if the review says it was not worth of time. Review: ' || review_text) AS ml_rank,
    review_text AS top_five
    FROM movie_reviews
    ORDER BY ml_rank DESC
    LIMIT 5;

以下是前 5 条评价。

 rating | ml_rank |                               top_five
--------+---------+-----------------------------------------------------------------------
      5 |       9 | The Dark Knight: Heath Ledger's Joker is iconic. Dark and thrilling.
      5 |       9 | The Matrix: Revolutionary visuals and a compelling story.
      5 |       9 | Interstellar: Visually stunning and emotionally powerful. A must-see.
      5 |       9 | Inception: Absolutely mind-bending! A masterpiece of sci-fi.
      5 |       9 | Pulp Fiction: Quirky, violent, and endlessly quotable.
(5 rows)
5 rows in set (0.13 sec)

对于热门评价,所选的评分是 10 分中的 9 分。

如需详细了解 AlloyDB AI 运算符,请参阅文档

10. 使用排名改进语义搜索

我们可以将语义搜索与排名相结合,以获得更精确的结果。

对重新排名模型进行访问

如要使用排名模型,我们必须启用 Discovery Engine API 并向 AlloyDB 服务账号授予“discoveryengine.viewer”角色。在实验的前几个步骤中,我们已启用 API 和角色。ai.rank 函数会自动在 Vertex AI 上查找并使用最新的重排名模型。

在查询中使用重新排名模型

现在,我们可以在查询中使用重新排名模型,从而改进语义搜索结果,使其更具体并选择最佳选项。

我们来查找动作片,然后使用“电脑和未来”作为条件对它们进行排名。

WITH
  action_movies AS (
    SELECT
      title,
      description,
      ROW_NUMBER() OVER (ORDER BY title, description) AS ref_number
    FROM
      movies
    WHERE
      google_ml.if(
        prompt => 'The following movies are action movies. The movie title: ' || title || ' and the description is: ' || description
      ) 
  ),
  ranked_documents_array AS (
    SELECT
      ARRAY_AGG(description ORDER BY ref_number) AS docs
    FROM
      action_movies
  ),
  reranked_results AS (
    SELECT
      r.index,
      r.score
    FROM
      ranked_documents_array,
      ai.rank(
        model_id => 'semantic-ranker-default',
        search_string => 'Computers and future',
        documents => ranked_documents_array.docs
      ) AS r 
  )
SELECT
  am.title,
  left(am.description,80) as description,
  rr.score
FROM
  action_movies am
JOIN
  reranked_results rr ON am.ref_number = rr.index 
ORDER BY
  rr.score DESC;

结果将列出动作片,并尽可能将有关未来和计算机的电影放在顶部。

                     title                     |                                   description                                    | score  
-----------------------------------------------+----------------------------------------------------------------------------------+--------
 The Matrix                                    | A computer hacker learns about the true nature of his reality.                   | 0.1197
 Inception                                     | A thief who steals information by entering people's dreams.                      | 0.0646
 Blade Runner 2049                             | Young Blade Runner K's discovery of a long-buried secret leads him to track down |  0.022
 Mad Max: Fury Road                            | In a post-apocalyptic wasteland, a woman rebels against a tyrannical ruler in se | 0.0206
 Gladiator                                     | A former Roman General sets out to exact vengeance against the corrupt emperor w | 0.0189
 Avengers: Endgame                             | After the devastating events of Avengers: Infinity War, the universe is in ruins | 0.0175
 Fight Club                                    | An insomniac office worker looking for a way to change his life crosses paths wi | 0.0162
 The Dark Knight                               | When the menace known as the Joker wreaks havoc and chaos on the people of Gotha | 0.0095
 The Lord of the Rings: The Return of the King | Gandalf and Aragorn lead the World of Men against Sauron's army to draw his gaze | 0.0056
(9 rows)

尝试使用不同的条件,看看排名对输出顺序有何影响。

如需详细了解这些选项和重新排名,请参阅文档

11. 清理环境

完成实验后,销毁 AlloyDB 实例和集群。

如果您曾使用 AlloyDB 试用版。如果您计划使用试用集群测试其他实验和资源,请勿删除试用集群。您将无法在同一项目中创建其他试用集群。

删除 AlloyDB 集群和所有实例

系统会通过强制选项销毁集群,该选项还会删除属于该集群的所有实例。

如果您已断开连接且之前的所有设置都已丢失,请在 Cloud Shell 中定义项目和环境变量:

gcloud config set project <your project id>
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)

删除集群:

gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force

预期的控制台输出:

student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force
All of the cluster data will be lost when the cluster is deleted.

Do you want to continue (Y/n)?  Y

Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f
Deleting cluster...done.   

删除 AlloyDB 备份

删除集群的所有 AlloyDB 备份:

for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done

预期的控制台输出:

student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f
Deleting backup...done.                                                                                                                                                                                                                                                            

12. 恭喜

恭喜您完成此 Codelab。

所学内容

  • 如何部署 AlloyDB for Postgres
  • 如何启用 AlloyDB AI 运算符
  • 如何使用不同的 AlloyDB AI 运算符
  • 如何在 AlloyDB AI 运算符中使用重新排名来改进结果输出

13. 调查问卷

输出如下:

您打算如何使用本教程?

仅通读 阅读并完成练习