关于此 Codelab
1. 简介
本 Codelab 提供了有关如何部署 AlloyDB 和 AI 运算符的指南,以及如何将其用于语义搜索、联接和结果排名等任务。
前提条件
- 对 Google Cloud 控制台有基本的了解
- 具备命令行界面和 Cloud Shell 方面的基本技能
学习内容
- 如何部署 AlloyDB for Postgres
- 如何启用 AlloyDB AI 操作器
- 如何使用不同的 AlloyDB AI 运算符
- 如何在 RAG 工作流中使用 AlloyDB AI 操作器
所需条件
- Google Cloud 账号和 Google Cloud 项目
- 支持 Google Cloud 控制台和 Cloud Shell 的网络浏览器,例如 Chrome
2. 设置和要求
自定进度的环境设置
- 登录 Google Cloud 控制台,然后创建一个新项目或重复使用现有项目。如果您还没有 Gmail 或 Google Workspace 账号,则必须创建一个。
- 项目名称是此项目参与者的显示名称。它是 Google API 尚未使用的字符串。您可以随时对其进行更新。
- 项目 ID 在所有 Google Cloud 项目中是唯一的,并且是不可变的(一经设置便无法更改)。Cloud 控制台会自动生成一个唯一字符串;通常情况下,您无需关注该字符串。在大多数 Codelab 中,您都需要引用项目 ID(通常用
PROJECT_ID
标识)。如果您不喜欢生成的 ID,可以再随机生成一个 ID。或者,您也可以尝试自己的项目 ID,看看是否可用。完成此步骤后便无法更改该 ID,并且此 ID 在项目期间会一直保留。 - 此外,还有第三个值,即部分 API 使用的项目编号,供您参考。如需详细了解所有这三个值,请参阅文档。
- 接下来,您需要在 Cloud 控制台中启用结算功能,以便使用 Cloud 资源/API。运行此 Codelab 应该不会产生太多的费用(如果有的话)。若要关闭资源以避免产生超出本教程范围的结算费用,您可以删除自己创建的资源或删除项目。Google Cloud 新用户符合参与 300 美元免费试用计划的条件。
启动 Cloud Shell
虽然可以通过笔记本电脑对 Google Cloud 进行远程操作,但在此 Codelab 中,您将使用 Google Cloud Shell,这是一个在云端运行的命令行环境。
在 Google Cloud 控制台 中,点击右上角工具栏中的 Cloud Shell 图标:
预配和连接到环境应该只需要片刻时间。完成后,您应该会看到如下内容:
这个虚拟机已加载了您需要的所有开发工具。它提供了一个持久的 5 GB 主目录,并且在 Google Cloud 中运行,大大增强了网络性能和身份验证功能。您在此 Codelab 中的所有工作都可以在浏览器中完成。您无需安装任何程序。
3. 准备工作
启用 API
在 Cloud Shell 中,确保项目 ID 已设置:
gcloud config set project [YOUR-PROJECT-ID]
设置环境变量 PROJECT_ID:
PROJECT_ID=$(gcloud config get-value project)
启用所有必要的服务:
gcloud services enable alloydb.googleapis.com \
compute.googleapis.com \
cloudresourcemanager.googleapis.com \
servicenetworking.googleapis.com \
aiplatform.googleapis.com \
discoveryengine.googleapis.com \
secretmanager.googleapis.com
预期输出
student@cloudshell:~ (test-project-001-402417)$ gcloud config set project test-project-001-402417 Updated property [core/project]. student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project) Your active configuration is: [cloudshell-14650] student@cloudshell:~ (test-project-001-402417)$ student@cloudshell:~ (test-project-001-402417)$ gcloud services enable alloydb.googleapis.com \ compute.googleapis.com \ cloudresourcemanager.googleapis.com \ servicenetworking.googleapis.com \ aiplatform.googleapis.com Operation "operations/acat.p2-4470404856-1f44ebd8-894e-4356-bea7-b84165a57442" finished successfully.
4. 部署 AlloyDB
创建 AlloyDB 集群和主实例。以下步骤介绍了如何使用 Google Cloud SDK 创建 AlloyDB 集群和实例。如果您更喜欢使用控制台方法,可以按照此处的文档操作。
在创建 AlloyDB 集群之前,我们需要在 VPC 中有一个可用的专用 IP 范围,以供未来的 AlloyDB 实例使用。如果没有,我们需要创建一个,并将其分配给 Google 内部服务使用,然后才能创建集群和实例。
创建专用 IP 范围
我们需要在 VPC 中为 AlloyDB 配置专用服务访问配置。这里假设我们的项目中有“默认”VPC 网络,它将用于所有操作。
创建专用 IP 范围:
gcloud compute addresses create psa-range \
--global \
--purpose=VPC_PEERING \
--prefix-length=24 \
--description="VPC private service access" \
--network=default
使用分配的 IP 范围创建专用连接:
gcloud services vpc-peerings connect \
--service=servicenetworking.googleapis.com \
--ranges=psa-range \
--network=default
预期的控制台输出:
student@cloudshell:~ (test-project-402417)$ gcloud compute addresses create psa-range \ --global \ --purpose=VPC_PEERING \ --prefix-length=24 \ --description="VPC private service access" \ --network=default Created [https://www.googleapis.com/compute/v1/projects/test-project-402417/global/addresses/psa-range]. student@cloudshell:~ (test-project-402417)$ gcloud services vpc-peerings connect \ --service=servicenetworking.googleapis.com \ --ranges=psa-range \ --network=default Operation "operations/pssn.p24-4470404856-595e209f-19b7-4669-8a71-cbd45de8ba66" finished successfully. student@cloudshell:~ (test-project-402417)$
创建 AlloyDB 集群
在本部分中,我们将在 us-central1 区域创建一个 AlloyDB 集群。
为 postgres 用户定义密码。您可以自行定义密码,也可以使用随机函数生成密码
export PGPASSWORD=`openssl rand -hex 12`
预期的控制台输出:
student@cloudshell:~ (test-project-402417)$ export PGPASSWORD=`openssl rand -hex 12`
请记下该 PostgreSQL 密码,以备将来使用:
echo $PGPASSWORD
预期的控制台输出:
student@cloudshell:~ (test-project-402417)$ echo $PGPASSWORD bbefbfde7601985b0dee5723
创建免费试用集群
如果您之前从未使用过 AlloyDB,则可以创建免费的试用集群:
定义区域和 AlloyDB 集群名称。我们将使用 us-central1 区域和 alloydb-aip-01 作为集群名称:
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
运行以下命令以创建集群:
gcloud alloydb clusters create $ADBCLUSTER \
--password=$PGPASSWORD \
--network=default \
--region=$REGION \
--subscription-type=TRIAL
预期的控制台输出:
export REGION=us-central1 export ADBCLUSTER=alloydb-aip-01 gcloud alloydb clusters create $ADBCLUSTER \ --password=$PGPASSWORD \ --network=default \ --region=$REGION \ --subscription-type=TRIAL Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4 Creating cluster...done.
在同一 Cloud Shell 会话中,为我们的集群创建 AlloyDB 主实例。如果您断开连接,则需要重新定义区域和集群名称环境变量。
gcloud alloydb instances create $ADBCLUSTER-pr \
--instance-type=PRIMARY \
--cpu-count=8 \
--region=$REGION \
--cluster=$ADBCLUSTER
预期的控制台输出:
student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \ --instance-type=PRIMARY \ --cpu-count=8 \ --region=$REGION \ --availability-type ZONAL \ --cluster=$ADBCLUSTER Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721 Creating instance...done.
创建 AlloyDB Standard 集群
如果这不是项目中第一个 AlloyDB 集群,请继续创建标准集群。
定义区域和 AlloyDB 集群名称。我们将使用 us-central1 区域和 alloydb-aip-01 作为集群名称:
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
运行以下命令以创建集群:
gcloud alloydb clusters create $ADBCLUSTER \
--password=$PGPASSWORD \
--network=default \
--region=$REGION
预期的控制台输出:
export REGION=us-central1 export ADBCLUSTER=alloydb-aip-01 gcloud alloydb clusters create $ADBCLUSTER \ --password=$PGPASSWORD \ --network=default \ --region=$REGION Operation ID: operation-1697655441138-6080235852277-9e7f04f5-2012fce4 Creating cluster...done.
在同一 Cloud Shell 会话中,为我们的集群创建 AlloyDB 主实例。如果您断开连接,则需要重新定义区域和集群名称环境变量。
gcloud alloydb instances create $ADBCLUSTER-pr \
--instance-type=PRIMARY \
--cpu-count=2 \
--region=$REGION \
--cluster=$ADBCLUSTER
预期的控制台输出:
student@cloudshell:~ (test-project-402417)$ gcloud alloydb instances create $ADBCLUSTER-pr \ --instance-type=PRIMARY \ --cpu-count=2 \ --region=$REGION \ --availability-type ZONAL \ --cluster=$ADBCLUSTER Operation ID: operation-1697659203545-6080315c6e8ee-391805db-25852721 Creating instance...done.
5. 准备数据库
我们需要创建数据库、启用 Vertex AI 集成、创建数据库对象并导入数据。
向 AlloyDB 授予必要权限
向 AlloyDB 服务代理添加 Vertex AI 权限。
使用顶部的“+”号打开另一个 Cloud Shell 标签页。
在新的 Cloud Shell 标签页中,执行以下命令:
PROJECT_ID=$(gcloud config get-value project)
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
预期的控制台输出:
student@cloudshell:~ (test-project-001-402417)$ PROJECT_ID=$(gcloud config get-value project) Your active configuration is: [cloudshell-11039] student@cloudshell:~ (test-project-001-402417)$ gcloud projects add-iam-policy-binding $PROJECT_ID \ --member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \ --role="roles/aiplatform.user" Updated IAM policy for project [test-project-001-402417]. bindings: - members: - serviceAccount:service-4470404856@gcp-sa-alloydb.iam.gserviceaccount.com role: roles/aiplatform.user - members: ... etag: BwYIEbe_Z3U= version: 1
在标签页中执行命令“exit”,关闭该标签页:
exit
连接到 AlloyDB Studio
在后续章节中,所有需要连接到数据库的 SQL 命令都可以在 AlloyDB Studio 中执行。如需运行该命令,您需要点击主实例,打开 AlloyDB 集群的 Web 控制台界面。
然后点击左侧的 AlloyDB Studio:
选择 postgres 数据库、用户 postgres,然后提供我们在创建集群时记录的密码。然后点击“身份验证”按钮。
系统随即会打开 AlloyDB Studio 界面。如需在数据库中运行命令,请点击右侧的“Editor 1”标签页。
它会打开一个界面,您可以在其中运行 SQL 命令
创建数据库
创建数据库快速入门。
在 AlloyDB Studio 编辑器中,执行以下命令。
创建数据库:
CREATE DATABASE quickstart_db
预期输出:
Statement executed successfully
连接到 quickstart_db
使用切换用户/数据库的按钮重新连接到工作室。
从下拉列表中选择新的 quickstart_db 数据库,并使用与之前相同的用户名和密码。
系统会打开一个新连接,您可以在其中处理 quickstart_db 数据库中的对象。
验证 google_ml 扩展程序
检查 google_ml 扩展程序版本,确保其为 1.4.4 或更高版本,以便使用 AI 查询引擎。
在 AlloyDB Studio 中,连接到 quickstart_db 后,执行以下命令:
SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';
预期输出:
1.4.4
如果版本低于所需版本,请更新扩展程序。
在 AlloyDB Studio 中,连接到 quickstart_db 后,执行以下命令:
CALL google_ml.upgrade_to_preview_version();
预期输出:
Statement executed successfully
成功执行后,请再次验证版本。
在 AlloyDB Studio 中,连接到 quickstart_db 后,执行以下命令:
SELECT extversion FROM pg_extension WHERE extname = 'google_ml_integration';
预期输出:
1.4.4
6. 示例数据
现在,我们需要在数据库中创建对象并加载数据。我们将使用包含几行数据的虚构电影数据集。
将以下语句复制到 AlloyDB Studio 编辑器,然后按“Run”按钮。
-- Drop tables if they exist to prevent errors on re-running the script
DROP TABLE IF EXISTS movie_reviews;
DROP TABLE IF EXISTS movies;
-- Create the 'movies' table
-- This table stores information about each movie.
CREATE TABLE movies (
id BIGINT PRIMARY KEY, -- Unique identifier for the movie
title TEXT NOT NULL, -- Title of the movie
description TEXT, -- A brief description or synopsis of the movie
genres TEXT, -- Comma-separated list of genres (e.g., "Action, Adventure, Sci-Fi")
actors TEXT -- Comma-separated list of main actors
);
-- Create the 'movie_reviews' table
-- This table stores reviews for the movies.
CREATE TABLE movie_reviews (
review_id BIGINT PRIMARY KEY, -- Unique identifier for the review
movie_id BIGINT NOT NULL, -- Foreign key referencing the movie being reviewed
reviewer_name TEXT, -- Name of the person who wrote the review
rating INT CHECK (rating >= 1 AND rating <= 5), -- Rating from 1 to 5 stars
review_text TEXT, -- The content of the review
review_date DATE DEFAULT CURRENT_DATE, -- Date when the review was submitted
FOREIGN KEY (movie_id) REFERENCES movies(id) ON DELETE CASCADE -- Ensures referential integrity; if a movie is deleted, its reviews are also deleted.
);
-- Insert sample data into the 'movies' table (20 rows)
INSERT INTO movies (id, title, description, genres, actors) VALUES
(1, 'Inception', 'A thief who steals information by entering people''s dreams.', 'Sci-Fi, Thriller, Action', 'Leonardo DiCaprio, Joseph Gordon-Levitt, Elliot Page'),
(2, 'The Matrix', 'A computer hacker learns about the true nature of his reality.', 'Sci-Fi, Action', 'Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss'),
(3, 'Interstellar', 'A team of explorers journey through a cosmic passage beyond our world in an attempt to ensure humanity''s survival.', 'Sci-Fi, Drama, Adventure', 'Matthew McConaughey, Anne Hathaway, Jessica Chastain'), -- Updated description
(4, 'The Dark Knight', 'When the menace known as the Joker wreaks havoc and chaos on the people of Gotham, Batman must accept one of the greatest psychological and physical tests of his ability to fight injustice.', 'Action, Crime, Drama', 'Christian Bale, Heath Ledger, Aaron Eckhart'),
(5, 'Pulp Fiction', 'The lives of two mob hitmen, a boxer, a gangster and his wife, and a pair of diner bandits intertwine in four tales of violence and redemption.', 'Crime, Drama', 'John Travolta, Uma Thurman, Samuel L. Jackson'),
(6, 'Forrest Gump', 'The presidencies of Kennedy and Johnson, the Vietnam War, the Watergate scandal and other historical events unfold from the perspective of an Alabama man with an IQ of 75.', 'Drama, Romance', 'Tom Hanks, Robin Wright, Gary Sinise'),
(7, 'The Shawshank Redemption', 'Two imprisoned men bond over a number of years, finding solace and eventual redemption through acts of common decency.', 'Drama', 'Tim Robbins, Morgan Freeman, Bob Gunton'),
(8, 'Gladiator', 'A former Roman General sets out to exact vengeance against the corrupt emperor who murdered his family and sent him into slavery.', 'Action, Adventure, Drama', 'Russell Crowe, Joaquin Phoenix, Connie Nielsen'),
(9, 'Fight Club', 'An insomniac office worker looking for a way to change his life crosses paths with a devil-may-care soap maker and they form an underground fight club that evolves into something much, much more.', 'Drama', 'Brad Pitt, Edward Norton, Meat Loaf'),
(10, 'The Lord of the Rings: The Return of the King', 'Gandalf and Aragorn lead the World of Men against Sauron''s army to draw his gaze from Frodo and Sam as they approach Mount Doom with the One Ring.', 'Action, Adventure, Drama', 'Elijah Wood, Viggo Mortensen, Ian McKellen'),
(11, 'Spirited Away', 'During her family''s move to the suburbs, a sullen 10-year-old girl wanders into a world ruled by gods, witches, and spirits, and where humans are changed into beasts.', 'Animation, Adventure, Family', 'Daveigh Chase, Suzanne Pleshette, Miyu Irino'),
(12, 'Parasite', 'Greed and class discrimination threaten the newly formed symbiotic relationship between the wealthy Park family and the destitute Kim clan.', 'Comedy, Drama, Thriller', 'Song Kang-ho, Lee Sun-kyun, Cho Yeo-jeong'),
(13, 'The Godfather', 'The aging patriarch of an organized crime dynasty transfers control of his clandestine empire to his reluctant son.', 'Crime, Drama', 'Marlon Brando, Al Pacino, James Caan'),
(14, 'Avengers: Endgame', 'After the devastating events of Avengers: Infinity War, the universe is in ruins. With the help of remaining allies, the Avengers assemble once more in order to reverse Thanos'' actions and restore balance to the universe.', 'Action, Adventure, Drama', 'Robert Downey Jr., Chris Evans, Mark Ruffalo'),
(15, 'Joker', 'In Gotham City, mentally troubled comedian Arthur Fleck is disregarded and mistreated by society. He then embarks on a downward spiral of revolution and bloody crime.', 'Crime, Drama, Thriller', 'Joaquin Phoenix, Robert De Niro, Zazie Beetz'),
(16, 'Mad Max: Fury Road', 'In a post-apocalyptic wasteland, a woman rebels against a tyrannical ruler in search for her homeland with the help of a group of female prisoners, a psychotic worshiper, and a drifter named Max.', 'Action, Adventure, Sci-Fi', 'Tom Hardy, Charlize Theron, Nicholas Hoult'),
(17, 'Coco', 'Aspiring musician Miguel, confronted with his family''s ancestral ban on music, enters the Land of the Dead to find his great-great-grandfather, a legendary singer.', 'Animation, Adventure, Family', 'Anthony Gonzalez, Gael García Bernal, Benjamin Bratt'),
(18, 'Whiplash', 'A promising young drummer enrolls at a cut-throat music conservatory where his dreams of greatness are mentored by an instructor who will stop at nothing to realize a student''s potential.', 'Drama, Music', 'Miles Teller, J.K. Simmons, Paul Reiser'),
(19, 'The Grand Budapest Hotel', 'The adventures of Gustave H, a legendary concierge at a famous hotel from the fictional Republic of Zubrowka between the first and second World Wars, and Zero Moustafa, the lobby boy who becomes his most trusted friend.', 'Adventure, Comedy, Drama', 'Ralph Fiennes, F. Murray Abraham, Mathieu Amalric'),
(20, 'Blade Runner 2049', 'Young Blade Runner K''s discovery of a long-buried secret leads him to track down former Blade Runner Rick Deckard, who''s been missing for thirty years.', 'Action, Drama, Mystery', 'Ryan Gosling, Harrison Ford, Ana de Armas');
-- Insert sample data into the 'movie_reviews' table (30 rows)
-- Reviews are linked to movies via movie_id. Includes a mix of positive and negative reviews.
-- Movie title is prepended to the review text.
INSERT INTO movie_reviews (review_id, movie_id, reviewer_name, rating, review_text) VALUES
(1, 1, 'Alice Wonderland', 5, 'Inception: Absolutely mind-bending! A masterpiece of sci-fi.'),
(2, 1, 'Bob The Critic', 2, 'Inception: Too confusing and pretentious. Didn''t enjoy it.'),
(3, 2, 'Charlie Reviewer', 5, 'The Matrix: Revolutionary visuals and a compelling story.'),
(4, 3, 'Diana Prince', 5, 'Interstellar: Visually stunning and emotionally powerful. A must-see.'),
(5, 3, 'Edward Nigma', 4, 'Interstellar: Long, but worth it for the spectacle and ideas.'),
(6, 4, 'Fiona Glenanne', 5, 'The Dark Knight: Heath Ledger''s Joker is iconic. Dark and thrilling.'),
(7, 5, 'George Costanza', 5, 'Pulp Fiction: Quirky, violent, and endlessly quotable.'),
(8, 5, 'Hannah Montana', 1, 'Pulp Fiction: Way too violent and the timeline was confusing. Hated it.'),
(9, 6, 'Ian Malcolm', 4, 'Forrest Gump: A heartwarming story with a great performance by Hanks.'),
(10, 7, 'Jane Doe', 5, 'The Shawshank Redemption: An uplifting story of hope and friendship. Perfect.'),
(11, 7, 'John Smith', 5, 'The Shawshank Redemption: Morgan Freeman is amazing. Truly a classic.'),
(12, 8, 'Kyle Broflovski', 2, 'Gladiator: Generic plot and boring action scenes. Overrated.'),
(13, 9, 'Laura Palmer', 5, 'Fight Club: Provocative and thought-provoking. Norton and Pitt are fantastic.'),
(14, 10, 'Michael Scott', 5, 'The Lord of the Rings: The Return of the King: A fitting and epic conclusion to a legendary trilogy.'),
(15, 11, 'Nancy Drew', 5, 'Spirited Away: Beautiful animation and a magical story for all ages.'),
(16, 12, 'Oscar Martinez', 5, 'Parasite: A brilliant satire with unexpected twists. Loved it!'),
(17, 12, 'Pam Beesly', 4, 'Parasite: Very intense, but incredibly well-directed and acted.'),
(18, 13, 'Quentin Coldwater', 5, 'The Godfather: A cinematic masterpiece. Brando is unforgettable.'),
(19, 14, 'Rachel Green', 3, 'Avengers: Endgame: It was okay, but felt bloated and had too many characters.'),
(20, 14, 'Steve Rogers', 5, 'Avengers: Endgame: The culmination of a decade of storytelling. Perfect ending.'),
(21, 15, 'Tony Stark', 4, 'Joker: A dark and disturbing character study. Phoenix is mesmerizing.'),
(22, 16, 'Uma Thurman', 5, 'Mad Max: Fury Road: Non-stop action and incredible practical effects. What a ride!'),
(23, 17, 'Victor Frankenstein', 5, 'Coco: A heartwarming and visually stunning celebration of family and culture.'),
(24, 18, 'Walter White', 5, 'Whiplash: Intense and gripping. J.K. Simmons is terrifyingly good.'),
(25, 19, 'Xena Warrior', 2, 'The Grand Budapest Hotel: Too quirky for its own good. Style over substance.'),
(26, 20, 'Ygritte Snow', 5, 'Blade Runner 2049: A worthy sequel that expands on the original in meaningful ways. Visually breathtaking.'),
(27, 1, 'Zack Morris', 4, 'Inception: Kept me on the edge of my seat. Very clever.'),
(28, 4, 'Buffy Summers', 5, 'The Dark Knight: The best superhero movie ever made. Ledger is a legend.'),
(29, 8, 'Clark Kent', 3, 'Gladiator: Decent action, but the story felt predictable and dragged a bit.'),
(30, 15, 'Diana Troy', 3, 'Joker: Hard to watch at times, but a powerful performance. Felt it was a bit one-note though.');
如果您有自己的示例数据,并且您的 CSV 文件与 Cloud 控制台中提供的 Cloud SQL 导入工具兼容,则可以使用该工具,而不是使用上述方法。
7. 使用 IF 运算符
我们先尝试使用标准 PostgreSQL 方法进行传统搜索。
如果我们尝试搜索一部有关太空冒险的电影,可以尝试以下查询
SELECT title,description AS movies_about_space
FROM movies
WHERE description like '%space%' OR title like '%space%';
但未返回任何结果。但我确信我们至少有一部电影属于该类别。我们可以尝试使用全文搜索方法。
SELECT title,description
FROM movies
WHERE to_tsvector('english', description) @@ to_tsquery('english', 'space');
我们可能根本得不到任何结果。因此,我们需要了解一些关键字或字词组,才能使用 PostgreSQL 搜索的“传统”技术。
现在,我们可以尝试使用 google_ml.if 函数使用 AI 驱动的语义过滤。它将在后台使用 AI 技术,根据我们的自然语言请求执行语义过滤。
SELECT title,description AS movies_about_space
FROM movies
WHERE
google_ml.if(
prompt => 'Here are descriptions of movies, can you return the ones about space adventures: '||description);
我们会根据请求的语义含义来获取“星际穿越”这部电影,即使其标题和说明中都没有“太空”一词。正如您所注意到的那样,我们也没有事先构建任何内容,而是完全依赖于自动化内置函数。
8. 将 JOIN 与 IF 运算符搭配使用
如果我们想使用 AI 驱动的语义过滤功能联接两个表,该怎么办?例如,如果用户评价中提到了某部电影,我们可以尝试根据用户评价将其与相应电影进行匹配。
在新 AlloyDB Studio 编辑器标签页中运行:
SELECT title, rating, movie_reviews
FROM movies
JOIN
movie_reviews ON
google_ml.if(
prompt => 'Does the following reviews talk about a movie mentioned? The review: ' || review_text||' and the movie title is: '||title)
AND
title='Interstellar';
我们根据标题中提及的电影名称,获得了两个与我们的请求匹配的评价。我们还可以进一步简化请求:
SELECT title, rating, movie_reviews
FROM movies
JOIN
movie_reviews ON
google_ml.if(
prompt => 'Do we have the movie in the review?: ' || review_text||' and the movie title is: '||title)
AND
title='Interstellar';
9. 根据内容得出的评分结果
表 movie_reviews 包含电影的分级,但如果我们想实现自己的分级,可以使用 google_ml.rank 函数。我们可以根据自然语言定义的条件为评价评分,例如获取电影的前 5 条评价,并显示原始评分以供比较。
SELECT rating,review_text AS top_five
FROM movie_reviews
ORDER BY google_ml.rank('Score of 7 to 10 if the review says the movie was really good, 3 to 6 if the review says it''s alright is and 1 to 2 if the review says it was not worth of time. Review: ' || review_text) DESC
LIMIT 5;
如果我们想在原始评分旁边显示新评分,可以将其添加到列列表中。
SELECT rating,
google_ml.rank('Score of 7 to 10 if the review says the movie was really good, 3 to 6 if the review says it''s alright is and 1 to 2 if the review says it was not worth of time. Review: ' || review_text) AS ml_rank,
review_text AS top_five
FROM movie_reviews
ORDER BY ml_rank DESC
LIMIT 5;
以下是前 5 条评价。
rating | ml_rank | top_five
--------+---------+-----------------------------------------------------------------------
5 | 9 | The Dark Knight: Heath Ledger's Joker is iconic. Dark and thrilling.
5 | 9 | The Matrix: Revolutionary visuals and a compelling story.
5 | 9 | Interstellar: Visually stunning and emotionally powerful. A must-see.
5 | 9 | Inception: Absolutely mind-bending! A masterpiece of sci-fi.
5 | 9 | Pulp Fiction: Quirky, violent, and endlessly quotable.
(5 rows)
5 rows in set (0.13 sec)
我们为前 10 条评价选择的评分为 9 分(满分 10 分)。
10. 使用排名改进语义搜索
我们可以将语义搜索与排名结合使用,以获得更精准的结果。
注册重新排名模型
使用顶部的“+”号打开另一个 Cloud Shell 标签页。
在新的 Cloud Shell 标签页中,执行以下命令:
export PROJECT_ID=$(gcloud config get-value project)
gcloud iam service-accounts create aip-reranking
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:aip-reranking@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/discoveryengine.viewer"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:aip-reranking@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud iam service-accounts add-iam-policy-binding aip-reranking@$PROJECT_ID.iam.gserviceaccount.com \
--member="user:$(gcloud auth list --filter=status:ACTIVE --format="value(account)")" --role="roles/iam.serviceAccountTokenCreator"
gcloud iam service-accounts add-iam-policy-binding aip-reranking@$PROJECT_ID.iam.gserviceaccount.com \
--member="user:$(gcloud auth list --filter=status:ACTIVE --format="value(account)")" --role="roles/iam.serviceAccountUser"
生成一个访问令牌,并将其添加为名称为“rerank-secret”的 Secret:
echo -n $(gcloud auth print-access-token \
--impersonate-service-account="aip-reranking@$PROJECT_ID.iam.gserviceaccount.com" \
--lifetime=3600) | gcloud secrets create rerank-secret \
--replication-policy="automatic" \
--data-file=-
gcloud secrets add-iam-policy-binding 'rerank-secret' \
--member="serviceAccount:service-$(gcloud projects describe $PROJECT_ID --format="value(projectNumber)")@gcp-sa-alloydb.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
在此示例中,密钥将有效 1 小时,之后应通过创建新版本进行更新。例如:
echo -n $(gcloud auth print-access-token \
--impersonate-service-account="aip-reranking@$PROJECT_ID.iam.gserviceaccount.com" \
--lifetime=3600) | gcloud secrets versions add rerank-secret \
--data-file=-
现在,我们需要切换到 AlloyDB Studio,并创建一个密钥以访问我们的重新排名模型。
在 AlloyDB Studio 控制台中,将 $PROJECT_ID 替换为您的项目后,执行以下命令:
CALL
google_ml.create_sm_secret(
secret_id => 'rerank-secret',
secret_path => 'projects/PROJECT_ID/secrets/rerank-secret/versions/latest');
然后,在更改 PROJECT_ID 为您的项目 ID 后,在同一 Studio 会话中注册模型:
CALL
google_ml.create_model(
model_id => 'semantic-ranker-512-002',
model_type => 'reranking',
model_provider => 'custom',
model_request_url =>
'https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/rankingConfigs/default_ranking_config:rank',
model_qualified_name => 'semantic-ranker-512@002',
model_auth_type => 'secret_manager',
model_auth_id => 'rerank-secret',
model_in_transform_fn => 'google_ml.vertexai_reranking_input_transform',
model_out_transform_fn => 'google_ml.vertexai_reranking_output_transform');
使用重新排名模型
现在,我们可以在查询中使用重新排名模型,改进语义搜索结果,使其更具体,并选择最佳选项。
我们来查找动作片,然后使用“计算机和未来”作为条件对其进行排名。
WITH
action_movies AS (
SELECT
title,
description,
ROW_NUMBER() OVER (ORDER BY title, description) AS ref_number
FROM
movies
WHERE
google_ml.if(
prompt => 'The following movies are action movies. The movie title: ' || title || ' and the description is: ' || description
)
),
ranked_documents_array AS (
SELECT
ARRAY_AGG(description ORDER BY ref_number) AS docs
FROM
action_movies
),
reranked_results AS (
SELECT
r.index,
r.score
FROM
ranked_documents_array,
ai.rank(
model_id => 'semantic-ranker-512-002',
search_string => 'Computers and future',
documents => ranked_documents_array.docs
) AS r
)
SELECT
am.title,
left(am.description,80) as description,
rr.score
FROM
action_movies am
JOIN
reranked_results rr ON am.ref_number = rr.index
ORDER BY
rr.score DESC;
结果中会列出动作片,并且希望将与未来和计算机有关的电影排在前面。
title | description | score
-----------------------------------------------+----------------------------------------------------------------------------------+--------
The Matrix | A computer hacker learns about the true nature of his reality. | 0.0145
Mad Max: Fury Road | In a post-apocalyptic wasteland, a woman rebels against a tyrannical ruler in se | 0.002
Avengers: Endgame | After the devastating events of Avengers: Infinity War, the universe is in ruins | 0.0018
Blade Runner 2049 | Young Blade Runner K's discovery of a long-buried secret leads him to track down | 0.0004
The Lord of the Rings: The Return of the King | Gandalf and Aragorn lead the World of Men against Sauron's army to draw his gaze | 1e-05
The Dark Knight | When the menace known as the Joker wreaks havoc and chaos on the people of Gotha | 1e-05
Inception | A thief who steals information by entering people's dreams. | 1e-05
Gladiator | A former Roman General sets out to exact vengeance against the corrupt emperor w | 1e-05
(8 rows)
尝试使用不同的条件,看看排名对输出顺序有何影响。
11. 清理环境
完成实验后销毁 AlloyDB 实例和集群
删除 AlloyDB 集群和所有实例
系统会通过强制选项销毁集群,该选项还会删除属于该集群的所有实例。
如果您已断开连接且之前的所有设置都已丢失,请在 Cloud Shell 中定义项目和环境变量:
gcloud config set project <your project id>
export REGION=us-central1
export ADBCLUSTER=alloydb-aip-01
export PROJECT_ID=$(gcloud config get-value project)
删除集群:
gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force
预期的控制台输出:
student@cloudshell:~ (test-project-001-402417)$ gcloud alloydb clusters delete $ADBCLUSTER --region=$REGION --force All of the cluster data will be lost when the cluster is deleted. Do you want to continue (Y/n)? Y Operation ID: operation-1697820178429-6082890a0b570-4a72f7e4-4c5df36f Deleting cluster...done.
删除 AlloyDB 备份
删除集群的所有 AlloyDB 备份:
for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done
预期的控制台输出:
student@cloudshell:~ (test-project-001-402417)$ for i in $(gcloud alloydb backups list --filter="CLUSTER_NAME: projects/$PROJECT_ID/locations/$REGION/clusters/$ADBCLUSTER" --format="value(name)" --sort-by=~createTime) ; do gcloud alloydb backups delete $(basename $i) --region $REGION --quiet; done Operation ID: operation-1697826266108-60829fb7b5258-7f99dc0b-99f3c35f Deleting backup...done.
12. 恭喜
恭喜您完成此 Codelab。
所学内容
- 如何部署 AlloyDB for Postgres
- 如何启用 AlloyDB AI 操作器
- 如何使用不同的 AlloyDB AI 运算符
- 如何在 RAG 工作流中使用 AlloyDB AI 操作器