Building a Google-quality Search System with Vertex AI

1. Before you begin

Here are a few things to note before continuing this codelab.

Prerequisites

Basic understanding of LLMs
Basic understanding of RAG systems

What you'll learn

How to build a google quality search engine that can answer your questions from the data you upload
How to create Vertex AI Datastore
How to create Vertex AI Agents
How to leverage CloudRun to deploy the application

What you'll need

A Google Cloud account
A Google Cloud project
An IDE with terminal

Introduction

Google Search is a powerful tool that uses a massive index of web pages and other content to provide relevant results to user queries. This is made possible by a technique called Retrieval Augmented Generation (RAG), which is a key technique in modern AI.

RAG works by first retrieving relevant passages from a document corpus. This is done using a variety of methods, such as keyword matching, semantic similarity, and machine learning. Once the relevant passages have been retrieved, they are used to generate a summary or answer to the user's query.

The main benefit of RAG is that it allows language models to avoid hallucination. Hallucination is a term used to describe the generation of text that is not supported by the evidence in the document corpus. This can happen when language models are not able to distinguish between relevant and irrelevant information.

RAG helps to avoid hallucination by ensuring that the generated text is always based on evidence from the document corpus. This makes it a more reliable and trustworthy source of information.

RAG is a powerful technique that is being used in a variety of applications, including search engines, chatbots, and question answering systems. It is likely to play an increasingly important role in AI in the years to come.

Here are some examples of how RAG is being used in practice:

Many search systems use RAG to generate search results that are relevant to the user's query.
Chatbots use RAG to generate responses to user questions that are informative and engaging.
Question answering systems use RAG to generate answers to user questions that are accurate and comprehensive.

RAG is a versatile technique that can be used to generate text in a variety of domains and applications. It is a powerful tool that is helping to make AI more intelligent and informative.

In this codelab, we will build a RAG system that can help answer your questions with the provided corpus as uploaded by you. The out of the box RAG platform called Vertex AI Search/Agent Builder helps you accelerate building RAG systems thus avoiding you with the manual effort of collecting the documents, parsing, chunking, generating embeddings, query expansion, candidate retrieval & ranking. While the out of the box RAG system helps you to get started quickly, Google Cloud also provides discrete APIs for every process to build your own DIY RAG systems which helps in fine tuning your RAG systems to suit your business requirements.

What you'll build

By the end of this codelab, you will have a working RAG system deployed that can help answer your questions with factual information, grounded and cited with right references.

You will also have a better understanding of how to use Vertex AI Search APIs to build this RAG architecture on Google Cloud. In addition to this, you will also learn how to deploy this RAG architecture application (with frontend and backend) on CloudRun which is a serverless platform of deploying applications as containers on Google Cloud.

How the application works

Upload your data: Users can upload their own corpus of data i.e PDF file as input.
Ask queries in the search bar: Users can ask questions on the search bar based on the corpus of data uploaded.
Retrieve answers: Users can retrieve the search results/candidates and check the factuality/groundedness of the answer that is retrieved based on the relevance of the query.

2. Environment Setup

In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
Ensure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
You'll use Cloud Shell, a command-line environment running in Google Cloud. To access it, click Activate Cloud Shell at the top of the Google Cloud console.

Once connected to Cloud Shell, you check that you're already authenticated and that the project is set to your project ID using the following command:

gcloud auth list

Run the following command in Cloud Shell to confirm that the gcloud command knows about your project.

gcloud config list project

If your project is not set, use the following command to set it:

gcloud config set project <YOUR_PROJECT_ID>

Make sure that the following APIs are enabled:

Cloud Run
Vertex AI
Cloud Storage

The alternative to using the gcloud command is going through the console using this link. Refer to documentation for gcloud commands and usage.

3. Step 1: Create GCP Bucket

Go to the console and on the search bar type Cloud Storage.
Select the Cloud Storage from the suggested results.
Click on Create Bucket

Provide a globally unique name for the bucket
Click on Continue
In the Location Type, select Multi-Region
In the drop down, make sure to select the option us (multiple regions in United States)

Click on Create Bucket

Once the bucket is created then upload the alphabet-metadata.json from the repository

4. Step 2: Create a Vertex AI Datastore

On the search bar of the console page, type "Vertex AI Agent Builder"
Select the first product, "Agent Builder"

On the Agent Builder page, click on the "Data Stores" as shown in the left side navigation bar

Click on "Create Data Store"

Select the Cloud Storage as your data store
Click on "Select" below Cloud Storage icon

On the tab below "Folder" option, click on the "Browse" button
Select the bucket you created in Step 1
On the options below, make sure to select "Linked unstructured documents (JSONL with metadata)"
Click Continue

On the Configuration page, select "global" as the location of your data store
Provide an identifiable name to your data store
Click on Create

Brownie:

Just above the "Create" button, you can see the document processing option.
You can play around with different parsers like digital, OCR or layout parser
You can also enable advanced chunking and provide your own custom chunk size limits

5. Step 3: Create an Agent

Once your data store is created, click on the app in the navigation bar to the left and choose "Apps"
Click on "Create App" button
Select the app type to be "Search" (you can also create agents, conversational bots, recommendations etc.)

Make sure you select "Generic" under Content. You can also select Media or Recommendations based on the data store and type of data and systems you want to build.
Ensure to toggle ON both Enterprise Edition and Advanced LLM features
Provide your application name
Provide your company name

Ensure to select the region to be "global"
Click on "Continue"
In the next screen, select the data store you created in Step 2
Click on "Create"

6. Step 4: Dockerize your application

Open your terminal in Google Cloud console
Clone the repository <sample_repository_link> using the following command

git clone https://github.com/kkrishnan90/vertex-ai-search-agent-builder-demo

Change the directory and navigate to the cloned repository using the following command

cd vertex-ai-search-agent-builder-demo

Folder structure
Backend - This will hold a python based API implementation that will help create restful endpoints for your frontend to interact with.
Frontend - This will hold a react based application and will serve the UI. This will also contain the necessary application level calls to the backend via rest endpoints
Dockerfile - This file will contain all the relevant commands to create a docker container
In the root of the repository directory, run the following command which will help build a docker image (note: ensure to use - - platform flag when building docker containers in Macbooks that uses Apple Silicon chips like M1, M2 etc. This flag is not necessary if you are building using Windows machine or if your CPU architecture is Intel based)

docker build --platform linux/amd64 -t your-image-name .

Once the docker container build is successful, run the following command to tag the container with right tags to ensure you provide a version of your image. Remember, there could be multiple versions of the application and hence multiple versions as tags in docker containers. Ensuring that the latest stable version is always used is a good recommended approach from devops perspective.

docker tag your-image-name REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY-NAME/IMAGE-NAME:TAG

Once the tagging of the docker container image is successful, let us now push the image to Google Artifact Registry (GAR). GAR is a completely managed platform from Google to help manage and version control your docker containers. Run the following command that will push the above tagged container to GAR. For more information, refer the following link [ https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling]

docker push REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY-NAME/IMAGE-NAME:TAG

7. Step 5: Deploy your application on Cloud Run

Minimize your terminal. On the Google Cloud console search bar, search for Cloud Run
Click on the suggested Cloud Run product

Click on "Create Service"
In the next page, make sure "Deploy one revision from an existing container image" is selected
Below, click on "Select"
You will now be prompted with a navigation bar on the right.
Ensure to check if Artifact Registry is selected as the tab
Ensure to check if the project is selected correctly
Click on the arrow to expand the accordion on your deployed container image link
Select the container tag and expand (always select the latest ones deployed - with the right latest tags i.e. v1, v2, etc.)
Click on the container image shown below the container tag name

In the Configure area
Provide a service name for your Cloud Run application (this will be a part of the url when you deploy the application on Cloud Run)
Select the appropriate region (in this case us-central1 or anything of your choice)
Under the Authentication
Ensure "Allow unauthenticated invocations" is selected
Under CPU allocation and Pricing
Select "CPU is only allocated during request processing"
Modify the Service Auto Scaling to 1 (for production purposes, it is recommended to have minimum instances running to handle your daily traffic, you can even leave it 0 i.e. Zero)
Set the "Ingress Control" to "All" to allow traffic from the internet to access your application
Click on "Create"
This will deploy a Cloud Run instance and the provisioning of the same make take a few minutes

Once deployed, you will be able to see the publicly available URL that you can access your web application from

8. How does it all work

Once in the home page of the application, click on the "Upload Document" button
Upload your PDF file
Once the upload is complete
Click on the Search Bar on the top of the web page
Start searching for queries related to your uploaded document
Once you type your query and click on Search, it should show all relevant answers from the document you just uploaded
You can play around by looking into the backend code and add more configurations like the following
Adding snippets
Adding extractive segments
Adding answers
Tuning the top-k results to help the LLM summarize the answer (something like AI Overview on Google Search)
As an addon, you can also add metadata tags while uploading the document. This will help generate facets and filterable categories

9. Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this codelab, follow these steps:

In the Google Cloud console, go to the Manage resources page.
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.
Alternatively you can go to Cloud Run on the console, select the service you just deployed and delete.

10. Congratulations

Congratulations! You have successfully built a quick out of the box RAG system using the state of art models powered by Google and to provide Google quality results to your search queries. This codelab is for demonstration purposes only, more security and guardrails have to be set up for production use cases. The link to the complete repository is here. By leveraging Google Cloud and with just 5 steps, we can generate an end-to-end RAG system that can serve you with Google quality results out of the box in a few minutes. As generative AI and large language models evolve, building such RAG systems also helps us from avoiding the pitfalls of hallucination and non cited information being surfaced.

While this is just a starting point, we can do wonders with the completely customizable DIY RAG APIs which provides you with even more transparency, power and efficiency to handle every part of the pipeline process effectively.