1. Before you begin
Here are a few things to note before continuing this codelab.
Prerequisites
- Basic understanding of LLMs
- Basic understanding of RAG systems
What you'll learn
- How to build a google quality search engine that can answer your questions from the data you upload
- How to create Vertex AI Datastore
- How to create Vertex AI Agents
- How to leverage CloudRun to deploy the application
What you'll need
- A Google Cloud account
- A Google Cloud project
- An IDE with terminal
Introduction
Google Search is a powerful tool that uses a massive index of web pages and other content to provide relevant results to user queries. This is made possible by a technique called Retrieval Augmented Generation (RAG), which is a key technique in modern AI.
RAG works by first retrieving relevant passages from a document corpus. This is done using a variety of methods, such as keyword matching, semantic similarity, and machine learning. Once the relevant passages have been retrieved, they are used to generate a summary or answer to the user's query.
The main benefit of RAG is that it allows language models to avoid hallucination. Hallucination is a term used to describe the generation of text that is not supported by the evidence in the document corpus. This can happen when language models are not able to distinguish between relevant and irrelevant information.
RAG helps to avoid hallucination by ensuring that the generated text is always based on evidence from the document corpus. This makes it a more reliable and trustworthy source of information.
RAG is a powerful technique that is being used in a variety of applications, including search engines, chatbots, and question answering systems. It is likely to play an increasingly important role in AI in the years to come.
Here are some examples of how RAG is being used in practice:
- Many search systems use RAG to generate search results that are relevant to the user's query.
- Chatbots use RAG to generate responses to user questions that are informative and engaging.
- Question answering systems use RAG to generate answers to user questions that are accurate and comprehensive.
RAG is a versatile technique that can be used to generate text in a variety of domains and applications. It is a powerful tool that is helping to make AI more intelligent and informative.
In this codelab, we will build a RAG system that can help answer your questions with the provided corpus as uploaded by you. The out of the box RAG platform called Vertex AI Search/Agent Builder helps you accelerate building RAG systems thus avoiding you with the manual effort of collecting the documents, parsing, chunking, generating embeddings, query expansion, candidate retrieval & ranking. While the out of the box RAG system helps you to get started quickly, Google Cloud also provides discrete APIs for every process to build your own DIY RAG systems which helps in fine tuning your RAG systems to suit your business requirements.
What you'll build
By the end of this codelab, you will have a working RAG system deployed that can help answer your questions with factual information, grounded and cited with right references.
You will also have a better understanding of how to use Vertex AI Search APIs to build this RAG architecture on Google Cloud. In addition to this, you will also learn how to deploy this RAG architecture application (with frontend and backend) on CloudRun which is a serverless platform of deploying applications as containers on Google Cloud.
How the application works
- Upload your data: Users can upload their own corpus of data i.e PDF file as input.
- Ask queries in the search bar: Users can ask questions on the search bar based on the corpus of data uploaded.
- Retrieve answers: Users can retrieve the search results/candidates and check the factuality/groundedness of the answer that is retrieved based on the relevance of the query.
2. Environment Setup
- In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
- Ensure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
- You'll use Cloud Shell, a command-line environment running in Google Cloud. To access it, click Activate Cloud Shell at the top of the Google Cloud console.
- Once connected to Cloud Shell, you check that you're already authenticated and that the project is set to your project ID using the following command:
gcloud auth list
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project.
gcloud config list project
- If your project is not set, use the following command to set it:
gcloud config set project <YOUR_PROJECT_ID>
- Make sure that the following APIs are enabled:
- Cloud Run
- Vertex AI
- Cloud Storage
The alternative to using the gcloud command is going through the console using this link. Refer to documentation for gcloud commands and usage.
3. Step 1: Create GCP Bucket
- Go to the console and on the search bar type Cloud Storage.
- Select the Cloud Storage from the suggested results.
- Click on Create Bucket
- Provide a globally unique name for the bucket
- Click on Continue
- In the Location Type, select Multi-Region
- In the drop down, make sure to select the option
us (multiple regions in United States)
- Click on Create Bucket
- Once the bucket is created then upload the
alphabet-metadata.json
from the repository
4. Step 2: Create a Vertex AI Datastore
- On the search bar of the console page, type "Vertex AI Agent Builder"
- Select the first product, "Agent Builder"
- On the Agent Builder page, click on the "Data Stores" as shown in the left side navigation bar
- Click on "Create Data Store"
- Select the Cloud Storage as your data store
- Click on "Select" below Cloud Storage icon
- On the tab below "Folder" option, click on the "Browse" button
- Select the bucket you created in Step 1
- On the options below, make sure to select "Linked unstructured documents (JSONL with metadata)"
- Click Continue
- On the Configuration page, select "global" as the location of your data store
- Provide an identifiable name to your data store
- Click on Create
Brownie:
- Just above the "Create" button, you can see the document processing option.
- You can play around with different parsers like digital, OCR or layout parser
- You can also enable advanced chunking and provide your own custom chunk size limits
5. Step 3: Create an Agent
- Once your data store is created, click on the app in the navigation bar to the left and choose "Apps"
- Click on "Create App" button
- Select the app type to be "Search" (you can also create agents, conversational bots, recommendations etc.)
- Make sure you select "Generic" under Content. You can also select Media or Recommendations based on the data store and type of data and systems you want to build.
- Ensure to toggle ON both Enterprise Edition and Advanced LLM features
- Provide your application name
- Provide your company name
- Ensure to select the region to be "global"
- Click on "Continue"
- In the next screen, select the data store you created in Step 2
- Click on "Create"
6. Step 4: Dockerize your application
- Open your terminal in Google Cloud console
- Clone the repository <sample_repository_link> using the following command
git clone https://github.com/kkrishnan90/vertex-ai-search-agent-builder-demo
- Change the directory and navigate to the cloned repository using the following command
cd vertex-ai-search-agent-builder-demo
- Folder structure
- Backend - This will hold a python based API implementation that will help create restful endpoints for your frontend to interact with.
- Frontend - This will hold a react based application and will serve the UI. This will also contain the necessary application level calls to the backend via rest endpoints
- Dockerfile - This file will contain all the relevant commands to create a docker container
- In the root of the repository directory, run the following command which will help build a docker image (note: ensure to use
- - platform
flag when building docker containers in Macbooks that uses Apple Silicon chips like M1, M2 etc. This flag is not necessary if you are building using Windows machine or if your CPU architecture is Intel based)
docker build --platform linux/amd64 -t your-image-name .
- Once the docker container build is successful, run the following command to tag the container with right tags to ensure you provide a version of your image. Remember, there could be multiple versions of the application and hence multiple versions as tags in docker containers. Ensuring that the latest stable version is always used is a good recommended approach from devops perspective.
docker tag your-image-name REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY-NAME/IMAGE-NAME:TAG
- Once the tagging of the docker container image is successful, let us now push the image to Google Artifact Registry (GAR). GAR is a completely managed platform from Google to help manage and version control your docker containers. Run the following command that will push the above tagged container to GAR. For more information, refer the following link [ https://cloud.google.com/artifact-registry/docs/docker/pushing-and-pulling]
docker push REGION-docker.pkg.dev/PROJECT-ID/REPOSITORY-NAME/IMAGE-NAME:TAG
7. Step 5: Deploy your application on Cloud Run
- Minimize your terminal. On the Google Cloud console search bar, search for Cloud Run
- Click on the suggested Cloud Run product
- Click on "Create Service"
- In the next page, make sure "Deploy one revision from an existing container image" is selected
- Below, click on "Select"
- You will now be prompted with a navigation bar on the right.
- Ensure to check if Artifact Registry is selected as the tab
- Ensure to check if the project is selected correctly
- Click on the arrow to expand the accordion on your deployed container image link
- Select the container tag and expand (always select the latest ones deployed - with the right latest tags i.e. v1, v2, etc.)
- Click on the container image shown below the container tag name
- In the Configure area
- Provide a service name for your Cloud Run application (this will be a part of the url when you deploy the application on Cloud Run)
- Select the appropriate region (in this case us-central1 or anything of your choice)
- Under the Authentication
- Ensure "Allow unauthenticated invocations" is selected
- Under CPU allocation and Pricing
- Select "CPU is only allocated during request processing"
- Modify the Service Auto Scaling to 1 (for production purposes, it is recommended to have minimum instances running to handle your daily traffic, you can even leave it 0 i.e. Zero)
- Set the "Ingress Control" to "All" to allow traffic from the internet to access your application
- Click on "Create"
- This will deploy a Cloud Run instance and the provisioning of the same make take a few minutes
- Once deployed, you will be able to see the publicly available URL that you can access your web application from
8. How does it all work
- Once in the home page of the application, click on the "Upload Document" button
- Upload your PDF file
- Once the upload is complete
- Click on the Search Bar on the top of the web page
- Start searching for queries related to your uploaded document
- Once you type your query and click on Search, it should show all relevant answers from the document you just uploaded
- You can play around by looking into the backend code and add more configurations like the following
- Adding snippets
- Adding extractive segments
- Adding answers
- Tuning the top-k results to help the LLM summarize the answer (something like AI Overview on Google Search)
- As an addon, you can also add metadata tags while uploading the document. This will help generate facets and filterable categories
9. Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this codelab, follow these steps:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
- Alternatively you can go to Cloud Run on the console, select the service you just deployed and delete.
10. Congratulations
Congratulations! You have successfully built a quick out of the box RAG system using the state of art models powered by Google and to provide Google quality results to your search queries. This codelab is for demonstration purposes only, more security and guardrails have to be set up for production use cases. The link to the complete repository is here. By leveraging Google Cloud and with just 5 steps, we can generate an end-to-end RAG system that can serve you with Google quality results out of the box in a few minutes. As generative AI and large language models evolve, building such RAG systems also helps us from avoiding the pitfalls of hallucination and non cited information being surfaced.
While this is just a starting point, we can do wonders with the completely customizable DIY RAG APIs which provides you with even more transparency, power and efficiency to handle every part of the pipeline process effectively.