Bookshelf Analytics: Use Gemini to build SQL applications with BigQuery and Generative AI

1. Introduction

Do you love digging into books but get overwhelmed by the sheer volume of choices? Imagine having an AI-powered app that not only recommends the perfect read but also offers a concise summary based on your genre of choice, giving you a glimpse into the book's essence. In this codelab, I'll walk you through building such an app with BigQuery, Gemini and Cloud Functions powered by Gemini.

Project Overview

Our use case centers around these 4 key components:

  • Book Database: The vast BigQuery public dataset of internet archive books will serve as our comprehensive book catalog.
  • AI Summarization Engine: Google Cloud Functions, equipped with the Gemini 1.0 Pro language model, will generate insightful summaries tailored to user requests.
  • BigQuery Integration: A remote function within BigQuery that calls our Cloud Function to deliver on-demand book summaries and themes.
  • User Interface: A web app hosted on Cloud Run that will offer a web application for users to view the results.

We will divide the implementation into 3 codelabs:

Codelab 1: Use Gemini to build a Java Cloud Function for a Gemini application.

Codelab 2: Use Gemini to build SQL apps with BigQuery and Generative AI.

Codelab 3: Use Gemini to create a Java Spring Boot web application that interacts with BigQuery.

2. Use Gemini to build SQL apps with BigQuery and Generative AI

What you'll build

You'll create a

  • Remote model in BigQuery that invokes the Vertex AI text-bison-32k endpoint to identify the genre (or theme) of the book from a list of ";" separated keywords in the table.
  • Remote function in BigQuery that will invoke the deployed generative AI Cloud Function remotely.
  • Use the remote model and function to summarize the theme and text of a book with SQL queries and write the results to a new table in the bookshelf dataset.
  • You will implement these steps with the help of Gemini

3. Requirements

Create your project

You can skip the steps below if you already activated a billing account and created a project using the link mentioned in the conditional step above.

  1. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
  2. Make sure that billing is enabled for your Cloud project. Learn how to check if billing is enabled on a project.

Activate Cloud Shell

  1. You will use Cloud Shell, a command-line environment running in Google Cloud that comes pre-loaded with bq:

From the Cloud Console, click Activate Cloud Shell on the top right corner: 6757b2fb50ddcc2d.png

  1. Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID. Run the following command in Cloud Shell to confirm that you are authenticated:

gcloud auth list

  1. Run the following command in Cloud Shell to confirm that the gcloud command knows about your project.

gcloud config list project

  1. If your project is not set, use the following command to set it:

gcloud config set project <YOUR_PROJECT_ID>

Refer documentation for gcloud commands and usage.

4. Enabling Gemini for Google Cloud and necessary APIs

Enable Gemini

  1. Navigate to Gemini for Google Cloud Marketplace to enable the API. You can also use the following command:

gcloud services enable cloudaicompanion.googleapis.com --project PROJECT_ID

  1. Visit the Gemini page and click on "Start chatting".

Enable other necessary APIs

How would we do that? Let's ask Gemini that, shall we? But before that remember:

Note: LLMs are non-deterministic. So while you are trying these prompts, the response that you receive might look different from the ones in my screenshot.

Go to the Gemini chat console by clicking the "Open Gemini" icon on the top right corner adjacent to the search bar in the Google Cloud console.

Type this question in the "Enter a prompt here" section:

How do I enable the BigQuery and Vertex AI apis using gcloud command?

You get the response as seen in the following image:

19c3fd78530794d9.png

Copy that (you can use the copy icon on the top of the command snippet) and run it in the Cloud Shell Terminal to enable the respective services:

  • bigquery.googleapis.com
  • aiplatform.googleapis.com

5. Explore the BigQuery public dataset for books data

Begin by familiarizing yourself with the BigQuery public dataset containing information on numerous internet archive books.

You can find this public dataset in the BigQuery explorer pane. You can find this on the left side when you land on the BigQuery console.

39e2ac03cc99cbac.png

Type "gdelt-bq" or "internetarchivebooks" in the search bar and click SEARCH ALL PROJECTS. Expand in the result and star internet archive books as shown in the image below:

68dba68a79cddfc9.png.

Expand the dataset, click gdelt-bq.internetarchivebooks, and then preview the data in the 1920 table. This table includes books from the year 1920 archived.

To take a look at the schema that we will be using in subsequent sections, run the following query:

select * from  `gdelt-bq.internetarchivebooks.1920` limit 5;

We will be using the following three fields for our codelab:

  • BookMeta_Title (title)
  • Themes (themes separated by ‘;')
  • BookMeta_FullText (full text of the book)

6. Create a new BigQuery dataset called bookshelf

We want to create a dataset under the project to store all the database and analytics objects that we are going to create in this lab. Let's ask Gemini how to create a BigQuery dataset. You should already have your Gemini chat open on another tab in your browser from the enabling APIs step. If not, you can do so now. Go to the Google Cloud Console by navigating to https://console.cloud.google.com and you should see the Gemini icon right next to the search bar on top. Click that and the chat opens up.

26e1491322855614.png

Enter the prompt as shown below.

Here's my prompt:

How to create a BigQuery dataset?

Here's the response:

f7a989cc9a01009.png

Let's follow the steps outlined in the response to create a dataset named "bookshelf" in your active project.

To create a BigQuery dataset, follow these steps:

  1. Go to the BigQuery page in the Google Cloud console.
  2. In the Explorer panel, click your project id.
  3. Click Create dataset (it should be in the options listed when you click the 3 dots next to your project id)
  4. Enter the dataset name as "bookshelf".
  5. Set the location as "US(Multi-region)".
  6. For steps 3, 4, 5, and 6 from the response, leave the default options.
  7. Click CREATE DATASET.

Your dataset will be created and will appear in the Explorer panel. The "bookshelf" dataset can be seen as follows:

7. Create a remote model to invoke Vertex AI LLM (text-bison-32k)

Next, we need to create a model in BigQuery that invokes the "text-bison-32k" Vertex AI model. The model will help identify a common theme, context for the book, from the list of keywords for each book in the dataset.

Let's ask Gemini this question. To do this, navigate to the tab where you have the Gemini chat console open and type the below prompt:

How will you connect BigQuery and Vertex AI to call the LLM endpoint (text-bison-32k) in BigQuery?

The response is as follows:

41904e30ce92b436.png

The response includes accurate information, such as the steps that include using CREATE MODEL statement, using BigQuery connection, and defining the endpoint. The query statement is accurate for me,, but it doesn't mean you will receive the exact same output since this is a large language model you might get responses in different format, volume and depth. If you do not see all the details I have received, feel free to probe the chat with follow-up questions. For example: Give more details about how to create the connection resource or why the connection attribute is missing or How do I connect from BigQuery to Vertex AI etc.

Here is a sample follow-up prompt (use this only if you need a follow-up, if the first response you received is sufficient, proceed with it):

What about the connection? How will I connect from BigQuery to Vertex AI?

Here is the response:

2ed9b3ed96b11bc9.png

Let's follow the steps from the response to create a BigQuery connection:

  1. Navigate to the BigQuery console.
  2. In the BigQuery Explorer pane, click the "+ADD" button and click "Connections to external data sources".
  3. At this point, it will ask you to Enable BigQuery Connection API. Click ENABLE API:

ded96126495ffe9.png

  1. Click "Connections to external data sources" and you should see the External data source slide as seen below . From the list of external sources, select "Vertex AI" source.

434cdbbb3a9436f2.png

  1. Type a connection ID (this can be an id of your choice, but for now, set it as "bq-vx" and the region (multiregion "US").
  2. Click "Create Connection".

d3a2aeebc3ecdfae.png

  1. After you create the connection, click "Go to connection".
  2. In the connection info page, copy the Service Account ID since we will be using this ID in the subsequent steps.
  3. Now that the connection is created, let's assign permissions to this service account id that we copied to be able to use Vertex AI.
  4. From the Google Cloud Console page, open Google Cloud IAM or navigate to the link.
  5. Click Grant Access under the View by Principals section.

5317eed5da0bb8c5.png

  1. In the Grant Access dialog box, enter the Service Account ID we noted earlier inside the New principles text box.
  2. Set the role to "Vertex AI User".

f213db33d220aa5f.png

The required connection is created. The necessary permission is granted for the principal (service account of the connection) to use Vertex AI from BigQuery.

Run the following DDL(Data Definition Language) statement that represents the creation of a DB object, in this case, MODEL in the BigQuery query editor.

CREATE OR REPLACE MODEL bookshelf.llm_model
  REMOTE WITH CONNECTION `us.bq-vx`
  OPTIONS (ENDPOINT = 'text-bison-32k');

Something for you to try as an alternative to the above step, you can ask Gemini to suggest a query to create the model for invoking the "text-bison-32k" model.

Note: If you used a different name for your connection, replace the " us.bq-vx" with that name in the earlier DDL statement. This query creates the remote model in the dataset, "bookshelf", that we created earlier.

8. Create a remote function that invokes the Java Cloud Function

We will now create a remote function in BigQuery using the Java Cloud Function that we created in the codelab 1 of this series to implement the Gemini model. This remote function will be used to summarize the book content.

Note: If you missed this codelab or have not deployed this Cloud Function, you can skip this step and move on to the next topic (which is, Summarize the theme of the books using the remote model).

Go to the BigQuery console and paste the following DDL statement in the Query Editor (you can create a new Query Editor Tab by clicking on the + button)

a54c0b0014666cac.png

Below is the DDL that you can copy. Remember to replace the endpoint with your deployed Cloud Function endpoint (created from the codelab 1). If you don't have an endpoint, you can replace the masked characters in the below DDL with "abis-345004" for demo purposes.

CREATE OR REPLACE FUNCTION
  `bookshelf.GEMINI_REMOTE_CALL` (context STRING) RETURNS STRING
  REMOTE WITH CONNECTION `us.bq-vx`
  OPTIONS (
    endpoint = 'https://us-central1-****-******.cloudfunctions.net/remote-gemini-call'  );

Navigate to the BigQuery console on Google Cloud platform and open a new Query Editor tab. In the BigQuery Query Editor, paste the DDL statement above. You can see the following response after running the query:

a023d5691acf6f.png

Now that the model and function are created, let's test these two BigQuery objects by running them in a SELECT query.

9. Summarize themes using the remote model

Let's use the remote model we created "bookshelf.llm_model" to generate a consolidated keyword for the book from the given list of themes:

SELECT * FROM ML.GENERATE_TEXT(MODEL `bookshelf.llm_model`,
(
SELECT
     CONCAT('Read all the words in the following text separated by ";" and come up with one single (most relating) theme that is applicable : ', Themes, '. Take the title of the book from the following value', BookMeta_Title, '. Return a meaningful and concise theme with one or two words, not more than that. If a theme is not possible, return a context from its title. But DO NOT return empty or null. ') AS prompt
    FROM `gdelt-bq.internetarchivebooks.1920` LIMIT 1
));

Optional Step: The result for the LLM-generated result field is nested. Let's add some LLM parameters and the "flatten_json_output" attribute to the query. Using the "flatten_json_output" attribute helps remove the nested structure from the LLM-generated result field.

SELECT * FROM ML.GENERATE_TEXT ( MODEL `bookshelf.llm_model`, (
 SELECT CONCAT('Read all the words in the following text separated by ";" and come up with one single (most relating) theme that is applicable : ', Themes, '. Take the title of the book from the following value', BookMeta_Title, '. Return a meaningful and concise theme with one or two words, not more than that. If a theme is not possible, return a context from its title. But DO NOT return empty or null. ') AS prompt FROM `gdelt-bq.internetarchivebooks.1920` limit 1),
    STRUCT( 0.2 AS temperature, 100 AS max_output_tokens, TRUE AS flatten_json_output));

Now, run the SELECT query in the BigQuery Editor and verify the result. We have limited the query result to 1 for testing. The result is displayed as follows:

9b0d33eca61a73d2.png

10. Summarize full text of books using the remote function

We will now try to summarize the book by running the bookshelf.GEMINI_REMOTE_CALL Cloud Function we created earlier.

Note: If you had skipped the remote FUNCTION creation (the previous topic in this codelab), remember to skip the bookshelf.GEMINI_REMOTE_CALL function call in the SELECT query as well.

Use the SELECT query that calls the remote function (GEMINI_REMOTE_CALL) we created earlier. The call to this function, GEMINI_REMOTE_CALL, includes a prompt asking for a summary of the book text:

select BookMeta_Title, `bookshelf.GEMINI_REMOTE_CALL` (
  CONCAT('This is about a book. The title of the book is this: ', BookMeta_Title,  ' . The theme of the book is as follows: ', Themes, 'The following text contains an excerpt from the book. Summarize it in less than 5 lines to create the gist. If you do not have a summary, use the title and themes to make up a short summary. DO NOT RETURN EMPTY RESPONSE. ', SUBSTR(BookMeta_FullText, 5000, 10000))) Summary 
  from `gdelt-bq.internetarchivebooks.1920` where Themes like '%EDUCATION%' limit 1;

Note that we have taken a substring of full text of the book to generate the summary on.

The result of the query is as follows:

658bb0a9c9cf0938.png

11. Store books data in a table

Now that we have tested both the LLM calls (remote model and function) from BigQuery only using SQL queries, let's create a BigQuery table to store the "bookshelf" data with theme insights, in the same dataset as the remote model and the function.

We could actually include both the LLM model call and the remote function call in this step. But since we have marked the remote function call (that invokes the Cloud Function) as an optional step, we will only use the insights from the remote model.

Here is the query we will use:

SELECT
  BookMeta_Title, Themes, ml_generate_text_llm_result Context 
   FROM
  ML.GENERATE_TEXT (
    MODEL `bookshelf.llm_model`,
    (
 SELECT
      BookMeta_Title,
      Themes,BookMeta_FullText,
      CONCAT('Read all the words in the following text separated by ";" and come up with one single (most relating) theme that is applicable : ', Themes, '. Take the title of the book from the following value', BookMeta_Title, '. Return a meaningful and concise theme with one or two words, not more than that. If a theme is not possible, return a context from its title. But DO NOT return empty or null. ') AS prompt
    FROM `gdelt-bq.internetarchivebooks.1920` limit 5),
    STRUCT(
      0.2 AS temperature,
      100 AS max_output_tokens,
      TRUE AS flatten_json_output));

When you run the query in the BigQuery Editor, the result is as follows:

2c6e08e75a680867.png

Now, let's ask Gemini to create a table named "bookshelf.books" from the above query. Navigate to the Gemini chat console on the Google Cloud Console and enter the following prompt.

Here's the prompt we will use:

Create a BigQuery table named bookshelf.books from this SELECT query: 
SELECT
  BookMeta_Title, Themes, ml_generate_text_llm_result Context 
   FROM
  ML.GENERATE_TEXT (
    MODEL `bookshelf.llm_model`,
    (
 SELECT
      BookMeta_Title,
      Themes,BookMeta_FullText,
      CONCAT('Read all the words in the following text separated by ";" and come up with one single (most relating) theme that is applicable : ', Themes, '. Take the title of the book from the following value', BookMeta_Title, '. Return a meaningful and concise theme with one or two words, not more than that. If a theme is not possible, return a context from its title. But DO NOT return empty or null. ') AS prompt
    FROM `gdelt-bq.internetarchivebooks.1920` limit 5),
    STRUCT(
      0.2 AS temperature,
      100 AS max_output_tokens,
      TRUE AS flatten_json_output));

The Gemini chat response is as follows:

df6595a4b14f7b9.png

Here is the query in case you want to copy it from here directly:

CREATE TABLE bookshelf.books (
  BookMeta_Title STRING,
  Themes STRING,
  ml_generate_text_llm_result STRING
) AS (
  SELECT
    BookMeta_Title,
    Themes,
    ml_generate_text_llm_result Context
  FROM
    ML.GENERATE_TEXT (
      MODEL `bookshelf.llm_model`,
      (
        SELECT
          BookMeta_Title,
          Themes,
          BookMeta_FullText,
          CONCAT('Read all the words in the following text separated by ";" and come up with one single (most relating) theme that is applicable : ', Themes, '. Take the title of the book from the following value', BookMeta_Title, '. Return a meaningful and concise theme with one or two words, not more than that. If a theme is not possible, return a context from its title. But DO NOT return empty or null. ') AS prompt
        FROM `gdelt-bq.internetarchivebooks.1920`
        LIMIT 5
      ),
      STRUCT(
        0.2 AS temperature,
        100 AS max_output_tokens,
        TRUE AS flatten_json_output
      )
    )
);

After running the query in the BigQuery Editor, the result is as follows:

2d1ce716f844b7ad.png

That's it! Now query the table and play around with the data for more insights.

12. Congratulations

Congratulations! We have successfully completed the following and used Gemini in some steps of the process as well:

  • Created a remote model in BigQuery that invokes the Vertex AI "text-bison-32k" endpoint to identify the genre (or theme) of the book from a list of ";" separated keywords in the table.
  • Created a remote function in BigQuery that will invoke this deployed generative AI Cloud Function remotely. This function will take the prompt as input and output a string that summarizes the book in 5 lines.
  • Used the remote model and function to summarize the theme and text of a book with SQL queries and write the results to a new table in the bookshelf dataset.
  • As a follow-up assignment, try using Gemini to get the SQL for deleting the objects created in BigQuery. This will cover the cleanup step.