Use BigQuery Remote Functions to ask questions to Vertex AI Visual Question Answering (VQA) in a SQL query

1. Introduction

Overview

BigQuery remote functions allows you to implement a function in languages other than SQL and JavaScript or with the libraries and services not allowed in BigQuery user-defined functions. BigQuery remote functions provide a direct integration with Cloud Functions and Cloud Run. You can invoke a BigQuery remote function within a SQL query by taking one or more columns as input and then returning a single value as the output.

Cloud Functions is a lightweight compute solution for developers to create single-purpose, stand-alone functions that can be triggered using HTTPS or respond to CloudEvents without needing to manage a server or runtime environment. Cloud Functions support Node.js, Python, Go, Java, .NET, Ruby, and PHP.

In this codelab, you'll learn how to create a BigQuery remote function to get answers to a question about images stored in Cloud Storage using Vertex AI Visual Question Answering (VQA). Your SQL query will retrieve a URI for an image from a table in BigQuery. Then using a BigQuery remote function, you will send the image URI to a Cloud Function that will respond back with answers from VQA about the image.

Illustration

5832020184ccf2b2.png

From a development perspective, these the steps you will complete in this codelab:

  1. Create the HTTP endpoint in Cloud Functions
  2. Create a connection of type CLOUD_RESOURCE
  3. Create a BigQuery object table for the Cloud Storage bucket
  4. Create the remote function
  5. Use the remote function in a query just like any other user-defined functions

What you'll learn

2. Setup and Requirements

Prerequisites

Activate Cloud Shell

  1. From the Cloud Console, click Activate Cloud Shell d1264ca30785e435.png.

cb81e7c8e34bc8d.png

If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.

d95252b003979716.png

It should only take a few moments to provision and connect to Cloud Shell.

7833d5e1c5d18f54.png

This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.

Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.

  1. Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`
  1. Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

3. Setup local environment variables

In this code, you will create a few environment variables to improve the readability of the gcloud commands used in this codelab.

PROJECT_ID=$(gcloud config get-value project)

# Cloud Function variables
FUNCTION_NAME="imagen_vqa"
FUNCTION_REGION="us-central1"

# Cloud Function variables
BUCKET_NAME=$PROJECT_ID-imagen-vqa

# BigQuery variables
DATASET_ID="remote_function_codelab"
TABLE_NAME="images"
BQ_REGION="US"
CONNECTION_ID="imagen_vqa_connection"

4. Create the Cloud Function

To create a BigQuery remote function, you must first create an HTTP endpoint by using either Cloud Functions or Cloud Run. The endpoint must be able to process a batch of rows in a single HTTP POST request and return the results for the batch as an HTTP response.

This Cloud Function will receive the image storage URI and question prompt as input from your SQL query, and return the answer from Visual Question Answering (VQA).

This codelab uses an example for the python311 runtime using the Vertex AI SDK for Python.

Create the source code for the function

First, create a directory and cd into that directory.

mkdir imagen-vqa && cd $_

Then, create a requirements.txt file.

google-cloud-aiplatform[preview]
google-cloud-storage
functions-framework==3.*

Next, create a main.py source file.

from vertexai.preview.vision_models import ImageQnAModel
from vertexai.preview.vision_models import Image
from flask import jsonify
from google.cloud import storage
from urllib.parse import urlparse
import functions_framework

# This is the entry point for the cloud function
@functions_framework.http
def imagen_vqa(request):
    try:
        # See if you can parse the incoming JSON
        return_value = []
        request_json = request.get_json()
        # This grabs the input into the function as called from the SQL function 
        calls = request_json['calls']
        for call in calls:
            # We call the VQA function here in another function defined below
            ai_result = vqa(call)
            # The result to BigQuery is in the order it was prepared in 
            return_value.append(ai_result[0])
        # Prepare the response back to BigQuery
        return_json = jsonify( { "replies": return_value } )
        return return_json
    except Exception as e:
        return jsonify( { "errorMessage": str(e) } ), 400

# Helper function to split apart the GCS URI 
def decode_gcs_url(url):
    # Read the URI and parse it
    p = urlparse(url)
    bucket = p.netloc
    file_path = p.path[0:].split('/', 1)
    # Return the relevant objects (bucket, path to object)
    return bucket, file_path[1]
    
# We can't use the image load from local file since it expects a local path
# We use a GCS URL and get the bytes of the image 
def read_file(object_path):
    # Parse the path
    bucket, file_path = decode_gcs_url(object_path)
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket)
    blob = bucket.blob(file_path)
    # Return the object as bytes
    return blob.download_as_bytes()

# This is the function that calls the VQA function
def vqa (parameters):
    # This is the model we want to use
    image_qna_model = ImageQnAModel.from_pretrained("imagetext@001")
    # The location is the first parameter 
    image_loc = parameters[0]
    # Get the bytes 
    image_bytes = read_file(image_loc)
    # Load the bytes into the Image handler
    input_image = Image(image_bytes)
    # Ask the VQA the question
    results = image_qna_model.ask_question(
        image=input_image,
        # The prompt was the second parameter
        question=parameters[1],
        number_of_results=1
    )
    return results

Deploy the Cloud Function

Now you can deploy your Cloud Function for the python311 runtime.

gcloud functions deploy $FUNCTION_NAME \
--gen2 \
--region=$FUNCTION_REGION \
--runtime=python311 \
--trigger-http \
--source=. \
--no-allow-unauthenticated

and then you can save the Function URL as an environment variable to use later.

ENDPOINT_URL="$(gcloud functions describe $FUNCTION_NAME --gen2 --region us-central1 --format='get(serviceConfig.uri)')"

5. Create the Cloud Storage Bucket

First, create a Cloud Storage bucket to store your images.

gcloud storage buckets create gs://$BUCKET_NAME

Next, upload an image for VQA to use. This codelab uses the example image from the VQA documentation.

You can either use the Cloud Console for Cloud Storage to upload the image directly into your bucket. Or you can run the following commands to download the example image to your current Cloud Shell directory

wget -O image.jpg -o /dev/null https://unsplash.com/photos/QqN25A3iF9w/download?ixid=M3wxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjk1NzYxMjY2fA&force=true

and then upload to your Cloud Storage bucket.

gcloud storage cp image.jpg gs://$BUCKET_NAME

6. Create a BigQuery Cloud Resource connection

BigQuery uses a CLOUD_RESOURCE connection to interact with your Cloud Function. Run the following command to create this connection.

bq mk --connection --location=$BQ_REGION --project_id=$PROJECT_ID \
--connection_type=CLOUD_RESOURCE $CONNECTION_ID

Next, display the details of the new BigQuery connection.

bq show --connection $PROJECT_ID.$BQ_REGION.$CONNECTION_ID

Save the name of the BigQuery connection service account into a variable, as shown.

CONNECTION_SA="<YOUR-SERVICE-ACCOUNT-ID>@gcp-sa-bigquery-condel.iam.gserviceaccount.com"

Grant access to the service account to access your Cloud Storage bucket.

gsutil iam ch serviceAccount:$CONNECTION_SA:objectAdmin gs://$BUCKET_NAME

7. Create a BigQuery Object Table

BigQuery object tables are read-only tables over unstructured data objects that reside in Cloud Storage.

Object tables let you analyze unstructured data in Cloud Storage. You can perform analysis with remote functions and then join the results of these operations with the rest of your structured data in BigQuery.

First, create a dataset.

bq --location=$BQ_REGION mk \
    --dataset \
    $DATASET_ID

The following command creates an object table based on your Cloud Storage images bucket. The resulting table will contain the URIs for all images in that bucket.

bq mk --table \
--external_table_definition=gs://$BUCKET_NAME/*@$BQ_REGION.$CONNECTION_ID \
--object_metadata=SIMPLE \
$PROJECT_ID:$DATASET_ID.$TABLE_NAME

8. Create the BigQuery Remote Function

The last step is to now configure the BigQuery Remote Function.

First, grant the BigQuery connection service account permissions to invoke the Cloud Function. It is not recommended to allow unauthenticated invocation for your Cloud Function service.

gcloud functions add-invoker-policy-binding $FUNCTION_NAME \
 --member=serviceAccount:$CONNECTION_SA

Next, save the SQL Query to a variable.

SQL_CREATE_FUNCTION="CREATE FUNCTION \`$PROJECT_ID.$DATASET_ID\`.vqa(uri STRING, image_prompt STRING) RETURNS STRING
REMOTE WITH CONNECTION \`$PROJECT_ID.$BQ_REGION.$CONNECTION_ID\`
OPTIONS (
  endpoint = '$ENDPOINT_URL'
)"

And now run the query.

bq query --nouse_legacy_sql $SQL_CREATE_FUNCTION

After running the query to create the remote function, you will see Created <your-project-id>.remote_function_codelab.vqa

9. Call the BigQuery Remote Function in a SQL query

You have now completed the development steps for creating the remote function. You can now call your Cloud Function from within a SQL query.

First, save your question and SQL Query to a variable. This codelab uses the example from the Visual Question Answering documentation. This query uses the latest image added to your storage bucket.

export SQL_QUERY="DECLARE question STRING DEFAULT 'What objects are in the image?';
SELECT uri, image_prompt ,\`$DATASET_ID\`.vqa(uri, image_prompt) as result
FROM ( 
  SELECT 
  *, 
  dense_rank() over (order by updated) as rnk ,
  question as image_prompt
  FROM \`$PROJECT_ID.$DATASET_ID.images\`) as innertable
  WHERE rnk  = 1;
"

Then run the SQL Query to show the response from the Vertex AI Visual Question Answering (VQA) service.

bq query --nouse_legacy_sql $SQL_QUERY

The results should look similar to the example output below:

+---------------------------------+--------------------------------+----------+
|               uri               |    image_prompt                |  result  |
+---------------------------------+--------------------------------+----------+
| gs://<YOUR_BUCKET>/image.jpg    | What objects are in the image? |  marbles |
+---------------------------------+--------------------------------+----------+

10. Troubleshooting

When creating the BigQuery table, if you receive an error BigQuery error in mk operation: Source URI must be a Google Cloud Storage location: gs://$BUCKET_NAME make sure you have included the /* path after the $BUCKET_NAME in the command.

When running your SQL query, if you get an error Access Denied: BigQuery BigQuery: Received response code 403 from endpoint <your-function-endpoint>, try waiting about 1-2 minutes for Cloud Function Invoker role permission grant to propagate to the BigQuery connection service account before retrying.

11. Congratulations!

Congratulations for completing the codelab!

We recommend reviewing the documentation on BigQuery Remote Functions and Visual Question Answering (VQA).

What we've covered

  • How to configure authentication on a Cloud Function and verify authentication has been properly configured
  • Invoke an authenticated function from a local development environment by providing the token for your gcloud identity
  • How to create a service account and grant it the appropriate role to invoke a function
  • How to impersonate a service from a local development environment that has the appropriate roles for invoking a function

12. Clean up

To avoid inadvertent charges, (for example, if this Cloud Function is inadvertently invoked more times than your monthly Cloud Function invokement allocation in the free tier), you can either delete the Cloud Function or delete the project you created in Step 2.

To delete the Cloud Function, go to the Cloud Function Cloud Console at https://console.cloud.google.com/functions/ and delete the imagen_vqa function (or the $FUNCTION_NAME in case you used a different name).

If you choose to delete the entire project, you can go to https://console.cloud.google.com/cloud-resource-manager, select the project you created in Step 2, and choose Delete. If you delete the project, you'll need to change projects in your Cloud SDK. You can view the list of all available projects by running gcloud projects list.