1. Introduction
Overview
BigQuery remote functions allows you to implement a function in languages other than SQL and JavaScript or with the libraries and services not allowed in BigQuery user-defined functions. BigQuery remote functions provide a direct integration with Cloud Run functions and Cloud Run. You can invoke a BigQuery remote function within a SQL query by taking one or more columns as input and then returning a single value as the output.
Cloud Run functions is a lightweight compute solution for developers to create single-purpose, stand-alone functions that can be triggered using HTTPS or respond to CloudEvents without needing to manage a server or runtime environment. Cloud Run functions support Node.js, Python, Go, Java, .NET, Ruby, and PHP.
In this codelab, you'll learn how to create a BigQuery remote function to get answers to a question about images stored in Cloud Storage using Vertex AI Visual Question Answering (VQA). Your SQL query will retrieve a URI for an image from a table in BigQuery. Then using a BigQuery remote function, you will send the image URI to a Cloud Run function that will respond back with answers from VQA about the image.
Illustration
From a development perspective, these the steps you will complete in this codelab:
- Create the HTTP endpoint in Cloud Run functions
- Create a connection of type CLOUD_RESOURCE
- Create a BigQuery object table for the Cloud Storage bucket
- Create the remote function
- Use the remote function in a query just like any other user-defined functions
What you'll learn
- How to create a HTTP Cloud Run function in Python
- How to create and use a BigQuery Remote Function within a SQL query
- How to create a BigQuery object table
- How to use the Vertex AI SDK for Python to use Visual Question Answering (VQA)
2. Setup and Requirements
Prerequisites
- You are logged into the Cloud Console.
- You have previously deployed an HTTP Cloud Run function. See Python quickstart.
- You have previously created a bucket in Cloud Storage. See Cloud Storage quickstart.
- You have the appropriate roles to create a dataset, table, and remote function within BigQuery. See Load and Query Data in BigQuery quickstart.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.
Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
3. Setup local environment variables
In this code, you will create a few environment variables to improve the readability of the gcloud
commands used in this codelab.
PROJECT_ID=$(gcloud config get-value project) # Cloud Function variables FUNCTION_NAME="imagen-vqa" FUNCTION_REGION="us-central1" # Cloud Function variables BUCKET_NAME=$PROJECT_ID-imagen-vqa # BigQuery variables DATASET_ID="remote_function_codelab" TABLE_NAME="images" BQ_REGION="US" CONNECTION_ID="imagen_vqa_connection"
4. Create the Cloud Run function
To create a BigQuery remote function, you must first create an HTTP endpoint by using Cloud Run function. The endpoint must be able to process a batch of rows in a single HTTP POST request and return the results for the batch as an HTTP response.
This Cloud Run function will receive the image storage URI and question prompt as input from your SQL query, and return the answer from Visual Question Answering (VQA).
This codelab uses an example for the python311 runtime using the Vertex AI SDK for Python.
Create the source code for the function
First, create a directory and cd into that directory.
mkdir imagen-vqa && cd $_
Then, create a requirements.txt
file.
google-cloud-aiplatform[preview] google-cloud-storage functions-framework==3.*
Next, create a main.py
source file.
from vertexai.preview.vision_models import ImageQnAModel from vertexai.preview.vision_models import Image from flask import jsonify from google.cloud import storage from urllib.parse import urlparse import functions_framework # This is the entry point for the cloud function @functions_framework.http def imagen_vqa(request): try: # See if you can parse the incoming JSON return_value = [] request_json = request.get_json() # This grabs the input into the function as called from the SQL function calls = request_json['calls'] for call in calls: # We call the VQA function here in another function defined below ai_result = vqa(call) # The result to BigQuery is in the order it was prepared in return_value.append(ai_result[0]) # Prepare the response back to BigQuery return_json = jsonify( { "replies": return_value } ) return return_json except Exception as e: return jsonify( { "errorMessage": str(e) } ), 400 # Helper function to split apart the GCS URI def decode_gcs_url(url): # Read the URI and parse it p = urlparse(url) bucket = p.netloc file_path = p.path[0:].split('/', 1) # Return the relevant objects (bucket, path to object) return bucket, file_path[1] # We can't use the image load from local file since it expects a local path # We use a GCS URL and get the bytes of the image def read_file(object_path): # Parse the path bucket, file_path = decode_gcs_url(object_path) storage_client = storage.Client() bucket = storage_client.bucket(bucket) blob = bucket.blob(file_path) # Return the object as bytes return blob.download_as_bytes() # This is the function that calls the VQA function def vqa (parameters): # This is the model we want to use image_qna_model = ImageQnAModel.from_pretrained("imagetext@001") # The location is the first parameter image_loc = parameters[0] # Get the bytes image_bytes = read_file(image_loc) # Load the bytes into the Image handler input_image = Image(image_bytes) # Ask the VQA the question results = image_qna_model.ask_question( image=input_image, # The prompt was the second parameter question=parameters[1], number_of_results=1 ) return results
Deploy the Cloud Run function
Now you can deploy your Cloud Run function for the python311 runtime.
To deploy a Cloud Run function directly onto Cloud Run, run the following command:
gcloud beta run deploy $FUNCTION_NAME \ --source . \ --function imagen_vqa \ --region $FUNCTION_REGION \ --no-allow-unauthenticated
If you prefer to deploy as a Cloud Functions 2nd gen, use the following command:
gcloud functions deploy $FUNCTION_NAME \ --gen2 \ --region=$FUNCTION_REGION \ --runtime=python311 \ --trigger-http \ --source=. \ --no-allow-unauthenticated
and then you can save the Function URL as an environment variable to use later.
ENDPOINT_URL="$(gcloud beta run services describe $FUNCTION_NAME --region $FUNCTION_REGION --format='value(status.url)')"
5. Create the Cloud Storage Bucket
First, create a Cloud Storage bucket to store your images.
gcloud storage buckets create gs://$BUCKET_NAME
Next, upload an image for VQA to use. This codelab uses the example image from the VQA documentation.
You can either use the Cloud Console for Cloud Storage to upload the image directly into your bucket. Or you can run the following commands to download the example image to your current Cloud Shell directory
wget -O image.jpg -o /dev/null https://unsplash.com/photos/QqN25A3iF9w/download?ixid=M3wxMjA3fDB8MXxhbGx8fHx8fHx8fHwxNjk1NzYxMjY2fA&force=true
and then upload to your Cloud Storage bucket.
gcloud storage cp image.jpg gs://$BUCKET_NAME
6. Create a BigQuery Cloud Resource connection
BigQuery uses a CLOUD_RESOURCE connection to interact with your Cloud Function. Run the following command to create this connection.
bq mk --connection --location=$BQ_REGION --project_id=$PROJECT_ID \ --connection_type=CLOUD_RESOURCE $CONNECTION_ID
Next, display the details of the new BigQuery connection.
bq show --connection $PROJECT_ID.$BQ_REGION.$CONNECTION_ID
Save the name of the BigQuery connection service account into a variable, as shown.
CONNECTION_SA="<YOUR-SERVICE-ACCOUNT-ID>@gcp-sa-bigquery-condel.iam.gserviceaccount.com"
Grant access to the service account to access your Cloud Storage bucket.
gsutil iam ch serviceAccount:$CONNECTION_SA:objectAdmin gs://$BUCKET_NAME
7. Create a BigQuery Object Table
BigQuery object tables are read-only tables over unstructured data objects that reside in Cloud Storage.
Object tables let you analyze unstructured data in Cloud Storage. You can perform analysis with remote functions and then join the results of these operations with the rest of your structured data in BigQuery.
First, create a dataset.
bq --location=$BQ_REGION mk \ --dataset \ $DATASET_ID
The following command creates an object table based on your Cloud Storage images bucket. The resulting table will contain the URIs for all images in that bucket.
bq mk --table \ --external_table_definition=gs://$BUCKET_NAME/*@$BQ_REGION.$CONNECTION_ID \ --object_metadata=SIMPLE \ $PROJECT_ID:$DATASET_ID.$TABLE_NAME
8. Create the BigQuery Remote Function
The last step is to now configure the BigQuery Remote Function.
First, grant the BigQuery connection service account permissions to invoke the Cloud Run function. It is not recommended to allow unauthenticated invocation for your Cloud Run function service.
gcloud run services add-iam-policy-binding $FUNCTION_NAME \ --member=serviceAccount:$CONNECTION_SA \ --role="roles/run.invoker" \ --region $FUNCTION_REGION
Next, save the SQL Query to a variable.
SQL_CREATE_FUNCTION="CREATE FUNCTION \`$PROJECT_ID.$DATASET_ID\`.vqa(uri STRING, image_prompt STRING) RETURNS STRING REMOTE WITH CONNECTION \`$PROJECT_ID.$BQ_REGION.$CONNECTION_ID\` OPTIONS ( endpoint = '$ENDPOINT_URL' )"
And now run the query.
bq query --nouse_legacy_sql $SQL_CREATE_FUNCTION
After running the query to create the remote function, you will see Created <your-project-id>.remote_function_codelab.vqa
9. Call the BigQuery Remote Function in a SQL query
You have now completed the development steps for creating the remote function. You can now call your Cloud Run function from within a SQL query.
First, save your question and SQL Query to a variable. This codelab uses the example from the Visual Question Answering documentation. This query uses the latest image added to your storage bucket.
export SQL_QUERY="DECLARE question STRING DEFAULT 'What objects are in the image?'; SELECT uri, image_prompt ,\`$DATASET_ID\`.vqa(uri, image_prompt) as result FROM ( SELECT *, dense_rank() over (order by updated) as rnk , question as image_prompt FROM \`$PROJECT_ID.$DATASET_ID.images\`) as innertable WHERE rnk = 1; "
Then run the SQL Query to show the response from the Vertex AI Visual Question Answering (VQA) service.
bq query --nouse_legacy_sql $SQL_QUERY
The results should look similar to the example output below:
+---------------------------------+--------------------------------+----------+ | uri | image_prompt | result | +---------------------------------+--------------------------------+----------+ | gs://<YOUR_BUCKET>/image.jpg | What objects are in the image? | marbles | +---------------------------------+--------------------------------+----------+
10. Troubleshooting
When creating the BigQuery table, if you receive an error BigQuery error in mk operation: Source URI must be a Google Cloud Storage location: gs://$BUCKET_NAME
make sure you have included the /*
path after the $BUCKET_NAME
in the command.
When running your SQL query, if you get an error Access Denied: BigQuery BigQuery: Received response code 403 from endpoint <your-function-endpoint>
, try waiting about 1-2 minutes for Cloud Function Invoker role permission grant to propagate to the BigQuery connection service account before retrying.
11. Congratulations!
Congratulations for completing the codelab!
We recommend reviewing the documentation on BigQuery Remote Functions and Visual Question Answering (VQA).
What we've covered
- How to configure authentication on a Cloud Run function and verify authentication has been properly configured
- Invoke an authenticated function from a local development environment by providing the token for your gcloud identity
- How to create a service account and grant it the appropriate role to invoke a function
- How to impersonate a service from a local development environment that has the appropriate roles for invoking a function
12. Clean up
To avoid inadvertent charges, (for example, if this Cloud Run function is inadvertently invoked more times than your monthly Cloud Run function invokement allocation in the free tier), you can either delete the Cloud Function or delete the project you created in Step 2.
To delete the Cloud Run function, go to the Cloud Run Cloud Console at https://console.cloud.google.com/functions/ and delete the imagen-vqa function (or the $FUNCTION_NAME in case you used a different name).
If you choose to delete the entire project, you can go to https://console.cloud.google.com/cloud-resource-manager, select the project you created in Step 2, and choose Delete. If you delete the project, you'll need to change projects in your Cloud SDK. You can view the list of all available projects by running gcloud projects list
.