How to use Cloud Run functions and Gemini to summarize a text file uploaded to a Cloud Storage bucket

How to use Cloud Run functions and Gemini to summarize a text file uploaded to a Cloud Storage bucket

About this codelab

subjectLast updated Aug 30, 2024
account_circleWritten by a Googler

1. Introduction

Overview

Cloud Run functions is a new way to deploy workloads using the familiar GCF eventing paradigms and function signature. Instead of using our opinionated build process and deployment configurations, Cloud Run functions gives you direct control over the underlying Service created on Cloud Run.

In this section, you'll learn how to deploy an event-driven function in python that uses Gemini to summarize a plain text file uploaded to a Cloud Storage bucket.

What you'll learn

  • How to deploy an event-driven Cloud Run function that is triggered whenever an object is uploaded to a GCS bucket
  • How to create a service account with proper roles to receive an event from Cloud Storage and invoke the Cloud Run function
  • How to use Gemini to summarize a plain text document uploaded to Cloud Storage

2. Setup environment variables and enable APIs

Update gcloud CLI

This codelab requires a recent version of the gcloud CLI installed. You can update the CLI by running

gcloud components update

Enable APIs

Before you can start using this codelab, there are several APIs you will need to enable. This codelab requires using the following APIs. You can enable those APIs by running the following command:

gcloud services enable run.googleapis.com \
    cloudbuild.googleapis.com \
    storage.googleapis.com \
    artifactregistry.googleapis.com \
    eventarc.googleapis.com \
    aiplatform.googleapis.com

Setup environment variables

You can set environment variables that will be used throughout this codelab.

PROJECT_ID=<YOUR_PROJECT_ID>
REGION=<YOUR_REGION, e.g. us-central1>

gcloud config set project $PROJECT_ID
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format='value(projectNumber)')
SERVICE_NAME=crf-vertexai-codelab
BUCKET_NAME=$PROJECT_ID-$SERVICE_NAME
TRIGGER_NAME=$SERVICE_NAME-trigger

3. Create a Storage Bucket and a Service Account

Create a storage bucket

You can create a Cloud Storage bucket by running the following command:

gsutil mb -l us-central1 gs://$BUCKET_NAME

Create a service account

For this example, you'll create a service account with required EventArc permissions and Cloud Run invoker role to receive an event from Cloud Storage and invoke the Cloud Run function.

First, create the service account.

SERVICE_ACCOUNT="crf-vertexai-codelab"
SERVICE_ACCOUNT_ADDRESS=$SERVICE_ACCOUNT@$PROJECT_ID.iam.gserviceaccount.com

gcloud iam service-accounts create $SERVICE_ACCOUNT \
  --display-name="Cloud Run functions Eventarc service account"

Next, grant the Eventarc Event Receiver role (roles/eventarc.eventReceiver) on the project to the service account associated with your Eventarc trigger so that the trigger can receive events from event providers.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/eventarc.eventReceiver

Then, grant the service account the Cloud Run invoker role so that it can invoke the function.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
  --role=roles/run.invoker

Now, grant the service account the AI Platform User role so it can make calls to Gemini.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
    --role="roles/aiplatform.user"

And grant the service account the Storage Object Viewer role so it can access the file.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:$SERVICE_ACCOUNT_ADDRESS \
    --role="roles/storage.objectViewer"

Cloud Pub/Sub needs the role roles/iam.serviceAccountTokenCreator on your project to create identity tokens.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:service-$PROJECT_NUMBER@gcp-sa-pubsub.iam.gserviceaccount.com \
  --role=roles/iam.serviceAccountTokenCreator

Your trigger needs the role roles/pubsub.publisher granted to the Google Cloud Storage service account to receive events via Cloud Storage.

gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member serviceAccount:service-$PROJECT_NUMBER@gs-project-accounts.iam.gserviceaccount.com \
  --role=roles/pubsub.publisher

4. Create and deploy the function

First, create a directory for the source code and cd into that directory.

mkdir $SERVICE_NAME && cd $_

Then, create a requirements.txt file with the following content:

functions-framework==3.*
google-cloud-aiplatform==1.63.*
google-cloud-storage==2.16.*

Next, create an main.py file with the following content:

import functions_framework
import vertexai
from vertexai.generative_models import GenerativeModel
from google.cloud import storage

vertexai.init(project="<YOUR_PROJECT_ID>", location="us-central1")

model = GenerativeModel(
    model_name="gemini-1.5-pro-001",
    system_instruction=[
        "Summarize the following document in a single sentence. Do not respond with more than one sentence.",
    ],
)

# Triggered by a change in a storage bucket
@functions_framework.cloud_event
def hello_gcs(cloud_event):
    data = cloud_event.data

    # download the file
    storage_client = storage.Client()
    blob = storage_client.bucket(data["bucket"]).get_blob(data["name"])
    #print(blob)

    doc = blob.download_as_text()
    contents = [doc]

    response = model.generate_content(contents)
    print(response.text)

    print(f"Response from Model: {response.text}")

Now you can deploy the Cloud Run function by running the following command:

gcloud beta run deploy $SERVICE_NAME \
      --source . \
      --function hello_gcs \
      --region $REGION \
      --no-allow-unauthenticated \
      --service-account $SERVICE_ACCOUNT_ADDRESS

Please note the following:

  • the --source flag is used to tell Cloud Run to build the function into a runnable container based service
  • the --function flag (new) is used to set the entrypoint of the new service to be the function signature you want to be invoked
  • (optional) the --no-allow-unauthenticated to prevent your function from being publicly invokable

You may be asked "Deploying from source requires an Artifact Registry Docker repository to store built containers. A repository named [cloud-run-source-deploy] in region [<YOUR_REGION>] will be created." Accept the default yes to create the repository.

You can view your new service crf-vertexai-codelab by running the following command:

gcloud beta run services describe $SERVICE_NAME --region $REGION

5. Create the event

We can create an Eventarc trigger to send messages to our function every time an object is finalized in Google Cloud Storage:

BUCKET_REGION=$REGION

gcloud eventarc triggers create $TRIGGER_NAME \
     --location=$REGION \
     --destination-run-service=$SERVICE_NAME \
  --destination-run-region=$BUCKET_REGION \
     --event-filters="type=google.cloud.storage.object.v1.finalized" \
     --event-filters="bucket=$BUCKET_NAME" \
     --service-account=$SERVICE_ACCOUNT_ADDRESS

Please note for the --event-filters flag, do not use the gs:// prefix in your bucket name.

If you see an error If you recently started to use Eventarc, it may take a few minutes before all necessary permissions are propagated to the Service Agent. please wait a few minutes before trying again.

A detailed tutorial of setting up Trigger service from Cloud Storage using Eventarc can be found in the Cloud Run documentation here: https://cloud.google.com/run/docs/tutorials/eventarc

6. Test the Function

With our function deployed and a trigger created, we are now ready to invoke the function.

Create a file and upload it to your Cloud Storage bucket. You can do this through the Cloud Console web interface, or using the gsutil CLI tool, e.g.

gsutil cp <YOUR_PLAIN_TEXT_FILE> gs://$BUCKET_NAME

When the file is successfully uploaded, an event will be generated and your function will call Gemini to summarize the plain text file. The summary will be printed to the logs.

You can either view the logs in the Cloud Console for the Cloud Run service, or you can run the following command:

gcloud logging read "resource.type=cloud_run_revision AND resource.labels.service_name=$SERVICE_NAME AND textPayload: Response"

For example, uploading a plain text file of the Cloud Run functions user guide for private preview results in the following printed to the logs:

Response from Model: Cloud Run functions offer a new way to deploy serverless workloads with familiar Google Cloud Functions paradigms while providing control over the underlying Cloud Run service. 

7. Congratulations!

Congratulations for completing the codelab!

We recommend reviewing the documentation for Cloud Run functions

What we've covered

  • How to deploy an event-driven Cloud Run function that is triggered whenever an object is uploaded to a GCS bucket
  • How to create a service account with proper roles to receive an event from Cloud Storage and invoke the Cloud Run function
  • How to use Gemini to summarize a plain text document uploaded to Cloud Storage

8. Clean up

To avoid inadvertent charges, (for example, if this Cloud Run servcie is inadvertently invoked more times than your monthly Cloud Run invokement allocation in the free tier), you can either delete the Cloud Run service or delete the project you created in Step 2.

To delete the Cloud Run services, go to the Cloud Run Cloud Console at https://console.cloud.google.com/run/ and delete the crf-vertexai-codelab service you created in this codelab.

If you choose to delete the entire project, you can go to https://console.cloud.google.com/cloud-resource-manager, select the project you created in Step 2, and choose Delete. If you delete the project, you'll need to change projects in your Cloud SDK. You can view the list of all available projects by running gcloud projects list.