Prototype to Production: Getting predictions from custom trained models

1. Overview

In this lab, you'll use Vertex AI to get online and batch predictions from a custom trained model.

This lab is part of the Prototype to Production video series. Be sure to complete the previous lab before trying out this one. You can watch the accompanying video to learn more:

What you learn

You'll learn how to:

Upload models to Vertex AI Model Registry
Deploy a model to an endpoint
Get online and batch predictions with the UI and SDK

The total cost to run this lab on Google Cloud is about $1.

2. Intro to Vertex AI

This lab uses the newest AI product offering available on Google Cloud. Vertex AI integrates the ML offerings across Google Cloud into a seamless development experience. Previously, models trained with AutoML and custom models were accessible via separate services. The new offering combines both into a single API, along with other new products. You can also migrate existing projects to Vertex AI.

Vertex AI includes many different products to support end-to-end ML workflows. This lab will focus on the products highlighted below: Predictions and Workbench

Vertex product overview

3. Set up your environment

Complete the steps in the Training custom models with Vertex AI lab to set up your environment.

4. Upload model to registry

Before we can use our model to get predictions, we need to get it uploaded to Vertex AI Model Registry, which is a repository where you can manage the lifecycle of your ML models.

You can upload models when you configure a custom training job, as shown below.

training_prediction

Or you can import models after the training job has completed as long as you store the saved model artifacts in a Cloud Storage bucket. This is the option we'll use in this lab.

Navigate to the Models section within the console.

Select IMPORT

import_model

Select Import as new model and then provide a name for your model

new_model

Under Model settings import the model with a pre-built container and use TensorFlow 2.8. You can see the full list of pre-built prediction containers here.

Then provide the path to the cloud storage bucket where you saved out the model artifacts in the custom training job. This should look something like gs://{PROJECT_ID}-bucket/model_output

We'll skip the Explainability section, but if you'd like to learn more about Vertex Explainable AI, check out the docs.

When the model is imported, you'll see it in the registry.

flower_model

Note that if you wanted to do this through the SDK instead of the UI, you can run the following from your Workbench notebook to upload the model.

from google.cloud import aiplatform

my_model = aiplatform.Model.upload(display_name='flower-model',
                                  artifact_uri='gs://{PROJECT_ID}-bucket/model_output',
                                  serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest')

5. Deploy model to endpoint

There are two types of prediction jobs we can run in Vertex AI: batch and online.

Batch prediction is an asynchronous request. It's a good fit when you don't require an immediate response and want to process accumulated data in a single request.

On the other hand, if you wanted to get low latency predictions from data passed to your model on the fly, you would use online prediction.

Now that the model is in the registry, we can use it for batch predictions.

But if we want to get online predictions, we'll need to deploy the model to an endpoint. This associates the saved model artifacts with physical resources for low latency predictions.

To deploy to an endpoint, select the three dots on the far right of the model, then select Deploy to endpoint.

deploy_model

Give your endpoint a name, and then leave the rest of the settings as is and click CONTINUE

endpoint_name

Endpoints support autoscaling, which means that you can set a minimum and maximum, and compute nodes will scale to meet traffic demand within those boundaries.

Since this lab is just for demonstration and we aren't going to use this endpoint for high traffic, you can set the Maximum number of compute notes to 1, and select n1-standard-4 as the Machine type.

endpoint_compute

We'll skip Model monitoring, but if you'd like to learn more about this feature, check out the docs.

Then click DEPLOY

Deploying will take a few minutes, but once it's done, you'll see that your model's deployment status has changed to Deployed on Vertex AI.

If you would like to deploy a model via the SDK, run the command below.

my_model = aiplatform.Model("projects/{PROJECT_NUMBER}/locations/us-central1/models/{MODEL_ID}") 

endpoint = my_model.deploy(
     deployed_model_display_name='my-endpoint',
     traffic_split={"0": 100},
     machine_type="n1-standard-4",
     accelerator_count=0,
     min_replica_count=1,
     max_replica_count=1,
   )

6. Get predictions

Online predictions

When your model is deployed to an endpoint, you can hit it like any other rest endpoint, this means you can call it from a cloud function, chatbot, a web app, etc.

For demonstration purposes we'll call this endpoint from Workbench.

Return to the notebook you created in the previous lab. From the launcher, create a new TensorFlow 2 notebook.

tf_kernel

Import the Vertex AI Python SDK, numpy, and PIL

from google.cloud import aiplatform

import numpy as np
from PIL import Image

Download the image below and upload it to your workbench instance. We'll test out the model on this image of a dandelion.

test_image

First, define the endpoint. You'll need to replace the {PROJECT_NUMBER} and the {ENDPOINT_ID} below.

endpoint = aiplatform.Endpoint(
    endpoint_name="projects/{PROJECT_NUMBER}/locations/us-central1/endpoints/{ENDPOINT_ID}")

You can find your endpoint_id in the Endpoints section of the cloud Console.

endpoint_id

And you can find your Project Number on the home page of the console. Note that this is different from the Project ID.

project_number

The code below opens and resizes the image with PIL.

IMAGE_PATH = "test-image.jpg"
im = Image.open(IMAGE_PATH)

Then, convert the numpy data to type float32 and to a list. We convert to a list because numpy data is not JSON serializable so we can't send it in the body of our request.

x_test = np.asarray(im).astype(np.float32).tolist()

Finally, call endpoint.predict.

endpoint.predict(instances=x_test).predictions

The result you get is the output of the model, which is a softmax layer with 5 units. If you wanted to write custom logic to return the string label instead of the index, you can use custom prediction routines.

Batch predictions

There are different ways to format your data for batch prediction. For simplicty, we'll dump the numpy data to a json file and save the file to Cloud Storage.

with open('test-data.json', 'w') as fp:
    json.dump(x_test, fp)

!gsutil cp test-data.json gs://{YOUR_BUCKET}

Next, define the model. This is similar to defining the endpoint, except that you'll need to provide the MODEL_ID instead of an ENDPOINT_ID.

my_model=aiplatform.Model("projects/{PROJECT_NUMBER}/locations/us-central1/models/{MODEL_ID}")

You can find the model ID by selecting the model name and version from the Models section of the console, and then selecting VERSION DETAILS

model_id

Lastly, use the SDK to call a batch prediction job, passing in the Cloud Storage path where you stored the json file, and providing a Cloud Storage location for the prediction results to be stored.

batch_prediction_job = my_model.batch_predict(
    job_display_name='flower_batch_predict',
    gcs_source='gs://{YOUR_BUCKET}/test-data.json',
    gcs_destination_prefix='gs://{YOUR_BUCKET}/prediction-results',
    machine_type='n1-standard-4',)

You can track the job progress in the Batch Predictions section of the console. Note that running a batch prediction job for a single image is not efficient.

batch_pred

What's next

In this example, we converted the test image to NumPy first before making the prediction call. For more realistic use cases, you'll probably want to send the image itself and not have to load it into NumPy first. To do this, you'll need to adjust your TensorFlow serving function to decode image bytes. This requires a little more more work, but will be a lot more efficient for larger images and application building. You can see an example in this notebook.

🎉 Congratulations! 🎉

You've learned how to use Vertex AI to:

Upload models to the Vertex AI Model Registry
Get batch and online predictions

To learn more about different parts of Vertex, check out the documentation.

7. Cleanup

You'll want to undeploy the models from the endpoint if you're not planning to use them. You can also delete the endpoint entirely. You can always redeploy a model to an endpoint if you need to.

undeploy_model

Workbench managed notebooks time out automatically after 180 idle minutes, so you don't need to worry about shutting the instance down. If you would like to manually shut down the instance, click the Stop button on the Vertex AI Workbench section of the console. If you'd like to delete the notebook entirely, click the Delete button.

Stop instance

To delete the Storage Bucket, using the Navigation menu in your Cloud Console, browse to Storage, select your bucket, and click Delete:

Delete storage