Vertex AI:Use custom prediction routines with Sklearn to pre process and post process data for predictions

1. Introduction

In this lab, you'll learn how to use custom prediction routines on Vertex AI to write custom preprocessing and postprocessing logic. While this sample uses Scikit-learn, custom prediction routines can work with other Python ML frameworks such as XGBoost, PyTorch, and TensorFlow.

What you'll learn

  • Write custom prediction logic with custom prediction routines
  • Test the custom serving container and model locally
  • Test the custom serving container on Vertex AI Predictions

2. Intro to Vertex AI

This lab uses the newest AI product offering available on Google Cloud. Vertex AI integrates the ML offerings across Google Cloud into a seamless development experience. Previously, models trained with AutoML and custom models were accessible via separate services. The new offering combines both into a single API, along with other new products. You can also migrate existing projects to Vertex AI.

Vertex AI includes many different products to support end-to-end ML workflows. This lab will focus on Predictions and Workbench.


3. Use Case Overview

In this lab, you'll build a random forest regression model to predict the price of a diamond based on attributes like cut, clarity, and size.

You'll write custom preprocessing logic to check that the data at serving time is in the format expected by the model. You'll also write custom post processing logic to round the predictions and convert them to strings. To write this logic, you'll use custom prediction routines.

Introduction to custom prediction routines

The Vertex AI pre-built containers handle prediction requests by performing the prediction operation of the machine learning framework. Prior to custom prediction routines, if you wanted to preprocess the input before the prediction is performed, or postprocess the model's prediction before returning the result, you would need to build a custom container.

Building a custom serving container requires writing an HTTP server that wraps the trained model, translates HTTP requests into model inputs, and translates model outputs into responses.

With custom prediction routines, Vertex AI provides the serving-related components for you, so that you can focus on your model and data transformations.

What you'll build

You will set up a VPC network called aiml-vpc that consists of a workbench subnet used to deploy a user managed notebook and access the online prediction and model endpoint deployed in us-central1 illustrated in figure1 below.



4. Enable tutorial APIs

Step 1: Enable the Compute Engine API

Navigate to Compute Engine and select Enable if it isn't already enabled. You'll need this to create your notebook instance.

Step 2: Enable the Artifact Registry API

Navigate to Artifact Registry and select Enable if it isn't already. You'll use this to create a custom serving container.

Step 3: Enable the Vertex AI API

Navigate to the Vertex AI section of your Cloud Console and click Enable Vertex AI API.

Step 4: Create a Vertex AI Workbench instance

Enable the Notebooks API if it isn't already.

5. Create the aiml-vpc

This tutorial makes use of $variables to aid gcloud configuration implementation in Cloud Shell.

Inside Cloud Shell, perform the following:

gcloud config list project
gcloud config set project [YOUR-PROJECT-NAME]
echo $projectid

Create the aiml-vpc

Inside Cloud Shell, perform the following:

gcloud compute networks create aiml-vpc --project=$projectid --subnet-mode=custom

Create the user-managed notebook subnet

Inside Cloud Shell, create the workbench-subnet.

gcloud compute networks subnets create workbench-subnet --project=$projectid --range= --network=aiml-vpc --region=us-central1 --enable-private-ip-google-access

Cloud Router and NAT configuration

Cloud NAT is used in the tutorial to download software packages since the user managed notebook does not have an external IP address. Cloud NAT provides egress NAT capabilities, which means that internet hosts are not allowed to initiate communication with a user-managed notebook, making it more secure.

Inside Cloud Shell, create the regional cloud router, us-central1.

gcloud compute routers create cloud-router-us-central1-aiml-nat --network aiml-vpc --region us-central1

Inside Cloud Shell, create the regional cloud nat gateway, us-central1.

gcloud compute routers nats create cloud-nat-us-central1 --router=cloud-router-us-central1-aiml-nat --auto-allocate-nat-external-ips --nat-all-subnet-ip-ranges --region us-central1

6. Create the user managed notebook

Create a user managed service account (Notebook)

In the following section, you will create a user managed service account that will be associated with the Vertex Workbench (Notebook) used in the tutorial.

In the tutorial, the service account will have the following rules applied:

Inside Cloud Shell, create the service account.

gcloud iam service-accounts create user-managed-notebook-sa \

Inside Cloud Shell, update the service account with the role Storage Admin.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$" --role="roles/storage.admin"

Inside Cloud Shell, update the service account with the role Vertex AI User.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$" --role="roles/aiplatform.user"

Inside Cloud Shell, update the service account with the role Artifact Registry Admin.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$" --role="roles/artifactregistry.admin"

Inside Cloud Shell, list the service account and note the email address that will be used when creating the user-managed notebook.

gcloud iam service-accounts list

Create the user managed Notebook

In the following section, create a user-managed notebook that incorporates the previously created service account, user-managed-notebook-sa.

Inside Cloud Shell create the private-client instance.

gcloud notebooks instances create workbench-tutorial \
      --vm-image-project=deeplearning-platform-release \
      --vm-image-family=common-cpu-notebooks \
      --machine-type=n1-standard-4 \
      --location=us-central1-a \
      --shielded-secure-boot \
      --subnet-region=us-central1 \
      --subnet=workbench-subnet \
      --no-public-ip    --service-account=user-managed-notebook-sa@$

7. Write training code

Step 1: Create a cloud storage bucket

You'll store the model and preprocessing artifacts to a Cloud Storage bucket. If you already have a bucket in your project you'd like to use, you can skip this step.

From the launcher open up a new terminal session.


From your terminal, run the following to define an env variable for your project, making sure to replace your-cloud-project with the ID of your project:


Next, run the following in your Terminal to create a new bucket in your project.

gsutil mb -l us-central1 $BUCKET

Step 2: Train model

From the terminal, create a new directory called cpr-codelab and cd into it.

mkdir cpr-codelab
cd cpr-codelab

In the file browser, navigate to the new cpr-codelab directory, and then use the launcher to create a new Python 3 notebook called task.ipynb.


Your cpr-codelab directory should now look like:

+ cpr-codelab/
    + task.ipynb

In the notebook, paste in the following code.

First, write a requirements.txt file.

%%writefile requirements.txt
numpy>=1.17.3, <1.24.0

The model you deploy will have a different set of dependencies pre-installed than your notebook environment. Because of this, you'll want to list all of the dependencies for the model in requirements.txt and then use pip to install the exact same dependencies in the notebook. Later, you'll test the model locally before deploying to Vertex AI to double check that the environments match.

Pip installs the dependencies in the notebook.

!pip install -U --user -r requirements.txt

Note that you'll need to restart the kernel after the pip install completes.

Next, create the directories where you'll store the model and preprocessing artifacts.

USER_SRC_DIR = "src_dir"
!mkdir $USER_SRC_DIR
!mkdir model_artifacts

# copy the requirements to the source dir
!cp requirements.txt $USER_SRC_DIR/requirements.txt

Your cpr-codelab directory should now look like:

+ cpr-codelab/
    + model_artifacts/
    + scr_dir/
        + requirements.txt
    + task.ipynb
    + requirements.txt

Now that the directory structure is set up, it's time to train a model!

First, import the libraries.

import seaborn as sns
import numpy as np
import pandas as pd

from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_transformer

import joblib
import logging

# set logging to see the docker container logs

Then define the following variables. Be sure to replace PROJECT_ID with your project id and BUCKET_NAME with the bucket you created in the previous step.

REGION = "us-central1"
MODEL_ARTIFACT_DIR = "sklearn-model-artifacts"
REPOSITORY = "diamonds"
IMAGE = "sklearn-image"
MODEL_DISPLAY_NAME = "diamonds-cpr"

# Replace with your project

# Replace with your bucket

Load the data from the seaborn library and then create two data frames, one with the features and the other with the label.

data = sns.load_dataset('diamonds', cache=True, data_home=None)

label = 'price'

y_train = data['price']
x_train = data.drop(columns=['price'])

Let's take a look at the training data. You can see that each row represents a diamond.


And the labels, which are the corresponding prices.


Now, define a sklearn column transform to one hot encode the categorical features and scale the numerical features

column_transform = make_column_transformer(
    (preprocessing.OneHotEncoder(sparse=False), [1,2,3]),
    (preprocessing.StandardScaler(), [0,4,5,6,7,8]))

Define the random forest model

regr = RandomForestRegressor(max_depth=10, random_state=0)

Next, make a sklearn pipeline. This means that data fed to this pipeline will first be encoded/scaled and then passed to the model.

my_pipeline = make_pipeline(column_transform, regr)

Fit the pipeline on the training data, y_train)

Let's try the model to make sure it's working as expected. Call the predict method on the model, passing in a test sample.

my_pipeline.predict([[0.23, 'Ideal', 'E', 'SI2', 61.5, 55.0, 3.95, 3.98, 2.43]])

Now we can save the pipeline to the model_artifacts dir, and copy it to the Cloud Storage bucket

joblib.dump(my_pipeline, 'model_artifacts/model.joblib')

!gsutil cp model_artifacts/model.joblib {BUCKET_NAME}/{MODEL_ARTIFACT_DIR}/

Step 3: Save a preprocessing artifact

Next you'll create a preprocessing artifact. This artifact will be loaded in the custom container when the model server starts up. Your preprocessing artifact can be of almost any form (such as a pickle file), but in this case you'll write out a dictionary to a JSON file.

clarity_dict={"Flawless": "FL",
              "Internally Flawless": "IF",
              "Very Very Slightly Included": "VVS1",
              "Very Slightly Included": "VS2",
              "Slightly Included": "S12",
              "Included": "I3"}

The clarity feature in our training data was always in the abbreviated form (ie "FL" instead of "Flawless"). At serving time, we want to check that the data for this feature is also abbreviated. This is because our model knows how to one hot encode "FL" but not "Flawless". You'll write this custom preprocessing logic later. But for now, just save this look up table to a json file and then write it to the Cloud Storage bucket.

import json
with open("model_artifacts/preprocessor.json", "w") as f:
    json.dump(clarity_dict, f)

!gsutil cp model_artifacts/preprocessor.json {BUCKET_NAME}/{MODEL_ARTIFACT_DIR}/

Your local cpr-codelab directory should now look like:

+ cpr-codelab/
    + model_artifacts/
        + model.joblib
        + preprocessor.json
    + scr_dir/
        + requirements.txt
    + task.ipynb
    + requirements.txt

8. Build a custom serving container using the CPR model server

Now that the model has been trained and the and preprocessing artifact saved, it's time to build the custom serving container. Typically building a serving container requires writing model server code. However, with custom prediction routines, Vertex AI Predictions generates a model server and builds a custom container image for you.

A custom serving container contains the following 3 pieces of code:

  1. Model server (this will be generated automatically by the SDK and stored in scr_dir/)
  • HTTP server that hosts the model
  • Responsible for setting up routes/ports/etc.
  1. Request Handler
  • Responsible for webserver aspects of handling a request, such as deserializing the request body, and serializing the reponse, setting response headers, etc.
  • In this example, you'll use the default Handler, provided in the SDK.
  1. Predictor
  • Responsible for the ML logic for processing a prediction request.

Each of these components can be customized based on the requirements of your use case. In this example, you'll only implement the predictor.

The predictor is responsible for the ML logic for processing a prediction request, such as custom preprocessing and postprocessing. To write custom prediction logic, you'll subclass the Vertex AI Predictor interface.

This release of custom prediction routines comes with reusable XGBoost and Sklearn predictors, but if you need to use a different framework you can create your own by subclassing the base predictor.

You can see an example of the Sklearn predictor below. This is all the code you would need to write in order to build this custom model server.


In your notebook paste in the following code below to subclass the SklearnPredictor and write it to a Python file in the src_dir/. Note that in this example we are only customizing the load, preprocess, and postprocess methods, and not the predict method.

%%writefile $USER_SRC_DIR/

import joblib
import numpy as np
import json

from import storage
from import SklearnPredictor

class CprPredictor(SklearnPredictor):

    def __init__(self):

    def load(self, artifacts_uri: str) -> None:
        """Loads the sklearn pipeline and preprocessing artifact."""


        # open preprocessing artifact
        with open("preprocessor.json", "rb") as f:
            self._preprocessor = json.load(f)

    def preprocess(self, prediction_input: np.ndarray) -> np.ndarray:
        """Performs preprocessing by checking if clarity feature is in abbreviated form."""

        inputs = super().preprocess(prediction_input)

        for sample in inputs:
            if sample[3] not in self._preprocessor.values():
                sample[3] = self._preprocessor[sample[3]]
        return inputs

    def postprocess(self, prediction_results: np.ndarray) -> dict:
        """Performs postprocessing by rounding predictions and converting to str."""

        return {"predictions": [f"${value}" for value in np.round(prediction_results)]}

Let's take a deeper look at each of these methods.

  • The load method loads in the preprocessing artifact, which in this case is a dictionary mapping the diamond clarity values to their abbreviations.
  • The preprocess method uses that artifact to ensure that at serving time the clarity feature is in its abbreviated format. If not, it converts the full string to its abbreviation.
  • The postprocess method returns the predicted value as a string with a $ sign and rounds the value.

Next, use the Vertex AI Python SDK to build the image. Using custom prediction routines, the Dockerfile will be generated and an image will be built for you.

from import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

import os

from import LocalModel

from src_dir.predictor import CprPredictor  # Should be path of variable $USER_SRC_DIR

local_model = LocalModel.build_cpr_model(
    requirements_path=os.path.join(USER_SRC_DIR, "requirements.txt"),

Write a test file with two samples for prediction. One of the instances has the abbreviated clarity name, but the other needs to be converted first.

import json

sample = {"instances": [
  [0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43],
  [0.29, 'Premium', 'J', 'Internally Flawless', 52.5, 49.0, 4.00, 2.13, 3.11]]}

with open('instances.json', 'w') as fp:
    json.dump(sample, fp)

Test the container locally by deploying a local model.

with local_model.deploy_to_local_endpoint(
    artifact_uri = 'model_artifacts/', # local path to artifacts
) as local_endpoint:
    predict_response = local_endpoint.predict(
        headers={"Content-Type": "application/json"},

    health_check_response = local_endpoint.run_health_check()

You can see the prediction results with:


9. Deploy model to Vertex AI

Now that you've tested the container locally, it's time to push the image to Artifact Registry and upload the model to Vertex AI Model Registry.

First, configure Docker to access Artifact Registry.

!gcloud artifacts repositories create {REPOSITORY} --repository-format=docker \
--location=us-central1 --description="Docker repository"

!gcloud auth configure-docker {REGION} --quiet

Then, push the image.


And upload the model.

model = aiplatform.Model.upload(local_model = local_model,

When the model is uploaded, you should see it in the console:

Next, deploy the model so you can use it for online predictions. Custom prediction routines work with batch prediction as well so if your use case does not require online predictions, you do not need to deploy the model.

Then, push the image.

endpoint = model.deploy(machine_type="n1-standard-2")

Lastly, test the deployed model by getting a prediction.

endpoint.predict(instances=[[0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43]])

🎉 Congratulations! 🎉

You've learned how to use Vertex AI to:

  • Write custom preprocessing and postprocessing logic with custom prediction routines

Cosmopup thinks codelabs are awesome!!


What's next?

Further reading & Videos

Reference docs