Vertex AI online prediction baseline testing with HEY

1. Introduction

This tutorial will show you how to create and assess Cloud Monitoring Online Prediction metrics when performing baseline testing across us-central1 and us-west1 to a Prediction endpoint deployed in us-central1 using HEY web performance tool.

What you'll build

You will set up a VPC network called aiml-vpc that consists of subnets and instances in us-west1 and us-central1 that will be used to generate traffic using HEY targeting a Online Prediction and model deployed in us-central1.

Private Service Connect and Private DNS are also incorporated in the tutorial to demonstrate how on-premises and multi cloud environments can take advantage of PSC to access googleapis.

Cloud Monitoring and Network Intelligence will be used in the tutorial to validate traffic generated from HEY to the Online Prediction. Although the steps outlined in the tutorial are deployed in a VPC you can leverage the steps to deploy and obtain a baseline of Vertex APIS from on-premises or multi cloud environments. The network architecture consists of the components below below:

Below are details of the use case:

Access Online Prediction in us-central1 from a GCE instances in us-west1 using HEY
Verify that PSC is being used to access the Vertex API
Perform curl using HEY for 5 minutes
Validate latency using Cloud Monitoring
Validate inter-region latency using Network Intelligence
Access Online Prediction in us-central1 from a GCE instances in us-central1 using HEY
Verify that PSC is being used to access the Vertex API
Perform curl using HEY for 5 minutes
Validate latency using Cloud Monitoring
Validate intra-region latency using Network Intelligence

What you'll learn

How to establish a Private Service Connect Endpoint
How to generate load to a Online Prediction using HEY
How to create Vertex AI metrics using Cloud Monitoring
How to use Network Intelligence to validate intra & inter regional latency

What you'll need

Google Cloud Project

IAM Permissions

Compute Network Admin

Service Directory Editor

DNS Administrator

Network Management Viewer

2. Before you begin

Update the project to support the tutorial

This tutorial makes use of $variables to aid gcloud configuration implementation in Cloud Shell.

Inside Cloud Shell, perform the following:

gcloud config list project
gcloud config set project [YOUR-PROJECT-NAME]
projectid=YOUR-PROJECT-NAME
echo $projectid

3. aiml-vpc setup

Create the aiml-vpc

gcloud services enable networkmanagement.googleapis.com

Inside Cloud Shell, perform the following:

gcloud compute networks create aiml-vpc --project=$projectid --subnet-mode=custom

Inside Cloud Shell, enable to network management API for Network Intelligence

gcloud services enable networkmanagement.googleapis.com

Create the user-managed notebook subnet

Inside Cloud Shell, create the workbench-subnet.

gcloud compute networks subnets create workbench-subnet --project=$projectid --range=172.16.10.0/28 --network=aiml-vpc --region=us-central1 --enable-private-ip-google-access

Inside Cloud Shell, create the us-west1-subnet.

gcloud compute networks subnets create us-west1-subnet --project=$projectid --range=192.168.10.0/28 --network=aiml-vpc --region=us-west1

Inside Cloud Shell, create the us-central1-subnet.

gcloud compute networks subnets create us-central1-subnet --project=$projectid --range=192.168.20.0/28 --network=aiml-vpc --region=us-central1

Cloud Router and NAT configuration

Cloud NAT is used in the tutorial to download software packages because the GCE instance does not have an external IP address. Cloud NAT provides egress NAT capabilities, which means that internet hosts are not allowed to initiate communication with a user-managed notebook, making it more secure.

Inside Cloud Shell, create the regional cloud router, us-west1.

gcloud compute routers create cloud-router-us-west1-aiml-nat --network aiml-vpc --region us-west1

Inside Cloud Shell, create the regional cloud nat gateway, us-west1.

gcloud compute routers nats create cloud-nat-us-west1 --router=cloud-router-us-west1-aiml-nat --auto-allocate-nat-external-ips --nat-all-subnet-ip-ranges --region us-west1

Inside Cloud Shell, create the regional cloud router, us-central1.

gcloud compute routers create cloud-router-us-central1-aiml-nat --network aiml-vpc --region us-central1

Inside Cloud Shell, create the regional cloud nat gateway, us-central1.

gcloud compute routers nats create cloud-nat-us-central1 --router=cloud-router-us-central1-aiml-nat --auto-allocate-nat-external-ips --nat-all-subnet-ip-ranges --region us-central1

4. Create the Private Service Connect endpoint

In the following section, you will create a Private Service Connect (PSC) endpoint that will be used to access the Vertex API from the aiml-vpc.

From Cloud Shell

gcloud compute addresses create psc-ip \
    --global \
    --purpose=PRIVATE_SERVICE_CONNECT \
    --addresses=100.100.10.10 \
    --network=aiml-vpc

Store ‘pscendpointip' for duration of the lab

pscendpointip=$(gcloud compute addresses list --filter=name:psc-ip --format="value(address)")

echo $pscendpointip

Create the PSC Endpoint

From Cloud Shell

gcloud compute forwarding-rules create pscvertex \
    --global \
    --network=aiml-vpc \
    --address=psc-ip \
    --target-google-apis-bundle=all-apis

List the configured Private Service Connect endpoints

From Cloud Shell

gcloud compute forwarding-rules list  \
--filter target="(all-apis OR vpc-sc)" --global

Describe the configured Private Service Connect endpoints

From Cloud Shell

gcloud compute forwarding-rules describe \
    pscvertex --global

5. Create a service account for the GCE Instances

To provide a fine level of control to the Vertex API a user managed service account is required that will be applied to the west and central instances. Once generated, the service account permissions can be modified based on business requirements. In the tutorial, the user managed service account, vertex-sa, will have the following roles applied:

You must the Service Account API before proceeding.

Inside Cloud Shell, create the service account.

gcloud iam service-accounts create vertex-gce-sa \
    --description="service account for vertex" \
    --display-name="vertex-sa"

Inside Cloud Shell, update the service account with the role compute instance admin

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:vertex-gce-sa@$projectid.iam.gserviceaccount.com" --role="roles/compute.instanceAdmin.v1"

Inside Cloud Shell, update the service account with the role Vertex AI User

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:vertex-gce-sa@$projectid.iam.gserviceaccount.com" --role="roles/aiplatform.user"

6. Create a user managed service account (Notebook)

In the following section, you will create a user managed service account that will be associated with the Vertex Workbench (Notebook) used in the tutorial.

In the tutorial, the service account will have the following rules applied:

Inside Cloud Shell, create the service account.

gcloud iam service-accounts create user-managed-notebook-sa \
    --display-name="user-managed-notebook-sa"

Inside Cloud Shell, update the service account with the role Storage Admin.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$projectid.iam.gserviceaccount.com" --role="roles/storage.admin"

Inside Cloud Shell, update the service account with the role Vertex AI User.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$projectid.iam.gserviceaccount.com" --role="roles/aiplatform.user"

Inside Cloud Shell, update the service account with the role Artifact Registry Admin.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$projectid.iam.gserviceaccount.com" --role="roles/artifactregistry.admin"

Inside Cloud Shell, list the service account and note the email address that will be used when creating the user-managed notebook.

gcloud iam service-accounts list

7. Create the tests instances

In the following section, you will create test instances to perform baseline testing from us-west1 and us-central1.

Inside Cloud Shell create the west-client.

gcloud compute instances create west-client \
    --zone=us-west1-a \
    --image-family=debian-11 \
    --image-project=debian-cloud \
    --subnet=us-west1-subnet \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --no-address \
    --shielded-secure-boot --service-account=vertex-gce-sa@$projectid.iam.gserviceaccount.com \
    --metadata startup-script="#! /bin/bash
      sudo apt-get update
      sudo apt-get install tcpdump dnsutils -y"

Inside Cloud Shell create the central-client.

gcloud compute instances create central-client \
    --zone=us-central1-a \
    --image-family=debian-11 \
    --image-project=debian-cloud \
    --subnet=us-central1-subnet \
    --scopes=https://www.googleapis.com/auth/cloud-platform \
    --no-address \
    --shielded-secure-boot --service-account=vertex-gce-sa@$projectid.iam.gserviceaccount.com \
    --metadata startup-script="#! /bin/bash
      sudo apt-get update
      sudo apt-get install tcpdump dnsutils -y"

To allow IAP to connect to your VM instances, create a firewall rule that:

Applies to all VM instances that you want to be accessible by using IAP.
Allows ingress traffic from the IP range 35.235.240.0/20. This range contains all IP addresses that IAP uses for TCP forwarding.

Inside Cloud Shell, create the IAP firewall rule.

gcloud compute firewall-rules create ssh-iap-vpc \
    --network aiml-vpc \
    --allow tcp:22 \
    --source-ranges=35.235.240.0/20

8. Create a user managed notebook

the notebook API

In the following section, create a user-managed notebook that incorporates the previously created service account, user-managed-notebook-sa.

Inside Cloud Shell create the private-client instance.

gcloud notebooks instances create workbench-tutorial \
      --vm-image-project=deeplearning-platform-release \
      --vm-image-family=common-cpu-notebooks \
      --machine-type=n1-standard-4 \
      --location=us-central1-a \
      --subnet-region=us-central1 \
      --shielded-secure-boot \
      --subnet=workbench-subnet \
      --no-public-ip    --service-account=user-managed-notebook-sa@$projectid.iam.gserviceaccount.com

Navigate to Vertex AI → Workbench to view your deployed notebook.

9. Deploy the Model and Online Prediction

In the following section, use the provided codelab,Vertex AI:Use custom prediction routines with Sklearn to preprocess and post process data for predictions start with Section 7 since you already created a notebook in the previous step. Once the model is deployed, return back to the tutorial to start the next section.

10. Create a custom monitoring dashboard for Online Prediction

Online Prediction creates a default Monitoring Dashboard under VERTEX AI → ONLINE PREDICTION → ENDPOINT NAME (diamonds-cpr_endpoint). However, for our testing we need to define a start and stop time therefore a custom Dashboard is required.

In the following section, you will create Cloud Monitoring Metrics to obtain latency measurements based on regional access to the Online Prediction Endpoint to validate the different latency when accessing an endpoint in us-central1 from GCE instances deployed in us-west1 and us-central.

For the tutorial we will use prediction_latencies metrics, additional metrics are available in aiplatform

Metric	Description
prediction/online/prediction_latencies	Online prediction latency of the deployed model.

Create a chart for prediction_latencies Metric

From Cloud Console Navigate to MONITORING → Metrics Explorer

Insert the Metric prediction/online/prediction_latencies and select the following options, select Apply.

Update Group by based on the following option, select Save Chart.

Select Save, you will be prompted to select a Dashboard. Select New Dashboard & provide a name.

Vertex Custom Dashboard

In the following section, validate the Vertex Custom Dashboard is displaying the correct time.

Navigate to MONITORING → Dashboard and select Vertex Custom Dashboard followed by selecting the time. Ensure your Time zone is correct.

Ensure to expand the legend to obtain a table view.

Example expanded view:

11. Create Private DNS for the PSC Endpoint

Create a Private DNS Zone in the aiml-vpc to resolve all googleapis to the PSC endpoint IP Address 100.100.10.10.

From Cloud Shell, create a private DNS Zone.

gcloud dns --project=$projectid managed-zones create psc-googleapis --description="Private Zone to resolve googleapis to a PSC endpoint" --dns-name="googleapis.com." --visibility="private" --networks="https://www.googleapis.com/compute/v1/projects/$projectid/global/networks/aiml-vpc"

From Cloud Shell, create the A record that associates *. googleapis.com to the PSC IP.

gcloud dns --project=$projectid record-sets create *.googleapis.com. --zone="psc-googleapis" --type="A" --ttl="300" --rrdatas="100.100.10.10"

12. Hey testing variables

Hey offers end users the ability to customize testing based on network and application requirements. For the purpose of the tutorial we will use option detailed below w/a sample execution string:

c == 1 worker

z == Duration

m == HTTP method POST

D == HTTP request body from file, instances.json

n == Number of requests to run. Default is 200.

Example curl string with HEY (execution not required)

user@us-central$ ./hey_linux_amd64 -c 1 -z 1m -m POST -D instances.json  -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${projectid$}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict

13. Obtain the Prediction ID

Obtain your Online Prediction Endpoint ID from Cloud Console that will be used in the subsequent steps.

Navigate to VERTEX AI → ONLINE PREDICTION

14. Download and execute HEY (us-west1)

In the following section, you will log into the west-client to download and execute HEY against the Online Prediction located in us-central1.

From Cloud Shell, log into the west-client and download HEY

gcloud compute ssh west-client --project=$projectid --zone=us-west1-a --tunnel-through-iap

From the OS, download HEY and update the permissions.

wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
chmod +x hey_linux_amd64

From the OS, create the following variables:

gcloud config list project
gcloud config set project [YOUR-PROJECT-NAME]
projectid=YOUR-PROJECT-NAME
echo $projectid
ENDPOINT_ID="insert-your-endpoint-id-here"

Example:

ENDPOINT_ID="2706243362607857664"

In the following section, you will create an instances.json file using vi editor or nano and insert the data string used to obtain a prediction from the deployed model.

From the west-client OS, create a instances.json file with the data string below:

{"instances": [
  [0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43],
  [0.29, 'Premium', 'J', 'Internally Flawless', 52.5, 49.0, 4.00, 2.13, 3.11]]}

Example:

user@west-client:$ more instances.json 
{"instances": [
  [0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43],
  [0.29, 'Premium', 'J', 'Internally Flawless', 52.5, 49.0, 4.00, 2.13, 3.11]]}

user@west-client:$

Pre-test

From the OS, execute a curl to validate the model and prediction endpoint are working as successful. Note the PSC endpoint IP in the verbose log and HTTP/2 200 indicating success.

curl -v -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${projectid}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict -d @instances.json

Example, note the PSC IP address used to access prediction & successful outcome.

user@west-client:$ curl -v -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${projectid}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict -d @instances.json
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 100.100.10.10:443...
* Connected to us-central1-aiplatform.googleapis.com (100.100.10.10) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=upload.video.google.com
*  start date: Jul 31 08:22:19 2023 GMT
*  expire date: Oct 23 08:22:18 2023 GMT
*  subjectAltName: host "us-central1-aiplatform.googleapis.com" matched cert's "*.googleapis.com"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55a9f38b42c0)
> POST /v1/projects/new-test-project-396322/locations/us-central1/endpoints/2706243362607857664:predict HTTP/2
> Host: us-central1-aiplatform.googleapis.com
> user-agent: curl/7.74.0
> accept: */*
> authorization: Bearer ya29.c.b0Aaekm1LqrcaOlWFFwuEOWX_tZVXXvJgN_K-u5_hFyEAYXAi3AnBEBwwtHS8dweW_P2QGfdyFfa31nMT_6BaKBI0mC9IsfzfIiUwXc8u2yJt01gTUSJpCmGAFKZKidRMgkPYivVYCnuymzdYbRAWacIe__StkRzI9UeQOGN3jNIeESr80AdH12goaxCFXWaNWxoYRfGVhekEgUcsKs7t1OhOM-937gy4YGkXcXa8sGuHWRqF5bnulYlTqlxqQ2aAxMTrQg2lwUWRGCmGhPrym7rXJq7oim0DkAJSbAarl1qFuz0PPfNXeHGbs13zY2r1giV7u8_w4Umj_Q5M7H9fTkq7EiqnLzqRkOHXismYL368P1jOUBYM__krFQt4M3X9RJa0g01tOw3FnOh27BmUqlFQ1J2h14JZpx215Q3xzRvgfJ5iW5YYSkv67uZRQk4V04naOUXyc0plzWuVOjj4nor3fYvkS_oW0IyxJoBjeXR16Vnvln8c04svWX9dt7eobczFvBOm9nVdh4lVp8qxbp__2WtMvc1QVg6y-2i6lRpbvmyp1oadxVRjxV1e0wiQFSe-qqsinJu3bnnaMbxdU2cu5j26o8o8Xpgo0SF1UM0b1WX84iatbWpdFSphZm1llwmRagMzcFBW0aBk-i35_bXSbzwURgMfY6Qbyb9Rv9y0F-Maf34I0WxiMldv2uc57nej7dVl9OSm_Ohnro-i9zcpq9fxo9soYVB8WjaZOUjauk4znstc2_6y4atcVVsQBkeU674biR567Ri3M74Jfv4MrrF02ObfrJRdB7UJ4MU_9kWW-kYeeJzoci15UqYV0f_yJgReBwQa66Supmebee2Sn2nku6xZkRMu5Mz55mXuva0XWrpIbor7WckSsXwUFbf7rj5ipa4mOOyf2hJe1Rq0x6yeBaariRzXrhfm5bBpFBU73-zd-IekvOji0ZJQSkk0o6gpX_794Jny7j14aQJ8VxezcFpZUztimYhMnRhlO2lqms1h0h48
> content-type: application/json
> content-length: 158
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
* We are completely uploaded and fine
< HTTP/2 200 
< x-vertex-ai-internal-prediction-backend: harpoon
< content-type: application/json; charset=UTF-8
< date: Sun, 20 Aug 2023 03:51:54 GMT
< vary: X-Origin
< vary: Referer
< vary: Origin,Accept-Encoding
< server: scaffolding on HTTPServer2
< cache-control: private
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< x-content-type-options: nosniff
< accept-ranges: none
< 
{
  "predictions": [
    "$479.0",
    "$586.0"
  ],
  "deployedModelId": "3587550310781943808",
  "model": "projects/884291964428/locations/us-central1/models/6829574694488768512",
  "modelDisplayName": "diamonds-cpr",
  "modelVersionId": "1"
}
* Connection #0 to host us-central1-aiplatform.googleapis.com left intact

Execute HEY

From the OS, execute HEY enabling a 10 minute baseline test.

./hey_linux_amd64 -c 1 -z 10m -m POST -D instances.json  -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/$projectid/locations/us-central1/endpoints/${ENDPOINT_ID}:predict

15. Hey Validation (us-west1)

Now that you executed Hey from a compute instance in us-west1 evaluate the results from the following:

HEY results
Vertex Custom Dashboard
Network Intelligence

HEY results

From the OS, lets validate the HEY results based on the 10 minute execution,

17.5826 Requests per secs

99% in 0.0686 secs | 68 ms

10,550 responses with 200 status code

user@west-client:$ ./hey_linux_amd64 -c 1 -z 10m -m POST -D instances.json  -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/$projectid/locations/us-central1/endpoints/${ENDPOINT_ID}:predict

Summary:
  Total:        600.0243 secs
  Slowest:      0.3039 secs
  Fastest:      0.0527 secs
  Average:      0.0569 secs
  Requests/sec: 17.5826
  

Response time histogram:
  0.053 [1]     |
  0.078 [10514] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.103 [16]    |
  0.128 [4]     |
  0.153 [3]     |
  0.178 [1]     |
  0.203 [0]     |
  0.229 [2]     |
  0.254 [1]     |
  0.279 [5]     |
  0.304 [3]     |


Latency distribution:
  10% in 0.0546 secs
  25% in 0.0551 secs
  50% in 0.0559 secs
  75% in 0.0571 secs
  90% in 0.0596 secs
  95% in 0.0613 secs
  99% in 0.0686 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0527 secs, 0.3039 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0116 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0002 secs
  resp wait:    0.0567 secs, 0.0526 secs, 0.3038 secs
  resp read:    0.0001 secs, 0.0001 secs, 0.0696 secs

Status code distribution:
  [200] 10550 responses

Vertex Custom Dashboard

Navigate to MONITORING → Dashboard and select Vertex Custom Dashboard. Enter 10m or specify you start and stop time. Ensure your Time zone is correct.

Taking a look at the definition of Prediction Latencies indicates a Server Side metric that measures the total time to respond to the clients request after obtaining a response from the model.

Total latency duration: The total time that a request spends in the service, which is the model latency plus the overhead latency.

In contrast HEY is a client side metrics that takes into account the following parameters:

Client request + Total latency (includes model latency) + Client response

Network Intelligence

Let's now take a look at inter-region network latency reported by Network Intelligence to get an idea of us-west1 to us-central1 latency reported by Google Cloud Platform.

Navigate to Cloud Console Network Intelligence → Performance Dashboard and select the following options detailed in the screenshot below indicating latency from 32 - 39 ms.

HEY us-west1 baseline summary

Comparing the Total latency reports by the test tools yields approximately the same latency reported by HEY. Inter-region latency contributes the bulk of latency. Let's see how the central-client performs in the next series of tests.

Latency Tool	Duration
Network intelligence: us-west1 to us-central1 latency	~32 to 39 ms
Cloud Monitoring: Total prediction latency [99th%]	34.58 ms (99p)
Total latency reported by Google	~ 66.58 to 73.58 ms
HEY client side latency distribution	68ms (99p)

16. Download and execute HEY (us-central1)

In the following section, you will log into the central-client to download and execute HEY against the Online Prediction located in us-central1.

From Cloud Shell, log into the central-client and download HEY

gcloud compute ssh central-client --project=$projectid --zone=us-central1-a --tunnel-through-iap

From the OS, download HEY and update the permissions.

wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64
chmod +x hey_linux_amd64

From the OS, create the following variables:

gcloud config list project
gcloud config set project [YOUR-PROJECT-NAME]
projectid=YOUR-PROJECT-NAME
echo $projectid
ENDPOINT_ID="insert-your-endpoint-id-here"

Example:

ENDPOINT_ID="2706243362607857664"

In the following section, you will create an instances.json file using vi editor or nano and insert the data string used to obtain a prediction from the deployed model.

From the west-client OS, create a instances.json file with the data string below:

{"instances": [
  [0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43],
  [0.29, 'Premium', 'J', 'Internally Flawless', 52.5, 49.0, 4.00, 2.13, 3.11]]}

Example:

user@west-client:$ more instances.json 
{"instances": [
  [0.23, 'Ideal', 'E', 'VS2', 61.5, 55.0, 3.95, 3.98, 2.43],
  [0.29, 'Premium', 'J', 'Internally Flawless', 52.5, 49.0, 4.00, 2.13, 3.11]]}

user@west-client:$

Pre-test

From the OS, execute a curl to validate the model and prediction endpoint are working as successful. Note the PSC endpoint IP in the verbose log and HTTP/2 200 indicating success.

curl -v -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${projectid}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict -d @instances.json

Example, note the PSC IP address used to access prediction & successful outcome.

user@central-client:~$ curl -v -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/${projectid}/locations/us-central1/endpoints/${ENDPOINT_ID}:predict -d @instances.json
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 100.100.10.10:443...
* Connected to us-central1-aiplatform.googleapis.com (100.100.10.10) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/certs/ca-certificates.crt
*  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=upload.video.google.com
*  start date: Jul 31 08:22:19 2023 GMT
*  expire date: Oct 23 08:22:18 2023 GMT
*  subjectAltName: host "us-central1-aiplatform.googleapis.com" matched cert's "*.googleapis.com"
*  issuer: C=US; O=Google Trust Services LLC; CN=GTS CA 1C3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x559b57adc2c0)
> POST /v1/projects/new-test-project-396322/locations/us-central1/endpoints/2706243362607857664:predict HTTP/2
> Host: us-central1-aiplatform.googleapis.com
> user-agent: curl/7.74.0
> accept: */*
> authorization: Bearer ya29.c.b0Aaekm1KWqq-CIXuL6f1cx9d9jHHquQq9tlSV1oVZ1y3TACi82JFFZRwsagVY7MMovycsU4PLkt9MDMkNngxZE5RzXcS-AoaUaQf1tPT9-_JMTlFI6wCcR7Yr9MeRF5AZblr_k52ZZgEZKeYGcrXoGiqGQcAAwFtHiEVAkUhLuyukteXbMoep1JM9E0zFblJj7Z0yOCMJYBH-6XHcIDYnOKpStMVBR2wcTDbnFrCE08HXbvRnQVcENatTBoI9FzSVL1ORwqUiCcdfnTSjpIXcyD-W82d6ZHjGX_RUhfnH7RPfOJqkuU8pOovwoCjq_jvM_wJUfPuQnBKHp5rxbYxPE349DMBql62po2SWFguuFo-a2eoUnb8-FQeBZqan65zgV0lexR73gZlm071y9grlXv3fmJUo7vlj5W-7_-FJXaWWg8iWc6rmjYeO1Wz2h_8qnmojkX9xSUciI6JfmwdgMWwtvwJb63ppSmdwf8oagrYiQlpMzgRI6rekbRzg-1WOBeOf5nRg5vtxUMSc9iRaoarO5XwFX8vt7rxOUBvbXYVWmo3bsdhzsS9VopMwgMlxgcIJg7bq7_F3iapB-nRjfjfhZWpR83cWIkI2Wb9f89inpsxtYjZbbzdWkZvRB8FYSsY8F8tcpiVoWWyQWZiph9z7O59fF9irWY2gtUnbFcJJ_ZcYztjlMQaR45y42ZflkM3Qn668bzge3Y3hmVI1s6ZSmxxq6m27hoMwVn21R07Y613jwljmaFJ5V8MwkR6yvFhYngrh_JrhRUQtSSMh02Rz25wMfv7g8Fiqymr-12viM4btIFjXZBM3XFqzvso_rw1omI1yYWofmbaBYggpegpJBzSeqVUZe791agjVtiMUkyjXFy__9gI0Qk9ZUarI4p25SvS4I1hX4YyBk6ol32Z5zIsVr1Seff__aklm6M2Mlkumd7nurm46hjOIoOhFpfFxrQ6yivnhYapBOJMYirgbZvigvI3dom1fnmt0-ktmRxp69w7Uzzy
> content-type: application/json
> content-length: 158
> 
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
* We are completely uploaded and fine
< HTTP/2 200 
< x-vertex-ai-internal-prediction-backend: harpoon
< date: Sun, 20 Aug 2023 22:25:31 GMT
< content-type: application/json; charset=UTF-8
< vary: X-Origin
< vary: Referer
< vary: Origin,Accept-Encoding
< server: scaffolding on HTTPServer2
< cache-control: private
< x-xss-protection: 0
< x-frame-options: SAMEORIGIN
< x-content-type-options: nosniff
< accept-ranges: none
< 
{
  "predictions": [
    "$479.0",
    "$586.0"
  ],
  "deployedModelId": "3587550310781943808",
  "model": "projects/884291964428/locations/us-central1/models/6829574694488768512",
  "modelDisplayName": "diamonds-cpr",
  "modelVersionId": "1"
}
* Connection #0 to host us-central1-aiplatform.googleapis.com left intact

Execute HEY

From the OS, execute HEY enabling a 10 minute baseline test.

./hey_linux_amd64 -c 1 -z 10m -m POST -D instances.json  -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/$projectid/locations/us-central1/endpoints/${ENDPOINT_ID}:predict

17. Hey Validation (us-central1)

Now that you executed Hey from a compute instance in us-central1 evaluate the results from the following:

HEY results
Vertex Custom Dashboard
Network Intelligence

HEY results

From the OS, lets validate the HEY results based on the 10 minute execution,

44.9408 Requests per secs

99% in 0.0353 secs | 35 ms

26965 responses with 200 status code

devops_user_1_deepakmichael_alto@central-client:~$ ./hey_linux_amd64 -c 1 -z 10m -m POST -D instances.json  -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://us-central1-aiplatform.googleapis.com/v1/projects/$projectid/locations/us-central1/endpoints/${ENDPOINT_ID}:predict

Summary:
  Total:        600.0113 secs
  Slowest:      0.3673 secs
  Fastest:      0.0184 secs
  Average:      0.0222 secs
  Requests/sec: 44.9408
  

Response time histogram:
  0.018 [1]     |
  0.053 [26923] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.088 [25]    |
  0.123 [4]     |
  0.158 [0]     |
  0.193 [1]     |
  0.228 [9]     |
  0.263 [1]     |
  0.298 [0]     |
  0.332 [0]     |
  0.367 [1]     |


Latency distribution:
  10% in 0.0199 secs
  25% in 0.0205 secs
  50% in 0.0213 secs
  75% in 0.0226 secs
  90% in 0.0253 secs
  95% in 0.0273 secs
  99% in 0.0353 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0184 secs, 0.3673 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0079 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0007 secs
  resp wait:    0.0220 secs, 0.0182 secs, 0.3672 secs
  resp read:    0.0002 secs, 0.0001 secs, 0.0046 secs

Status code distribution:
  [200] 26965 responses

Vertex Custom Dashboard

Navigate to MONITORING → Dashboard and select Vertex Custom Dashboard, enter 10m. Or your start and stop time. Ensure your Time zone is correct.

Prediction Latencies for the last 10m yields 30.533 ms.

Taking a look at the definition of Prediction Latencies indicates a Server Side metric that measures the total time to respond to the clients request after obtaining a response from the model.

Total latency duration: The total time that a request spends in the service, which is the model latency plus the overhead latency.

In contrast HEY is a client side metrics that takes into account the following parameters:

Client request + Total latency (includes model latency) + Client response

Network Intelligence

Let's now take a look at intra region network latency reported by Network Intelligence to get an idea of us-central1 latency reported by Google Cloud Platform.

Navigate to Cloud Console Network Intelligence → Performance Dashboard and select the following options detailed in the screenshot below indicating latency from .2 to .8 ms.

HEY us-central1 baseline summary

Comparing the total latency reported by the test tools yields lower latency than the west-client due to the compute (central-client) and Vertex endpoints (model and online prediction) in the same region.

Latency Tool	Duration
Network intelligence: us-central1 intra region latency	~.2 to .8 ms
Cloud Monitoring: Total prediction latency [99th%]	30.533 ms (99p)
Total latency reported by Google	~30.733 to 31.333 ms
HEY client side latency	35 ms (99p)

18. Congratulations

Congratulations, you've successfully deployed and validated HEY to obtain client side Prediction baseline latency using a combination of Cloud Monitoring and Network Intelligence. Based on testing you were to identify a prediction endpoint in us-central can be served inter-region however latency was observed.

Cosmopup thinks tutorials are awesome!!

19. Clean up

From Cloud Shell, delete tutorial components.

gcloud compute instances delete central-client --zone=us-central1-a -q

gcloud compute instances delete west-client --zone=us-west1-a -q

gcloud compute instances delete workbench-tutorial --zone=us-central1-a -q

gcloud compute forwarding-rules delete pscvertex --global --quiet 

gcloud compute addresses delete psc-ip --global --quiet

gcloud compute networks subnets delete workbench-subnet --region=us-central1 --quiet 

gcloud compute networks subnets delete us-west1-subnet --region=us-west1 --quiet

gcloud compute networks subnets delete us-central1-subnet --region=us-central1 --quiet

gcloud compute routers delete cloud-router-us-west1-aiml-nat --region=us-west1 --quiet

gcloud compute routers delete cloud-router-us-central1-aiml-nat --region=us-central1 --quiet

gcloud compute firewall-rules delete  ssh-iap-vpc --quiet

gcloud dns record-sets delete *.googleapis.com. --zone=psc-googleapis --type=A --quiet

gcloud dns managed-zones delete psc-googleapis --quiet

gcloud compute networks delete aiml-vpc --quiet

gcloud storage rm -r gs://$projectid-cpr-bucket

From Cloud Console deleted the following:

Artifact Registry folder

From Vertex AI Model Registry, undeploy the model:

From Vertex AI Online Prediction, delete the endpoint

What's next?

Check out some of these tutorials...

Vertex AI online prediction baseline testing with HEY

1. Introduction

What you'll build

What you'll learn

What you'll need

IAM Permissions

2. Before you begin

Update the project to support the tutorial

3. aiml-vpc setup

Create the aiml-vpc

Create the user-managed notebook subnet

Cloud Router and NAT configuration

4. Create the Private Service Connect endpoint

5. Create a service account for the GCE Instances

6. Create a user managed service account (Notebook)

7. Create the tests instances

8. Create a user managed notebook

9. Deploy the Model and Online Prediction

10. Create a custom monitoring dashboard for Online Prediction

Create a chart for prediction_latencies Metric

Vertex Custom Dashboard

11. Create Private DNS for the PSC Endpoint

12. Hey testing variables

13. Obtain the Prediction ID

14. Download and execute HEY (us-west1)

Pre-test

Execute HEY

15. Hey Validation (us-west1)

HEY results

Vertex Custom Dashboard

Network Intelligence

HEY us-west1 baseline summary

16. Download and execute HEY (us-central1)

Pre-test

Execute HEY

17. Hey Validation (us-central1)

HEY results

Vertex Custom Dashboard

Network Intelligence

HEY us-central1 baseline summary

18. Congratulations

19. Clean up

What's next?

Further reading & Videos

Reference docs