Deploy Model Garden on a PSC endpoint

1. Introduction

Leverage Private Service Connect to establish secure, private access for models deployed from the Vertex AI Model Garden. Instead of exposing a public endpoint, this method allows you to deploy your model to a private Vertex AI endpoint accessible only within your Virtual Private Cloud (VPC).

Private Service Connect creates an endpoint with an internal IP address inside your VPC, which connects directly to the Google-managed Vertex AI service hosting your model. This enables applications in your VPC and on-premises environments (via Cloud VPN or Interconnect) to send inference requests using private IPs. All network traffic remains on Google's network, which enhances security, reduces latency, and completely isolates your model's serving endpoint from the public internet.

4a78228d4197997c.png

What you'll build

In this tutorial, you will download Gemma 3 from Model Garden, hosted in Vertex AI Online Inference as a private endpoint accessible via Private Service Connect. Your end-to-end setup will include:

  1. Model Garden Model: You will select Gemma 3 from the Vertex AI Model Garden and deploy it to a Private Service Connect endpoint.
  2. Private Service Connect: You will configure a consumer endpoint in your Virtual Private Cloud (VPC) consisting of an internal IP address within your own network.
  3. Secure Connection to Vertex AI: The PSC endpoint will target the Service Attachment automatically generated by Vertex AI for your private model deployment. This establishes a private connection, ensuring traffic between your VPC and the model serving endpoint does not traverse the public internet.
  4. Client Configuration within your VPC: You will set up a client (e.g., Compute Engine VM) within your VPC to send inference requests to the deployed model using the internal IP address of the PSC endpoint.

By the end, you'll have a functional example of a Model Garden model being served privately, only accessible from within your designated VPC network.

What you'll learn

In this tutorial, you will learn how to deploy a model from Vertex AI Model Garden and make it securely accessible from your Virtual Private Cloud (VPC) using Private Service Connect (PSC). This method allows your applications within your VPC (the consumer) to privately connect to the Vertex AI model endpoint (the producer service) without traversing the public internet.

Specifically, you will learn:

  1. Understanding PSC for Vertex AI: How PSC enables private and secure consumer-to-producer connections. Your VPC can access the deployed Model Garden model using internal IP addresses.
  2. Deploying a Model with Private Access: How to configure a Vertex AI Endpoint for your Model Garden model to use PSC, making it a private endpoint.
  3. The Role of the Service Attachment: When you deploy a model to a private Vertex AI Endpoint, Google Cloud automatically creates a Service Attachment in a Google-managed tenant project. This Service Attachment exposes the model serving service to consumer networks.
  4. Creating a PSC Endpoint in Your VPC:
  • How to obtain the unique Service Attachment URI from your deployed Vertex AI Endpoint details.
  • How to reserve an internal IP address within your chosen subnet in your VPC.
  • How to create a Forwarding Rule in your VPC that acts as the PSC Endpoint, targeting the Vertex AI Service Attachment. This endpoint makes the model accessible via the reserved internal IP.
  1. Establishing Private Connectivity: How the PSC Endpoint in your VPC connects to the Service Attachment, bridging your network with the Vertex AI service securely.
  2. Sending Inference Requests Privately: How to send prediction requests from resources (like Compute Engine VMs) within your VPC to the internal IP address of the PSC Endpoint.
  3. Validation: Steps to test and confirm that you can successfully send inference requests from your VPC to the deployed Model Garden model through the private connection.

By completing this, you'll be able to host models from Model Garden that are only reachable from your private network infrastructure.

What you'll need

Google Cloud Project

IAM Permissions

2. Before you begin

Update the project to support the tutorial

This tutorial makes use of $variables to aid gcloud configuration implementation in Cloud Shell.

Inside Cloud Shell, perform the following:

gcloud config list project
gcloud config set project [YOUR-PROJECT-ID]
projectid=[YOUR-PROJECT-ID]
echo $projectid

API Enablement

Inside Cloud Shell, perform the following:

gcloud services enable "compute.googleapis.com"
gcloud services enable "aiplatform.googleapis.com"
gcloud services enable "serviceusage.googleapis.com"

3. Deploy Model

Follow the steps below to deploy your model from Model Garden

In the Google Cloud console, Go to Model Garden and search and select Gemma 3

10c7ce35cfc571dc.png

Click Deploy options and select Vertex AI

ed9280fcc5f4c3fa.png

In the Deploy on Vertex AI pane, select Advanced. The pre-populated region and Machine spec are selected based on available capacity. You may change these values, although the codelab is tailored for us-central1.

3f7e4cefdc06488a.png

In the Deploy on Vertex AI pane, ensure Endpoint Access is configured as Private Service Connect then select your Project.

d0f0d9bc49205fb3.png

Leave all defaults for other options, then select Deploy at the bottom & Check your notifications for the deployment status.

9bd3b10256b7b2cc.png

In model Garden, select the region, us-central1, that provides the Gemma 3 model and endpoint. Model deployment takes approximately 5 min.

e998ee6288a8a7a.png

In 30 minutes, the endpoint will transition to "Active" once completed

9dcc7c56dbe0e88a.png

Obtain and note the Endpoint ID by selecting the endpoint.

6e3e2feef82fadd5.png

Open cloud shell and perform the following to obtain the Private Service Connect Service Attachment URI. This URI string is used by the consumer when deploying a PSC consumer endpoint.

Inside Cloud Shell, update the Endpoint ID, then issue the following command.

gcloud ai endpoints describe [Endpoint ID] --region=us-central1  | grep -i serviceAttachment:

Below is an example:

user@cloudshell:$ gcloud ai endpoints describe 2124795225560842240 --region=us-central1 | grep -i serviceAttachment:

Using endpoint [https://us-central1-aiplatform.googleapis.com/]
    serviceAttachment: projects/o9457b320a852208e-tp/regions/us-central1/serviceAttachments/gkedpm-52065579567eaf39bfe24f25f7981d

Copy the contents after serviceAttachment into a variable called "Service_attachment", you will need it later when creating the PSC connection.

user@cloudshell:$ Service_attachment=projects/o9457b320a852208e-tp/regions/us-central1/serviceAttachments/gkedpm-52065579567eaf39bfe24f25f7981d

4. Consumer Setup

Create the Consumer VPC

Inside Cloud Shell, perform the following:

gcloud compute networks create consumer-vpc --project=$projectid --subnet-mode=custom

Create the consumer VM subnet

Inside Cloud Shell, perform the following:

gcloud compute networks subnets create consumer-vm-subnet --project=$projectid --range=192.168.1.0/24 --network=consumer-vpc --region=us-central1 --enable-private-ip-google-access

Create the PSC Endpoint subnet

gcloud compute networks subnets create pscendpoint-subnet --project=$projectid --range=10.10.10.0/28 --network=consumer-vpc --region=us-central1

5. Enable IAP

To allow IAP to connect to your VM instances, create a firewall rule that:

  • Applies to all VM instances that you want to be accessible by using IAP.
  • Allows ingress traffic from the IP range 35.235.240.0/20. This range contains all IP addresses that IAP uses for TCP forwarding.

Inside Cloud Shell, create the IAP firewall rule.

gcloud compute firewall-rules create ssh-iap-consumer \
    --network consumer-vpc \
    --allow tcp:22 \
    --source-ranges=35.235.240.0/20

6. Create consumer VM instances

Inside Cloud Shell, create the consumer vm instance, consumer-vm.

gcloud compute instances create consumer-vm \
    --project=$projectid \
    --machine-type=e2-micro \
    --image-family debian-11 \
    --no-address \
    --shielded-secure-boot \
    --image-project debian-cloud \
    --zone us-central1-a \
    --subnet=consumer-vm-subnet 

7. Private Service Connect Endpoints

The consumer creates a consumer endpoint (forwarding rule) with an internal IP address within their VPC. This PSC endpoint targets the producer's service attachment. Clients within the consumer VPC or hybrid network can send traffic to this internal IP address to reach the producer's service.

Reserve an IP address for the consumer endpoint.

Inside Cloud Shell, create the forwarding rule.

gcloud compute addresses create psc-address \
    --project=$projectid \
    --region=us-central1 \
    --subnet=pscendpoint-subnet \
    --addresses=10.10.10.6

Verify that the IP address is reserved

Inside Cloud Shell, list the reserved IP Address.

gcloud compute addresses list 

You should see the 10.10.10.6 IP address reserved.

edb5661bea25cd14.png

Create the consumer endpoint by specifying the service attachment URI, target-service-attachment, that you captured in the previous step, Deploy Model section.

Inside Cloud Shell, describe the network attachment.

 gcloud compute forwarding-rules create psc-consumer-ep \
    --network=consumer-vpc \
    --address=psc-address \
    --region=us-central1 \
    --target-service-attachment=$Service_attachment \
    --project=$projectid

Verify that the service attachment accepts the endpoint

gcloud compute forwarding-rules describe psc-consumer-ep \
    --project=$projectid \
    --region=us-central1 \

In the response, verify that an "ACCEPTED" status appears in the pscConnectionStatus field

6c66347ede9d4c7d.png

8. Test from Consumer VM

In Cloud Shell perform following steps to provide access to Consumer VM to access Vertex Model Garden API

SSH into Consumer VM

f0984d9e60530cb2.png

Re-authenticate with Application Default Credentials and specify Vertex AI scopes.

gcloud auth application-default login
--scopes=https://www.googleapis.com/auth/cloud-platform 

Use this table below to generate a CURLl command, adjust based on your environment

Attribute

Value

Protocol

HTTP

Location

us-central1

Online Prediction Endpoint

2133539641536544768

Project ID

test4-473419

Model

gemma-3-12b-it

Private Service Connect Endpoint IP

10.10.10.6

Messages

[{"role": "user","content": "What weighs more 1 pound of feathers or rocks?"}]

Update and execute the curl command based on your environments details:

curl -k -v -X POST   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)"   -H "Content-Type: application/json"   http://[PSC-IP]/v1/projects/[Project-ID]/locations/us-central1/endpoints/[Predictions Endpoint]/chat/completions   -d '{"model": "google/gemma-3-12b-it", "messages": [{"role": "user","content": "What weighs more 1 pound of feathers or rocks?"}] }'

Example:

curl -k -v -X POST   -H "Authorization: Bearer $(gcloud auth application-default print-access-token)"   -H "Content-Type: application/json"   http://10.10.10.6/v1/projects/test4-473419/locations/us-central1/endpoints/2133539641536544768/chat/completions   -d '{"model": "google/gemma-3-12b-it", "messages": [{"role": "user","content": "What weighs more 1 pound of feathers or rocks?"}] }'

FINAL RESULT - SUCCESS!!!

The result you should see prediction from Gemma 3 at the bottom of the output, this shows that you were able to hit the API endpoint privately through PSC endpoint

 Connection #0 to host 10.10.10.6 left intact
{"id":"chatcmpl-9e941821-65b3-44e4-876c-37d81baf62e0","object":"chat.completion","created":1759009221,"model":"google/gemma-3-12b-it","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"This is a classic trick question! They weigh the same. One pound is one pound, regardless of the material. 😊\n\n\n\n","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":106}],"usage":{"prompt_tokens":20,"total_tokens":46,"completion_tokens":26,"prompt_tokens_details":null},"prompt_logprobs":null

9. Clean up

From Cloud Shell, delete tutorial components.

gcloud ai endpoints undeploy-model ENDPOINT_ID --deployed-model-id=DEPLOYED_MODEL_ID --region=us-central1 --quiet

gcloud ai endpoints delete $ENDPOINT_ID --project=$projectid --region=us-central1 --quiet

gcloud ai models delete $MODEL_ID --project=$projectid --region=us-central1 --quiet

gcloud compute instances delete consumer-vm --zone=us-central1-a --quiet

gcloud compute forwarding-rules delete psc-consumer-ep --region=us-central1 --project=$projectid --quiet

gcloud compute addresses delete psc-address --region=us-central1 --project=$projectid --quiet

gcloud compute networks subnets delete pscendpoint-subnet consumer-vm-subnet --region=us-central1 --quiet

gcloud compute firewall-rules delete ssh-iap-consumer --project=$projectid

gcloud compute networks delete consumer-vpc --project=$projectid --quiet

gcloud projects delete $projectid --quiet

10. Congratulations

Congratulations, you've successfully configured and validated private access to the Gemma 3 API hosted on Vertex AI Prediction using a Private Service Connect Endpoint.

You created the consumer infrastructure, including reserving an internal IP address and configuring a Private Service Connect Endpoint (a forwarding rule) within your VPC. This endpoint securely connects to the Vertex AI service by targeting the service attachment associated with your deployed Gemma 3 model. This setup allows your applications within the VPC or connected networks to interact with the Gemma 3 API privately and securely, using an internal IP address, without requiring traffic to traverse the public internet.

Further reading & Videos

Reference docs