Vertex AI create a secure user-managed notebook

1. Introduction

Vertex AI Workbench user-managed notebooks instances let you create and manage deep learning virtual machine (VM) instances that are prepackaged with JupyterLab.

User-managed notebooks instances have a preinstalled suite of deep learning packages, including support for the TensorFlow and PyTorch frameworks. You can configure either CPU-only or GPU-enabled instances.

What you'll build

This tutorial describes the process of deploying a secure user-managed notebook based on best practices from Networking and Security. The following steps are involved:

  1. Create a VPC
  2. Create a Cloud Router and Cloud NAT
  3. Configure the notebook instance with the appropriate security settings

This tutorial provides detailed instructions for each step. It also includes tips and best practices for securing user-managed notebooks. Figure 1 is an illustration of the deployment using a Standalone VPC.

Figure 1

2292244ba0b11f71.png

What you'll learn

  • How to determine if a Shared of Standalone VPC is right for your organization
  • How to create an Standalone VPC
  • How to create a Cloud Router and Cloud NAT
  • How to create a user-managed notebook
  • How to access a user-managed notebook
  • How to monitor user-managed notebook health
  • How to create and apply a instance schedule

What you'll need

  • Google Cloud Project

IAM permissions

2. VPC Network

You can think of a VPC network the same way you'd think of a physical network, except that it is virtualized within Google Cloud. A VPC network is a global resource that consists of regional subnets. VPC networks are logically isolated from each other in Google Cloud.

Standalone VPC

Figure 2 is an example of a standalone global VPC consisting of a regional subnet (us-central1) in addition to Cloud Router and Cloud NAT used to allow the User Managed Notebook to securely establish connectivity to the Internet.

Figure 2

2292244ba0b11f71.png

Shared VPC

Shared VPC allows you to export subnets from a VPC network in a host project to service projects in the same organization. The host project contains networking resources that are shared with the service project such as subnets, cloud nat and firewall rules. The service project contains application-level resources that leverage networking resources in the host project.

Figure 3 is an illustration of a Global Shared VPC, in which the networking and security infrastructure is deployed in the host project, while the workloads are deployed in the service project.

Figure 3

1354a9323c8e5787.png

Standalone vs Shared VPC

A single VPC network is sufficient for many simple use cases, as it is easier to create, maintain, and understand than more complex alternatives. Shared VPC is an effective tool for organizations with multiple teams, as it allows them to extend the architectural simplicity of a single VPC network across multiple working groups through the use of service projects.

VPC Best Practice used in the tutorial

  • Enable Cloud NAT to access the notebook.
  • Turn on Private Google Access when you create subnets.
  • Create prescriptive firewall rules to reduce unsolicited traffic e.g don't use 0.0.0.0/0 tcp instead define the exact subnet(s) or host(s) IP addresses.
  • Leverage firewall policies to deepen the scope of ingress rules e.g geo-locations, threat intelligence lists, source domain names etc.

3. Notebook Best Practices

Right-size your instances

  • Stop and/or delete unused instances
  • Use smaller initial instance and iterate with smaller sample data
  • Scale up instances as required
  • Experiment with smaller datasets

Select the right machine types

  • Cost optimized VMs
  • Make better use of hardware resources to drive down costs
  • Up to 31% saving compared to N1
  • Additional savings (20-50%) for 1 or 3 year commits
  • Increasing the machine size or adding GPUs can help in performance and in overcoming memory limitations errors

Schedule your instances to shutdown

  • Switch-off instances when they are idle (pay for only disk storage)
  • Schedule notebook VM instances to shut-down and start-up automatically at specific hours

Monitor notebook health status

Security Considerations

The following are the recommended security considerations when creating a user-managed notebook:

  • Select the option for "single user only" notebook access. If the specified user is not the creator of the instance, you must grant the specified user the Service Account User role (roles/iam.serviceAccountUser) on the instance's service account.
  • Disable the following options:
  • root access
  • nbconvert
  • file downloading from JupyterLab UI
  • Cloud NAT will be used instead of assigning an external IP address to the user-managed notebook.
  • Select the following compute options:
  • Secure Boot
  • Virtual Trusted Platform Module (vTPM)
  • Integrity monitoring

4. Before you begin

Update the project to support the tutorial

This tutorial makes use of $variables to aid gcloud configuration implementation in Cloud Shell.

Inside Cloud Shell, perform the following:

gcloud config list project
gcloud config set project [your-project-name]
projectid=your-project-name
echo $projectid

5. VPC Setup

Create the Standalone VPC

Inside Cloud Shell, perform the following:

gcloud compute networks create securevertex-vpc --project=$projectid --subnet-mode=custom

Create the user-managed notebook subnet

Inside Cloud Shell, perform the following:

gcloud compute networks subnets create securevertex-subnet-a --project=$projectid --range=10.10.10.0/28 --network=securevertex-vpc --region=us-central1 --enable-private-ip-google-access

Cloud Router and NAT configuration

Cloud NAT is used in the tutorial for notebook software package downloads because the user-managed notebook instance does not have an external IP address. Cloud NAT also offers egress NAT capabilities, which means that internet hosts are not allowed to initiate communication with a user-managed notebook, making it more secure.

Inside Cloud Shell, create the regional cloud router.

gcloud compute routers create cloud-router-us-central1 --network securevertex-vpc --region us-central1

Inside Cloud Shell, create the regional cloud nat gateway.

gcloud compute routers nats create cloud-nat-us-central1 --router=cloud-router-us-central1 --auto-allocate-nat-external-ips --nat-all-subnet-ip-ranges --region us-central1

6. Create a storage bucket

Storage buckets offer secure file upload/retrieval, in the tutorial, the cloud storage will contain a post startup script to install Generative AI packages in the user managed notebooks.

Create a Cloud Storage bucket and replace BUCKET_NAME with a globally unique name you prefer.

Inside Cloud Shell, create a unique storage bucket.

gsutil mb -l us-central1 -b on gs://BUCKET_NAME

Store ‘BUCKET_NAME' for duration of the lab

BUCKET_NAME=YOUR BUCKET NAME
echo $BUCKET_NAME

7. Create a post startup script

To enable the download of the Generative AI packages, create a post-startup script in cloud shell using the vi or nano editor and save it as poststartup.sh.

#! /bin/bash
echo "Current user: id" >> /tmp/notebook_config.log 2>&1
echo "Changing dir to /home/jupyter" >> /tmp/notebook_config.log 2>&1
cd /home/jupyter
echo "Cloning generative-ai from github" >> /tmp/notebook_config.log 2>&1
su - jupyter -c "git clone https://github.com/GoogleCloudPlatform/generative-ai.git" >> /tmp/notebook_config.log 2>&1
echo "Current user: id" >> /tmp/notebook_config.log 2>&1
echo "Installing python packages" >> /tmp/notebook_config.log 2&1
su - jupyter -c "pip install --upgrade --no-warn-conflicts --no-warn-script-location --user \
     google-cloud-bigquery \
     google-cloud-pipeline-components \
     google-cloud-aiplatform \
     seaborn \
     kfp" >> /tmp/notebook_config.log 2>&1

Example:

vpc_admin@cloudshell$ more poststartup.sh 
#! /bin/bash
echo "Current user: id" >> /tmp/notebook_config.log 2>&1
echo "Changing dir to /home/jupyter" >> /tmp/notebook_config.log 2>&1
cd /home/jupyter
echo "Cloning generative-ai from github" >> /tmp/notebook_config.log 2>&1
su - jupyter -c "git clone https://github.com/GoogleCloudPlatform/generative-ai.git" >> /tmp/notebook_config.log 2>&1
echo "Current user: id" >> /tmp/notebook_config.log 2>&1
echo "Installing python packages" >> /tmp/notebook_config.log 2&1
su - jupyter -c "pip install --upgrade --no-warn-conflicts --no-warn-script-location --user \
     google-cloud-bigquery \
     google-cloud-pipeline-components \
     google-cloud-aiplatform \
     seaborn \
     kfp" >> /tmp/notebook_config.log 2>&1

Upload the post startup script to your storage bucket from cloud shell using gsutil

gsutil cp poststartup.sh gs://$BUCKET_NAME

8. Create a service account

To provide a fine level of control of the user-managed notebook a service account is required. Once generated, the service account permissions can be modified based on business requirements. In the tutorial, the service account will have the following rules applied:

You must the Service Account API before proceeding.

Inside Cloud Shell, create the service account.

gcloud iam service-accounts create user-managed-notebook-sa \
    --display-name="user-managed-notebook-sa"

Inside Cloud Shell, update the service account with the role Storage Object Viewer

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$projectid.iam.gserviceaccount.com" --role="roles/storage.objectViewer"

Inside Cloud Shell, update the service account with the role Vertex AI User

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:user-managed-notebook-sa@$projectid.iam.gserviceaccount.com" --role="roles/aiplatform.user"

Inside Cloud Shell, list the service account and note the email address that will be used when creating the user-managed notebook.

gcloud iam service-accounts list

Example:

$ gcloud iam service-accounts list
DISPLAY NAME: user-managed-notebook-sa
EMAIL: user-managed-notebook-sa@my-project-id.iam.gserviceaccount.com
DISABLED: False

9. Create a secure user-managed notebook

A user-managed notebooks instance is a Deep Learning virtual machine instance with the latest machine learning and data science libraries preinstalled. You can optionally include Nvidia GPUs for hardware acceleration.

Enable consumer APIs

the Notebooks API

Create the user-managed notebook

  1. Goto Workbench
  2. Select User-Managed Notebooks, and then select Create Notebook. The Create a user-managed notebook page opens.
  3. If an existing notebook is deployed, then select User-Managed Notebooks → New Notebook → Customize
  4. On the Create a user-managed notebook page, in the Details section, provide the following information for your new instance:
  • Name: Provide a name for your new instance.
  • Region and Zone: The tutorial will use the region us-central1 and zone us-central1-a

Select Continue

  1. In the Environment section, provide the following:
  • Operating system: Select the operating system that you want to use.
  • Select the environment that you want to use.
  • Version: Select the version that you want to use.
  • Post-startup script (Optional, use the previously created generative AI script) select Browse to select a script to run after the instance starts.
  • Metadata: Optional: Provide custom metadata keys for the instance.

Select Continue

  1. In the Machine type section, provide the following:
  • Machine type: Select the number of CPUs and amount of RAM for your new instance. Vertex AI Workbench provides monthly cost estimates for each machine type that you select.
  • GPU type: Select the GPU type and Number of GPUs for your new instance. For information about the different GPUs, see GPUs on Compute Engine.
  • Select the Install NVIDIA GPU driver automatically for me checkbox.

Shielded VM

  • Turn on Secure Boot
  • Turn on vTPM
  • Turn on Integrity Monitoring

Select Continue

  1. In the Disks section, provide the following:
  • Disks: Optional: To change the default boot or data disk settings, select the Boot disk type, Boot disk size in GB, Data disk type, and Data disk size in GB that you want. For more information about disk types, see Storage options.
  • Delete to trash: Optional: Select this checkbox to use the operating system's default trash behavior, If you use the default trash behavior, files deleted by using the JupyterLab user interface are recoverable but these deleted files do use disk space.
  • Backup: Optional: To sync a Cloud Storage location with your instance's data disk, select Browse and specify the Cloud Storage location. To learn about storage costs, see Cloud Storage pricing.
  • Encryption: Google-managed encryption key

Select Continue

  1. In the Networking section, provide the following:
  • Networking: Select either Networks in this project or Networks shared with me. If you are using a Shared VPC in the host project, you must also grant the Compute Network User role (roles/compute.networkUser) to the Notebooks Service Agent from the service project.
  • In the Network field, select the network that you want. The tutorial is using the network, securevertex-vpc. You can select a VPC network, as long as the network has Private Google Access enabled or can access the internet. In the Subnetwork field, select the subnetwork that you want, in the tutorial the subnetwork securevertex-subnet-a is used.
  • Deselect assign external IP address
  • Select allow proxy access

Select Continue

81bb7dbe31fbf587.png

  1. In the IAM and security section, provide the following:
  • Select Single user and then, in the User email field, enter the user account that you want to grant access. If the specified user is not the creator of the instance, you must grant the specified user the Service Account User role (roles/iam.serviceAccountUser) on the instance's service account.
  • Deselect Use default Compute Engine service account on the VM to call Google Cloud APIs
  • Enter the newly created service account email address, example: user-managed-notebook-sa@my-project-id.iam.gserviceaccount.com

Security options

  • Deselect enable root access to the instance
  • Deselect enable nbconvert
  • Deselect Enable file downloading from JupyterLab UI
  • Enable terminal (Deselect for production environments)

Select Continue

e19f3cd05a2c1b7f.png

  1. In the System health section, provide the following

Environment upgrade and system health

  • Select the Enable environment auto-upgrade checkbox.
  • Choose whether to upgrade your notebook Weekly or Monthly.

In System health and reporting, select or clear the following checkboxes:

  • Enable system health report
  • Report custom metrics to Cloud Monitoring
  • Install Cloud Monitoring agent

Select Create.

10. Validation

Vertex AI Workbench creates a user-managed notebooks instance based on your specified properties and automatically starts the instance. When the instance is ready to use, Vertex AI Workbench activates an Open JupyterLab link that allows the end user access to the notebook.

11. Observability

Monitor system and application metrics through Monitoring

For user-managed notebooks instances that have Monitoring installed, you can monitor your system and application metrics by using the Google Cloud console:

  1. In the Google Cloud console, go to the User-managed notebooks page.
  2. Click the instance name that you want to view the system and application metrics of.
  3. On the Notebook details page, click the Monitoring tab. Review the system and application metrics for your instance.

12. Create a Notebook Schedule

Instance schedules let you start and stop virtual machine (VM) instances automatically. Using instance schedules to automate deployment of your VM instances can help you optimize costs and manage VM instances more efficiently. You can use instance schedules for both recurring and one-off workloads. For example, use instance schedules to only run VM instances during working hours or to provide capacity for a one time event.

To use instance schedules, create a resource policy detailing the start and stop behavior, and then attach the policy to one or more VM instances.

The tutorial will show you how to create an instance schedule that will power on your notebook at 7 AM and power it off at 6 PM.

To create the instance schedule you will need the permission compute.instances.start and compute.instances.stop therefore a custom role is recommended, created by the administrator granted to you.

Once created, the custom role will be assigned to the default Compute Engine service account in your project, which will allow the instance schedule to start and stop your notebook.

Create a custom role

Inside Cloud Shell, create a custom role, VmScheduler and include the necessary permissions.

gcloud iam roles create Vm_Scheduler --project=$projectid \
    --title=vm-scheduler-notebooks \
    --permissions="compute.instances.start,compute.instances.stop" --stage=ga

Describe the custom role from Cloud Shell.

gcloud iam roles describe Vm_Scheduler --project=$projectid

Example:

$ gcloud iam roles describe Vm_Scheduler --project=$projectid
etag: BwX991B0_kg=
includedPermissions:
- compute.instances.start
- compute.instances.stop
name: projects/$projectid/roles/Vm_Scheduler
stage: GA
title: vm-scheduler-notebooks

Update the default service account

In the following section, you will identify and update the default service account that consists of the format: PROJECT_NUMBER-compute@developer.gserviceaccount.com

In Cloud Shell, identify the current project number.

gcloud projects list --filter=$projectid

In Cloud Shell, store the project number as a variable.

project_number=your_project_number
echo $project_number

In Cloud Shell, update the default compute service account with the custom role, VM_Scheduler.

gcloud projects add-iam-policy-binding $projectid --member="serviceAccount:$project_number-compute@developer.gserviceaccount.com" --role="projects/$projectid/roles/Vm_Scheduler"

Create the instance schedule

In Cloud Shell, create the start and stop schedule.

gcloud compute resource-policies create instance-schedule optimize-notebooks \
    --region=us-central1 \
    --vm-start-schedule='0 7 * * *' \
    --vm-stop-schedule='0 18 * * *' \
        --timezone=America/Chicago

In Cloud Shell, store the name of your notebook's name.

gcloud compute instances list
notebook_vm=your_notebookvm_name
echo $notebook_vm

You can attach an instance schedule to any existing VM instance that is located in the same region as the instance schedule.

In Cloud Shell, associate the schedule with your notebook.

gcloud compute instances add-resource-policies $notebook_vm \
--resource-policies=optimize-notebooks \
--zone=us-central1-a

13. Clean up

Delete the user-managed notebook from console, navigate to Vertex AI → Workbench, select and delete the notebook.

From Cloud Shell, delete VPC components.

gcloud compute routers delete cloud-router-us-central1 --region=us-central1 --quiet

gcloud compute routers nats delete cloud-nat-us-central1 --region=us-central1 --router=cloud-router-us-central1 --quiet

gcloud compute instances remove-resource-policies $notebook_vm \
--resource-policies=optimize-notebooks \
--zone=us-central1-a --quiet

gcloud compute resource-policies delete optimize-notebooks --region=us-central1 --quiet

gcloud compute instances delete $notebook_vm --zone=us-central1-a --quiet

gcloud compute networks subnets delete securevertex-subnet-a --region=us-central1 --quiet 

gcloud iam service-accounts delete user-managed-notebook-sa@$projectid.iam.gserviceaccount.com --quiet 

gcloud projects remove-iam-policy-binding $projectid --member="serviceAccount:$project_number-compute@developer.gserviceaccount.com" --role="projects/$projectid/roles/Vm_Scheduler"

gcloud iam roles delete Vm_Scheduler --project=$projectid

gcloud compute networks delete securevertex-vpc --quiet 

14. Congratulations

Well done! You have successfully configured and validated a secure user-managed notebook by creating a custom Standalone VPC using security hardening best practices for managed notebooks and implemented an instance schedule to optimize spending.

What's next?

Check out some of these tutorials...

Further reading & Videos

Reference docs