Welcome to the Google Codelab for running a Lustre Parallel Filesystem cluster in Google Cloud Platform!

Data is core to the practice of High Performance Computing, and accessing large amounts of data at extremely high speeds and low latencies has always been a key challenge in running HPC workloads. This requirement for high performance storage has not changed in the cloud, and in fact the ability to utilize vast amounts of storage quickly and easily has become paramount.

HPC centers have long met this need on-premise using technologies like the Lustre parallel file system. Lustre is one of the most popular open source high performance storage solutions today, and since June 2005, it has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world. Lustre has the ability to scale up to hundreds of PB of capacity, and deliver the maximum performance for HPC jobs possible, with systems delivering TB/s of throughput in a single namespace.

In order to serve the demand for storage, our engineers at Google Cloud have developed and open-sourced a set of scripts to easily configure and deploy a Lustre storage cluster on Google Compute Engine using the Google Cloud Deployment Manager. Lustre on Google Cloud Platform is equally capable of delivering the maximum performance of the infrastructure it's running on. It's performance on GCP is so good that it placed 8th on the IO-500 storage system benchmark in 2019 with our partner DDN, representing the highest-ranking cloud-based filesystem on the IO-500. Today we will walk you through deploying the Open Source Deployment Manager scripts for Lustre, however if you are interested in having support for your Lustre cluster, as well as features like a management and monitoring GUI or Lustre tunings, we recommend investigating the DDN Cloud Edition for Lustre Marketplace offering.

What you'll learn

Prerequisites

Self-paced environment setup

Create a Project

If you don't already have a Google Account (Gmail or G Suite), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and open the Manage resources page:

Click Create Project.

Enter a project name. Remember the project ID (highlighted in red in the screenshot above). The project ID must be a unique name across all Google Cloud projects. If your project name is not unique Google Cloud will generate a random project ID based on the project name.

Next, you'll need to enable billing in the Developers Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "Conclusion" section at the end of this document). The Google Cloud Platform pricing calculator is available here.

New users of Google Cloud Platform are eligible for a $300 free trial.

Google Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab we will be using Google Cloud Shell, a command line environment running in the Cloud.

Launch Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

Then click Start Cloud Shell:

It should only take a few moments to provision and connect to the environment:

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and simplifying authentication. Much, if not all, of your work in this lab can be done with simply a web browser or a Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID:

$ gcloud auth list


Command output:

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
$ gcloud config list project


Command output:

[core]
project = <PROJECT_ID>


If the project ID is not set correctly you can set it with this command:

$ gcloud config set project <PROJECT_ID>

Command output:

Updated property [core/project].

Download the Lustre Deployment Manager Scripts

In the Cloud Shell session, execute the following command to clone (download) the Git repository that contains the Lustre for Google Cloud Platform deployment-manager files:

git clone https://github.com/GoogleCloudPlatform/deploymentmanager-samples.git

Switch to the Lustre deployment configuration directory by executing the following command:

cd deploymentmanager-samples/community/lustre/

Configure Lustre Deployment YAML

Deployment Manager uses a YAML file to provide deployment configuration. This YAML file details the configuration of the deployment, such as the Lustre version to deploy, and the machine instance types to deploy. The file is configured by default to deploy in a new project without any quota increases, however you may increase or decrease the instance profiles or capacity as desired for this codelab. This codelab is written to use these defaults, so if you do make any changes you must carry those changes throughout this codelab to avoid errors. In production, we recommend at least a 32 vCPU instance for the MDS node, and at least an 8 or 16 vCPU instance for the OSS nodes, depending on storage capacity and type.

To review or edit the YAML file in the Cloud Shell session, open the deployment configuration YAML file Lustre-cluster.yaml. You can either use your preferred command line editor (vi, nano, emacs, etc.) or use the Cloud Console Code Editor to view the file contents:

The contents of the file will look like this:

# [START cluster_yaml]
imports:
- path: lustre.jinja

resources:
- name: lustre
  type: lustre.jinja
  properties:
    ## Cluster Configuration
    cluster_name            : lustre
    zone                    : us-central1-f
    cidr                    : 10.20.0.0/16
    external_ips            : True
    ### Use these fields to deploy Lustre in an existing VPC, Subnet, and/or Shared VPC
    #vpc_net                 : < VPC Network Name >
    #vpc_subnet              : < VPC Subnet Name >
    #shared_vpc_host_proj    : < Shared VPC Host Project name >

    ## Filesystem Configuration
    fs_name                 : lustre
    ### Review https://downloads.whamcloud.com/public/ to determine version naming
    lustre_version          : latest-release
    e2fs_version            : latest

    ## Lustre MDS/MGS Node Configuration
    #mds_node_count          : 1
    mds_ip_address          : 10.20.0.2
    mds_machine_type        : n1-standard-8
    ### MDS/MGS Boot disk
    mds_boot_disk_type      : pd-standard
    mds_boot_disk_size_gb   : 10
    ### Lustre MetaData Target disk
    mdt_disk_type           : pd-ssd
    mdt_disk_size_gb        : 1000

    ## Lustre OSS Configuration
    oss_node_count          : 4
    oss_ip_range_start      : 10.20.0.5
    oss_machine_type        : n1-standard-4
    ### OSS Boot disk
    oss_boot_disk_type      : pd-standard
    oss_boot_disk_size_gb   : 10
    ### Lustre Object Storage Target disk
    ost_disk_type           : pd-standard
    ost_disk_size_gb        : 5000
#  [END cluster_yaml]

Within this YAML file there are several fields. Fields below with an asterisk (*) is required. These fields include:

Cluster Configuration

Filesystem Configuration

MDS/MGS Configuration

OSS Configuration

Deploy the Configuration

In the Cloud Shell session, execute the following command from the Lustre-gcp folder:

gcloud deployment-manager deployments create lustre --config lustre.yaml

This command creates a deployment named Lustre. The operation can take a few minutes to complete, so please be patient.

Once the deployment has completed you will see output similar to:

Create operation operation-1572410719018-5961966591cad-e25384f6-d4c905f8 completed successfully.
NAME                                TYPE                   STATE      ERRORS  INTENT
lustre-all-internal-firewall-rule  compute.v1.firewall    COMPLETED  []
lustre-lustre-network              compute.v1.network     COMPLETED  []
lustre-lustre-subnet               compute.v1.subnetwork  COMPLETED  []
lustre-mds1                        compute.v1.instance    COMPLETED  []
lustre-oss1                        compute.v1.instance    COMPLETED  []
lustre-oss2                        compute.v1.instance    COMPLETED  []
lustre-oss3                        compute.v1.instance    COMPLETED  []
lustre-oss4                        compute.v1.instance    COMPLETED  []
lustre-ssh-firewall-rule           compute.v1.firewall    COMPLETED  []

Verify the Deployment

Follow these steps to view the deployment in Google Cloud Platform Console:

With the deployment's configuration verified let's confirm the cluster's instances are started. In the Cloud Platform Console, in the Products & Services menu, click Compute Engine > VM Instances.

On the VM Instances page, review the five virtual machine instances that have been created by the deployment manager. This includes lustre-mds1, lustre-oss1, lustre-oss2, lustre-oss3, and lustre-oss4.

Monitor the Installation

On the VM Instances page, click lustre-mds1 to open the Instance details page.

Click on Serial port 1 (console) to open the serial console output page. We will use this serial output to monitor the installation process of the MDS instance, and wait until the startup-script has completed. The node will reboot once to boot into the Lustre kernel, and display messages similar to below:

Startup finished in 838ms (kernel) + 6.964s (initrd) + 49.302s (userspace) = 57.105s.
Lustre: lustre-MDT0000: Connection restored to 374e2d80-0b31-0cd7-b2bf-de35b8119534 (at 0@lo)

This means Lustre is installed on the Lustre cluster, and the filesystem is ready to be utilized!

Access the Lustre Cluster

In the Cloud Shell session, click the SSH button next to the lustre-mds1 instance in the Google Cloud Console. Alternatively, execute the following command in Cloud Shell, substituting <ZONE> for the lustre-mds1 node's zone:

gcloud compute ssh lustre-mds1 --zone=<ZONE>

This command logs into the lustre-mds1 virtual machine. This is the Lustre Metadata Server (MDS) instance, which also acts as the Lustre Management Server (MGS) instance. This instance handles all authentication and metadata requests for the filesystem.

Let's mount the filesystem on our lustre-mds1 instance in order to be able to test it later. Execute the following commands:

sudo mkdir /mnt/lustre
sudo mount -t lustre lustre-mds1:/lustre /mnt/lustre
cd /mnt/lustre

You've now mounted the Lustre filesystem at /mnt/lustre. Let's take a look at what we can do with this filesystem.

If you are not familiar with Lustre and it's tools, we will walk through a few important commands here.

Lustre's low-level cluster management tool is "lctl". We can use lctl to configure and manage the Lustre cluster, and to view the Lustre cluster's services. To view the services and instances in our new Lustre cluster, execute:

sudo lctl dl

You will see output similar to below, depending on what changes you made to the Lustre YAML configuration file:

  0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 11
  1 UP mgs MGS MGS 12
  2 UP mgc MGC10.128.15.2@tcp 374e2d80-0b31-0cd7-b2bf-de35b8119534 4
  3 UP mds MDS MDS_uuid 2
  4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 3
  5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 12
  6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 3
  7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 3
  8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 4
  9 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 10 UP osp lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 11 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 12 UP osp lustre-OST0003-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4

We can see our Lustre Management Server (MGS) as item 1, our Lustre Metadata Server (MDS) as item 3, our Lustre Metadata Target (MDT) as item 5, and our four Lustre Object Storage Servers (OSS) as items 8 through 12. To understand what the other services are, please review the Lustre Manual.

Lustre's filesystem configuration tool is "lfs". We can use lfs to manage striping of files across our Lustre Object Storage Servers (OSS) and their respective Object Storage Targets (OST), as well as running common filesystem operations like find, df, and quota management.

Striping allows us to configure how a file is distributed across our Lustre cluster to deliver the best performance possible. While striping a large file across as many OSSs as possible often delivers the best performance by parallelizing the IO, striping a small file may lead to worse performance than if that file were only written to a single instance.

To test this, let's set up two directories, one with a stripe count of one OSS, and one with a stripe count of "-1", indicating that the files written in that directory should be striped across as many OSSs as possible. Directories can hold striping configurations that are inherited by files created within them, but sub-directories and individual files within that directory can then be configured to be striped differently if desired. To make these two directories, execute the following commands while in the "/mnt/lustre" directory:

sudo mkdir stripe_one
sudo mkdir stripe_all
sudo lfs setstripe -c 1 stripe_one/
sudo lfs setstripe -c -1 stripe_all/

You can view the stripe settings of a file or directory using lfs getstripe:

sudo lfs getstripe stripe_all/

You will see output showing the stripe count set as -1:

stripe_all/
stripe_count:  -1 stripe_size:   1048576 pattern:    raid0 stripe_offset: -1

Now we're ready to test the performance improvements achievable by writing a large file striped across multiple OSSs.

We will run two simple tests of the Lustre IO to demonstrate the possible performance advantages and scaling capabilities of the Lustre filesystem. First, we will run a simple test using the "dd" utility to write a 5GB file to our "stripe_one" directory. Execute the following command:

sudo dd if=/dev/zero of=stripe_one/test bs=1M count=5000

The process to write 5GB of data to the filesystem averages around 27 seconds, writing to a single Persistent Disk (PD) on a single Object Storage Server (OSS).

To test striping across multiple OSSs, and therefore multiple PDs, we simply need to change the output directory we write to. Execute the following command:

sudo dd if=/dev/zero of=stripe_all/test bs=1M count=5000

Notice we changed "of=stripe_one/test" to "of=stripe_all/test". This will allow our single stream write to distribute it's writes across all of our Object Storage Servers, and complete the write in on average 5.5 seconds, about 4x as quickly with four OSSs.

This performance continues to increase as you add Object Storage Servers, and you can add OSSs with the filesystem online and begin striping data to them to increase capacity and performance online. The possibilities are endless using Lustre on Google Cloud Platform, and we're excited to see what you can build, and what problems you can solve.

Congratulations, you've created a Lustre cluster on Google Cloud Platform! You can use these scripts as a starting point to build your own Lustre cluster, and to integrate it with your cloud-based computing cluster.

Clean Up the Deployment

Logout of the Lustre node:

exit

You can easily clean up the deployment after we're done by executing the following command from your Google Cloud Shell, after logging out of the Lustre cluster:

gcloud deployment-manager deployments delete lustre

When prompted, type Y to continue. This operation can take some time, please be patient.

Delete the Project

To cleanup, we simply delete our project.

What we've covered

Find Support

Are you building something cool using the Lustre deployment manager scripts? Have questions? Chat with us in the Google Cloud Lustre discussion group. To request features, provide feedback, or report bugs please use this form, or feel free to modify the code and submit a pull request! Want to speak to a Google Cloud expert? Reach out to the Google Cloud team today through Google Cloud's High Performance Computing website.

Learn More

Feedback

Please submit feedback about this codelab using this link. Feedback takes less than 5 minutes to complete. Thank you!