Deploy a Lustre Parallel File System on GCP

Welcome to the Google Codelab for running a Lustre Parallel file system cluster on Google Cloud Platform!

d51beef5f729cbe9.png

Data is core to the practice of High Performance Computing, and accessing large amounts of data at extremely high speeds and low latencies has always been a key challenge in running HPC workloads. This requirement for high performance storage has not changed in the cloud, and in fact the ability to utilize vast amounts of storage quickly and easily has become paramount.

HPC centers have long met this need on-premise using technologies like the Lustre parallel file system. Lustre is one of the most popular open source high performance storage solutions today, and since June 2005, it has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world. Lustre has the ability to scale up to hundreds of PB of capacity, and deliver the maximum performance for HPC jobs possible, with systems delivering TB/s of throughput in a single namespace.

In order to serve the demand for storage, Google Cloud has taken two approaches. First, GCP partnered with DDN to bring their supported, enterprise-class DDN EXAScaler Lustre software to the GCP Marketplace. Second, our engineers at Google Cloud have developed and open-sourced a set of scripts to easily configure and deploy a Lustre storage cluster on Google Compute Engine using the Google Cloud Deployment Manager.

Lustre on Google Cloud Platform is equally capable of delivering the maximum performance of the infrastructure it's running on. It's performance on GCP is so good that it placed 8th on the IO-500 storage system benchmark in 2019 with our partner DDN, representing the highest-ranking cloud-based file system on the IO-500. Today we will walk you through deploying the Open Source Deployment Manager scripts for Lustre. If you are interested in having an enterprise, hardened Lustre experience, with Lustre-expert support for your Lustre cluster, as well as features like a management and monitoring GUI or Lustre tunings, we recommend investigating the DDN EXAScaler Marketplace offering.

What you'll learn

  • How to use the GCP Deployment Manager Service
  • How to configure and deploy a Lustre file system on GCP.
  • How to configure striping and test simple I/O to the Lustre file system.

Prerequisites

  • Google Cloud Platform Account and a Project with Billing
  • Basic Linux Experience

Self-paced environment setup

Create a Project

If you don't already have a Google Account (Gmail or G Suite), you must create one. Sign-in to Google Cloud Platform console ( console.cloud.google.com) and open the Manage resources page:

359c06e07e6d699f.png

Click Create Project.

25c23d651abb837b.png

Enter a project name. Remember the project ID (highlighted in red in the screenshot above). The project ID must be a unique name across all Google Cloud projects. If your project name is not unique Google Cloud will generate a random project ID based on the project name.

Next, you'll need to enable billing in the Developers Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "Conclusion" section at the end of this document). The Google Cloud Platform pricing calculator is available here.

New users of Google Cloud Platform are eligible for a $300 free trial.

Google Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab we will be using Google Cloud Shell, a command line environment running in the Cloud.

Launch Google Cloud Shell

From the GCP Console click the Cloud Shell icon on the top right toolbar:

dbad104cef962719.png

Then click Start Cloud Shell:

4e50db320508ac88.png

It should only take a few moments to provision and connect to the environment:

20b0aa80492144d.png

This virtual machine is loaded with all the development tools you'll need. It offers a persistent 5GB home directory, and runs on the Google Cloud, greatly enhancing network performance and simplifying authentication. Much, if not all, of your work in this lab can be done with simply a web browser or a Google Chromebook.

Once connected to the cloud shell, you should see that you are already authenticated and that the project is already set to your PROJECT_ID:

$ gcloud auth list

Command output:

Credentialed accounts:
 - <myaccount>@<mydomain>.com (active)
$ gcloud config list project

Command output:

[core]
project = <PROJECT_ID>

If the project ID is not set correctly you can set it with this command:

$ gcloud config set project <PROJECT_ID>

Command output:

Updated property [core/project].

Download the Lustre Deployment Manager Scripts

In the Cloud Shell session, execute the following command to clone (download) the Git repository that contains the Lustre for Google Cloud Platform deployment-manager files:

git clone https://github.com/GoogleCloudPlatform/deploymentmanager-samples.git

Switch to the Lustre deployment configuration directory by executing the following command:

cd deploymentmanager-samples/community/lustre/

Configure Lustre Deployment YAML

Deployment Manager uses a YAML file to provide deployment configuration. This YAML file details the configuration of the deployment, such as the Lustre version to deploy, and the machine instance types to deploy. The file is configured by default to deploy in a new project without any quota increases, however you may change the machine type or capacity as desired for this codelab. This codelab is written to use these defaults, so if you do make any changes you must carry those changes throughout this codelab to avoid errors. In production, we recommend at least a 32 vCPUs instance for the MDS node, and at least an 8 or 16 vCPUs instance for the OSS nodes, depending on storage capacity and type.

To review or edit the YAML file in the Cloud Shell session, open the deployment configuration YAML file Lustre-cluster.yaml. You can either use your preferred command line editor (vi, nano, emacs, etc.) or use the Cloud Console Code Editor to view the file contents:

11efd5af658f1842.png

The contents of the file will look like this:

# [START cluster_yaml]
imports:
- path: lustre.jinja

resources:
- name: lustre
  type: lustre.jinja
  properties:
    ## Cluster Configuration
    cluster_name            : lustre
    zone                    : us-central1-f
    cidr                    : 10.20.0.0/16
    external_ips            : True
    ### Use these fields to deploy Lustre in an existing VPC, Subnet, and/or Shared VPC
    #vpc_net                 : < VPC Network Name >
    #vpc_subnet              : < VPC Subnet Name >
    #shared_vpc_host_proj    : < Shared VPC Host Project name >

    ## Filesystem Configuration
    fs_name                 : lustre
    ### Review https://downloads.whamcloud.com/public/ to determine version naming
    lustre_version          : latest-release
    e2fs_version            : latest

    ## Lustre MDS/MGS Node Configuration
    #mds_node_count          : 1
    mds_ip_address          : 10.20.0.2
    mds_machine_type        : n1-standard-8
    ### MDS/MGS Boot disk
    mds_boot_disk_type      : pd-standard
    mds_boot_disk_size_gb   : 10
    ### Lustre MetaData Target disk
    mdt_disk_type           : pd-ssd
    mdt_disk_size_gb        : 1000

    ## Lustre OSS Configuration
    oss_node_count          : 4
    oss_ip_range_start      : 10.20.0.5
    oss_machine_type        : n1-standard-4
    ### OSS Boot disk
    oss_boot_disk_type      : pd-standard
    oss_boot_disk_size_gb   : 10
    ### Lustre Object Storage Target disk
    ost_disk_type           : pd-standard
    ost_disk_size_gb        : 5000
#  [END cluster_yaml]

Within this YAML file there are several fields. Fields below with an asterisk (*) is required. These fields include:

Cluster Configuration

  • cluster_name* - Name of the Lustre cluster, prepends all deployed resources
  • zone* - Zone to deploy the cluster into
  • cidr* - IP range in CIDR format
  • external_ips* - True/False, Lustre nodes have external IP addresses. If false then a Cloud NAT is setup as a NAT gateway
  • vpc_net - Define this field, and the vpc_subnet field, to deploy the Lustre cluster to an existing VPC
  • vpc_subnet - Existing VPC subnet to deploy Lustre cluster to
  • shared_vpc_host_proj - Define this field, as well as the vpc_net and vpc_subnet fields, to deploy the cluster to a Shared VPC

File system Configuration

MDS/MGS Configuration

  • mds_ip_address - Internal IP Address to specify for MDS/MGS node
  • mds_machine_type - Machine type to use for MDS/MGS node (see https://cloud.google.com/compute/docs/machine-types)
  • mds_boot_disk_type - Disk type to use for the MDS/MGS boot disk (pd-standard, pd-ssd)
  • mds_boot_disk_size_gb - Size of MDS boot disk in GB
  • mdt_disk_type* - Disk type to use for the Metadata Target (MDT) disk (pd-standard, pd-ssd, local-ssd)
  • mdt_disk_size_gb* - Size of MDT disk in GB

OSS Configuration

  • oss_node_count* - Number of Object Storage Server (OSS) nodes to create
  • oss_ip_range_start - Start of the IP range for the OSS node(s). If not specified, use automatic IP assignment
  • oss_machine_type - Machine type to use for OSS node(s)
  • oss_boot_disk_type - Disk type to use for the OSS boot disk (pd-standard, pd-ssd)
  • oss_boot_disk_size_gb - Size of MDS boot disk in GB
  • ost_disk_type* - Disk type to use for the Object Storage Target (OST) disk (pd-standard, pd-ssd, local-ssd)
  • ost_disk_size_gb* - Size of OST disk in GB

Deploy the Configuration

In the Cloud Shell session, execute the following command from the Lustre-gcp folder:

gcloud deployment-manager deployments create lustre --config lustre.yaml

This command creates a deployment named Lustre. The operation can take up to 10-20 minutes to complete, so please be patient.

Once the deployment has completed you will see output similar to:

Create operation operation-1572410719018-5961966591cad-e25384f6-d4c905f8 completed successfully.
NAME                                TYPE                   STATE      ERRORS  INTENT
lustre-all-internal-firewall-rule  compute.v1.firewall    COMPLETED  []
lustre-lustre-network              compute.v1.network     COMPLETED  []
lustre-lustre-subnet               compute.v1.subnetwork  COMPLETED  []
lustre-mds1                        compute.v1.instance    COMPLETED  []
lustre-oss1                        compute.v1.instance    COMPLETED  []
lustre-oss2                        compute.v1.instance    COMPLETED  []
lustre-oss3                        compute.v1.instance    COMPLETED  []
lustre-oss4                        compute.v1.instance    COMPLETED  []
lustre-ssh-firewall-rule           compute.v1.firewall    COMPLETED  []

Verify the Deployment

5f2a0557d3f2476f.png

Follow these steps to view the deployment in Google Cloud Platform Console:

  • In the Cloud Platform Console, open the Products & Services menu in the top left corner of the console (three horizontal lines).
  • Click Deployment Manager.
  • Click Lustre to view the details of the deployment.
  • Click Overview - Lustre. The Deployment properties pane displays the overall deployment configuration.
  • Click "View" on the Config property. The Config pane displays the contents of the deployment configuration YAML file modified earlier. Verify the contents are correct before proceeding. If you need to change a deployment configuration simply delete the deployment according to steps in "Clean Up the Deployment", and restart the deployment according to the steps in "Configure Lustre Deployment YAML".
  • (Optional) Under the Lustre-cluster section, click each of the resources created by the Lustre.jinja template and review the details.

With the deployment's configuration verified let's confirm the cluster's instances are started. In the Cloud Platform Console, in the Products & Services menu, click Compute Engine > VM Instances.

aec8498e04a3c334.png

On the VM Instances page, review the five virtual machine instances that have been created by the deployment manager. This includes lustre-mds1, lustre-oss1, lustre-oss2, lustre-oss3, and lustre-oss4.

Monitor the Installation

On the VM Instances page, click lustre-mds1 to open the Instance details page.

ba0bea7acdbb9527.png

Click on Serial port 1 (console) to open the serial console output page. We will use this serial output to monitor the installation process of the MDS instance, and wait until the startup-script has completed. Click the "refresh" button at the top of the page to update the serial output. The node will reboot once to boot into the Lustre kernel, and display messages similar to below:

Startup finished in 838ms (kernel) + 6.964s (initrd) + 49.302s (userspace) = 57.105s.
Lustre: lustre-MDT0000: Connection restored to 374e2d80-0b31-0cd7-b2bf-de35b8119534 (at 0@lo)

This means Lustre is installed on the Lustre cluster, and the file system is ready to be utilized!

Access the Lustre Cluster

In the Cloud Shell session, click the SSH button next to the lustre-mds1 instance in the Google Cloud Console. Alternatively, execute the following command in Cloud Shell, substituting <ZONE> for the lustre-mds1 node's zone:

gcloud compute ssh lustre-mds1 --zone=<ZONE>

This command logs into the lustre-mds1 virtual machine. This is the Lustre Metadata Server (MDS) instance, which also acts as the Lustre Management Server (MGS) instance. This instance handles all authentication and metadata requests for the file system.

Let's mount the file system on our lustre-mds1 instance in order to be able to test it later. Execute the following commands:

sudo mkdir /mnt/lustre
sudo mount -t lustre lustre-mds1:/lustre /mnt/lustre
cd /mnt/lustre

These three commands do three things. The first command creates a local directory we will use as a mount point at "/mnt/lustre". The second command runs the "mount" command to mount the "lustre" type file system, which lives on the lustre-mds1 server, and where the file system name is "lustre", seen as "/lustre". The mount command mounts the Lustre file system at your local "/mnt/lustre" directory. Finally, the third command changes directory to the /mnt/lustre directory, where Lustre is mounted.

You've now mounted the Lustre file system at /mnt/lustre. Let's take a look at what we can do with this file system.

If you are not familiar with Lustre and it's tools, we will walk through a few important commands here.

Lustre's low-level cluster management tool is "lctl". We can use lctl to configure and manage the Lustre cluster, and to view the Lustre cluster's services. To view the services and instances in our new Lustre cluster, execute:

sudo lctl dl

You will see output similar to below, depending on what changes you made to the Lustre YAML configuration file:

  0 UP osd-ldiskfs lustre-MDT0000-osd lustre-MDT0000-osd_UUID 11
  1 UP mgs MGS MGS 12
  2 UP mgc MGC10.128.15.2@tcp 374e2d80-0b31-0cd7-b2bf-de35b8119534 4
  3 UP mds MDS MDS_uuid 2
  4 UP lod lustre-MDT0000-mdtlov lustre-MDT0000-mdtlov_UUID 3
  5 UP mdt lustre-MDT0000 lustre-MDT0000_UUID 12
  6 UP mdd lustre-MDD0000 lustre-MDD0000_UUID 3
  7 UP qmt lustre-QMT0000 lustre-QMT0000_UUID 3
  8 UP lwp lustre-MDT0000-lwp-MDT0000 lustre-MDT0000-lwp-MDT0000_UUID 4
  9 UP osp lustre-OST0000-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 10 UP osp lustre-OST0002-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 11 UP osp lustre-OST0001-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4
 12 UP osp lustre-OST0003-osc-MDT0000 lustre-MDT0000-mdtlov_UUID 4

We can see our Lustre Management Server (MGS) as item 1, our Lustre Metadata Server (MDS) as item 3, our Lustre Metadata Target (MDT) as item 5, and our four Lustre Object Storage Servers (OSS) as items 8 through 12. To understand what the other services are, please review the Lustre Manual.

Lustre's file system configuration tool is "lfs". We can use lfs to manage striping of files across our Lustre Object Storage Servers (OSS) and their respective Object Storage Targets (OST), as well as running common file system operations like find, df, and quota management.

Striping allows us to configure how a file is distributed across our Lustre cluster to deliver the best performance possible. While striping a large file across as many OSSs as possible often delivers the best performance by parallelizing the IO, striping a small file may lead to worse performance than if that file were only written to a single instance.

To test this, let's set up two directories, one with a stripe count of one OSS, and one with a stripe count of "-1", indicating that the files written in that directory should be striped across as many OSSs as possible. Directories can hold striping configurations that are inherited by files created within them, but sub-directories and individual files within that directory can then be configured to be striped differently if desired. To make these two directories, execute the following commands while in the "/mnt/lustre" directory:

sudo mkdir stripe_one
sudo mkdir stripe_all
sudo lfs setstripe -c 1 stripe_one/
sudo lfs setstripe -c -1 stripe_all/

You can view the stripe settings of a file or directory using lfs getstripe:

sudo lfs getstripe stripe_all/

You will see output showing the stripe count set as -1:

stripe_all/
stripe_count:  -1 stripe_size:   1048576 pattern:    raid0 stripe_offset: -1

Now we're ready to test the performance improvements achievable by writing a large file striped across multiple OSSs.

We will run two simple tests of the Lustre IO to demonstrate the possible performance advantages and scaling capabilities of the Lustre file system. First, we will run a simple test using the "dd" utility to write a 5GB file to our "stripe_one" directory. Execute the following command:

sudo dd if=/dev/zero of=stripe_one/test bs=1M count=5000

The process to write 5GB of data to the file system averages around 27 seconds, writing to a single Persistent Disk (PD) on a single Object Storage Server (OSS).

To test striping across multiple OSSs, and therefore multiple PDs, we simply need to change the output directory we write to. Execute the following command:

sudo dd if=/dev/zero of=stripe_all/test bs=1M count=5000

Notice we changed "of=stripe_one/test" to "of=stripe_all/test". This will allow our single stream write to distribute it's writes across all of our Object Storage Servers, and complete the write in on average 5.5 seconds, about 4x as quickly with four OSSs.

This performance continues to increase as you add Object Storage Servers, and you can add OSSs with the file system online and begin striping data to them to increase capacity and performance online. The possibilities are endless using Lustre on Google Cloud Platform, and we're excited to see what you can build, and what problems you can solve.

Congratulations, you've created a Lustre cluster on Google Cloud Platform! You can use these scripts as a starting point to build your own Lustre cluster, and to integrate it with your cloud-based computing cluster.

Clean Up the Deployment

Logout of the Lustre node:

exit

You can easily clean up the deployment after we're done by executing the following command from your Google Cloud Shell, after logging out of the Lustre cluster:

gcloud deployment-manager deployments delete lustre

When prompted, type Y to continue. This operation can take some time, please be patient.

Delete the Project

To cleanup, we simply delete our project.

  • In the navigation menu select IAM & Admin
  • Then click on settings in the submenu
  • Click on the trashcan icon with the text "Delete Project"
  • Follow the prompts instructions

What we've covered

  • How to use the GCP Deployment Manager Service.
  • How to configure and deploy a Lustre file system on GCP.
  • How to configure striping and test simple I/O to the Lustre file system.

Find Support

Are you building something cool using the Lustre deployment manager scripts? Have questions? Chat with us in the Google Cloud Lustre discussion group. To request features, provide feedback, or report bugs please use this form, or feel free to modify the code and submit a pull request! Want to speak to a Google Cloud expert? Reach out to the Google Cloud team today through Google Cloud's High Performance Computing website.

Learn More

Feedback

Please submit feedback about this codelab using this link. Feedback takes less than 5 minutes to complete. Thank you!