Kubeflow is a Machine Learning toolkit for Kubernetes. The project is dedicated to making deployments of Machine Learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.

What does a Kubeflow deployment look like?

A Kubeflow deployment is:

It is a means of organizing loosely-coupled microservices as a single unit and deploying them to a variety of locations, whether that's a laptop or the cloud.

This codelab will walk you through creating your own Kubeflow deployment, and running a KubeFlow Pipelines workflow for model training and serving -- both from the Pipelines UI, and from a Jupyter Notebook.

What you'll build

In this codelab, you will build a web app that summarizes GitHub issues using Kubeflow Pipelines to train and serve a model. It is based on the walkthrough provided in the Kubeflow Examples repo. Upon completion, your infrastructure will contain:

What you'll learn

The pipeline you will build trains a Tensor2Tensor model on GitHub issue data, learning to predict issue titles from issue bodies. It then exports the trained model and deploys the exported model using Tensorflow Serving. The final step in the pipeline launches a web app, which interacts with the TF-Serving instance in order to get model predictions.

What you'll need

This is an advanced codelab focused on Kubeflow. For more background and an introduction to the platform, see the Introduction to Kubeflow documentation. Non-relevant concepts and code blocks are glossed over and provided for you to simply copy and paste.

Cloud Shell

Visit the GCP Console in the browser and log in with your project credentials:

Open the GCP Console

Then click the "Activate Cloud Shell" icon in the top right of the console to start up a Cloud Shell.

Set your GitHub token

This codelab calls the GitHub API to retrieve publicly available data. To prevent rate-limiting, especially at events where a large number of anonymized requests are sent to the GitHub APIs, set up an access token with no permissions. This is simply to authorize you as an individual rather than anonymous user.

  1. Navigate to https://github.com/settings/tokens and generate a new token with no scopes.
  2. Save it somewhere safe. If you lose it, you will need to delete and create a new one.
  3. Set the GITHUB_TOKEN environment variable:
export GITHUB_TOKEN=<token>

Install pyyaml

Ensure that pyyaml is installed by running:

pip install -U --user pyyaml

Install ksonnet

Set the correct version

In Cloud Shell set these environment variables:

export KS_VER=0.13.1
export KS_BIN=ks_${KS_VER}_linux_amd64

Install ksonnet

Download and unpack the appropriate binary, then add it to your $PATH:

wget -O /tmp/$KS_BIN.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v${KS_VER}/${KS_BIN}.tar.gz

mkdir -p ${HOME}/bin
tar -xvf /tmp/${KS_BIN}.tar.gz -C ${HOME}/bin

export PATH=$PATH:${HOME}/bin/${KS_BIN}

To familiarize yourself with ksonnet concepts, see this diagram.

Install kfctl

Download and unpack kfctl, the Kubeflow command-line tool, then add it to your $PATH:

export KUBEFLOW_TAG=0.5.1
wget -P /tmp https://github.com/kubeflow/kubeflow/releases/download/v${KUBEFLOW_TAG}/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz
tar -xvf /tmp/kfctl_v${KUBEFLOW_TAG}_linux.tar.gz -C ${HOME}/bin
export PATH=$PATH:${HOME}/bin

kfctl allows you to install Kubeflow on an existing cluster or create one from scratch.

Set your GCP project ID and cluster name

To find your project ID, visit the GCP Console's Home panel. If the screen is empty, click on Yes at the prompt to create a dashboard.

In the Cloud Shell terminal, run these commands to set the cluster name and project ID. We'll indicate which zone to use at the workshop.

export DEPLOYMENT_NAME=kubeflow-codelab
export PROJECT_ID=<your_project_id>
export ZONE=<your-zone>
gcloud config set project ${PROJECT_ID}
gcloud config set compute/zone ${ZONE}

Create a storage bucket

Create a Cloud Storage bucket for storing pipeline files. Fill in a new, unique bucket name and issue the "mb" (make bucket) command:

export BUCKET_NAME=kubeflow-${PROJECT_ID}
gsutil mb gs://${BUCKET_NAME}

Alternatively, you can create a bucket via the GCP Console.

Install the Kubeflow Pipelines SDK

Run the following command to install the Kubeflow Pipelines SDK:

pip3 install -U kfp

Pin useful dashboards

In the GCP console, pin the Kubernetes Engine and Storage dashboards for easier access.

Create a cluster

Create a managed Kubernetes cluster on Kubernetes Engine by visiting the Kubeflow Click-to-Deploy site in your browser and signing in with your GCP account:

Open Kubeflow Click-to-Deploy

Fill in the following values in the resulting form:

Generate the cluster by clicking Create Deployment. This will create a deployment object with everything necessary for installing Kubeflow, e.g. GKE resource requirements, service accounts, etc.

Set up kubectl to use your new cluster's credentials

When the cluster has been instantiated, connect your environment to the Kubernetes Engine cluster by running the following command in your Cloud Shell:

gcloud container clusters get-credentials ${DEPLOYMENT_NAME} \
  --project ${PROJECT_ID} \
  --zone ${ZONE}

This configures your kubectl context so that you can interact with your cluster. To verify the connection, run the following command:

kubectl get nodes -o wide

You should see two nodes listed, both with a status of "Ready", and other information about node age, version, external IP address, OS image, kernel version, and container runtime.

Setup your local context to view the kubeflow namespace:

kubectl config set-context $(kubectl config current-context) --namespace=kubeflow

Setup node autoprovisioning (NAP)

In Cloud Shell, run the following command to enable node auto-provisioning (NAP) in your GKE cluster. This will activate the automatic creation of an accelerator node pool. This means that when you create a workload that requires GPUs, the appropriate resources will automatically be added to your cluster:

gcloud beta container clusters update ${DEPLOYMENT_NAME} \
  --project ${PROJECT_ID} \
  --zone ${ZONE} \
  --enable-autoprovisioning \
  --max-cpu 48 \
  --max-memory 224 \
  --max-accelerator type=nvidia-tesla-k80,count=4 \
  --verbosity error

View the Kubeflow central dashboard

Once the cluster setup is complete, port-forward to view the Kubeflow central dashboard. In Cloud Shell, open a new tab by clicking on the plus sign:

In the new tab, run the following command to port-forward to the ambassador service, a reverse http proxy that provides an ingestion point into the cluster:

kubectl port-forward svc/ambassador 8080:80

In Cloud Shell, click the Web Preview icon and select Preview on port 8080:

This will launch the Kubeflow central dashboard in a new browser tab.

Create a Kubernetes cluster

To create a managed Kubernetes cluster on Kubernetes Engine using kfctl, we will walk through the following steps:

To create an application directory with local config files and enable APIs for your project, run these commands:

cd ${HOME}
export KUBEFLOW_USERNAME=codelab-user
export KUBEFLOW_PASSWORD=password
kfctl init ${DEPLOYMENT_NAME} --platform gcp --project ${PROJECT_ID} --use_basic_auth -V

This creates the file kubeflow-codelab/app.yaml, which defines a full, default Kubeflow installation.
To generate the files used to create the deployment, including a cluster and service accounts, run these commands:

cd ${DEPLOYMENT_NAME}
kfctl generate platform -V --zone ${ZONE}

This generates several new directories, each with customized files. To use the generated files to create all the objects in your project, run this command:

kfctl apply platform -V

When kfctl has exited with the message, "KUBECONFIG context kubeflow-codelab is created," verify the connection with this command:

kubectl cluster-info

Verify that this IP address matches the IP address corresponding to the Endpoint in your Google Cloud Platform Console by comparing the Kubernetes master IP is the same as the Master_IP address in the previous step.

Install Kubeflow

To add a default Kubeflow installation to the cluster you just built, first generate manifest files:

cd ${HOME}/${DEPLOYMENT_NAME}
kfctl generate k8s -V --zone ${ZONE}

Apply the generated manifests to the cluster:

kfctl apply k8s -V

When kfctl has exited with the message, "All components apply succeeded," continue below.

Add Seldon to the default installation

To add Seldon, ksonnet can help. ksonnet is a templating framework which allows you to utilize common object definitions and customize them to your environment. You'll begin by referencing Kubeflow templates and apply environment-specific parameters. Once manifests have been generated specifically for your cluster, they can be applied like any other Kubernetes object using `kubectl`.

Run the following commands to install Seldon using ksonnet:

cd ${HOME}/${DEPLOYMENT_NAME}/ks_app
ks generate seldon seldon
ks apply default -c seldon

Congratulations! Your cluster now contains a Kubeflow installation with Seldon. You can view the components by running:

kubectl get pods

You should see an output similar to this:

Setup node autoprovisioning (NAP)

In Cloud Shell, run the following command to enable node auto-provisioning (NAP) in your GKE cluster. This will activate the automatic creation of an accelerator node pool. This means that when you create a workload that requires GPUs, the appropriate resources will automatically be added to your cluster:

gcloud beta container clusters update ${DEPLOYMENT_NAME} \
  --project ${PROJECT_ID} \
  --zone ${ZONE} \
  --enable-autoprovisioning \
  --max-cpu 48 \
  --max-memory 224 \
  --max-accelerator type=nvidia-tesla-k80,count=4 \
  --verbosity error

View the Kubeflow Central Dashboard

To view the UI, open a new tab in Cloud Shell and run this command to open a port to the ambassador service:

kubectl port-forward svc/ambassador 8080:80

In Cloud Shell, click on the Web Preview button and select "Preview on port 8080."

This will open a new browser tab that shows you a login, where you can enter the username and password you provided when you created the cluster ("codelab-user", "password"). This will bring you to the Kubeflow central dashboard.

Pipelines dashboard

From the Kubeflow central dashboard, click the Pipeline Dashboard link to navigate to the Kubeflow Pipelines web UI.

Pipeline description

The pipeline you will run has three steps:

  1. It starts by training a Tensor2Tensor model using preprocessed data. (More accurately, this step starts from an existing model checkpoint, then trains for a few more hundred steps-- it would take too long to fully train it). When it finishes, it exports the model in a form suitable for serving by TensorFlow serving.
  2. The next step in the pipeline deploys a TensorFlow-serving instance using that model.
  3. The last step launches a web app for interacting with the served model to retrieve predictions.

Download and compile the pipeline

To download the script containing the pipeline definition, execute this command from Cloud Shell:

cd ${HOME}
curl -O https://raw.githubusercontent.com/kubeflow/examples/master/github_issue_summarization/pipelines/example_pipelines/gh_summ.py

Compile the pipeline definition file by running it:

python3 gh_summ.py

You will see the file gh_summ.py.tar.gz appear as a result.

Upload the compiled pipeline

In the Kubeflow Pipelines web UI, click on Upload pipeline, and select Import by URL. Copy, then paste in the following URL, which points to the same pipeline that you just compiled. (It's a few extra steps to upload a file from Cloud Shell, so we're taking a shortcut).

https://github.com/kubeflow/examples/raw/master/github_issue_summarization/pipelines/example_pipelines/gh_summ.py.tar.gz

Give the pipeline a name (e.g. gh_summ).

Run the pipeline

Click on the uploaded pipeline in the list —this lets you view the pipeline's static graph— then click on Create an experiment to create a new Experiment using the pipeline.

Give the Experiment a name (e.g. the same name as the pipeline, gh_summ), then click

Next to create it.

An Experiment is composed of multiple Runs. In Cloud Shell, execute these commands to gather the values to enter into the UI as parameters for the first Run:

gcloud config get-value project
echo ${GITHUB_TOKEN}
echo "gs://${BUCKET_NAME}/codelab"

Give the Run a name (e.g. gh_summ-1) and fill in three parameter fields:

After filling in the fields, click Start. View the run and click on the first step, train to view its progress. Notice that the step does not execute immediately:

This is because there are no GPU nodes available in the cluster. Not to worry, since GKE auto-provisioning will spin up a new node and add it to the cluster. In the GCP console, you can view the cluster size increase from 2 to 3.

Once the node is available, the pipeline begins executing the first step.

Once the image has been pulled and the container is running, you can click on an individual step to get more information about it, including viewing its pod logs.

View the pipeline definition

While the pipeline is running, take a closer look at how it is put together and what it is doing.

View TensorBoard

The first step in the pipeline performs training and generates a model. Once this step is complete, view Artifacts and click the blue Start TensorBoard button, then once it's ready, click Open Tensorboard.

View the web app and make some predictions

The last step in the pipeline deploys a web app, which provides a UI for querying the trained model - served via TF Serving - to make predictions. After the pipeline completes, connect to the web app by visiting the Kubeflow central dashboard page, and appending /webapp/ at the end of the URL. (The trailing slash is required, e.g. https://8080-dot-7377735-dot-devshell.appspot.com/webapp/).

You should see something like this:

Click the Populate Random Issue button to retrieve a block of text. Click on Generate TItle to call the trained model and display a prediction.

If you have trouble setting up a GPU node pool or running the training pipeline

If you have any trouble running the training pipeline, or if you had any issues setting up a GPU node pool, try this shorter pipeline. It uses an already-exported TensorFlow model, skips the training step, and takes only a minute or so to run. Download the Python pipeline definition here:

https://raw.githubusercontent.com/kubeflow/examples/master/github_issue_summarization/pipelines/example_pipelines/gh_summ_serve.py

or the compiled version of the pipeline here:

https://github.com/kubeflow/examples/blob/master/github_issue_summarization/pipelines/example_pipelines/gh_summ_serve.py.tar.gz?raw=true

Create a JupyterHub instance

You can also interactively define and run Kubeflow Pipelines from a Jupyter notebook. To create a notebook, navigate to the Notebooks link on the central Kubeflow dashboard.

The first time you visit JupyterHub, you will need to create a new notebook server by clicking on the New Server button.


Give your server a name and leave all settings on defaults as below. Then click the Spawn button, which generates a new pod in your cluster.

When the notebook server is available, click Connect to connect.

Download a notebook

Once JupyterHub becomes available, open a terminal.

In the Terminal window, run this command to use the latest version of Kubeflow Pipelines:

pip3 install -U kfp

This command downloads the notebook that will be used for the remainder of the lab:

curl -O https://raw.githubusercontent.com/kubeflow/examples/master/github_issue_summarization/pipelines/example_pipelines/pipelines-kubecon.ipynb

Return to the JupyterHub home screen and open the notebook you just downloaded.

Execute the notebook

In the Setup section, find the second command cell (starts with # Define some pipeline input variables.). Fill in your own values for the environment variables WORKING_DIR, PROJECT_NAME, and GITHUB_TOKEN, then execute the notebook one step at a time.

Follow the instructions in the notebook for the remainder of the lab.

Destroy the cluster

To remove all resources created by Click-to-Deploy, navigate to Deployment Manager in the GCP Console and delete the $DEPLOYMENT_NAME deployment.

Remove the GitHub token

Navigate to https://github.com/settings/tokens and remove the generated token.