Kubeflow is a machine learning toolkit for Kubernetes. The project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable, and scalable. The goal is to provide a straightforward way to deploy best-of-breed open-source systems for ML to diverse infrastructures.

What does a Kubeflow deployment look like?

A Kubeflow deployment is:

It is a means of organizing loosely-coupled microservices as a single unit and deploying them to a variety of locations, whether that's a laptop or the cloud. This codelab will walk you through creating your own Kubeflow deployment.

What you'll build

In this codelab, you're going to build a web app that summarizes GitHub issues using a trained model. It is based on the walkthrough provided in the Kubeflow Examples repo. Upon completion, your infrastructure will contain:

What you'll learn

What you'll need

This is an advanced codelab focused on Kubeflow. For more background and an introduction to the platform, see the Introduction to Kubeflow on Kubernetes codelab. Non-relevant concepts and code blocks are glossed over and provided for you to simply copy and paste.

Choose one of the following environments for running this codelab:

Cloud Shell

This link clones the Kubeflow Examples repo and places it in the ~/examples directory.

Download in Google Cloud Shell

Once you have the project files, checkout the v0.2 branch, which contains the resources you will need:

cd ${HOME}/examples/github_issue_summarization
git checkout v0.2

Enable Boost Mode

In the Cloud Shell window, click on the Settings dropdown at the far right. Select Enable Boost Mode. This will provision a larger instance for your Cloud Shell session, resulting in speedier Docker builds. If you can't find this menu, ensure the main Navigation Menu is hidden by clicking the three lines at the top left of the screen, next to the Google Cloud Platform logo.

Local Linux or MacOS

This link downloads an archive of the Kubeflow examples repo. Unpacking the downloaded zip file will produce a root folder (examples-0.2) containing all of the official Kubeflow examples.

Download locally

Unzip and move the folder for consistency with the absolute paths in this codelab:

unzip v0.2.zip
mv examples-0.2 ${HOME}/examples

Set your GitHub token

This codelab involves the use of many different files obtained from public repos on GitHub. To prevent rate-limiting, especially at events where a large number of anonymized requests are sent to the GitHub APIs, setup an access token with no permissions. This is simply to authorize you as an individual rather than anonymous user.

  1. Navigate to https://github.com/settings/tokens and generate a new token with no permissions.
  2. Save it somewhere safe. If you lose it, you will need to delete and create a new one.
  3. Set the GITHUB_TOKEN environment variable:
export \
  GITHUB_TOKEN=<token>

Installing ksonnet

Set the correct version

To install on Cloud Shell or a local Linux machine, set this environment variable:

export KS_VER=ks_0.11.0_linux_amd64

To install on a Mac, set this environment variable:

export KS_VER=ks_0.11.0_darwin_amd64

Install ksonnet

Download and unpack the appropriate binary, then add it to your $PATH:

wget -O /tmp/$KS_VER.tar.gz https://github.com/ksonnet/ksonnet/releases/download/v0.11.0/$KS_VER.tar.gz

mkdir -p ${HOME}/bin
tar -xvf /tmp/$KS_VER.tar.gz -C ${HOME}/bin

export PATH=$PATH:${HOME}/bin/$KS_VER

To familiarize yourself with ksonnet concepts, see this diagram.

Set your GCP project ID

Set the project ID:

export PROJECT_ID=<your_project_id>
gcloud config set project ${PROJECT_ID}

Authorize Docker

Allow Docker access to your project's Container Registry:

gcloud auth configure-docker

Create a service account

Create a service account with read/write access to storage buckets:

export SERVICE_ACCOUNT=github-issue-summarization
export SERVICE_ACCOUNT_EMAIL=${SERVICE_ACCOUNT}@${PROJECT_ID}.iam.gserviceaccount.com
gcloud iam service-accounts create ${SERVICE_ACCOUNT} \
  --display-name "GCP Service Account for use with kubeflow examples"

gcloud projects add-iam-policy-binding ${PROJECT_ID} --member \
  serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
  --role=roles/storage.admin

Generate a credentials file for upload to the cluster:

export KEY_FILE=${HOME}/secrets/${SERVICE_ACCOUNT_EMAIL}.json
gcloud iam service-accounts keys create ${KEY_FILE} \
  --iam-account ${SERVICE_ACCOUNT_EMAIL}

Create a storage bucket

Create a Cloud Storage bucket for storing your trained model. Fill in a new, unique bucket name and issue the "mb" (make bucket) command:

export BUCKET_NAME=kubeflow-${PROJECT_ID}
gsutil mb -c regional -l us-central1 gs://${BUCKET_NAME}

Create a cluster

Create a managed Kubernetes cluster on Kubernetes Engine:

gcloud container clusters create kubeflow-codelab \
  --zone us-central1-a  \
  --machine-type n1-standard-4 \
  --scopes=compute-rw,storage-rw \
  --verbosity=error

Connect your local environment to the Kubernetes Engine cluster:

gcloud container clusters get-credentials kubeflow-codelab \
  --zone us-central1-a

This configures your kubectl context so that you can interact with your cluster. To verify the connection, run the following command:

kubectl cluster-info

You should see an IP address corresponding to the Endpoint in your Google Cloud Platform Console.

To enable the installation of Kubeflow and Seldon components, create two ClusterRoleBindings, which allows the creation of objects:

kubectl create clusterrolebinding default-admin \
  --clusterrole=cluster-admin \
  --user=$(gcloud config get-value account)

kubectl create clusterrolebinding seldon-admin \
  --clusterrole=cluster-admin \
  --serviceaccount=default:default

Upload service account credentials:

kubectl create secret generic user-gcp-sa \
  --from-file=user-gcp-sa.json="${KEY_FILE}"

ksonnet is a templating framework, which allows us to utilize common object definitions and customize them to our environment. We begin by referencing Kubeflow templates and apply environment-specific parameters. Once manifests have been generated specifically for our cluster, they can be applied like any other Kubernetes object using kubectl.

Initialize a ksonnet app

Inside the github_issue_summarization directory, issue the following commands to create an new ksonnet app directory, fill it with boilerplate code, and retrieve component files:

cd ${HOME}/examples/github_issue_summarization
ks init ksonnet-kubeflow
cd ksonnet-kubeflow
cp ../ks-kubeflow/components/kubeflow-core.jsonnet components
cp ../ks-kubeflow/components/params.libsonnet components
cp ../ks-kubeflow/components/seldon.jsonnet components
cp ../ks-kubeflow/components/tfjob-v1alpha2.* components
cp ../ks-kubeflow/components/ui.* components

Install packages and generate core components

Register the Kubeflow template repository:

export KUBEFLOW_VERSION=v0.2.0-rc.1
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${KUBEFLOW_VERSION}/kubeflow

Install Kubeflow core, TensorFlow, and Seldon components:

ks pkg install kubeflow/core@${KUBEFLOW_VERSION}
ks pkg install kubeflow/tf-serving@${KUBEFLOW_VERSION}
ks pkg install kubeflow/tf-job@${KUBEFLOW_VERSION}
ks pkg install kubeflow/seldon@${KUBEFLOW_VERSION}

Create the environment

Define a ksonnet environment that references our specific cluster:

ks env add gke
ks param set --env gke kubeflow-core \
  cloud "gke"
ks param set --env gke kubeflow-core \
  tfAmbassadorServiceType "LoadBalancer"

Apply Kubeflow with Seldon to the cluster

Apply the generated ksonnet Kubernetes manifests to the cluster to create the following default Kubeflow and Seldon components:

ks apply gke -c kubeflow-core -c seldon

Congratulations! Your cluster now contains a Kubeflow installation with Seldon. You can view the components by running:

kubectl get pods

You should see output similar to this:

In this section, you will create a component that trains a model.

Set component parameters

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
ks param set --env gke tfjob-v1alpha2 image "gcr.io/kubeflow-examples/tf-job-issue-summarization:v20180629-v0.1-2-g98ed4b4-dirty-182929"
ks param set --env gke tfjob-v1alpha2 output_model_gcs_bucket "${BUCKET_NAME}"

Launch training

Apply the component manifests to the cluster:

ks apply gke -c tfjob-v1alpha2

View the running job

View the resulting pods:

kubectl get pods

Your cluster state should look similar to this:

It can take a few minutes to pull the image and start the container. Once the "tfjob-issue-summarization-master" pod is running, tail the logs:

kubectl logs -f \
  $(kubectl get pods -ltf_job_key=tfjob-issue-summarization -o=jsonpath='{.items[0].metadata.name}')

Inside the pod, you will see the download of source data (github-issues.zip) before training begins. Continue tailing the logs until the pod exits on its own and you find yourself back at the command prompt.

To verify that training completed successfully, check to make sure all three model files were uploaded to your Cloud Storage bucket:

gsutil ls gs://${BUCKET_NAME}/github-issue-summarization-data

In this section, you will create a component that serves a trained model.

Set serving image path

export SERVING_IMAGE=gcr.io/kubeflow-examples/issue-summarization-model:v20180718-g98ed4b4-qwiklab

Create the serving component

Using a Seldon ksonnet template, generate the serving component. Navigate back to the ksonnet app directory for Kubeflow, and issue the following command:

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
ks generate seldon-serve-simple issue-summarization-model \
  --name=issue-summarization \
  --image=${SERVING_IMAGE} \
  --replicas=2

Launch serving

Apply the component manifests to the cluster:

ks apply gke -c issue-summarization-model

View the running pods

You will see several new pods appear:

kubectl get pods

Your cluster state should look similar to this:

Once the pod is running, tail the logs for one of the serving containers to verify that it is running on port 9000:

kubectl logs -f \
  $(kubectl get pods \
    -lseldon-app=issue-summarization \
    -o=jsonpath='{.items[0].metadata.name}') \
  issue-summarization

In this section, you will create a component that provides browser access to the serving component.

Set parameter values

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
ks param set --env gke ui image "gcr.io/kubeflow-examples/issue-summarization-ui:v20180629-v0.1-2-g98ed4b4-dirty-182929"
ks param set --env gke ui githubToken ${GITHUB_TOKEN}
ks param set --env gke ui modelUrl "http://issue-summarization.default.svc.cluster.local:8000/api/v0.1/predictions"
ks param set --env gke ui serviceType "LoadBalancer"

(Optional) Create the UI image

The UI component is now configured to use a pre-built container image which we've made available in Container Registry (gcr.io). If you would prefer to generate your own image instead, continue with this step.

Switch to the docker directory and build the image for the UI:

cd ${HOME}/examples/github_issue_summarization/docker
docker build -t gcr.io/${PROJECT_ID}/issue-summarization-ui:latest .

After the image has been successfully built, store it in Container Registry:

docker push gcr.io/${PROJECT_ID}/issue-summarization-ui:latest

Update the component parameter with a link that points to the custom image:

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
ks param set --env gke ui image gcr.io/${PROJECT_ID}/issue-summarization-ui:latest

Launch the UI

Apply the component manifests to the cluster:

ks apply gke -c ui

You should see an additional pod:

View the UI

To view the UI, get the external IP address:

kubectl get svc issue-summarization-ui

In a browser, navigate to the address listed under EXTERNAL_IP to view the results, e.g. http://35.239.43.138/. You should see something like this:

Click the Populate Random Issue button to fill in the large text box with a random issue summary. Then click the Generate Title button to view the machine generated title produced by your trained model.

View serving container logs

Tail the logs of one of the serving containers to verify that it is receiving a request from the UI and providing a prediction in response:

kubectl logs -f \
  $(kubectl get pods \
    -lseldon-app=issue-summarization \
    -o=jsonpath='{.items[0].metadata.name}') \
  issue-summarization

Press the Generate Title button in the UI a few times to view the POST request. Since there are two serving containers, you might need to try a few times before you see the log entry. Press Ctrl-C to return to the command prompt.

(Optional) Create the training image

In the github_issue_summarization directory, navigate to the folder containing the training code (notebooks). From there, issue a make command that builds the image and stores it in Container Registry. This places it in a location accessible from inside the cluster.

cd ${HOME}/examples/github_issue_summarization/notebooks
make PROJECT=${PROJECT_ID} push

Once the image has been built and stored in Container Registry, update the component parameter with a link that points to the custom image:

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
export TAG=$(gcloud container images list-tags \
  gcr.io/${PROJECT_ID}/tf-job-issue-summarization \
  --limit=1 \
  --format='get(tags)')
ks param set --env gke tfjob-v1alpha2 image "gcr.io/${PROJECT_ID}/tf-job-issue-summarization:${TAG}"

Remove any previously applied components:

ks delete gke -c tfjob-v1alpha2

Continue from Launch Training in the Train a model section.

(Optional) Create the serving image

Retrieve the trained model files that were generated previously:

cd ${HOME}/examples/github_issue_summarization/notebooks
gsutil cp gs://${BUCKET_NAME}/github-issue-summarization-data/* .

Using a Seldon wrapper, generate image build files. This command creates a build directory and image creation script:

docker run -v $(pwd):/my_model seldonio/core-python-wrapper:0.7 \
  /my_model IssueSummarization 0.1 gcr.io \
  --base-image=python:3.6 \
  --image-name=${PROJECT_ID}/issue-summarization-model

Using the files created by the wrapper, generate a serving image and store it in Container Registry:

cd ${HOME}/examples/github_issue_summarization/notebooks/build
./build_image.sh
docker push gcr.io/${PROJECT_ID}/issue-summarization-model:0.1
export SERVING_IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-model:0.1

Update the component parameter to reflect the new image path:

cd ${HOME}/examples/github_issue_summarization/ksonnet-kubeflow
ks param set issue-summarization-model image ${SERVING_IMAGE}

Remove any previously deployed components:

ks delete gke -c issue-summarization-model

Continue from Launch serving in the Serve the trained model section.

Destroy the cluster

Delete the previously created cluster with the following command:

gcloud container clusters delete kubeflow-codelab \
  --zone us-central1-a

Destroy images

These snippets will remove all versions of the training, serving, and UI images that were stored in your project:

export IMAGE=gcr.io/${PROJECT_ID}/tf-job-issue-summarization
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"
done

export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-model
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"
done

export IMAGE=gcr.io/${PROJECT_ID}/issue-summarization-ui
for digest in $(gcloud container images list-tags \
  ${IMAGE} --limit=999999 \
  --format='get(digest)'); do
    gcloud container images delete -q --force-delete-tags "${IMAGE}@${digest}"
done

Destroy the storage bucket

gsutil rm -r gs://${BUCKET_NAME}

Destroy the service account

gcloud iam service-accounts delete ${SERVICE_ACCOUNT_EMAIL}

gcloud projects remove-iam-policy-binding ${PROJECT_ID} --member \
  serviceAccount:${SERVICE_ACCOUNT_EMAIL} \
  --role=roles/storage.admin

rm ${HOME}/secrets/${SERVICE_ACCOUNT_EMAIL}.json

Remove Ksonnet

rm /tmp/${KS_VER}.tar.gz
rm -rf ${HOME}/bin/${KS_VER}

Remove sample code

rm -rf ${HOME}/examples

Remove GitHub token

Navigate to https://github.com/settings/tokens and remove the generated token.