Analyzing a financial ML model deployed on Cloud AI Platform with the What-if Tool

1. Overview

In this lab, you will use the What-if Tool to analyze an XGBoost model trained on financial data and deployed on Cloud AI Platform.

What you learn

You'll learn how to:

Train an XGBoost model on a public mortgage dataset in AI Platform Notebooks
Deploy the XGBoost model to AI Platform
Analyze the model using the What-if Tool

The total cost to run this lab on Google Cloud is about $1.

2. A quick XGBoost primer

XGBoost is a machine learning framework that uses decision trees and gradient boosting to build predictive models. It works by ensembling multiple decision trees together based on the score associated with different leaf nodes in a tree.

The diagram below is a visualization of a simple decision tree model that evaluates whether a sports game should be played based on the weather forecast:

Why are we using XGBoost for this model? While traditional neural networks have been shown to perform best on unstructured data like images and text, decision trees often perform extremely well on structured data like the mortgage dataset we'll be using in this codelab.

3. Setup your environment

You'll need a Google Cloud Platform project with billing enabled to run this codelab. To create a project, follow the instructions here.

Step 1: Enable the Cloud AI Platform Models API

Navigate to the AI Platform Models section of your Cloud Console and click Enable if it isn't already enabled.

Step 2: Enable the Compute Engine API

Navigate to Compute Engine and select Enable if it isn't already enabled. You'll need this to create your notebook instance.

Step 3: Create an AI Platform Notebooks instance

Navigate to AI Platform Notebooks section of your Cloud Console and click New Instance. Then select the latest TF Enterprise 2.x instance type without GPUs:

Use the default options and then click Create. Once the instance has been created, select Open JupyterLab:

Step 4: Install XGBoost

Once your JupyterLab instance has opened, you'll need to add the XGBoost package.

To do this, select Terminal from the launcher:

Then run the following to install the latest version of XGBoost supported by Cloud AI Platform:

pip3 install xgboost==0.90

After this completes, open a Python 3 Notebook instance from the launcher. You're ready to get started in your notebook!

Step 5: Import Python packages

In the first cell of your notebook, add the following imports and run the cell. You can run it by pressing the right arrow button in the top menu or pressing command-enter:

import pandas as pd
import xgboost as xgb
import numpy as np
import collections
import witwidget

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.utils import shuffle
from witwidget.notebook.visualization import WitWidget, WitConfigBuilder

4. Download and process data

We'll use a mortgage dataset from ffiec.gov to train an XGBoost model. We've done some preprocessing on the original dataset and created a smaller version for you to use to train the model. The model will predict whether or not a particular mortgage application will get approved.

Step 1: Download the pre-processed dataset

We've made a version of the dataset available for you in Google Cloud Storage. You can download it by running the following gsutil command in your Jupyter notebook:

!gsutil cp 'gs://mortgage_dataset_files/mortgage-small.csv' .

Step 2: Read the dataset with Pandas

Before we create our Pandas DataFrame we'll create a dict of each column's data type so that Pandas reads our dataset correctly:

COLUMN_NAMES = collections.OrderedDict({
 'as_of_year': np.int16,
 'agency_code': 'category',
 'loan_type': 'category',
 'property_type': 'category',
 'loan_purpose': 'category',
 'occupancy': np.int8,
 'loan_amt_thousands': np.float64,
 'preapproval': 'category',
 'county_code': np.float64,
 'applicant_income_thousands': np.float64,
 'purchaser_type': 'category',
 'hoepa_status': 'category',
 'lien_status': 'category',
 'population': np.float64,
 'ffiec_median_fam_income': np.float64,
 'tract_to_msa_income_pct': np.float64,
 'num_owner_occupied_units': np.float64,
 'num_1_to_4_family_units': np.float64,
 'approved': np.int8
})

Next we'll create a DataFrame, passing it the data types we specified above. It's important to shuffle our data in case the original dataset is ordered in a specific way. We use an sklearn utility called shuffle to do this, which we imported in the first cell:

data = pd.read_csv(
 'mortgage-small.csv',
 index_col=False,
 dtype=COLUMN_NAMES
)
data = data.dropna()
data = shuffle(data, random_state=2)
data.head()

data.head() lets us preview the first five rows of our dataset in Pandas. You should see something like this after running the cell above:

These are the features we'll be using to train our model. If you scroll all the way to the end, you'll see the last column approved, which is the thing we're predicting. A value of 1 indicates a particular application was approved, and 0 indicates it was denied.

To see the distribution of approved / denied values in the dataset and create a numpy array of the labels, run the following:

# Class labels - 0: denied, 1: approved
print(data['approved'].value_counts())

labels = data['approved'].values
data = data.drop(columns=['approved'])

About 66% of the dataset contains approved applications.

Step 3: Creating dummy column for categorical values

This dataset contains a mix of categorical and numerical values, but XGBoost requires that all features be numerical. Instead of representing categorical values using one-hot encoding, for our XGBoost model we'll take advantage of the Pandas get_dummies function.

get_dummies takes a column with multiple possible values and converts it into a series of columns each with only 0s and 1s. For example, if we had a column "color" with possible values of "blue" and "red," get_dummies would transform this into 2 columns called "color_blue" and "color_red" with all boolean 0 and 1 values.

To create dummy columns for our categorical features, run the following code:

dummy_columns = list(data.dtypes[data.dtypes == 'category'].index)
data = pd.get_dummies(data, columns=dummy_columns)

data.head()

When you preview the data this time, you'll see single features (like purchaser_type pictured below) split into multiple columns:

Step 4: Splitting data into train and test sets

An important concept in machine learning is train / test split. We'll take the majority of our data and use it to train our model, and we'll set aside the rest for testing our model on data it's never seen before.

Add the following code to your notebook, which uses the Scikit Learn function train_test_split to split our data:

x,y = data,labels
x_train,x_test,y_train,y_test = train_test_split(x,y)

Now you're ready to build and train your model!

5. Build, train, and evaluate an XGBoost model

Step 1: Define and train the XGBoost model

Creating a model in XGBoost is simple. We'll use the XGBClassifier class to create the model, and just need to pass the right objective parameter for our specific classification task. In this case we use reg:logistic since we've got a binary classification problem and we want the model to output a single value in the range of (0,1): 0 for not approved and 1 for approved.

The following code will create an XGBoost model:

model = xgb.XGBClassifier(
    objective='reg:logistic'
)

You can train the model with one line of code, calling the fit() method and passing it the training data and labels.

model.fit(x_train, y_train)

Step 2: Evaluate the accuracy of your model

We can now use our trained model to generate predictions on our test data with the predict() function.

Then we'll use Scikit Learn's accuracy_score function to calculate the accuracy of our model based on how it performs on our test data. We'll pass it the ground truth values along with the model's predicted values for each example in our test set:

y_pred = model.predict(x_test)
acc = accuracy_score(y_test, y_pred.round())
print(acc, '\n')

You should see accuracy around 87%, but yours will vary slightly since there is always an element of randomness in machine learning.

Step 3: Save your model

In order to deploy the model, run the following code to save it to a local file:

model.save_model('model.bst')

6. Deploy model to Cloud AI Platform

We've got our model working locally, but it would be nice if we could make predictions on it from anywhere (not just this notebook!). In this step we'll deploy it to the cloud.

Step 1: Create a Cloud Storage bucket for our model

Let's first define some environment variables that we'll be using throughout the rest of the codelab. Fill in the values below with the name of your Google Cloud project, the name of the cloud storage bucket you'd like to create (must be globally unique), and the version name for the first version of your model:

# Update these to your own GCP project, model, and version names
GCP_PROJECT = 'your-gcp-project'
MODEL_BUCKET = 'gs://storage_bucket_name'
VERSION_NAME = 'v1'
MODEL_NAME = 'xgb_mortgage'

Now we're ready to create a storage bucket to store our XGBoost model file. We'll point Cloud AI Platform at this file when we deploy.

Run this gsutil command from within your notebook to create a bucket:

!gsutil mb $MODEL_BUCKET

Step 2: Copy the model file to Cloud Storage

Next, we'll copy our XGBoost saved model file to Cloud Storage. Run the following gsutil command:

!gsutil cp ./model.bst $MODEL_BUCKET

Head over to the storage browser in your Cloud Console to confirm the file has been copied:

Step 3: Create and deploy the model

We're almost ready to deploy the model! The following ai-platform gcloud command will create a new model in your project. We'll call this one xgb_mortgage:

!gcloud ai-platform models create $MODEL_NAME --region='global'

Now it's time to deploy the model. We can do that with this gcloud command:

!gcloud ai-platform versions create $VERSION_NAME \
--model=$MODEL_NAME \
--framework='XGBOOST' \
--runtime-version=2.1 \
--origin=$MODEL_BUCKET \
--python-version=3.7 \
--project=$GCP_PROJECT \
--region='global'

While this is running, check the models section of your AI Platform console. You should see your new version deploying there:

When the deploy completes successfully you'll see a green check mark where the loading spinner is. The deploy should take 2-3 minutes.

Step 4: Test the deployed model

To make sure your deployed model is working, test it out using gcloud to make a prediction. First, save a JSON file with the first example from our test set:

%%writefile predictions.json
[2016.0, 1.0, 346.0, 27.0, 211.0, 4530.0, 86700.0, 132.13, 1289.0, 1408.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0]

Test your model by running this code:

prediction = !gcloud ai-platform predict --model=xgb_mortgage --region='global' --json-instances=predictions.json --version=$VERSION_NAME --verbosity=none

print(prediction)

You should see your model's prediction in the output. This particular example was approved, so you should see a value close to 1.

7. Use the What-if Tool to interpret your model

Step 1: Create the What-if Tool visualization

To connect the What-if Tool to your AI Platform models, you need to pass it a subset of your test examples along with the ground truth values for those examples. Let's create a Numpy array of 500 of our test examples along with their ground truth labels:

num_wit_examples = 500
test_examples = np.hstack((x_test[:num_wit_examples].values,y_test[:num_wit_examples].reshape(-1,1)))

Instantiating the What-if Tool is as simple as creating a WitConfigBuilder object and passing it the AI Platform model we'd like to analyze.

We use the optional adjust_prediction parameter here because the What-if Tool expects a list of scores for each class in our model (in this case 2). Since our model only returns a single value from 0 to 1, we transform it to the correct format in this function:

def adjust_prediction(pred):
  return [1 - pred, pred]

config_builder = (WitConfigBuilder(test_examples.tolist(), data.columns.tolist() + ['mortgage_status'])
  .set_ai_platform_model(GCP_PROJECT, MODEL_NAME, VERSION_NAME, adjust_prediction=adjust_prediction)
  .set_target_feature('mortgage_status')
  .set_label_vocab(['denied', 'approved']))
WitWidget(config_builder, height=800)

Note that it'll take a minute to load the visualization. When it loads, you should see the following:

The y-axis shows us the model's prediction, with 1 being a high confidence approved prediction, and 0 being a high confidence denied prediction. The x-axis is just the spread of all loaded data points.

Step 2: Explore individual data points

The default view on the What-if Tool is the Datapoint editor tab. Here you can click on any individual data point to see it's features, change feature values, and see how that change impacts the model's prediction on an individual data point.

In the example below we chose a data point close to the .5 threshold. The mortgage application associated with this particular data point originated from the CFPB. We changed that feature to 0 and also changed the value of agency_code_Department of Housing and Urban Development (HUD) to 1 to see what would happen to the model's prediction if this loan instead originated from HUD:

As we can see in the bottom left section of the What-if Tool, changing this feature significantly decreased the model's approved prediction by 32%. This could indicate that the agency a loan originated from has a strong impact on the model's output, but we'll need to do more analysis to be sure.

In the bottom left part of the UI, we can also see the ground truth value for each data point and compare it to the model's prediction:

Step 3: Counterfactual analysis

Next, click on any datapoint and move the Show nearest counterfactual datapoint slider to the right:

Selecting this will show you the data point that has the most similar feature values to the original one you selected, but the opposite prediction. You can then scroll through the feature values to see where the two data points differed (the differences are highlighted in green and bold).

Step 4: Look at partial dependence plots

To see how each feature affects the model's predictions overall, check the Partial dependence plots box and make sure Global partial dependence plots is selected:

Here we can see that loans originating from HUD have a slightly higher likelihood of being denied. The graph is this shape because agency code is a boolean feature, so values can only be exactly 0 or 1.

applicant_income_thousands is a numerical feature, and in the partial dependence plot we can see that higher income slightly increases the likelihood of an application being approved, but only up to around $200k. After $200k, this feature doesn't impact the model's prediction.

Step 5: Explore overall performance and fairness

Next, go to the Performance & Fairness tab. This shows overall performance statistics on the model's results on the provided dataset, including confusion matrices, PR curves, and ROC curves.

Select mortgage_status as the Ground Truth Feature to see a confusion matrix:

This confusion matrix shows our model's correct and incorrect predictions as a percentage of the total. If you add up the Actual Yes / Predicted Yes and Actual No / Predicted No squares, it should add up the same accuracy as your model (around 87%).

You can also experiment with the threshold slider, raising and lowering the positive classification score the model needs to return before it decides to predicted approved for the loan, and see how that changes accuracy, false positives, and false negatives. In this case, accuracy is highest around a threshold of .55.

Next, on the left Slice by dropdown, select loan_purpose_Home_purchase:

You'll now see performance on the two subsets of your data: the "0" slice shows when the loan is not for a home purchase, and the "1" slice is for when the loan is for a home purchase. Check out the accuracy, false postive, and false negative rate between the two slices to look for differences in performance.

If you expand the rows to look at the confusion matrices, you can see that the model predicts "approved" for ~70% loan applications for home purchases and only 46% of loans that aren't for home purchases (exact percentages will vary on your model):

If you select Demographic parity from the radio buttons on the left, the two thresholds will be adjusted so that the model predicts approved for a similar percentage of applicants in both slices. What does this do to the accuracy, false positives and false negatives for each slice?

Step 6: Explore feature distribution

Finally, navigate to the Features tab in the What-if Tool. This shows you the distribution of values for each feature in your dataset:

You can use this tab to make sure your dataset is balanced. For example, it looks like very few loans in the dataset originated from the Farm Service Agency. To improve model accuracy, we might consider adding more loans from that agency if the data is available.

We've described just a few What-if Tool exploration ideas here. Feel free to keep playing around with the tool, there are plenty more areas to explore!

8. Cleanup

If you'd like to continue using this notebook, it is recommended that you turn it off when not in use. From the Notebooks UI in your Cloud Console, select the notebook and then select Stop:

If you'd like to delete all resources you've created in this lab, simply delete the notebook instance instead of stopping it.

Using the Navigation menu in your Cloud Console, browse to Storage and delete both buckets you created to store your model assets.