Building Trusted AI Products with the PAIR Guidebook

In this codelab, you'll apply best practices from the new edition of the People + AI Research (PAIR) Guidebook to design a new product with AI, with a focus on human-centered data practices and properly calibrated user trust.

Prerequisites

  • Basic understanding of what AI is.
  • Knowledge of product development workflow.

What you'll learn

In this codelab, you will learn how to use Google Research's People + AI Guidebook to build trustworthy, user-centered AI products.

Specifically, you will:

  • Learn what's new in the PAIR Guidebook's second edition.
  • Go through a series of exercises that highlight opportunities in the AI development process to calibrate user trust, with a focus on data and user-facing explainability.
  • Get an introduction to a broader toolkit of materials and resources available for further exploration.

What you'll need

We're introducing the second edition of the PAIR Guidebook at Google I/O this year! The first edition was released two years ago, and since then it's been used by over a quarter million people across roles (developers, designers, product managers, students, etc.) all around the world. We're now excited to introduce a set of updates to make it even more actionable.

Specifically, in this second edition, we're providing a new way to navigate through the Guidebook and find content by task. We've come up with a list of key questions that you and your team may have when developing a product with a user-centered approach to AI, and that will help you find the content that you need, when you need it:

  1. When and how should I use AI in my product?
  2. How do I responsibly build my dataset?
  3. How do I help users build and calibrate trust in my AI system?
  4. How do I onboard users to new AI features?
  5. How do I explain my AI system to users?
  6. What's the right balance of user control and automation?
  7. How do I support users when something goes wrong?

dd1277d752e60684.png

Once you select a question, you'll get relevant content in smaller, more actionable units.

We've also updated the PAIR Guidebook with new content:

  1. A set of AI design patterns
  2. Case studies
  3. Updated chapters
  4. New exercises and a workshop kit

In this codelab, you'll see some of these design patterns in action in a workflow as you develop a new feature with AI.

Let's get started!

Imagine the following scenario:

You're developing a movie viewing app, and you'd like to provide users with an improved and customized experience, helping them find more movies that they enjoy.

The app's landing page currently includes the following sections:

  • A list of new movies, ordered by release date
  • A catalogue of all movies, organized by genre
  • A search box, where users can search by movie title, cast, etc.

81efa53fac12f607.png

Movie app without personalized recommendations

You'd like to add a new section with movie recommendations for the user, and you think that AI could be a good option to implement this feature. Before diving into any implementation, you'll want to do the following:

  • Review existing workflows: how do users currently interact with the app, and how do you think that their experience could be improved?
  • Determine if AI can add unique value: does your problem map to one that can be solved well with AI, and is AI likely to improve your product's user experience?

Using the PAIR Guidebook's chapter, User Needs + Defining Success, you review the list of use cases where AI is probably a good solution, and find that your users' need falls under the following types of problems:

  • Recommending different content to different users
  • Personalizing improves the user experience
  • Showing dynamic content is more efficient than a predictable interface

Make sure to review the list of cases where AI is probably not a better solution, too.

Now that you can see that an AI-powered solution seems like a good candidate to address this user need, you'll want to evaluate whether it will actually provide a better user experience.

Pattern:

17c84836936a7adc.png

Before you start building with AI, make sure the product or feature that you have in mind requires AI, or would be enhanced by it.

AI is well-suited for applications like:

  • Recommending different content to different users, such as movie suggestions
  • Predicting future events, such as weather events or flight price changes
  • Natural language understanding
  • Image recognition

A rule or heuristic-based solution may be better when:

  • Maintaining predictability is important
  • Users, customers or developers need complete transparency
  • People don't want a task automated

See the User Needs chapter for more on when to use (or not) AI.

Link to full pattern: https://pair.withgoogle.com/guidebook/patterns#determine-if-ai-adds-value

You can add value to the app by highlighting to each user the movies that they specifically are likely to enjoy, providing them with a richer user experience than showing them only the latest or overall top-rated movies. You also suspect that you may be able to save them some time exploring a rapidly growing catalogue of movies with this feature.

Now that you've decided to move forward with an AI-powered solution, you're ready to start planning your next steps.

In order to train a recommendation system to provide users with movie suggestions, you're going to need to put together a dataset that your AI model will learn from.

The first thing that you'll want to do is to match the user needs with data needs.

Using the matching exercise from the PAIR Guidebook's chapter, Data Collection + Evaluation, you determine the following:

  • User: movie viewers (movie app users)
  • User need: find more movies that they like, easier and faster
  • User action: select and watch movies through the app
  • AI system output: which movies to suggest and why (sentiment labels, or tags)
  • AI system learning: patterns of behavior around accepting movie recommendations, viewing entire movies, writing reviews for movies, and giving these movies high ratings
  • Datasets needed: movie viewing data from the app, movie information, and movie ratings and reviews
  • Key features needed in dataset: viewer movie preferences and viewing history, movie information (e.g, genre, cast), movie star ratings, movie reviews
  • Key labels needed in dataset: viewer acceptance or rejection rate of app suggestion, viewer movie completion rate, viewer ratings and reviews, and viewer feedback as to why a suggestion was rejected

Now that you've completed this exercise, you can start to see several potential data sources emerge:

  • User data from the app (specified preferences and viewing history)
  • Movie information from the app (title, year, duration, cast, etc.)
  • Movie reviews and rating information from other sources, such as IMDB and MovieLens

Once you have an idea of the type of data you will need, consider Google's AI Principles and Responsible AI Practices as examples of frameworks to help work through key considerations, such as privacy (e.g., "give opportunities for notice and consent") and fairness (e.g., "conduct iterative user testing to incorporate a diverse set of users' needs in the development cycles.")

And finally, as you prepare your training dataset, make sure to gather data that is realistic and reflects the "noisy" data that is out in the world. For example, make sure to include movie reviews with spelling mistakes, abbreviations, emojis and unusual or unexpected characters, because your app's users will most likely be contributing similarly real and "noisy" reviews in the future, rather than perfectly formatted ones!

Pattern:

4617730ca248f081.png

As you develop your training dataset, don't strive for something perfectly curated. Instead, allow some "noise" to make the data as similar as possible to the real-world data you expect to get from your users. This can help head off errors and poor quality recommendations once you release your model into the real world.

To do this, think about the types of data that you expect to get from your users, and then ensure that data is represented in your training set.

For example, for an image recognition system, consider the data you might get from you users. If it's likely they will not have the time to take high-quality photographs and your model will have to work with blurry smartphone images, include blurry images in your training data.

Link to full pattern: https://pair.withgoogle.com/guidebook/patterns#embrace-noisy-data

With the user needs mapped to an AI problem and dataset needs, you're ready to train the AI to provide recommendations and label movies for your app's users. While we won't cover this part of the process in this codelab, you can learn more about recommendation systems and sentiment analysis in the following resources:

As you design the user experience for your app, you'll want to plan for onboarding users to the new AI-powered feature, and helping them set their expectations appropriately. Users shouldn't implicitly trust your AI system in all circumstances, but rather calibrate their trust correctly.

Setting expectations with users is a deliberate process that will start even before their first interaction with your product. You'll want to provide explanations throughout, and outside of the product experience in a variety of ways:

  • Explain in-the-moment. When appropriate, provide reasons for a given inference, recommendation, suggestion, etc.
  • Provide additional explanations in the product. Leverage other in-product moments, such as onboarding, to explain AI systems.
  • Go beyond the product experience. In-product information may not be sufficient, but you can support it with a variety of additional resources, such as marketing campaigns to raise awareness, and educational materials and literacy campaigns to develop mental models.

Let's take an example: a user has logged on to the app, and selects a recommended movie from the new list that's been added to their landing page. In addition to the usual information about the movie that they can expect to find there, you may want to include an explanation for why they are seeing this specific movie in their recommended list.

Using the PAIR Guidebook's search by question, and selecting "How do I explain my AI system to users?", you find the following pattern: Explain for understanding, not completeness.

Pattern:

a75dc18115d636a9.png

When explaining recommendations from your AI system, focus on sharing the information that users need to make decisions and move forward. Don't attempt to explain everything that's happening in the system.

Often, the rationale behind a particular prediction is unknown or too complex to be summarized in a simple phrase or sentence. Users may also not want to be overwhelmed or distracted by superfluous explanations as they use your product.

The Explainability + Trust chapter offers examples of different approaches for crafting succinct, user-friendly explanations, which include partial explanations, progressive disclosure and model confidence displays.

If you'd like to share longer or more detailed explanations of how the overall system works, do this outside of the active user flow, for example in marketing materials or onboarding content.

Link to full pattern: https://pair.withgoogle.com/guidebook/patterns#explain-for-understanding

Applying best practices outlined in this pattern, you decide to display an explanation that looks like this:

a534f48eeffaaa40.png

Movie recommendation, including rationale for the recommendation

In this example explanation, you're applying the PAIR Guidebook's guidance in Explainability + Trust and using data sources to provide an explanation to the user with the three top voted user reviews.

Furthermore, you've highlighted the specific words that contributed the most to the positive sentiment for this movie, which are factors that matter to the user in their movie selection. You can learn more about exploring a sentiment classifier in this demo for PAIR's Language Interpretability Tool (LIT).

b8fc8bd717f7742f.png

Screenshot of the Language Interpretability Tool (LIT)

LIT can help you inspect Natural Language Processing (NLP) model behavior through a visual, interactive, and extensible tool, which allows you to test hypotheses and validate them over a dataset with relevant metrics and local explanations (e.g., salience maps and prediction visualizations). Product teams can use LIT in the following example use cases:

  • Before deploying a model
  • When testing for fairness
  • To debug individual predictions
  • When comparing a new model to an old one

Another way to set expectations with users can be to display model confidence in recommendations. Rather than stating why or how the AI came to a certain decision, model confidence displays show how certain the AI is in its prediction, and the alternatives it considered. As most models can output n-best classifications and confidence scores, model confidence displays are often a readily-available explanation.

Before adding confidence to the recommended movie pages shown to the user, you'll want to determine whether this confidence is helpful to the user, and if so, what the best way of displaying it may be.

Pattern:

33a1b5ab6fff1c06.png

In some situations, you can help users gauge how much trust to put in the AI output with model confidence displays that explain how certain the AI is in its prediction, and the alternatives considered.

However, in other contexts, confidence displays can be challenging for users to understand.

If you decide to use them, test different types of displays early in the product development process to find what works best for your users.

Link to full pattern: https://pair.withgoogle.com/guidebook/patterns#how-to-show-model-confidence

Reviewing recommended approaches for displaying confidence in the PAIR Guidebook's chapter on Explainability + Trust, you find the following options:

  • N-most likely classifications
  • Numeric confidence level

Given that you're presenting a list of recommended movies to the user on their landing page in the app, you opt for an ordered list, where the n-most likely movies are shown in order.

8b3181335ece9a09.png

Carousel of recommended movies, with the most likely recommendations displayed first

In the last couple of steps, you've learned some best practices for setting expectations and providing explanations to users, as you help them build and calibrate their trust in the product.

Another important question that you'll want to answer is: what does the user experience look like when an error occurs? How users move forward is equally important. Focusing on what your users can do after the system fails empowers them while maintaining the usefulness of your product.

As explained in the PAIR Guidebook's chapter on Errors + Graceful Failure, you'll want to start by defining what an error is for your product, and what type of error it is (user, system, or context).

Take the following examples:

  1. The user gets a recommendation for a movie that they have already seen at the movies. While this recommendation may not be off, it is also not really helpful to the user.
  2. The user gets a recommendation for a movie that they have already seen at the movies, and didn't like. This recommendation may be deemed as an error to the user, based on their preferences in movies.
  3. The user gets a recommendation for a movie in a genre that they don't usually enjoy. The user may find this to be an error.
  4. The user gets a recommendation for a movie that is no longer hosted on the app. This is a clear system error.

In the second and third examples listed above, the AI has provided a recommendation that is not a helpful one for this user. In case such errors occur, you'll want to provide the user with an opportunity to give feedback on the prediction, and you'll want to communicate the time to impact, as recommended in the PAIR Guidebook's chapter on Feedback + Control.

Pattern:

2a01f1bf6b24a241.png

When your AI-enabled system behaves in a way that a user doesn't expect or want, make sure that they have an option to share feedback. And, as much as possible, use that feedback to improve your model.

Feedback in AI systems can take a range of forms, including,

  • Giving a thumbs up or thumbs down on a recommendation
  • Hiding unwanted recommendations
  • Flagging or reporting problematic recommendations
  • More traditional feedback flows, where a user manually reports a problem through a form or other mechanism

Once a user gives feedback, acknowledge that you received it. If possible, let them know how the system will respond to the feedback.

Link to full pattern: https://pair.withgoogle.com/guidebook/patterns#let-users-give-feedback

In the case of your app, this may look like:

e990b5a272085f20.png

Users can give feedback on recommendations...

f81d3f378792640f.png

...and they are notified about what will happen next

Congratulations! You've just been through an example workflow that shows you how to use some of the PAIR Guidebook‘s new resources.

Summary

In this codelab, you learned how to:

  • Translate user needs to an AI problem
  • Build a dataset for the task
  • Onboard users to the new feature
  • Explain the system and set user expectations
  • Give the user a way forward from errors
  • Gather feedback to improve the product

What's next?

You can find all of the resources highlighted in this codelab, and many more, at the following links: