LLM prompt debugging with the Learning Interpretability Tool (LIT) on GCP

1. Overview

This lab provides a detailed walk-through on deploying a LIT application server on Google Cloud Platform (GCP) to interact with Vertex AI Gemini foundation models and self-hosted third-party large language models (LLMs). It also includes guidance on how to use the LIT UI for prompt debugging and model interpretation.

By following this lab, users will learn how to:

  • Configure a LIT server on GCP.
  • Connect the LIT server to Vertex AI Gemini models or other self-hosted LLMs.
  • Utilize the LIT UI to analyze, debug, and interpret prompts for better model performance and insights.

What is LIT?

LIT is a visual, interactive model-understanding tool that supports text, image, and tabular data. It can be run as a standalone server, or inside of notebook environments such as Google Colab, Jupyter, and Google Cloud Vertex AI. LIT is available from PyPI and GitHub.

Originally built to understand classification and regression models, recent updates have added tools for debugging LLM prompts allowing you to explore how user, model, and system content influence generation behavior.

What is Vertex AI and Model Garden?

Vertex AI is a machine learning (ML) platform that lets you train and deploy ML models and AI applications, and customize LLMs for use in your AI-powered applications. Vertex AI combines data engineering, data science, and ML engineering workflows, enabling your teams to collaborate using a common toolset and scale your applications using the benefits of Google Cloud.

Vertex Model Garden is an ML model library that helps you discover, test, customize, and deploy Google proprietary and select third party models and assets.

What you will do

You will use Google Cloud Shell and Cloud Run to deploy a Docker container from LIT's prebuilt image.

Cloud Run is a managed compute platform that lets you run containers directly on top of Google's scalable infrastructure, including on GPUs.

Dataset

The demo uses the LIT prompt debugging sample dataset by default, or you can load your own via the UI.

Before you begin

For this reference guide, you need a Google Cloud project. You can create a new one, or select a project you already created.

2. Launch Google Cloud Console and a Cloud Shell

You will launch a Google Cloud Console and use the Google Cloud Shell in this step.

2-a: Launch a Google Cloud Console

Launch a browser and go to Google Cloud Console.

The Google Cloud Console is a powerful, secure web admin interface that lets you manage your Google Cloud resources quickly. It's a DevOps tool on the go.

2-b: Launch a Google Cloud Shell

Cloud Shell is an online development and operations environment accessible anywhere with your browser. You can manage your resources with its online terminal preloaded with utilities such as the gcloud command-line tool, kubectl, and more. You can also develop, build, debug, and deploy your cloud-based APPs using the online Cloud Shell Editor. Cloud Shell provides a developer ready, online environment with a preinstalled favorite tool set and 5GB of persistent storage space. You will be using the command prompt in the next steps.

Launch a Google Cloud Shell using the icon in the upper right of the menu bar, circled in blue in the following screenshot.

Google Cloud Console Launch

You should see a terminal with a Bash shell at the bottom of the page.

Google Cloud Console

2-c: Set Google Cloud Project

You must set the project id and project region using the gcloud command.

# Set your GCP Project ID.
gcloud config set project your-project-id

# Set your GCP Project Region.
gcloud config set run/region your-project-region

3. Deploy the LIT App Server Docker Image with Cloud Run

3-a: Deploy LIT App to Cloud Run

You first need to set the latest version of LIT-App as the version to be deployed.

# Set latest version as your LIT_SERVICE_TAG.
export LIT_SERVICE_TAG=latest
# List all the public LIT GCP App server docker images.
gcloud container images  list-tags  us-east4-docker.pkg.dev/lit-demos/lit-app/gcp-lit-app

After setting the version tag, you need to name the service.

# Set your lit service name. While 'lit-app-service' is provided as a placeholder, you can customize the service name based on your preferences.
export LIT_SERVICE_NAME=lit-app-service

After that, you can run the following command to deploy the container to Cloud Run.

# Use below cmd to deploy the LIT App to Cloud Run.
gcloud run deploy $LIT_SERVICE_NAME \
--image us-east4-docker.pkg.dev/lit-demos/lit-app/gcp-lit-app:$LIT_SERVICE_TAG \
--port 5432 \
--cpu 8 \
--memory 32Gi \
--no-cpu-throttling \
--no-allow-unauthenticated

LIT also allows you to add the dataset when starting the server. To do that, set the DATASETS variable to include the data you want to load, using the format name:path, for example, data_foo:/bar/data_2024.jsonl. The dataset format should be .jsonl, where each record contains prompt and optional target and source fields. To load multiple datasets, separate them with a comma. If not set, the LIT prompt debugging sample dataset will be loaded.

# Set the dataset.
export DATASETS=[DATASETS]

By setting MAX_EXAMPLES, you are able to set maximum number of examples to load from each evaluation set.

# Set the max examples.
export MAX_EXAMPLES=[MAX_EXAMPLES]

Then, in the deploy commad, you can add

--set-env-vars "DATASETS=$DATASETS" \
--set-env-vars "MAX_EXAMPLES=$MAX_EXAMPLES" \

3-b: View LIT App Service

After creating the LIT App server, you can find the service in the Cloud Run section of Cloud Console.

Select the LIT App service that you just created. Make sure the service name is the same as LIT_SERVICE_NAME.

Google Cloud Console Cloud Run list

You can find the service URL by clicking the service you just deployed.

Google Cloud Find Service URL

Then you should be able to view the LIT UI. If you encounter an error, check the Troubleshooting section.

LIT Demo home

You can check the LOGS section to monitor activity, view error messages, and track the progress of the deployment.

Google Cloud Console Cloud Run Log

You can check the METRICS section to view the metrics of the service.

Google Cloud Console Cloud Run Metrics

3-c: Load Datasets

Click the Configure option in the LIT UI, select the Dataset. Load the dataset by specifying a name and providing the dataset URL. The dataset format should be .jsonl, where each record contains prompt and optional target and source fields.

LIT Load Dataset

4. Prepare Gemini Models in Vertex AI Model Garden

Google's Gemini foundation models are available from the Vertex AI API. LIT provides the VertexAIModelGarden model wrapper to use these models for generation. Simply specify the desired version (e.g., "gemini-1.5-pro-001") via the model name parameter. A key advantage of using these models is that they require no additional effort for deployment. By default, you have immediate access to models such as Gemini 1.0 Pro and Gemini 1.5 Pro on GCP, eliminating the need for extra configuration steps.

4-a: Grant Vertex AI Permissions

To query Gemini in GCP, you need to grant Vertex AI permissions to the service account. Make sure the service account name is Default compute service account. Copy the service account email of the account.

Service Accounts of GCP

Add the service account email as a principal with the Vertex AI User role in your IAM allowlist.

Add Service Accounts Email

4-b: Load Gemini Models

You will be loading Gemini models and adjusting its parameters following steps below.

    1. Click the Configure option in the LIT UI.
    1. Select the gemini option under Select a base model option.
    1. You need to name the model in new_name.
    1. Enter your selected gemini models as the model_name.
    1. Click Load Model.
    1. Click Submit.

LIT Load Gemini Model

5. Deploy Self-Hosted LLMs Model Server on GCP

Self-hosting LLMs with LIT's model server docker image allows you to use LIT's salience and tokenize functions to gain deeper insights into model behavior. The model server image works with KerasNLP or Hugging Face Transformers models, including library-provided and self-hosted weights, e.g., on Google Cloud Storage.

5-a: Configure Models

Each container loads one model, configured using environment variables.

You should specify models to load by setting the MODEL_CONFIG. The format should be name:path, for instance model_foo:model_foo_path. The path can be a URL, a local file path, or the name of a preset for the configured Deep Learning framework (see the following table for more). This server is tested with Gemma, GPT2, Llama, and Mistral on all supported DL_FRAMEWORK values. Other models should work, but adjustments might be needed.

# Set models you want to load. While 'gemma2b is given as a placeholder, you can load your preferred model by following the instructions above.
export MODEL_CONFIG=gemma2b:gemma_2b_en

Additionally, the LIT model server allows configuration of various environment variables using the command below. Please refer to the table for details. Note that each variable must be set individually.

# Customize the variable value as needed.
export [VARIABLE]=[VALUE]

Variable

Values

Description

DL_FRAMEWORK

kerasnlp, transformers

The modeling library used to load the model weights onto the specified runtime. Defaults to kerasnlp.

DL_RUNTIME

torch, tensorflow

The deep learning backend framework that the model runs on. All models loaded by this server will use the same backend, incompatibilities will result in errors. Defaults to torch.

PRECISION

bfloat16, float32

Floating point precision for the LLM models. Defaults to bfloat16.

BATCH_SIZE

Positive Integers

The number of examples to process per batch. Defaults to 1.

SEQUENCE_LENGTH

Positive Integers

The maximum sequence length of the input prompt plus generated text. Defaults to 512.

5-b: Deploy Model Server to Cloud Run

You first need to set the latest version of Model Server as the version to be deployed.

# Set latest as MODEL_VERSION_TAG.
export MODEL_VERSION_TAG=latest
# List all the public LIT GCP model server docker images.
gcloud container images  list-tags  us-east4-docker.pkg.dev/lit-demos/lit-app/gcp-model-server

After setting the version tag, you need to the name your model-server.

# Set your Service name.
export MODEL_SERVICE_NAME='gemma2b-model-server'

After that, you can run the following command to deploy the container to Cloud Run. If you don't set the environment variables, default values will be applied. As most LLMs require expensive computing resources, it is highly recommended to use GPU. If you prefer to run on CPU only (which works fine for small models like GPT2), you can remove the related arguments --gpu 1 --gpu-type nvidia-l4 --max-instances 7.

# Deploy the model service container.
gcloud beta run deploy $MODEL_SERVICE_NAME \
--image us-east4-docker.pkg.dev/lit-demos/lit-app/gcp-model-server:$MODEL_VERSION_TAG \
--port 5432 \
--cpu 8 \
--memory 32Gi \
--no-cpu-throttling \
--gpu 1 \
--gpu-type nvidia-l4  \
--max-instances 7  \
--set-env-vars "MODEL_CONFIG=$MODEL_CONFIG" \
--no-allow-unauthenticated

Besides, you can customize environment variables by adding the following commands. Only include the environment variables that are necessary for your specific needs.

--set-env-vars "DL_FRAMEWORK=$DL_FRAMEWORK" \
--set-env-vars "DL_RUNTIME=$DL_RUNTIME" \
--set-env-vars "PRECISION=$PRECISION" \
--set-env-vars "BATCH_SIZE=$BATCH_SIZE" \
--set-env-vars "SEQUENCE_LENGTH=$SEQUENCE_LENGTH" \

Additional environment variables may be necessary to access certain models. See instructions from Kaggle Hub (used for KerasNLP models) and Hugging Face Hub as appropriate.

5-c: Access Model Server

After creating the model server, the started service could be found in Cloud Run section of your GCP project.

Select the model server that you just created. Make sure the service name is the same as MODEL_SERVICE_NAME.

Google Cloud Console Cloud Run list

You can find the service URL by clicking model service you just deployed.

Google Cloud Find Service URL

You can check the LOGS section to monitor activity, view error messages, and track the progress of the deployment.

Google Cloud Console Cloud Run Log

You can check the METRICS section to view the metrics of the service.

Google Cloud Console Cloud Run Metrics

5-d: Load Self-Hosted Models

If you proxy your LIT server in Step 3 (check Troubleshooting section), you'll need to obtain your GCP identity token by running the following command.

# Find your GCP identity token.
gcloud auth print-identity-token

You will be loading self-hosted models and adjusting its parameters following steps below.

  1. Click the Configure option in the LIT UI.
  2. Select the LLM (self hosted) option under Select a base model option.
  3. You need to name the model in new_name.
  4. Enter your model server url as the base_url.
  5. Input the obtained identity token in the identity_token if you proxy the LIT App server (See Step 3 and Step 7). Otherwise, Leave it empty.
  6. Click Load Model.
  7. Click Submit.

LIT Load LLM Models

6. Interact with LIT on GCP

LIT offers a rich set of features to help you debug and understand model behaviors. You can do something as simple as querying the model, by typing text in a box and seeing model predictions, or inspect the models in-depth with LIT's suite of powerful features, including:

6-a: Query the Model via LIT

LIT automatically query the dataset after model and dataset loading. You can view the response of each model by selecting the response in the columns.

LIT View Response

LIT Response

6-b: Use Sequence Salience Technique

Currently Sequence Salience technique on LIT only supports self-hosted models.

Sequence Salience is a visual tool that helps debug LLM prompts by highlighting which parts of a prompt are most important for a given output. For more information on Sequence Salience, check the full tutorial for more on how to use this feature.

To access salience results, click on any input or output in the prompt or response, and the salience results will be displayed.

LIT Sequence Salience Results

6-c: Manullay Edit Prompt and Target

LIT allows you to manually edit any prompt and target for existing datapoint. By clicking Add, the new input will be added to the dataset.

LIT Mannully Edit

6-d: Compare Prompt Side-By-Side

LIT allows you to compare prompt side-by-side on original and edited examples. You can manually edit an example and view the prediction result and Sequence Salience analysis for both the original and the edited versions simultaneously. You can modify the prompt for each datapoint, and LIT will generate the corresponding response by querying the model.

LIT Compare Datasets

6-e: Compare Multiple Models Side-by-Side

LIT enables side-by-side comparison of models on individual text generation and scoring examples, as well as on aggregated examples for specific metrics. By querying various loaded models, you can easily compare the differences in their responses.

LIT Compare Model Response

6-f: Automatic Counterfactual Generators

You can use automatic counterfactual generators to create alternative inputs, and see how your model behaves on them right away.

LIT Automatically Generate Input

6-g: Evaluate model performance

You can assess model performance using metrics (currently support BLEU and ROUGE scores for text generation) across the entire dataset, or any subsets of filtered or selected examples.

LIT View Model Metrics

7. Troubleshooting

7-a: Potential access issues and solutions

As --no-allow-unauthenticated is applied when deploying to Cloud Run, you may encounter forbidden errors as shown below.

Google Cloud Forbidden Error

There are two approaches to access the LIT App service.

1. Proxy to Local Service

You can proxy the service to local host using the command below.

# Proxy the service to local host.
gcloud run services proxy $LIT_SERVICE_NAME

Then you should be able to access to the LIT server by clicking the proxyed service link.

2. Directly Authenticate Users

You can follow this link to authenticate users, allowing direct access to the LIT App service. This approach can also enable a group of users to access the service. For development involving collaboration with multiple people, this is a more effective option.

7-b: Checks to ensure the Model Server has launched successfully

To ensure the model server has launched successfully, you can directly query the model server by sending a request. The model server provides three endpoints, predict, tokenize, and salience. Make sure you provide both the prompt field and target fields in your request.

# Query the model server predict endpoint.
curl -X POST  http://YOUR_MODEL_SERVER_URL/predict -H "Content-Type: application/json" -d '{"inputs":[{"prompt":"[YOUR PROMPT]", "target":[YOUR TARGET]}]}'

# Query the model server tokenize endpoint.
curl -X POST  http://YOUR_MODEL_SERVER_URL/tokenize -H "Content-Type: application/json" -d '{"inputs":[{"prompt":"[YOUR PROMPT]", "target":[YOUR TARGET]}]}'

# Query the model server salience endpoint.
curl -X POST  http://YOUR_MODEL_SERVER_URL/salience -H "Content-Type: application/json" -d '{"inputs":[{"prompt":"[YOUR PROMPT]", "target":[YOUR TARGET]}]}'

If you encounter an access issue, check out section 7-a above.

8. Congratulations

Well done on completing the codelab! Time to chill!

Clean up

To clean up the lab, delete all the Google Cloud Services created for the lab. Use Google Cloud Shell to run the following commands.

If the Google Cloud Connection is lost because of inactivity, then reset the variables following previous steps.

# Delete the LIT App Service.
gcloud run services delete $LIT_SERVICE_NAME

If you started model server, you also need to delete model server.

# Delete the Model Service.
gcloud run services delete $MODEL_SERVICE_NAME

Further reading

Continue learning the LIT tool features with the below materials:

Contact

For any questions or issues with this codelab, please reach out to us on GitHub.

License

This work is licensed under a Creative Commons Attribution 4.0 Generic License.