Fine-Tuning Large Language Models: How Vertex AI Takes LLMs to the Next Level

Fine-Tuning Large Language Models:
How Vertex AI Takes LLMs to the Next Level

About this codelab

subjectLast updated May 2, 2024
account_circleWritten by Author: Abirami Sukumaran, Editor: Muthu Ganesh

1. Introduction

Why fine-tuning matters

Foundation models are trained for general purposes and sometimes don't perform tasks as well as you'd like them to. This might be because the tasks you want the model to perform are specialized tasks that are difficult to teach a model by using only prompt design. In these cases, you can use model tuning to improve the performance of a model for specific tasks. Model tuning can also help adhere to specific output requirements when instructions aren't sufficient. Large language models (LLMs) can have a vast amount of information and can perform many tasks, but they excel only when provided specialized training. Fine-tuning can train an LLM, allowing you to adapt a pre-trained LLM for your specific needs.

In this codelab, you'll learn how to perform fine-tuning using the supervised tuning approach for an LLM model.

Supervised tuning improves the performance of a model by teaching a new skill. Data that contains hundreds of labeled examples is used to teach the model to mimic a desired behavior or task. We are going to provide a labeled dataset for input text (prompt) and output text (response) to teach the model how to customize the responses for our specific use case.

More information about model customization is available here.

What you'll build

Use case: Generate headlines for news articles

Let's assume that you want to automatically generate headlines for news articles. Using Vertex AI, you can fine-tune an LLM that generates a suitable summarized title in a specific style and customizes the title as per the news channel's guidelines.

In this codelab, you'll perform the following:

  • Use BBC FULLTEXT DATA (made available by BigQuery Public Dataset bigquery-public-data.bbc_news.fulltext).
  • Fine-tune an LLM (text-bison@002) to a new fine-tuned model called "bbc-news-summary-tuned" and compare the result to the response from the base model. A sample JSONL file is available for this codelab in the repository. You can upload the file to your Cloud Storage Bucket and execute the following fine-tuning steps:
  • Prepare your data: Start with a dataset of news articles and their corresponding headlines, like the BBC News dataset used in the example code.
  • Fine-tune a pre-trained model: Choose a base model like "text-bison@002" and fine-tune it with your news data using Vertex AI SDK for Python.
  • Evaluate the results: Compare the performance of your fine-tuned model with the base model to see the improvement in headline generation quality.
  • Deploy and use your model: Make your fine-tuned model available through an API endpoint and start generating headlines for new articles automatically.

2. Before you begin

  1. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
  2. Ensure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
  3. Open Colab Notebook and login to the same account as your current active Google Cloud account.

3. Fine-tune a large language model

This codelab uses Vertex AI SDK for Python to fine-tune the model. You can perform fine-tuning using the other options as well — HTTP, CURL command, Java SDK, Console.

You can fine-tune and evaluate your model for customized responses in 5 steps. You can refer to the full code in the file llm_fine_tuning_supervised.ipynb from the repository.

4. Step 1: Install and Import dependencies

!pip install google-cloud-aiplatform
!pip install --user datasets
!pip install --user google-cloud-pipeline-components

Follow the rest of the steps as shown in the .ipynb file in the repo. Make sure you replace the PROJECT_ID and BUCKET_NAME with your credentials.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import warnings
warnings.filterwarnings('ignore')
import vertexai
vertexai.init(project=PROJECT_ID, location=REGION)
import kfp
import sys
import uuid
import json
import pandas as pd
from google.auth import default
from datasets import load_dataset
from google.cloud import aiplatform
from vertexai.preview.language_models import TextGenerationModel, EvaluationTextSummarizationSpec

5. Step 2: Prepare and load training data

Replace YOUR_BUCKET with your bucket and upload the sample TRAIN.jsonl training data file to it. The sample data has been provisioned for this use case in the link mentioned above.

json_url = 'https://storage.googleapis.com/YOUR_BUCKET/TRAIN.jsonl'
df
= pd.read_json(json_url, lines=True)
print (df)

This step should result this:

17274866af36a47c.png

6. Step 3: Fine-tune a large language model

You can tune any large language model at this point (based on support availability) . In this snippet however, we are tuning the pretrained model "text-bison@002" with the data frame that has the training data we loaded in the previous step.:

model_display_name = 'bbc-finetuned-model' # @param {type:"string"}
tuned_model = TextGenerationModel.from_pretrained("text-bison@002")
tuned_model.tune_model(
training_data=df,
train_steps=100,
tuning_job_location="europe-west4",
tuned_model_location="europe-west4",
)

This step will take a few hours to complete. You can track the progress of fine-tuning using the pipeline job link in the result.

7. Step 4: Predict with the new fine-tuned model

Once the fine tuning job is complete, you will be able to predict with your new model. To predict with your new tuned model:

response = tuned_model.predict("Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)

You should see the following result:

67061c36b7ba39b7.png

To predict with the base model (text-bison@002) for comparison, run the following commands:

base_model = TextGenerationModel.from_pretrained("text-bison@002")
response = base_model.predict("Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk. They say that the shrinking space on aeroplanes is not only uncomfortable it it's putting our health and safety in danger. More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.")
print(response.text)

You should see the following result:

22ec58e4261405d6.png

Even though both titles generated look appropriate, the first one (generated with the fine-tuned model) is more in tune with the style of titles used in the dataset in question.

Load the fine tuned model

It might be easier to load a model that you just fine-tuned. But remember in step 3, it is invoked in the scope of the code itself so it still holds the tuned model in the variable tuned_model. But what if you want to invoke a model that was tuned in the past?

To do this, you can invoke the get_tuned_model() method on the LLM with the full ENDPOINT URL of the deployed fine tuned model from Vertex AI Model Registry. Note that in this case, you are entering the PROJECT_NUMBER and the MODEL_NUMBER instead of their respective ids.

tuned_model_1 = TextGenerationModel.get_tuned_model("projects/<<PROJECT_NUMBER>>/locations/europe-west4/models/<<MODEL_NUMBER>>")
print(tuned_model_1.predict("YOUR_PROMPT"))

8. Step 5: Evaluate the new fine-tuned model

Evaluation is a critical aspect of assessing the quality and relevance of the generated response. It involves examining the output from a generative language model to determine its coherence, accuracy, and alignment with the provided prompt. Model evaluation helps identify areas for improvement, optimize model performance, and ensure that the generated text meets the desired standards for quality and usefulness. Read more about it in the documentation. For now, we will see how we can get some evaluation metrics on the fine tuned model and compare against the base model.

  1. Load the EVALUATION dataset:
json_url = 'https://storage.googleapis.com/YOUR_BUCKET/EVALUATE.jsonl'
df
= pd.read_json(json_url, lines=True)
print (df)
  1. Define the evaluation specification for a text summarization task on the fine-tuned model.
task_spec = EvaluationTextSummarizationSpec(
 task_name
= "summarization",
 ground_truth_data
=df
)

This step will take a few minutes to complete. You can track the progress using the pipeline job link in the result. After completion, you should see the following evaluation result:

387843d6c970e02.png

The rougeLSum metric in the evaluation result specifies the ROUGE-L score for the summary. ROUGE-L is a recall-based metric that measures the overlap between a summary and a reference summary. It is calculated by taking the longest common subsequence (LCS) between the two summaries and dividing it by the length of the reference summary.

The rougeLSum score in the provided expression is 0.36600753600753694, which means that the summary has a 36.6% overlap with the reference summary.

If you run the evaluation step on the baseline model, you will observe that the summary score is relatively higher for the fine-tuned model.

You can find the evaluation results in the Cloud Storage output directory that you specified when creating the evaluation job. The file is named evaluation_metrics.json. For tuned models, you can also view evaluation results in the Google Cloud console on the Vertex AI Model Registry page.

9. Important considerations

  • Model support: Always check the model documentation for the latest compatibility.
  • Rapid development: The field of LLMs advances quickly. A newer, more powerful model could potentially outperform a fine-tuned model built on an older base. The good news is that you can apply these fine-tuning techniques to newer models when the capability becomes available.
  • LoRA: LoRA is a technique for efficiently fine-tuning LLMs. It does this by introducing trainable, low-rank decomposition matrices into the existing pre-trained model's layers. Read more about it here. Instead of updating all the parameters of a massive LLM, LoRA learns smaller matrices that are added to or multiplied with the original model's weight matrices. This significantly reduces the number of additional parameters introduced during fine-tuning.

10. Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this codelab, follow these steps:

  1. In the Google Cloud console, go to the Manage resources page.
  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.
  4. Alternatively you can go to Model Registry, navigate to the model deploy and test tab and undeploy the endpoint and delete the deployed tuned model.

11. Congratulations

Congratulations! You have successfully used Vertex AI to fine-tune an LLM model. Fine-tuning is a powerful technique that allows you to customize LLMs to your domain and tasks. With Vertex AI, you have the tools and resources you need to fine-tune your models efficiently and effectively.

Explore the GitHub repositories and experiment with the sample code to experience fine-tuning and evaluation firsthand. Consider how fine-tuned LLMs can address your specific needs, from generating targeted marketing copy to summarizing complex documents or translating languages with cultural nuance. Utilize the comprehensive suite of tools and services offered by Vertex AI to build, train, evaluate, and deploy your fine-tuned models with ease.