Fine-tune Gemini on Vertex AI

1. Introduction

In this lab, you learn to perform the complete workflow of supervised fine-tuning on a Google Gemini model to adapt it for a specific task: article summarization. While large language models are powerful, their general-purpose nature means they can be made even more effective for specific use cases through fine-tuning. By training the model on a high-quality dataset of examples, you can improve its consistency, quality, and efficiency for your target task.

You'll use Gemini 2.5 Flash, a lightweight and cost-efficient model, and perform the fine-tuning using Vertex AI.

Architecture Overview

Here's what we'll build:

  • Cloud Shell: Your development environment.
  • Cloud Storage: Stores training/validation data in JSONL format.
  • Vertex AI Training: Manages the fine-tuning job.
  • Vertex AI Endpoint: Hosts your fine-tuned model.

What you'll learn

  • Prepare high-quality datasets for supervised fine-tuning.
  • Configure and launch fine-tuning jobs using the Vertex AI SDK for Python.
  • Evaluate models using automated metrics (ROUGE scores).
  • Compare base and fine-tuned models to quantify improvements.

2. Project setup

Google Account

If you don't already have a personal Google Account, you must create a Google Account.

Use a personal account instead of a work or school account.

Sign-in to the Google Cloud Console

Sign-in to the Google Cloud Console using a personal Google account.

Enable Billing

Redeem $5 Google Cloud credits (optional)

To run this workshop, you need a Billing Account with some credit. If you are planning to use your own billing, you can skip this step.

  1. Click this link and sign in with a personal google account.You will see something like this:Click to authorize Cloud Shell
  2. Click the CLICK HERE TO ACCESS YOUR CREDITS button.This will bring you to a page to set up your billing profileClick to authorize Cloud Shell
  3. Click Confirm

You are now connected to a Google Cloud Platform Trial Billing Account.

Screenshot of billing overview

Create a project (optional)

If you do not have a current project you'd like to use for this lab, create a new project here.

3. Open Cloud Shell Editor

  1. Click this link to navigate directly to Cloud Shell Editor
  2. If prompted to authorize at any point today, click Authorize to continue.Click to authorize Cloud Shell
  3. If the terminal doesn't appear at the bottom of the screen, open it:
    • Click View
    • Click TerminalOpen new terminal in Cloud Shell Editor
  4. In the terminal, set your project with this command:
    gcloud config set project [PROJECT_ID]
    
    • Example:
      gcloud config set project lab-project-id-example
      
    • If you can't remember your project ID, you can list all your project IDs with:
      gcloud projects list
      
      Set project id in Cloud Shell Editor terminal
  5. You should see this message:
    Updated property [core/project].
    

4. Enable APIs

To use Vertex AI and other services, you need to enable the necessary APIs in your Google Cloud project.

  1. In the terminal, enable the APIs:
    • Vertex AI API (aiplatform.googleapis.com): Enables the use of Vertex AI for fine-tuning and serving models.
    • Cloud Storage API (storage.googleapis.com): Enables the storage of datasets and model artifacts.
    gcloud services enable aiplatform.googleapis.com \
        storage.googleapis.com
    

5. Set up the project environment

Create a working directory

  1. In the terminal, create a directory for your project and navigate into it.
    mkdir gemini-finetuning
    cd gemini-finetuning
    

Set up environment variables

  1. In the terminal, define the environment variables for your project. We will create an env.sh file to store these variables so they can be easily reloaded if your session disconnects.
    cat <<EOF > env.sh
    export PROJECT_ID=\$(gcloud config get-value project)
    export REGION="us-central1"
    export BUCKET_NAME="\${PROJECT_ID}-gemini-tuning"
    EOF
    
    source env.sh
    

Create a Cloud Storage Bucket

  1. In the terminal, create a bucket to store your dataset and model artifacts.
    gcloud storage buckets create gs://$BUCKET_NAME --project=$PROJECT_ID --location=$REGION
    

Set up Virtual Environment

  1. We will use uv to manage our Python environment. In the terminal, run:
    uv venv .venv
    source .venv/bin/activate
    
  2. In the terminal, install the required Python packages.
    uv pip install google-cloud-aiplatform rouge-score matplotlib pandas tqdm
    

6. Prepare the training data

Quality data is the foundation of successful fine-tuning. You will use the WikiLingua dataset, transform it into the specific JSONL format that Gemini requires, and upload it to your storage bucket.

  1. In the terminal, create a file named prepare_data.py.
    cloudshell edit prepare_data.py
    
  2. Paste the following code into prepare_data.py.
    import json
    import os
    import pandas as pd
    from google.cloud import storage
    import subprocess
    
    # Configuration
    BUCKET_NAME = os.environ["BUCKET_NAME"]
    PROJECT_ID = os.environ["PROJECT_ID"]
    
    def download_data():
        print("Downloading WikiLingua dataset...")
        # Using gsutil to copy from public bucket
        subprocess.run(["gsutil", "cp", "gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/*", "."], check=True)
    
    def convert_to_gemini_format(input_file, output_file, max_samples=1000):
        print(f"Converting {input_file} to Gemini format (first {max_samples} samples)...")
        converted_data = []
        with open(input_file, 'r') as f:
            for i, line in enumerate(f):
                if i >= max_samples:
                    break
                obj = json.loads(line)
                messages = obj.get("messages", [])
    
                # Convert messages to Gemini 2.5 format
                # Input: {"messages": [{"role": "user", "content": "..."}, {"role": "model", "content": "..."}]}
                # Output: {"contents": [{"role": "user", "parts": [{"text": "..."}]}, {"role": "model", "parts": [{"text": "..."}]}]}
    
                contents = []
                for msg in messages:
                    role = msg["role"]
                    content = msg["content"]
                    contents.append({
                        "role": role,
                        "parts": [{"text": content}]
                    })
    
                converted_data.append({"contents": contents})
    
        with open(output_file, 'w') as f:
            for item in converted_data:
                f.write(json.dumps(item) + "\n")
    
        print(f"Saved {len(converted_data)} examples to {output_file}")
    
    def upload_to_gcs(local_file, destination_blob_name):
        print(f"Uploading {local_file} to gs://{BUCKET_NAME}/{destination_blob_name}...")
        storage_client = storage.Client(project=PROJECT_ID)
        bucket = storage_client.bucket(BUCKET_NAME)
        blob = bucket.blob(destination_blob_name)
        blob.upload_from_filename(local_file)
        print("Upload complete.")
    
    def main():
        download_data()
    
        # Process Training Data
        convert_to_gemini_format("sft_train_samples.jsonl", "train_gemini.jsonl")
        upload_to_gcs("train_gemini.jsonl", "datasets/train/train_gemini.jsonl")
    
        # Process Validation Data
        convert_to_gemini_format("sft_val_samples.jsonl", "val_gemini.jsonl")
        upload_to_gcs("val_gemini.jsonl", "datasets/val/val_gemini.jsonl")
    
        print("Data preparation complete!")
    
    if __name__ == "__main__":
        main()
    
  3. In the terminal, run the data preparation script.
    python prepare_data.py
    

7. Establish baseline performance

Before fine-tuning, you need a benchmark. You will measure how well the base gemini-2.5-flash model performs on the summarization task using ROUGE scores.

  1. In the terminal, create a file named evaluate.py.
    cloudshell edit evaluate.py
    
  2. Paste the following code into evaluate.py.
    import argparse
    import json
    import os
    import pandas as pd
    from google.cloud import aiplatform
    import vertexai
    from vertexai.generative_models import GenerativeModel, GenerationConfig, HarmCategory, HarmBlockThreshold
    from rouge_score import rouge_scorer
    from tqdm import tqdm
    import matplotlib.pyplot as plt
    import time
    
    # Configuration
    PROJECT_ID = os.environ["PROJECT_ID"]
    REGION = os.environ["REGION"]
    
    aiplatform.init(project=PROJECT_ID, location=REGION)
    
    def evaluate(model_name, test_file, max_samples=50, output_json="results.json"):
        print(f"Evaluating model: {model_name}")
    
        # Load Test Data
        test_df = pd.read_csv(test_file)
        test_df = test_df.head(max_samples)
    
        model = GenerativeModel(model_name)
    
        safety_settings = {
            HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
            HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,
            HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
            HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
        }
    
        generation_config = GenerationConfig(
            temperature=0.1,
            max_output_tokens=1024,
        )
    
        scorer = rouge_scorer.RougeScorer(['rouge1', 'rouge2', 'rougeL'], use_stemmer=True)
        results = []
    
        for index, row in tqdm(test_df.iterrows(), total=len(test_df)):
            input_text = row['input_text']
            reference_summary = row['output_text']
    
            try:
                response = model.generate_content(
                    input_text,
                    generation_config=generation_config,
                    safety_settings=safety_settings
                )
                generated_summary = response.text
    
                scores = scorer.score(reference_summary, generated_summary)
    
                results.append({
                    "generated": generated_summary,
                    "reference": reference_summary,
                    "rouge1": scores['rouge1'].fmeasure,
                    "rouge2": scores['rouge2'].fmeasure,
                    "rougeL": scores['rougeL'].fmeasure
                })
            except Exception as e:
                print(f"Error processing example {index}: {e}")
                # Sleep briefly to avoid quota issues if hitting limits
                time.sleep(1)
    
        # Save results
        with open(output_json, 'w') as f:
            json.dump(results, f, indent=2)
    
        return pd.DataFrame(results)
    
    def plot_results(df, title, filename):
        os.makedirs("plots", exist_ok=True)
    
        metrics = ['rouge1', 'rouge2', 'rougeL']
        fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
        for i, metric in enumerate(metrics):
            axes[i].hist(df[metric], bins=10, alpha=0.7, color='skyblue', edgecolor='black')
            axes[i].set_title(f'{metric} Distribution')
            axes[i].set_xlabel('Score')
            axes[i].set_ylabel('Count')
    
        plt.suptitle(title)
        plt.tight_layout()
        plt.savefig(f"plots/{filename}")
        print(f"Plot saved to plots/{filename}")
    
    def compare_results(baseline_file, tuned_file):
        with open(baseline_file, 'r') as f:
            baseline_data = pd.DataFrame(json.load(f))
        with open(tuned_file, 'r') as f:
            tuned_data = pd.DataFrame(json.load(f))
    
        print("\n--- Comparison ---")
        metrics = ['rouge1', 'rouge2', 'rougeL']
        for metric in metrics:
            base_mean = baseline_data[metric].mean()
            tuned_mean = tuned_data[metric].mean()
            diff = tuned_mean - base_mean
            print(f"{metric}: Base={base_mean:.4f}, Tuned={tuned_mean:.4f}, Diff={diff:+.4f}")
    
        # Comparative Plot
        os.makedirs("plots", exist_ok=True)
        comparison_df = pd.DataFrame({
            'Metric': metrics,
            'Baseline': [baseline_data[m].mean() for m in metrics],
            'Tuned': [tuned_data[m].mean() for m in metrics]
        })
    
        comparison_df.plot(x='Metric', y=['Baseline', 'Tuned'], kind='bar', figsize=(10, 6))
        plt.title('Baseline vs Tuned Model Performance')
        plt.ylabel('Average Score')
        plt.xticks(rotation=0)
        plt.tight_layout()
        plt.savefig("plots/comparison.png")
        print("Comparison plot saved to plots/comparison.png")
    
    def main():
        parser = argparse.ArgumentParser()
        parser.add_argument("--model", type=str, default="gemini-2.5-flash", help="Model resource name")
        parser.add_argument("--baseline", type=str, help="Path to baseline results json for comparison")
        parser.add_argument("--output", type=str, default="results.json", help="Output file for results")
        args = parser.parse_args()
    
        # Ensure test data exists (it was downloaded in prepare_data step)
        if not os.path.exists("sft_test_samples.csv"):
            # Fallback download if needed
            subprocess.run(["gsutil", "cp", "gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_test_samples.csv", "."], check=True)
    
        df = evaluate(args.model, "sft_test_samples.csv", output_json=args.output)
    
        print("\n--- Results Summary ---")
        print(df.describe())
    
        plot_filename = "baseline_dist.png" if not args.baseline else "tuned_dist.png"
        plot_results(df, f"ROUGE Scores - {args.model}", plot_filename)
    
        if args.baseline:
            compare_results(args.baseline, args.output)
    
    if __name__ == "__main__":
        main()
    
  3. In the terminal, run the baseline evaluation.
    python evaluate.py --model "gemini-2.5-flash" --output "baseline.json"
    
    This will generate a baseline.json file and a plot in plots/baseline_dist.png.

8. Configure and launch fine-tuning

Now you will launch a managed fine-tuning job on Vertex AI.

  1. In the terminal, create a file named tune.py.
    cloudshell edit tune.py
    
  2. Paste the following code into tune.py.
    import os
    import time
    from google.cloud import aiplatform
    import vertexai
    from vertexai.preview.tuning import sft
    
    # Configuration
    PROJECT_ID = os.environ["PROJECT_ID"]
    REGION = os.environ["REGION"]
    BUCKET_NAME = os.environ["BUCKET_NAME"]
    
    aiplatform.init(project=PROJECT_ID, location=REGION)
    
    def train():
        print("Launching fine-tuning job...")
    
        sft_tuning_job = sft.train(
            source_model="gemini-2.5-flash", # Using specific version for stability
            train_dataset=f"gs://{BUCKET_NAME}/datasets/train/train_gemini.jsonl",
            validation_dataset=f"gs://{BUCKET_NAME}/datasets/val/val_gemini.jsonl",
            epochs=1, # Keep it short for the lab
            adapter_size=4,
            learning_rate_multiplier=1.0,
            tuned_model_display_name="gemini-2.5-flash-wikilingua",
        )
    
        print(f"Job started: {sft_tuning_job.resource_name}")
        print("Waiting for job to complete... (this may take ~45 minutes)")
    
        # Wait for the job to complete
        while not sft_tuning_job.has_ended:
            time.sleep(60)
            sft_tuning_job.refresh()
            print(f"Status: {sft_tuning_job.state.name}")
    
        print("Job completed!")
        print(f"Tuned Model Endpoint: {sft_tuning_job.tuned_model_endpoint_name}")
        return sft_tuning_job.tuned_model_endpoint_name
    
    if __name__ == "__main__":
        train()
    
  3. In the terminal, run the fine-tuning script.
    python tune.py
    
    Note: This process can take ~45 minutes. You can monitor the job in the Vertex AI Console.

9. Understand the training code

While your job is running, let's take a closer look at the tune.py script to understand how the fine-tuning works.

Managed Supervised Fine-Tuning

The script uses the vertexai.tuning.sft.train method to submit a managed tuning job. This abstracts away the complexity of provisioning infrastructure, distributing the training, and managing checkpoints.

sft_tuning_job = sft.train(
    source_model="gemini-2.5-flash",
    train_dataset=f"gs://{BUCKET_NAME}/datasets/train/train_gemini.jsonl",
    # ...
)

LoRA Configuration

Instead of manually defining a LoraConfig like you might in open-source frameworks, Vertex AI simplifies this into a few key parameters:

  • adapter_size: This parameter (set to 4 in our script) controls the rank of the LoRA adapters. A larger size allows the model to learn more complex adaptations but increases the number of trainable parameters.
  • epochs: We set this to 1 for this lab to keep the training time short (~20 minutes). In a production scenario, you might increase this to allow the model to learn more deeply from your data, though you should watch out for overfitting.

Model Selection

We explicitly specify source_model="gemini-2.5-flash". Vertex AI supports various versions of Gemini, and pinning a specific version ensures your pipeline remains stable and reproducible.

10. Compare models

Once the fine-tuning job is complete, you can compare the performance of your new model against the baseline.

  1. Get your tuned model endpoint. It was printed at the end of the tune.py script. It will look something like projects/.../locations/.../endpoints/....
  2. Run the evaluation script again, this time passing your tuned model and the baseline results for comparison.
    # Replace [YOUR_TUNED_MODEL_ENDPOINT] with the actual endpoint name
    export TUNED_MODEL="projects/[YOUR_PROJECT_ID]/locations/[YOUR_REGION]/endpoints/[YOUR_ENDPOINT_ID]"
    
    python evaluate.py --model "$TUNED_MODEL" --baseline "baseline.json" --output "tuned.json"
    
  3. View the results. The script will output a comparison of ROUGE scores and generate a plots/comparison.png chart showing the improvement.You can view the plots by opening the plots folder in the Cloud Shell Editor.

11. Clean up

To avoid incurring charges, delete the resources you created.

  1. In the terminal, delete the Cloud Storage bucket and the tuned model.
    gcloud storage rm -r gs://$BUCKET_NAME
    # Note: You can delete the model endpoint from the Vertex AI Console
    

12. Congratulations!

You have successfully fine-tuned Gemini 2.5 Flash on Vertex AI!

Recap

In this lab, you:

  • Prepared a dataset in JSONL format for Gemini fine-tuning.
  • Established a baseline using the base Gemini 2.5 Flash model.
  • Launched a supervised fine-tuning job on Vertex AI.
  • Evaluated and compared the fine-tuned model against the baseline.

What's next

This lab is part of the Production-Ready AI with Google Cloud Learning Path.

Explore the full curriculum to bridge the gap from prototype to production.

Share your progress with the hashtag #ProductionReadyAI.