Gemini in Java with Vertex AI and LangChain4j

1. Introduction

This codelab focuses on the Gemini Large Language Model (LLM), hosted on Vertex AI on Google Cloud. Vertex AI is a platform that encompasses all the machine learning products, services, and models on Google Cloud.

You will use Java to interact with the Gemini API using the LangChain4j framework. You'll go through concrete examples to take advantage of the LLM for question answering, idea generation, entity and structured content extraction, retrieval augmented generation, and function calling.

What is Generative AI?

Generative AI refers to the use of artificial intelligence to create new content, such as text, images, music, audio, and videos.

Generative AI is powered by large language models (LLMs) that can multi-task and perform out-of-the-box tasks such as summarization, Q&A, classification, and more. With minimal training, foundational models can be adapted for targeted use cases with very little example data.

How does Generative AI work?

Generative AI works by using a Machine Learning (ML) model to learn the patterns and relationships in a dataset of human-created content. It then uses the learned patterns to generate new content.

The most common way to train a generative AI model is to use supervised learning. The model is given a set of human-created content and corresponding labels. It then learns to generate content that is similar to the human-created content.

What are common Generative AI applications?

Generative AI can be used to:

Improve customer interactions through enhanced chat and search experiences.
Explore vast amounts of unstructured data through conversational interfaces and summarizations.
Assist with repetitive tasks like replying to requests for proposals, localizing marketing content in different languages and checking customer contracts for compliance, and more.

What Generative AI offerings does Google Cloud have?

With Vertex AI, you can interact with, customize, and embed foundation models into your applications with little to no ML expertise. You can access foundation models on Model Garden, tune models via a simple UI on Vertex AI Studio, or use models in a data science notebook.

Vertex AI Search and Conversation offers developers the fastest way to build generative AI powered search engines and chatbots.

Powered by Gemini, Gemini for Google Cloud is an AI-powered collaborator available across Google Cloud and IDEs to help you get more done, faster. Gemini Code Assist provides code completion, code generation, code explanations, and lets you chat with it to ask technical questions.

What is Gemini?

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. Multimodal means it can process and generate different kinds of content such as text, code, images, and audio.

Gemini comes in different variations and sizes:

Gemini Ultra: The largest, most capable version for complex tasks.
Gemini Flash: Fastest and most cost-effective, optimized for high-volume tasks.
Gemini Pro: Mid-sized, optimized for scaling across various tasks.
Gemini Nano: The most efficient, designed for on-device tasks.

Key Features:

Multimodality: Gemini's ability to understand and handle multiple information formats is a significant step beyond traditional text-only language models.
Performance: Gemini Ultra outperforms the current state-of-the-art on many benchmarks and was the first model to surpass human experts on the challenging MMLU (Massive Multitask Language Understanding) benchmark.
Flexibility: The different Gemini sizes make it adaptable for various use cases, from large-scale research to deployment on mobile devices.

How can you interact with Gemini on Vertex AI from Java?

You have two options:

The official Vertex AI Java API for Gemini library.
LangChain4j framework.

In this codelab, you will use the LangChain4j framework.

What is the LangChain4j framework?

The LangChain4j framework is an open source library for integrating LLMs in your Java applications, by orchestrating various components, such as the LLM itself, but also other tools like vector databases (for semantic searches), document loaders and splitters (to analyze documents and learn from them), output parsers, and more.

The project was inspired by the LangChain Python project but with the goal to serve Java developers.

From the Github project page:

The goal of this project is to simplify the integration of AI/LLM capabilities into your Java application.

This can be achieved thanks to:

A simple and coherent layer of abstractions, designed to ensure that your code does not depend on concrete implementations such as LLM providers, embedding store providers, etc. This allows for easy swapping of components.
Numerous implementations of the above-mentioned abstractions, providing you with a variety of LLMs and embedding stores to choose from.
Range of in-demand features on top of LLMs, such as:
The capability to ingest your own data (documentation, codebase, etc.), allowing the LLM to act and respond based on your data.
Autonomous agents for delegating tasks (defined on the fly) to the LLM, which will strive to complete them.
Prompt templates to help you achieve the highest possible quality of LLM responses.
Memory to provide context to the LLM for your current and past conversations.
Structured outputs for receiving responses from the LLM with a desired structure as Java POJOs.
"AI Services" for declaratively defining complex AI behavior behind a simple API.
Chains to reduce the need for extensive boilerplate code in common use-cases.
Auto-moderation to ensure that all inputs and outputs to/from the LLM are not harmful.

What you'll learn

How to setup a Java project to use Gemini and LangChain4j
How to send your first prompt to Gemini programmatically
How to stream responses from Gemini
How to create a conversation between a user and Gemini
How to use Gemini in a multimodal context by sending both text and images
How to extract useful structured information from unstructured content
How to manipulate prompt templates
How to do text classification such as sentiment analysis
How to chat with your own documents (Retrieval Augmented Generation)
How to extend your chatbots with function calling
How to use Gemma locally with Ollama and TestContainers

What you'll need

Knowledge of the Java programming language
A Google Cloud project
A browser, such as Chrome or Firefox

2. Setup and requirements

Self-paced environment setup

Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Cloud Shell, a command line environment running in the Cloud.

Activate Cloud Shell

From the Cloud Console, click Activate Cloud Shell .

If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.

It should only take a few moments to provision and connect to Cloud Shell.

This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.

Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.

Run the following command in Cloud Shell to confirm that you are authenticated:

gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:

gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

3. Preparing your development environment

In this codelab, you're going to use the Cloud Shell terminal and Cloud Shell editor to develop your Java programs.

Enable Vertex AI APIs

In the Google Cloud console, make sure your project name is displayed at the top of your Google Cloud console. If it's not, click Select a project to open the Project Selector, and select your intended project.

You can enable Vertex AI APIs either from the Vertex AI section of Google Cloud console or from Cloud Shell terminal.

To enable from the Google Cloud console, first, go to the Vertex AI section of Google Cloud console menu:

Click Enable All Recommended APIs in the Vertex AI dashboard.

This will enable several APIs, but the most important one for the codelab is the aiplatform.googleapis.com.

Alternatively, you can also enable this API from the Cloud Shell terminal with the following command:

gcloud services enable aiplatform.googleapis.com

Clone the Github repository

In the Cloud Shell terminal, clone the repository for this codelab:

git clone https://github.com/glaforge/gemini-workshop-for-java-developers.git

To check that the project is ready to run, you can try running the "Hello World" program.

Make sure you're at the top level folder:

cd gemini-workshop-for-java-developers/

Create the Gradle wrapper:

gradle wrapper

Run with gradlew:

./gradlew run

You should see the following output:

..
> Task :app:run
Hello World!

Open and setup Cloud Editor

Open the code with the Cloud Code Editor from Cloud Shell:

In the Cloud Code Editor, open the codelab source folder by selecting File -> Open Folder and point to the codelab source folder (eg. /home/username/gemini-workshop-for-java-developers/).

Setup environment variables

Open a new terminal in Cloud Code Editor by selecting Terminal -> New Terminal. Set up two environment variables required for running the code examples:

PROJECT_ID — Your Google Cloud project ID
LOCATION — The region where the Gemini model is deployed

Export the variables as follows:

export PROJECT_ID=$(gcloud config get-value project)
export LOCATION=us-central1

4. First call to the Gemini model

Now that the project is properly set up, it is time to call the Gemini API.

Take a look at QA.java in the app/src/main/java/gemini/workshop directory:

package gemini.workshop;

import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.model.chat.ChatLanguageModel;

public class QA {
    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .build();

        System.out.println(model.generate("Why is the sky blue?"));
    }
}

In this first example, you need to import the VertexAiGeminiChatModel class, which implements the ChatModel interface.

In the main method, you configure the chat language model by using the builder for the VertexAiGeminiChatModel and specify:

Project
Location
Model name (gemini-1.5-flash-002).

Now that the language model is ready, you can call the generate() method and pass your prompt, your question or instructions to send to the LLM. Here, you ask a simple question about what makes the sky blue.

Feel free to change this prompt to try different questions or tasks.

Run the sample at the source code root folder:

./gradlew run -q -DjavaMainClass=gemini.workshop.QA

You should see an output similar to this one:

The sky appears blue because of a phenomenon called Rayleigh scattering.
When sunlight enters the atmosphere, it is made up of a mixture of
different wavelengths of light, each with a different color. The
different wavelengths of light interact with the molecules and particles
in the atmosphere in different ways.

The shorter wavelengths of light, such as those corresponding to blue
and violet light, are more likely to be scattered in all directions by
these particles than the longer wavelengths of light, such as those
corresponding to red and orange light. This is because the shorter
wavelengths of light have a smaller wavelength and are able to bend
around the particles more easily.

As a result of Rayleigh scattering, the blue light from the sun is
scattered in all directions, and it is this scattered blue light that we
see when we look up at the sky. The blue light from the sun is not
actually scattered in a single direction, so the color of the sky can
vary depending on the position of the sun in the sky and the amount of
dust and water droplets in the atmosphere.

Congratulations, you made your first call to Gemini!

Streaming response

Did you notice that the response was given in one go, after a few seconds? It's also possible to get the response progressively, thanks to the streaming response variant. The streaming response, the model returns the response piece by piece, as it becomes available.

In this codelab, we'll stick with the non-streaming response but let's have a look at the streaming response to see how it can be done.

In StreamQA.java in the app/src/main/java/gemini/workshop directory you can see the streaming response in action:

package gemini.workshop;

import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiStreamingChatModel;

import static dev.langchain4j.model.LambdaStreamingResponseHandler.onNext;

public class StreamQA {
    public static void main(String[] args) {
        StreamingChatLanguageModel model = VertexAiGeminiStreamingChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .maxOutputTokens(4000)
            .build();

        model.generate("Why is the sky blue?", onNext(System.out::println));
    }
}

This time, we import the streaming class variants VertexAiGeminiStreamingChatModel which implements the StreamingChatLanguageModel interface. You'll also need to static import LambdaStreamingResponseHandler.onNext which is a convenience method that provides StreamingResponseHandlers to create a streaming handler with Java lambda expressions.

This time, the signature of the generate() method is a little bit different. Instead of returning a string, the return type is void. In addition to the prompt, you have to pass a streaming response handler. Here, thanks to the static import we mentioned above, we can define a lambda expression that you pass to the onNext() method. The lambda expression is called each time a new piece of the response is available, while the latter is called only if ever an error occurs.

Run:

./gradlew run -q -DjavaMainClass=gemini.workshop.StreamQA

You will get a similar answer to the previous class, but this time, you will notice that the answer appears progressively in your shell, rather than waiting for the display of the full answer.

Extra configuration

For configuration, we only defined the project, the location, and the model name, but there are other parameters you can specify for the model:

temperature(Float temp) — to define how creative you want the response to be (0 being low creative and often more factual, while 2 is for more creative outputs)
topP(Float topP) — to select the possible words whose total probability add up to that floating point number (between 0 and 1)
topK(Integer topK) — to randomly select a word out of a maximum number of probable words for the text completion (from 1 to 40)
maxOutputTokens(Integer max) — to specify the maximum length of the answer given by the model (generally, 4 tokens represent roughly 3 words)
maxRetries(Integer retries) — in case you're running past the request per time quota, or the platform is facing some technical issue, you can have the model retry the call 3 times

So far, you asked a single question to Gemini, but you can also have a multi-turn conversation. That's what you'll explore in the next section.

5. Chat with Gemini

In the previous step, you asked a single question. It's now time to have a real conversation between a user and the LLM. Each question and answer can build upon the previous ones to form a real discussion.

Take a look at Conversation.java in the app/src/main/java/gemini/workshop folder:

package gemini.workshop;

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.service.AiServices;

import java.util.List;

public class Conversation {
    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .build();

        MessageWindowChatMemory chatMemory = MessageWindowChatMemory.builder()
            .maxMessages(20)
            .build();

        interface ConversationService {
            String chat(String message);
        }

        ConversationService conversation =
            AiServices.builder(ConversationService.class)
                .chatLanguageModel(model)
                .chatMemory(chatMemory)
                .build();

        List.of(
            "Hello!",
            "What is the country where the Eiffel tower is situated?",
            "How many inhabitants are there in that country?"
        ).forEach( message -> {
            System.out.println("\nUser: " + message);
            System.out.println("Gemini: " + conversation.chat(message));
        });
    }
}

A couple new interesting imports in this class:

MessageWindowChatMemory — a class that will help handle the multi-turn aspect of the conversation, and keep in local memory the previous questions and answers
AiServices — a higher-level abstraction class that will tie together the chat model and the chat memory

In the main method, you're going to set up the model, the chat memory, and the AI service. The model is configured as usual with the project, location, and model name information.

For the chat memory, we use MessageWindowChatMemory‘s builder to create a memory that keeps the last 20 messages exchanged. It's a sliding window over the conversation whose context is kept locally in our Java class client.

You then create the AI service that binds the chat model with the chat memory.

Notice how the AI service makes use of a custom ConversationService interface we've defined, that LangChain4j implements, and that takes a String query and returns a String response.

Now, it's time to have a conversation with Gemini. First, a simple greeting is sent, then a first question about the Eiffel tower to know in which country it can be found. Notice that the last sentence is related to the answer of the first question, as you wonder how many inhabitants are in the country where the Eiffel tower is situated, without explicitly mentioning the country that was given in the previous answer. It shows that past questions and answers are sent with every prompt.

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.Conversation

You should see three answers similar to these ones:

User: Hello!
Gemini: Hi there! How can I assist you today?

User: What is the country where the Eiffel tower is situated?
Gemini: France

User: How many inhabitants are there in that country?
Gemini: As of 2023, the population of France is estimated to be around 67.8 million.

You can ask single-turn questions or have multi-turn conversations with Gemini but so far, the input has been only text. What about images? Let's explore images in the next step.

6. Multimodality with Gemini

Gemini is a multimodal model. Not only does it accept text as input, but also it accepts images, or even videos as input. In this section, you'll see a use case for mixing text and images.

Do you think Gemini will recognise this cat?

Picture of a cat in the snow taken from Wikipediahttps://upload.wikimedia.org/wikipedia/commons/b/b6/Felis_catus-cat_on_snow.jpg

Take a look at Multimodal.java in app/src/main/java/gemini/workshop directory:

package gemini.workshop;

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;

public class Multimodal {

    static final String CAT_IMAGE_URL =
        "https://upload.wikimedia.org/wikipedia/" +
        "commons/b/b6/Felis_catus-cat_on_snow.jpg";


    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .build();

        UserMessage userMessage = UserMessage.from(
            ImageContent.from(CAT_IMAGE_URL),
            TextContent.from("Describe the picture")
        );

        Response<AiMessage> response = model.generate(userMessage);

        System.out.println(response.content().text());
    }
}

In the imports, notice we distinguish between different kinds of messages and contents. A UserMessage can contain both a TextContent and an ImageContent object. This is multimodality at play: mixing text and images. We don't just send a simple string prompt, we send a more structured object that represents a user message, composed of an image content piece and a text content piece. The model sends back a Response which contains an AiMessage.

You then retrieve the AiMessage from the response via content(), and then the text of the message thanks to text().

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.Multimodal

The name of the picture certainly gave you a hint of what the picture contained but Gemini output is similar to the following:

A cat with brown fur is walking in the snow. The cat has a white patch of fur on its chest and white paws. The cat is looking at the camera.

Mixing images and text prompts opens up interesting use cases. You can create applications that can:

Recognize text in pictures.
Check if an image is safe to display.
Create image captions.
Search through a database of images with plain text descriptions.

In addition to extracting information from images, you can also extract information from unstructured text. That's what you're going to learn in the next section.

7. Extract structured information from unstructured text

There are many situations where important information is given in report documents, in emails, or other long form texts in an unstructured way. Ideally, you'd like to be able to extract the key details contained in the unstructured text, in the form of structured objects. Let's see how you can do that.

Let's say you want to extract the name and age of a person, given a biography, CV, or description of that person. You can instruct the LLM to extract JSON from unstructured text with a cleverly tweaked prompt (this is commonly called "prompt engineering").

But in the example below, rather than crafting a prompt describing the JSON output, we'll use a powerful feature of Gemini called structured output, or sometimes constrained decoding, forcing the model to only output valid JSON content, following a specified JSON schema.

Take a look at ExtractData.java in app/src/main/java/gemini/workshop:

package gemini.workshop;

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;

import static dev.langchain4j.model.vertexai.SchemaHelper.fromClass;

public class ExtractData {

    record Person(String name, int age) { }

    interface PersonExtractor {
        @SystemMessage("""
            Your role is to extract the name and age 
            of the person described in the biography.
            """)
        Person extractPerson(String biography);
    }

    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .responseMimeType("application/json")
            .responseSchema(fromClass(Person.class))
            .build();

        PersonExtractor extractor = AiServices.create(PersonExtractor.class, model);

        String bio = """
            Anna is a 23 year old artist based in Brooklyn, New York. She was born and 
            raised in the suburbs of Chicago, where she developed a love for art at a 
            young age. She attended the School of the Art Institute of Chicago, where 
            she studied painting and drawing. After graduating, she moved to New York 
            City to pursue her art career. Anna's work is inspired by her personal 
            experiences and observations of the world around her. She often uses bright 
            colors and bold lines to create vibrant and energetic paintings. Her work 
            has been exhibited in galleries and museums in New York City and Chicago.    
            """;
        Person person = extractor.extractPerson(bio);

        System.out.println(person.name());  // Anna
        System.out.println(person.age());   // 23
    }
}

Let's have a look at the various steps in this file:

A Person record is defined to represent the details describing a person (name and age).
The PersonExtractor interface is defined with a method that, given an unstructured text string, returns a Person instance.
The extractPerson() is annotated with a @SystemMessage annotation that associates an instruction prompt with it. That's the prompt that the model will use to guide its extraction of the information, and return the details in the form of a JSON document, that will be parsed for you, and unmarshalled into a Person instance.

Now let's look at the content of the main() method:

The chat model is configured and instantiated. We are using 2 new methods of the model builder class: responseMimeType() and responseSchema(). The first one tells Gemini to generate valid JSON in output. The second method defines the schema of the JSON object that should be returned. Furthermore, the latter delegates to a convenience method that is able to convert a Java class or record into a proper JSON schema.
A PersonExtractor object is created thanks to LangChain4j's AiServices class.
Then, you can simply call Person person = extractor.extractPerson(...) to extract the details of the person from the unstructured text, and get back a Person instance with the name and age.

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.ExtractData

You should see the following output:

Anna
23

Yes, this is Anna and they are 23!

With this AiServices approach you operate with strongly typed objects. You are not interacting directly with the LLM. Instead, you are working with concrete classes, like the Person record to represent the extracted personal information, and you have a PersonExtractor object with an extractPerson() method which returns a Person instance. The notion of LLM is abstracted away, and as a Java developer, you are just manipulating normal classes and objects, when you are using this PersonExtractor interface..

8. Structure prompts with prompt templates

When you interact with an LLM using a common set of instructions or questions, there's a part of that prompt that never changes, while other parts contain the data. For example, if you want to create recipes, you might use a prompt like "You're a talented chef, please create a recipe with the following ingredients: ...", and then you'd append the ingredients to the end of that text. That's what prompt templates are for — similar to interpolated strings in programming languages. A prompt template contains placeholders which you can replace with the right data for a particular call to the LLM.

More concretely, let's study TemplatePrompt.java in the app/src/main/java/gemini/workshop directory:

package gemini.workshop;

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.output.Response;

import java.util.HashMap;
import java.util.Map;

public class TemplatePrompt {
    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .maxOutputTokens(500)
            .temperature(1.0f)
            .topK(40)
            .topP(0.95f)
            .maxRetries(3)
            .build();

        PromptTemplate promptTemplate = PromptTemplate.from("""
            You're a friendly chef with a lot of cooking experience.
            Create a recipe for a {{dish}} with the following ingredients: \
            {{ingredients}}, and give it a name.
            """
        );

        Map<String, Object> variables = new HashMap<>();
        variables.put("dish", "dessert");
        variables.put("ingredients", "strawberries, chocolate, and whipped cream");

        Prompt prompt = promptTemplate.apply(variables);

        Response<AiMessage> response = model.generate(prompt.toUserMessage());

        System.out.println(response.content().text());
    }
}

As usual, you configure the VertexAiGeminiChatModel model, with a high level of creativity with a high temperature and also high topP and topK values. Then you create a PromptTemplate with its from() static method, by passing the string of our prompt, and use the double curly-braces placeholder variables: {{dish}} and {{ingredients}}.

You create the final prompt by calling apply() that takes a map of key/value pairs that represent the name of the placeholder and the string value to replace it with.

Lastly, you call the generate() method of the Gemini model by creating a user message from that prompt, with the prompt.toUserMessage() instruction.

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.TemplatePrompt

You should see a generated output that looks similar to this one:

**Strawberry Shortcake**

Ingredients:

* 1 pint strawberries, hulled and sliced
* 1/2 cup sugar
* 1/4 cup cornstarch
* 1/4 cup water
* 1 tablespoon lemon juice
* 1/2 cup heavy cream, whipped
* 1/4 cup confectioners' sugar
* 1/4 teaspoon vanilla extract
* 6 graham cracker squares, crushed

Instructions:

1. In a medium saucepan, combine the strawberries, sugar, cornstarch, 
water, and lemon juice. Bring to a boil over medium heat, stirring 
constantly. Reduce heat and simmer for 5 minutes, or until the sauce has 
thickened.
2. Remove from heat and let cool slightly.
3. In a large bowl, combine the whipped cream, confectioners' sugar, and 
vanilla extract. Beat until soft peaks form.
4. To assemble the shortcakes, place a graham cracker square on each of 
6 dessert plates. Top with a scoop of whipped cream, then a spoonful of 
strawberry sauce. Repeat layers, ending with a graham cracker square.
5. Serve immediately.

**Tips:**

* For a more elegant presentation, you can use fresh strawberries 
instead of sliced strawberries.
* If you don't have time to make your own whipped cream, you can use 
store-bought whipped cream.

Feel free to change the values of dish and ingredients in the map and tweak the temperature, topK and tokP and re-run the code. This will allow you to observe the effect of changing these parameters on the LLM.

Prompt templates are a good way to have reusable and parameterizable instructions for LLM calls. You can pass data and customize prompts for different values, provided by your users.

9. Text classification with few-shot prompting

LLMs are pretty good at classifying text into different categories. You can help an LLM in that task by providing some examples of texts and their associated categories. This approach is often called few shot prompting.

Let's open TextClassification.java in the app/src/main/java/gemini/workshop directory, to do a particular type of text classification: sentiment analysis.

package gemini.workshop;

import com.google.cloud.vertexai.api.Schema;
import com.google.cloud.vertexai.api.Type;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;

import java.util.List;

public class TextClassification {

    enum Sentiment { POSITIVE, NEUTRAL, NEGATIVE }

    public static void main(String[] args) {
        ChatLanguageModel model = VertexAiGeminiChatModel.builder()
            .project(System.getenv("PROJECT_ID"))
            .location(System.getenv("LOCATION"))
            .modelName("gemini-1.5-flash-002")
            .maxOutputTokens(10)
            .maxRetries(3)
            .responseSchema(Schema.newBuilder()
                .setType(Type.STRING)
                .addAllEnum(List.of("POSITIVE", "NEUTRAL", "NEGATIVE"))
                .build())
            .build();


        interface SentimentAnalysis {
            @SystemMessage("""
                Analyze the sentiment of the text below.
                Respond only with one word to describe the sentiment.
                """)
            Sentiment analyze(String text);
        }

        MessageWindowChatMemory memory = MessageWindowChatMemory.withMaxMessages(10);
        memory.add(UserMessage.from("This is fantastic news!"));
        memory.add(AiMessage.from(Sentiment.POSITIVE.name()));

        memory.add(UserMessage.from("Pi is roughly equal to 3.14"));
        memory.add(AiMessage.from(Sentiment.NEUTRAL.name()));

        memory.add(UserMessage.from("I really disliked the pizza. Who would use pineapples as a pizza topping?"));
        memory.add(AiMessage.from(Sentiment.NEGATIVE.name()));

        SentimentAnalysis sentimentAnalysis =
            AiServices.builder(SentimentAnalysis.class)
                .chatLanguageModel(model)
                .chatMemory(memory)
                .build();

        System.out.println(sentimentAnalysis.analyze("I love strawberries!"));
    }
}

A Sentiment enum lists the different values for a sentiment: negative, neutral, or positive.

In the main() method, you create the Gemini chat model as usual, but with a small maximum output token number, as you only want a short response: the text is POSITIVE, NEGATIVE, or NEUTRAL. And in order to restrict the model to only return those values, exclusively, you can take advantage of the structured output support you discovered in the data extraction section. That's why the responseSchema() method is used. This time, you are not using the convenient method from SchemaHelper to infer the schema definition, but you'll use the Schema builder instead, to understand what schema definition looks like.

Once the model is configured, you create a SentimentAnalysis interface that LangChain4j's AiServices will implement for you using the LLM. This interface contains one method: analyze(). It takes the text to analyze in input, and returns a Sentiment enum value. So you're only manipulating a strongly typed object that represents the class of sentiment that is recognized.

Then, in order to give the "few shot examples" to nudge the model to do its classification work, you create a chat memory to pass pairs of user messages and AI responses that represents the text and the sentiment associated with it.

Let's bind everything together with the AiServices.builder() method, by passing our SentimentAnalysis interface, the model to use, and the chat memory with the few-shot examples. Lastly, call the analyze() method with the text to analyze.

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.TextClassification

You should see a single word:

POSITIVE

Looks like loving strawberries is a positive sentiment!

10. Retrieval Augmented Generation

LLMs are trained on a large quantity of text. However, their knowledge covers only information that it has seen during its training. If there is new information released after the model training cut-off-date, those details won't be available to the model. Thus, the model will not be able to answer questions on information that it hasn't seen.

That's why approaches like Retrieval Augmented Generation (RAG) that will be covered in this section help provide the extra information that an LLM may need to know to fulfill the requests of its users, to reply with information that may be more current or on private information that is not accessible at training time.

Let's come back to conversations. This time, you will be able to ask questions about your documents. You will build a chatbot that is able to retrieve relevant information from a database containing your documents split in smaller pieces ("chunks") and that information will be used by the model to ground its answers, instead of relying solely on the knowledge contained in its training.

Why are documents split into "chunks"?

Because with the RAG technique, the user prompts are compared with each chunk of text in the document. To do the comparison, chunks of text and prompts are transformed into vector embeddings: a highly multidimensional vector of floating point values. Then, vectors are compared with different distance or similarity metrics. The most commonly used by vector databases is the cosine similarity. Vectors that are more similar, also are usually semantically related.

What should be the size of the chunks?

There's not one size that is best. You should try different chunk sizes and see if they yield better results for the type of documents you'll be analyzing. Chunk sizes often range between 100 and 2000 characters.

What is a vector database? Can't I use a normal database?

They are databases optimized specifically for doing vector storage and computation. Most classical databases, like PostgreSQL and others, usually offer extensions that expand them to allow vector storage and calculations.

In RAG, there are two phases:

Ingestion phase — Documents are loaded in memory, split into smaller chunks, and vector embeddings (a high multidimensional vector representation of the chunks) are calculated and stored in a vector database that is capable of doing semantic searches. This ingestion phase is normally done once, when new documents need to be added to the document corpus.

Query phase — Users can now ask questions about the documents. The question will be transformed into a vector as well and compared with all the other vectors in the database. The most similar vectors are usually semantically related and are returned by the vector database. Then, the LLM is given the context of the conversation, the chunks of text that correspond to the vectors returned by the database, and it is asked to ground its answer by looking at those chunks.

Prepare your documents

For this new example, you will ask questions about a fictitious car model from an also fictitious car maker: the Cymbal Starlight car! The idea is that a document about a fictitious car shouldn't be part of the knowledge of the model. So if Gemini is able to answer questions correctly about this car, then it means that the RAG approach works: it's able to search through your document.

Implement the chatbot

Let's explore how to build the 2-phase approach: first with the document ingestion, and then the query time (also called "retrieval phase") when users ask questions about the document.

In this example, both phases are implemented in the same class. Normally, you'd have one application that takes care of the ingestion, and another application that offers the chatbot interface to your users.

Also, in this example we will use an in-memory vector database. In a real production scenario, the ingestion and the querying phases would be separated in two distinct applications, and the vectors are persisted in a standalone database.

Document ingestion

The very first step of the document ingestion phase is to locate the PDF file about our fictitious car, and prepare a PdfParser to read it:

URL url = new URI("https://raw.githubusercontent.com/meteatamel/genai-beyond-basics/main/samples/grounding/vertexai-search/cymbal-starlight-2024.pdf").toURL();
ApachePdfBoxDocumentParser pdfParser = new ApachePdfBoxDocumentParser();
Document document = pdfParser.parse(url.openStream());

Instead of creating the usual chat language model first, you create an instance of an embedding model. This is a particular model whose role is to create vector representations of text pieces (words, sentences or even paragraphs). It returns vectors of floating point numbers, rather than returning text responses.

VertexAiEmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder()
    .endpoint(System.getenv("LOCATION") + "-aiplatform.googleapis.com:443")
    .project(System.getenv("PROJECT_ID"))
    .location(System.getenv("LOCATION"))
    .publisher("google")
    .modelName("text-embedding-005")
    .maxRetries(3)
    .build();

Next, you will need a few classes to collaborate together to:

Load and split the PDF document in chunks.
Create vector embeddings for all of these chunks.

InMemoryEmbeddingStore<TextSegment> embeddingStore = 
    new InMemoryEmbeddingStore<>();

EmbeddingStoreIngestor storeIngestor = EmbeddingStoreIngestor.builder()
    .documentSplitter(DocumentSplitters.recursive(500, 100))
    .embeddingModel(embeddingModel)
    .embeddingStore(embeddingStore)
    .build();
storeIngestor.ingest(document);

An instance of InMemoryEmbeddingStore, an in-memory vector database, is created to store the vector embeddings.

The document is split in chunks thanks to the DocumentSplitters class. It is going to split the text of the PDF file into snippets of 500 characters, with an overlap of 100 characters (with the following chunk, to avoid cutting words or sentences, in bits and pieces).

The store ingestor links the document splitter, the embedding model to calculate the vectors, and the in-memory vector database. Then, the ingest() method will take care of doing the ingestion.

Now, the first phase is over, the document has been transformed into text chunks with their associated vector embeddings, and stored in the vector database.

Asking questions

It's time to get ready to ask questions! Create a chat model to start the conversation:

ChatLanguageModel model = VertexAiGeminiChatModel.builder()
        .project(System.getenv("PROJECT_ID"))
        .location(System.getenv("LOCATION"))
        .modelName("gemini-1.5-flash-002")
        .maxOutputTokens(1000)
        .build();

You also need a retriever class to link the vector database (in the embeddingStore variable) with the embedding model. Its job is to query the vector database by computing a vector embedding for the user's query, to find similar vectors in the database:

EmbeddingStoreContentRetriever retriever =
    new EmbeddingStoreContentRetriever(embeddingStore, embeddingModel);

Create an interface that represents a car expert assistant, that's an interface that the AiServices class will implement for you to interact with the model:

interface CarExpert {
    Result<String> ask(String question);
}

The CarExpert interface returns a string response wrapped in LangChain4j's Result class. Why use this wrapper? Because not only it will give you the answer, but it will also let you examine the chunks from the database that have been returned by the content retriever. That way, you can display the sources of the document(s) that are used to ground the final answer to the user.

At this point, you can configure a new AI service:

CarExpert expert = AiServices.builder(CarExpert.class)
    .chatLanguageModel(model)
    .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
    .contentRetriever(retriever)
    .build();

This service binds together:

The chat language model that you configured earlier.
A chat memory to keep track of the conversation.
The retriever compares a vector embedding query to the vectors in the database.

.retrievalAugmentor(DefaultRetrievalAugmentor.builder()
    .contentInjector(DefaultContentInjector.builder()
        .promptTemplate(PromptTemplate.from("""
            You are an expert in car automotive, and you answer concisely.

            Here is the question: {{userMessage}}

            Answer using the following information:
            {{contents}}
the following information:
            {{contents}}
            """))
        .build())
    .contentRetriever(retriever)
    .build())

You're finally ready to ask your questions!

List.of(
    "What is the cargo capacity of Cymbal Starlight?",
    "What's the emergency roadside assistance phone number?",
    "Are there some special kits available on that car?"
).forEach(query -> {
    Result<String> response = expert.ask(query);
    System.out.printf("%n=== %s === %n%n %s %n%n", query, response.content());
    System.out.println("SOURCE: " + response.sources().getFirst().textSegment().text());
});

The full source code is in RAG.java in app/src/main/java/gemini/workshop directory.

Run the sample:

./gradlew -q run -DjavaMainClass=gemini.workshop.RAG

In the output, you should see answers to your questions:

=== What is the cargo capacity of Cymbal Starlight? ===

The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet.

SOURCE: Cargo
The Cymbal Starlight 2024 has a cargo capacity of 13.5 cubic feet. The cargo area is located in the trunk of
the vehicle.
To access the cargo area, open the trunk lid using the trunk release lever located in the driver's footwell.
When loading cargo into the trunk, be sure to distribute the weight evenly. Do not overload the trunk, as this
could affect the vehicle's handling and stability.
Luggage

=== What's the emergency roadside assistance phone number? ===

The emergency roadside assistance phone number is 1-800-555-1212.

SOURCE: Chapter 18: Emergencies
Roadside Assistance
If you experience a roadside emergency, such as a flat tire or a dead battery, you can call roadside
assistance for help. Roadside assistance is available 24 hours a day, 7 days a week.
To call roadside assistance, dial the following number:
1-800-555-1212
When you call roadside assistance, be prepared to provide the following information:
Your name and contact information
Your vehicle's make, model, and year
Your vehicle's location

=== Are there some special kits available on that car? ===

Yes, the Cymbal Starlight comes with a tire repair kit.

SOURCE: Lane keeping assist: This feature helps to keep you in your lane by gently steering the vehicle back
into the lane if you start to drift.
Adaptive cruise control: This feature automatically adjusts your speed to maintain a safe following
distance from the vehicle in front of you.
Forward collision warning: This feature warns you if you are approaching another vehicle too
quickly.
Automatic emergency braking: This feature can automatically apply the brakes to avoid a collision.

11. Function calling

There are situations where you would like an LLM to have access to external systems, like a remote web API that retrieves information or have an action, or services that perform some kind of computation. For example:

Remote web APIs:

Track and update customer orders.
Find or create a ticket in an issue tracker.
Fetch real time data like stock quotes or IoT sensor measurements.
Send an email.

Computation tools:

A calculator for more advanced math problems.
Code interpretation for running code when LLMs need reasoning logic.
Convert natural language requests into SQL queries so that an LLM can query a database.

Function calling (sometimes called tools, or tool use) is the ability for the model to request one or more function calls to be made on its behalf, so it can properly answer a user's prompt with fresher data.

Given a particular prompt from a user, and the knowledge of existing functions that can be relevant to that context, an LLM can reply with a function call request. The application integrating the LLM can then call the function on its behalf, and then reply back to the LLM with a response, and the LLM then interprets back by replying with a textual answer.

Four steps of function calling

Let's have a look at an example of function calling: getting information about the weather forecast.

If you ask Gemini or any other LLM about the weather in Paris, they would reply by saying that it has no information about the current weather forecast. If you want the LLM to have real time access to the weather data, you need to define some functions it can request to be used.

Take a look at the following diagram:

1️⃣ First, a user asks about the weather in Paris. The chatbot app (using LangChain4j) knows there are one or more functions that are at its disposal to help the LLM fulfill the query. The chatbot both sends the initial prompt, as well as the list of functions that can be called. Here, a function called getWeather() which takes a string parameter for the location.

As the LLM doesn't know about weather forecasts, instead of replying via text, it sends back a function execution request. The chatbot must call the getWeather() function with "Paris" as location parameter.

2️⃣ The chatbot invokes that function on behalf of the LLM, retrieves the function response. Here, we imagine that the response is {"forecast": "sunny"}.

3️⃣ The chatbot app sends the JSON response back to the LLM.

4️⃣ The LLM looks at the JSON response, interprets that information, and eventually replies back with the text that the weather is sunny in Paris.

Each step as code

First, you'll configure the Gemini model as usual:

ChatLanguageModel model = VertexAiGeminiChatModel.builder()
    .project(System.getenv("PROJECT_ID"))
    .location(System.getenv("LOCATION"))
    .modelName("gemini-1.5-flash-002")
    .maxOutputTokens(100)
    .build();

You define a tool specification that describes the function that can be called:

ToolSpecification weatherToolSpec = ToolSpecification.builder()
    .name("getWeather")
    .description("Get the weather forecast for a given location or city")
    .parameters(JsonObjectSchema.builder()
        .addStringProperty(
            "location", 
            "the location or city to get the weather forecast for")
        .build())
    .build();

The name of the function is defined, as well as the name and type of the parameter, but notice that both the function and the parameters are given descriptions. Descriptions are very important and help the LLM really understand what a function can do, and thus judge whether this function needs to be called in the context of the conversation.

Let's start step #1, by sending the initial question about the weather in Paris:

List<ChatMessage> allMessages = new ArrayList<>();

// 1) Ask the question about the weather
UserMessage weatherQuestion = UserMessage.from("What is the weather in Paris?");
allMessages.add(weatherQuestion);

In step #2, we pass the tool we'd like the model to use, and the model replies with a too execution request:

// 2) The model replies with a function call request
Response<AiMessage> messageResponse = model.generate(allMessages, weatherToolSpec);
ToolExecutionRequest toolExecutionRequest = messageResponse.content().toolExecutionRequests().getFirst();
System.out.println("Tool execution request: " + toolExecutionRequest);
allMessages.add(messageResponse.content());

Step #3. At this point, we know what function the LLM would like us to call. In the code, we're not making a real call to an external API, we just return an hypothetical weather forecast directly:

// 3) We send back the result of the function call
ToolExecutionResultMessage toolExecResMsg = ToolExecutionResultMessage.from(toolExecutionRequest,
    "{\"location\":\"Paris\",\"forecast\":\"sunny\", \"temperature\": 20}");
allMessages.add(toolExecResMsg);

And in step #4, the LLM learns about the function execution result, and can then synthesize a textual response:

// 4) The model answers with a sentence describing the weather
Response<AiMessage> weatherResponse = model.generate(allMessages);
System.out.println("Answer: " + weatherResponse.content().text());

The output is:

Tool execution request: ToolExecutionRequest { id = null, name = "getWeatherForecast", arguments = "{"location":"Paris"}" }
Answer:  The weather in Paris is sunny with a temperature of 20 degrees Celsius.

You can see in the output above the tool execution request, as well as the answer.

The full source code is in FunctionCalling.java in app/src/main/java/gemini/workshop directory:

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.FunctionCalling

You should see an output similar to the following:

Tool execution request: ToolExecutionRequest { id = null, name = "getWeatherForecast", arguments = "{"location":"Paris"}" }
Answer:  The weather in Paris is sunny with a temperature of 20 degrees Celsius.

12. LangChain4j handles function calling

In the previous step, you saw how the normal text question/answer and function request/response interactions are interleaved, and in between, you provided the requested function response directly, without calling a real function.

However, LangChain4j also offers a higher-level abstraction that can handle the function calls transparently for you, while handling the conversation as usual.

Single function call

Let's have a look at FunctionCallingAssistant.java, piece by piece.

First, you create a record that will represent the function's response data structure:

record WeatherForecast(String location, String forecast, int temperature) {}

The response contains information about the location, the forecast, and the temperature.

Then you create a class that contains the actual function you want to make available to the model:

static class WeatherForecastService {
    @Tool("Get the weather forecast for a location")
    WeatherForecast getForecast(@P("Location to get the forecast for") String location) {
        if (location.equals("Paris")) {
            return new WeatherForecast("Paris", "Sunny", 20);
        } else if (location.equals("London")) {
            return new WeatherForecast("London", "Rainy", 15);
        } else {
            return new WeatherForecast("Unknown", "Unknown", 0);
        }
    }
}

Note that this class contains a single function, but it is annotated with the @Tool annotation which corresponds to the description of the function the model can request to call.

The parameters of the function (a single one here) is also annotated, but with this short @P annotation, which also gives a description of the parameter. You could add as many functions as you wanted, to make them available to the model, for more complex scenarios.

In this class, you return some canned responses, but if you wanted to call a real external weather forecast service, this is in the body of that method that you would make the call to that service.

As we saw when you created a ToolSpecification in the previous approach, it's important to document what a function does, and describe what the parameters correspond to. This helps the model understand how and when this function can be used.

Next, LangChain4j lets you provide an interface that corresponds to the contract you want to use to interact with the model. Here, it's a simple interface that takes in a string representing the user message, and returns a string corresponding to the model's response:

interface WeatherAssistant {
    String chat(String userMessage);
}

It is also possible to use more complex signatures that involve LangChain4j's UserMessage (for a user message) or AiMessage (for a model response), or even a TokenStream, if you want to handle more advanced situations, as those more complicated objects also contain extra information such as the number of tokens consumed, etc. But for simplicity sake, we'll just take string in input, and string in output.

Let's finish with the main() method that ties all the pieces together:

public static void main(String[] args) {
    ChatLanguageModel model = VertexAiGeminiChatModel.builder()
        .project(System.getenv("PROJECT_ID"))
        .location(System.getenv("LOCATION"))
        .modelName("gemini-1.5-pro-002")
        .build();

    WeatherForecastService weatherForecastService = new WeatherForecastService();

    WeatherAssistant assistant = AiServices.builder(WeatherAssistant.class)
        .chatLanguageModel(model)
        .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
        .tools(weatherForecastService)
        .build();

    System.out.println(assistant.chat("What is the weather in Paris?"));
}

As usual, you configure the Gemini chat model. Then you instantiate your weather forecast service that contains the "function" that the model will request us to call.

Now, you use the AiServices class again to bind the chat model, the chat memory, and the tool (ie. the weather forecast service with its function). AiServices returns an object that implements your WeatherAssistant interface you defined. The only thing left is to call the chat() method of that assistant. When invoking it, you will only see the text responses, but the function call requests and the function call responses will not be visible from the developer, and those requests will be handled automatically and transparently. If Gemini thinks a function should be called, it'll reply with the function call request, and LangChain4j will take care of calling the local function on your behalf.

Run the sample:

./gradlew run -q -DjavaMainClass=gemini.workshop.FunctionCallingAssistant

You should see an output similar to the following:

OK. The weather in Paris is sunny with a temperature of 20 degrees.

This was an example of a single function.

Multiple function calls

You can also have multiple functions and let LangChain4j handle multiple function calls on your behalf. Take a look at MultiFunctionCallingAssistant.java for a multiple function example.

It has a function to convert currencies:

@Tool("Convert amounts between two currencies")
double convertCurrency(
    @P("Currency to convert from") String fromCurrency,
    @P("Currency to convert to") String toCurrency,
    @P("Amount to convert") double amount) {

    double result = amount;

    if (fromCurrency.equals("USD") && toCurrency.equals("EUR")) {
        result = amount * 0.93;
    } else if (fromCurrency.equals("USD") && toCurrency.equals("GBP")) {
        result = amount * 0.79;
    }

    System.out.println(
        "convertCurrency(fromCurrency = " + fromCurrency +
            ", toCurrency = " + toCurrency +
            ", amount = " + amount + ") == " + result);

    return result;
}

Another function to get the value of a stock:

@Tool("Get the current value of a stock in US dollars")
double getStockPrice(@P("Stock symbol") String symbol) {
    double result = 170.0 + 10 * new Random().nextDouble();

    System.out.println("getStockPrice(symbol = " + symbol + ") == " + result);

    return result;
}

Another function to apply a percentage to a given amount:

@Tool("Apply a percentage to a given amount")
double applyPercentage(@P("Initial amount") double amount, @P("Percentage between 0-100 to apply") double percentage) {
    double result = amount * (percentage / 100);

    System.out.println("applyPercentage(amount = " + amount + ", percentage = " + percentage + ") == " + result);

    return result;
}

You can then combine all these functions and a MultiTools class and ask questions like "What is 10% of AAPL stock price converted from USD to EUR?""

public static void main(String[] args) {
    ChatLanguageModel model = VertexAiGeminiChatModel.builder()
        .project(System.getenv("PROJECT_ID"))
        .location(System.getenv("LOCATION"))
        .modelName("gemini-1.5-flash-002")
        .maxOutputTokens(100)
        .build();

    MultiTools multiTools = new MultiTools();

    MultiToolsAssistant assistant = AiServices.builder(MultiToolsAssistant.class)
        .chatLanguageModel(model)
        .chatMemory(withMaxMessages(10))
        .tools(multiTools)
        .build();

    System.out.println(assistant.chat(
        "What is 10% of the AAPL stock price converted from USD to EUR?"));
}

Run it as follows:

./gradlew run -q -DjavaMainClass=gemini.workshop.MultiFunctionCallingAssistant

And you should see the multiple functions called:

getStockPrice(symbol = AAPL) == 172.8022224055534
convertCurrency(fromCurrency = USD, toCurrency = EUR, amount = 172.8022224055534) == 160.70606683716468
applyPercentage(amount = 160.70606683716468, percentage = 10.0) == 16.07060668371647
10% of the AAPL stock price converted from USD to EUR is 16.07060668371647 EUR.

Towards Agents

Function calling is a great extension mechanism for large language models like Gemini. It enables us to build more complex systems often called "agents" or "AI assistants". These agents can interact with the external world via external APIs and with services that can have side effects on the external environment (like sending emails, creating tickets, etc.)

When creating such powerful agents, you should do so responsibly. You should consider a human-in-the-loop before making automatic actions. It's important to keep safety in mind when designing LLM-powered agents that interact with the external world.

13. Running Gemma with Ollama and TestContainers

So far, we've been using Gemini but there's also Gemma, its little sister model.

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma is available in two variations Gemma1 and Gemma2 each with various sizes. Gemma1 is available in two sizes: 2B and 7B. Gemma2 is available in two sizes: 9B and 27B. Their weights are freely available, and their small sizes means you can run it on your own, even on your laptop or in Cloud Shell.

How do you run Gemma?

There are many ways to run Gemma: in the cloud, via Vertex AI with a click of a button, or GKE with some GPUs, but you can also run it locally.

One good option to run Gemma locally is with Ollama, a tool that lets you run small models, like Llama 2, Mistral, and many others on your local machine. It's similar to Docker but for LLMs.

Install Ollama following the instruction for your Operating System.

If you are using a Linux environment you will need to enable Ollama first after installing it.

ollama serve > /dev/null 2>&1 &

Once installed locally, you can run commands to pull a model:

ollama pull gemma:2b

Wait for the model to be pulled. This can take some time.

Run the model:

ollama run gemma:2b

Now, you can interact with the model:

>>> Hello!
Hello! It's nice to hear from you. What can I do for you today?

To exit the prompt press Ctrl+D

Running Gemma in Ollama on TestContainers

Instead of having to install and run Ollama locally, you can use Ollama within a container, handled by TestContainers.

TestContainers is not only useful for testing, but you can also use it for executing containers. There's even a specific OllamaContainer you can take advantage of!

Here's the whole picture:

Implementation

Let's have a look at GemmaWithOllamaContainer.java, piece by piece.

First, you need to create a derived Ollama container that pulls in the Gemma model. This image either already exists from a previous run or it will be created. If the image already exists, you're just going to tell TestContainers that you want to substitute the default Ollama image with your Gemma-powered variant:

private static final String TC_OLLAMA_GEMMA_2_B = "tc-ollama-gemma-2b";

// Creating an Ollama container with Gemma 2B if it doesn't exist.
private static OllamaContainer createGemmaOllamaContainer() throws IOException, InterruptedException {

    // Check if the custom Gemma Ollama image exists already
    List<Image> listImagesCmd = DockerClientFactory.lazyClient()
        .listImagesCmd()
        .withImageNameFilter(TC_OLLAMA_GEMMA_2_B)
        .exec();

    if (listImagesCmd.isEmpty()) {
        System.out.println("Creating a new Ollama container with Gemma 2B image...");
        OllamaContainer ollama = new OllamaContainer("ollama/ollama:0.1.26");
        ollama.start();
        ollama.execInContainer("ollama", "pull", "gemma:2b");
        ollama.commitToImage(TC_OLLAMA_GEMMA_2_B);
        return ollama;
    } else {
        System.out.println("Using existing Ollama container with Gemma 2B image...");
        // Substitute the default Ollama image with our Gemma variant
        return new OllamaContainer(
            DockerImageName.parse(TC_OLLAMA_GEMMA_2_B)
                .asCompatibleSubstituteFor("ollama/ollama"));
    }
}

Next, you create and start an Ollama test container and then create an Ollama chat model, by pointing at the address and port of the container with the model you want to use. Finally, you just invoke model.generate(yourPrompt) as usual:

public static void main(String[] args) throws IOException, InterruptedException {
    OllamaContainer ollama = createGemmaOllamaContainer();
    ollama.start();

    ChatLanguageModel model = OllamaChatModel.builder()
        .baseUrl(String.format("http://%s:%d", ollama.getHost(), ollama.getFirstMappedPort()))
        .modelName("gemma:2b")
        .build();

    String response = model.generate("Why is the sky blue?");

    System.out.println(response);
}

Run it as follows:

./gradlew run -q -DjavaMainClass=gemini.workshop.GemmaWithOllamaContainer

The first run will take a while to create and run the container but once done, you should see Gemma responding:

INFO: Container ollama/ollama:0.1.26 started in PT2.827064047S
The sky appears blue due to Rayleigh scattering. Rayleigh scattering is a phenomenon that occurs when sunlight interacts with molecules in the Earth's atmosphere.

* **Scattering particles:** The main scattering particles in the atmosphere are molecules of nitrogen (N2) and oxygen (O2).
* **Wavelength of light:** Blue light has a shorter wavelength than other colors of light, such as red and yellow.
* **Scattering process:** When blue light interacts with these molecules, it is scattered in all directions.
* **Human eyes:** Our eyes are more sensitive to blue light than other colors, so we perceive the sky as blue.

This scattering process results in a blue appearance for the sky, even though the sun is actually emitting light of all colors.

In addition to Rayleigh scattering, other atmospheric factors can also influence the color of the sky, such as dust particles, aerosols, and clouds.

You have Gemma running in Cloud Shell!

14. Congratulations

Congratulations, you've successfully built your first Generative AI chat application in Java using LangChain4j and the Gemini API! You discovered along the way that multimodal large language models are pretty powerful and capable of handling various tasks like question/answering, even on your own documentation, data extraction, interacting with external APIs, and more.

What's next?

It's your turn to enhance your applications with powerful LLM integrations!

Gemini in Java with Vertex AI and LangChain4j

1. Introduction

What is Generative AI?

How does Generative AI work?

What are common Generative AI applications?

What Generative AI offerings does Google Cloud have?

What is Gemini?

How can you interact with Gemini on Vertex AI from Java?

What is the LangChain4j framework?

What you'll learn

What you'll need

2. Setup and requirements

Self-paced environment setup

Start Cloud Shell

Activate Cloud Shell

3. Preparing your development environment

Enable Vertex AI APIs

Clone the Github repository

Open and setup Cloud Editor

Setup environment variables

4. First call to the Gemini model

Streaming response

Extra configuration

5. Chat with Gemini

6. Multimodality with Gemini

7. Extract structured information from unstructured text

8. Structure prompts with prompt templates

9. Text classification with few-shot prompting

10. Retrieval Augmented Generation

Prepare your documents

Implement the chatbot

Document ingestion

Asking questions

11. Function calling

Four steps of function calling

Each step as code

12. LangChain4j handles function calling

Single function call

Multiple function calls

Towards Agents

13. Running Gemma with Ollama and TestContainers

How do you run Gemma?

Running Gemma in Ollama on TestContainers

Implementation

14. Congratulations

What's next?

Further reading

Reference docs