1. Introduction
This codelab focuses on the Gemini Large Language Model (LLM), hosted on Vertex AI on Google Cloud. Vertex AI is a platform that encompasses all the machine learning products, services, and models on Google Cloud.
You will use Java to interact with the Gemini API using the LangChain4j framework. You'll go through concrete examples to take advantage of the LLM for question answering, idea generation, entity and structured content extraction, retrieval augmented generation, and function calling.
What is Generative AI?
Generative AI refers to the use of artificial intelligence to create new content, such as text, images, music, audio, and videos.
Generative AI is powered by large language models (LLMs) that can multi-task and perform out-of-the-box tasks such as summarization, Q&A, classification, and more. With minimal training, foundational models can be adapted for targeted use cases with very little example data.
How does Generative AI work?
Generative AI works by using a Machine Learning (ML) model to learn the patterns and relationships in a dataset of human-created content. It then uses the learned patterns to generate new content.
The most common way to train a generative AI model is to use supervised learning. The model is given a set of human-created content and corresponding labels. It then learns to generate content that is similar to the human-created content.
What are common Generative AI applications?
Generative AI can be used to:
- Improve customer interactions through enhanced chat and search experiences.
- Explore vast amounts of unstructured data through conversational interfaces and summarizations.
- Assist with repetitive tasks like replying to requests for proposals, localizing marketing content in different languages and checking customer contracts for compliance, and more.
What Generative AI offerings does Google Cloud have?
With Vertex AI, you can interact with, customize, and embed foundation models into your applications with little to no ML expertise. You can access foundation models on Model Garden, tune models via a simple UI on Vertex AI Studio, or use models in a data science notebook.
Vertex AI Search and Conversation offers developers the fastest way to build generative AI powered search engines and chatbots.
Powered by Gemini, Gemini for Google Cloud is an AI-powered collaborator available across Google Cloud and IDEs to help you get more done, faster. Gemini Code Assist provides code completion, code generation, code explanations, and lets you chat with it to ask technical questions.
What is Gemini?
Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. Multimodal means it can process and generate different kinds of content such as text, code, images, and audio.
Gemini comes in different variations and sizes:
- Gemini Ultra: The largest, most capable version for complex tasks.
- Gemini Flash: Fastest and most cost-effective, optimized for high-volume tasks.
- Gemini Pro: Mid-sized, optimized for scaling across various tasks.
- Gemini Nano: The most efficient, designed for on-device tasks.
Key Features:
- Multimodality: Gemini's ability to understand and handle multiple information formats is a significant step beyond traditional text-only language models.
- Performance: Gemini Ultra outperforms the current state-of-the-art on many benchmarks and was the first model to surpass human experts on the challenging MMLU (Massive Multitask Language Understanding) benchmark.
- Flexibility: The different Gemini sizes make it adaptable for various use cases, from large-scale research to deployment on mobile devices.
How can you interact with Gemini on Vertex AI from Java?
You have two options:
- The official Vertex AI Java API for Gemini library.
- LangChain4j framework.
In this codelab, you will use the LangChain4j framework.
What is the LangChain4j framework?
The LangChain4j framework is an open source library for integrating LLMs in your Java applications, by orchestrating various components, such as the LLM itself, but also other tools like vector databases (for semantic searches), document loaders and splitters (to analyze documents and learn from them), output parsers, and more.
The project was inspired by the LangChain Python project but with the goal to serve Java developers.
What you'll learn
- How to setup a Java project to use Gemini and LangChain4j
- How to send your first prompt to Gemini programmatically
- How to stream responses from Gemini
- How to create a conversation between a user and Gemini
- How to use Gemini in a multimodal context by sending both text and images
- How to extract useful structured information from unstructured content
- How to manipulate prompt templates
- How to do text classification such as sentiment analysis
- How to chat with your own documents (Retrieval Augmented Generation)
- How to extend your chatbots with function calling
- How to use Gemma locally with Ollama and TestContainers
What you'll need
- Knowledge of the Java programming language
- A Google Cloud project
- A browser, such as Chrome or Firefox
2. Setup and requirements
Self-paced environment setup
- Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.
- The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
- The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as
PROJECT_ID
). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project. - For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.
- Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.
Start Cloud Shell
While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Cloud Shell, a command line environment running in the Cloud.
Activate Cloud Shell
- From the Cloud Console, click Activate Cloud Shell .
If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.
It should only take a few moments to provision and connect to Cloud Shell.
This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.
Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.
- Run the following command in Cloud Shell to confirm that you are authenticated:
gcloud auth list
Command output
Credentialed Accounts ACTIVE ACCOUNT * <my_account>@<my_domain.com> To set the active account, run: $ gcloud config set account `ACCOUNT`
- Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:
gcloud config list project
Command output
[core] project = <PROJECT_ID>
If it is not, you can set it with this command:
gcloud config set project <PROJECT_ID>
Command output
Updated property [core/project].
3. Preparing your development environment
In this codelab, you're going to use the Cloud Shell terminal and Cloud Shell editor to develop your Java programs.
Enable Vertex AI APIs
In the Google Cloud console, make sure your project name is displayed at the top of your Google Cloud console. If it's not, click Select a project to open the Project Selector, and select your intended project.
You can enable Vertex AI APIs either from the Vertex AI section of Google Cloud console or from Cloud Shell terminal.
To enable from the Google Cloud console, first, go to the Vertex AI section of Google Cloud console menu:
Click Enable All Recommended APIs in the Vertex AI dashboard.
This will enable several APIs, but the most important one for the codelab is the aiplatform.googleapis.com
.
Alternatively, you can also enable this API from the Cloud Shell terminal with the following command:
gcloud services enable aiplatform.googleapis.com
Clone the Github repository
In the Cloud Shell terminal, clone the repository for this codelab:
git clone https://github.com/glaforge/gemini-workshop-for-java-developers.git
To check that the project is ready to run, you can try running the "Hello World" program.
Make sure you're at the top level folder:
cd gemini-workshop-for-java-developers/
Create the Gradle wrapper:
gradle wrapper
Run with gradlew
:
./gradlew run
You should see the following output:
.. > Task :app:run Hello World!
Open and setup Cloud Editor
Open the code with the Cloud Code Editor from Cloud Shell:
In the Cloud Code Editor, open the codelab source folder by selecting File
-> Open Folder
and point to the codelab source folder (eg. /home/username/gemini-workshop-for-java-developers/
).
Install Gradle for Java
To get the cloud code editor working properly with Gradle, install the Gradle for Java extension.
First, go to the Java Projects section and press the plus sign:
Select Gradle for Java
:
Select the Install Pre-Release
version:
Once installed, you should see the Disable
and the Uninstall
buttons:
Finally, clean the workspace to have the new settings applied:
This will ask you to reload and delete the workshop. Go ahead and choose Reload and delete
:
If you open one of the files, for example App.java, you should now see the editor working correctly with syntax highlighting:
You're now ready to run some samples against Gemini!
Setup environment variables
Open a new terminal in Cloud Code Editor by selecting Terminal
-> New Terminal
. Set up two environment variables required for running the code examples:
- PROJECT_ID — Your Google Cloud project ID
- LOCATION — The region where the Gemini model is deployed
Export the variables as follows:
export PROJECT_ID=$(gcloud config get-value project) export LOCATION=us-central1
4. First call to the Gemini model
Now that the project is properly set up, it is time to call the Gemini API.
Take a look at QA.java
in the app/src/main/java/gemini/workshop
directory:
package gemini.workshop;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.model.chat.ChatLanguageModel;
public class QA {
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.build();
System.out.println(model.generate("Why is the sky blue?"));
}
}
In this first example, you need to import the VertexAiGeminiChatModel
class, which implements the ChatModel
interface.
In the main
method, you configure the chat language model by using the builder for the VertexAiGeminiChatModel
and specify:
- Project
- Location
- Model name (
gemini-1.5-flash-001
).
Now that the language model is ready, you can call the generate()
method and pass your prompt, your question or instructions to send to the LLM. Here, you ask a simple question about what makes the sky blue.
Feel free to change this prompt to try different questions or tasks.
Run the sample at the source code root folder:
./gradlew run -q -DjavaMainClass=gemini.workshop.QA
You should see an output similar to this one:
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the atmosphere, it is made up of a mixture of different wavelengths of light, each with a different color. The different wavelengths of light interact with the molecules and particles in the atmosphere in different ways. The shorter wavelengths of light, such as those corresponding to blue and violet light, are more likely to be scattered in all directions by these particles than the longer wavelengths of light, such as those corresponding to red and orange light. This is because the shorter wavelengths of light have a smaller wavelength and are able to bend around the particles more easily. As a result of Rayleigh scattering, the blue light from the sun is scattered in all directions, and it is this scattered blue light that we see when we look up at the sky. The blue light from the sun is not actually scattered in a single direction, so the color of the sky can vary depending on the position of the sun in the sky and the amount of dust and water droplets in the atmosphere.
Congratulations, you made your first call to Gemini!
Streaming response
Did you notice that the response was given in one go, after a few seconds? It's also possible to get the response progressively, thanks to the streaming response variant. The streaming response, the model returns the response piece by piece, as it becomes available.
In this codelab, we'll stick with the non-streaming response but let's have a look at the streaming response to see how it can be done.
In StreamQA.java
in the app/src/main/java/gemini/workshop
directory you can see the streaming response in action:
package gemini.workshop;
import dev.langchain4j.model.chat.StreamingChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiStreamingChatModel;
import dev.langchain4j.model.StreamingResponseHandler;
public class StreamQA {
public static void main(String[] args) {
StreamingChatLanguageModel model = VertexAiGeminiStreamingChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.build();
model.generate("Why is the sky blue?", new StreamingResponseHandler<>() {
@Override
public void onNext(String text) {
System.out.println(text);
}
@Override
public void onError(Throwable error) {
error.printStackTrace();
}
});
}
}
This time, we import the streaming class variants VertexAiGeminiStreamingChatModel
which implements the StreamingChatLanguageModel
interface. You'll also need a StreamingResponseHandler
.
This time, the signature of the generate()
method is a little bit different. Instead of returning a string, the return type is void. In addition to the prompt, you have to pass a streaming response handler. Here, you implement the interface by creating an anonymous inner class, with two methods onNext(String text)
and onError(Throwable error)
. The former is called each time a new piece of the response is available, while the latter is called only if ever an error occurs.
Run:
./gradlew run -q -DjavaMainClass=gemini.workshop.StreamQA
You will get a similar answer to the previous class, but this time, you will notice that the answer appears progressively in your shell, rather than waiting for the display of the full answer.
Extra configuration
For configuration, we only defined the project, the location, and the model name, but there are other parameters you can specify for the model:
temperature(Float temp)
— to define how creative you want the response to be (0 being low creative and often more factual, while 1 is for more creative outputs)topP(Float topP)
— to select the possible words whose total probability add up to that floating point number (between 0 and 1)topK(Integer topK)
— to randomly select a word out of a maximum number of probable words for the text completion (from 1 to 40)maxOutputTokens(Integer max)
— to specify the maximum length of the answer given by the model (generally, 4 tokens represent roughly 3 words)maxRetries(Integer retries)
— in case you're running past the request per time quota, or the platform is facing some technical issue, you can have the model retry the call 3 times
So far, you asked a single question to Gemini, but you can also have a multi-turn conversation. That's what you'll explore in the next section.
5. Chat with Gemini
In the previous step, you asked a single question. It's now time to have a real conversation between a user and the LLM. Each question and answer can build upon the previous ones to form a real discussion.
Take a look at Conversation.java
in the app/src/main/java/gemini/workshop
folder:
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.service.AiServices;
import java.util.List;
public class Conversation {
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.build();
MessageWindowChatMemory chatMemory = MessageWindowChatMemory.builder()
.maxMessages(20)
.build();
interface ConversationService {
String chat(String message);
}
ConversationService conversation =
AiServices.builder(ConversationService.class)
.chatLanguageModel(model)
.chatMemory(chatMemory)
.build();
List.of(
"Hello!",
"What is the country where the Eiffel tower is situated?",
"How many inhabitants are there in that country?"
).forEach( message -> {
System.out.println("\nUser: " + message);
System.out.println("Gemini: " + conversation.chat(message));
});
}
}
A couple new interesting imports in this class:
MessageWindowChatMemory
— a class that will help handle the multi-turn aspect of the conversation, and keep in local memory the previous questions and answersAiServices
— a class that will tie together the chat model and the chat memory
In the main method, you're going to set up the model, the chat memory, and the AI service. The model is configured as usual with the project, location, and model name information.
For the chat memory, we use MessageWindowChatMemory
‘s builder to create a memory that keeps the last 20 messages exchanged. It's a sliding window over the conversation whose context is kept locally in our Java class client.
You then create the AI service
that binds the chat model with the chat memory.
Notice how the AI service makes use of a custom ConversationService
interface we've defined, that LangChain4j implements, and that takes a String
query and returns a String
response.
Now, it's time to have a conversation with Gemini. First, a simple greeting is sent, then a first question about the Eiffel tower to know in which country it can be found. Notice that the last sentence is related to the answer of the first question, as you wonder how many inhabitants are in the country where the Eiffel tower is situated, without explicitly mentioning the country that was given in the previous answer. It shows that past questions and answers are sent with every prompt.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.Conversation
You should see three answers similar to these ones:
User: Hello! Gemini: Hi there! How can I assist you today? User: What is the country where the Eiffel tower is situated? Gemini: France User: How many inhabitants are there in that country? Gemini: As of 2023, the population of France is estimated to be around 67.8 million.
You can ask single-turn questions or have multi-turn conversations with Gemini but so far, the input has been only text. What about images? Let's explore images in the next step.
6. Multimodality with Gemini
Gemini is a multimodal model. Not only does it accept text as input, but also it accepts images, or even videos as input. In this section, you'll see a use case for mixing text and images.
Do you think Gemini will recognise this cat?
Picture of a cat in the snow taken from Wikipediahttps://upload.wikimedia.org/wikipedia/commons/b/b6/Felis_catus-cat_on_snow.jpg
Take a look at Multimodal.java
in app/src/main/java/gemini/workshop
directory:
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.output.Response;
import dev.langchain4j.data.message.ImageContent;
import dev.langchain4j.data.message.TextContent;
public class Multimodal {
static final String CAT_IMAGE_URL =
"https://upload.wikimedia.org/wikipedia/" +
"commons/b/b6/Felis_catus-cat_on_snow.jpg";
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.build();
UserMessage userMessage = UserMessage.from(
ImageContent.from(CAT_IMAGE_URL),
TextContent.from("Describe the picture")
);
Response<AiMessage> response = model.generate(userMessage);
System.out.println(response.content().text());
}
}
In the imports, notice we distinguish between different kinds of messages and contents. A UserMessage
can contain both a TextContent
and an ImageContent
object. This is multimodality at play: mixing text and images. The model sends back a Response
which contains an AiMessage
.
You then retrieve the AiMessage
from the response via content()
, and then the text of the message thanks to text()
.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.Multimodal
The name of the picture certainly gave you a hint of what the picture contained but Gemini output is similar to the following:
A cat with brown fur is walking in the snow. The cat has a white patch of fur on its chest and white paws. The cat is looking at the camera.
Mixing images and text prompts opens up interesting use cases. You can create applications that can:
- Recognize text in pictures.
- Check if an image is safe to display.
- Create image captions.
- Search through a database of images with plain text descriptons.
In addition to extracting information from images, you can also extract information from unstructured text. That's what you're going to learn in the next section.
7. Extract structured information from unstructured text
There are many situations where important information is given in report documents, in emails, or other long form texts in an unstructured way. Ideally, you'd like to be able to extract the key details contained in the unstructured text, in the form of structured objects. Let's see how you can do that.
Let's say you want to extract the name and age of a person, given a biography or description of that person. You can instruct the LLM to extract JSON from unstructured text with a cleverly tweaked prompt (this is commonly called "prompt engineering").
Take a look at ExtractData.java
in app/src/main/java/gemini/workshop
:
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.UserMessage;
public class ExtractData {
static record Person(String name, int age) {}
interface PersonExtractor {
@UserMessage("""
Extract the name and age of the person described below.
Return a JSON document with a "name" and an "age" property, \
following this structure: {"name": "John Doe", "age": 34}
Return only JSON, without any markdown markup surrounding it.
Here is the document describing the person:
---
{{it}}
---
JSON:
""")
Person extractPerson(String text);
}
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.temperature(0f)
.topK(1)
.build();
PersonExtractor extractor = AiServices.create(PersonExtractor.class, model);
Person person = extractor.extractPerson("""
Anna is a 23 year old artist based in Brooklyn, New York. She was born and
raised in the suburbs of Chicago, where she developed a love for art at a
young age. She attended the School of the Art Institute of Chicago, where
she studied painting and drawing. After graduating, she moved to New York
City to pursue her art career. Anna's work is inspired by her personal
experiences and observations of the world around her. She often uses bright
colors and bold lines to create vibrant and energetic paintings. Her work
has been exhibited in galleries and museums in New York City and Chicago.
"""
);
System.out.println(person.name()); // Anna
System.out.println(person.age()); // 23
}
}
Let's have a look at the various steps in this file:
- A
Person
record is defined to represent the details describing a person ( name and age). - The
PersonExtractor
interface is defined with a method that given an unstructured text string, returns aPerson
instance. - The
extractPerson()
is annotated with a@UserMessage
annotation that associates a prompt with it. That's the prompt that the model will use to extract the information, and return the details in the form of a JSON document, that will be parsed for you, and unmarshalled into aPerson
instance.
Now let's look at the content of the main()
method:
- The chat model is instantiated. Notice that we use a very low
temperature
of zero, and atopK
of only one, to ensure a very deterministic answer. This also helps the model follow the instructions better. In particular, we don't want Gemini to wrap the JSON response with extra Markdown markup. - A
PersonExtractor
object is created thanks to LangChain4j'sAiServices
class. - Then, you can simply call
Person person = extractor.extractPerson(...)
to extract the details of the person from the unstructured text, and get back aPerson
instance with the name and age.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.ExtractData
You should see the following output:
Anna 23
Yes, this is Anna and they are 23!
With this AiServices
approach you operate with strongly typed objects. You are not interacting directly with the LLM. Instead, you are working with concrete classes, like the Person
record to represent the extracted personal information, and you have a PersonExtractor
object with an extractPerson()
method which returns a Person
instance. The notion of LLM is abstracted away, and as a Java developer, you are just manipulating normal classes and objects.
8. Structure prompts with prompt templates
When you interact with an LLM using a common set of instructions or questions, there's a part of that prompt that never changes, while other parts contain the data. For example, if you want to create recipes, you might use a prompt like "You're a talented chef, please create a recipe with the following ingredients: ...", and then you'd append the ingredients to the end of that text. That's what prompt templates are for — similar to interpolated strings in programming languages. A prompt template contains placeholders which you can replace with the right data for a particular call to the LLM.
More concretely, let's study TemplatePrompt.java
in the app/src/main/java/gemini/workshop
directory:
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.output.Response;
import java.util.HashMap;
import java.util.Map;
public class TemplatePrompt {
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(500)
.temperature(0.8f)
.topK(40)
.topP(0.95f)
.maxRetries(3)
.build();
PromptTemplate promptTemplate = PromptTemplate.from("""
You're a friendly chef with a lot of cooking experience.
Create a recipe for a {{dish}} with the following ingredients: \
{{ingredients}}, and give it a name.
"""
);
Map<String, Object> variables = new HashMap<>();
variables.put("dish", "dessert");
variables.put("ingredients", "strawberries, chocolate, and whipped cream");
Prompt prompt = promptTemplate.apply(variables);
Response<AiMessage> response = model.generate(prompt.toUserMessage());
System.out.println(response.content().text());
}
}
As usual, you configure VertexAiGeminiChatModel
model, with a high level of creativity with a high temperature and also high topP and topK values. Then you create a PromptTemplate
with its from()
static method, by passing the string of our prompt, and use the double curly-braces placeholder variables: {{dish}}
and {{ingredients}}
.
You create the final prompt by calling apply()
that takes a map of key/value pairs that represent the name of the placeholder and the string value to replace it with.
Lastly, you call the generate()
method of the Gemini model by creating a user message from that prompt, with the prompt.toUserMessage()
instruction.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.TemplatePrompt
You should see a generated output that looks similar to this one:
**Strawberry Shortcake** Ingredients: * 1 pint strawberries, hulled and sliced * 1/2 cup sugar * 1/4 cup cornstarch * 1/4 cup water * 1 tablespoon lemon juice * 1/2 cup heavy cream, whipped * 1/4 cup confectioners' sugar * 1/4 teaspoon vanilla extract * 6 graham cracker squares, crushed Instructions: 1. In a medium saucepan, combine the strawberries, sugar, cornstarch, water, and lemon juice. Bring to a boil over medium heat, stirring constantly. Reduce heat and simmer for 5 minutes, or until the sauce has thickened. 2. Remove from heat and let cool slightly. 3. In a large bowl, combine the whipped cream, confectioners' sugar, and vanilla extract. Beat until soft peaks form. 4. To assemble the shortcakes, place a graham cracker square on each of 6 dessert plates. Top with a scoop of whipped cream, then a spoonful of strawberry sauce. Repeat layers, ending with a graham cracker square. 5. Serve immediately. **Tips:** * For a more elegant presentation, you can use fresh strawberries instead of sliced strawberries. * If you don't have time to make your own whipped cream, you can use store-bought whipped cream.
Feel free to change the values of dish
and ingredients
in the map and tweak the temperature, topK
and tokP
and re-run the code. This will allow you to observe the effect of changing these parameters on the LLM.
Prompt templates are a good way to have reusable and parameterizable instructions for LLM calls. You can pass data and customize prompts for different values, provided by your users.
9. Text classification with few-shot prompting
LLMs are pretty good at classifying text into different categories. You can help an LLM in that task by providing some examples of texts and their associated categories. This approach is often called few shot prompting.
Take a look at TextClassification.java
in the app/src/main/java/gemini/workshop
directory, to do a particular type of text classification: sentiment analysis.
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.input.Prompt;
package gemini.workshop;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.vertexai.VertexAiGeminiChatModel;
import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.model.input.Prompt;
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.output.Response;
import java.util.Map;
public class TextClassification {
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(10)
.maxRetries(3)
.build();
PromptTemplate promptTemplate = PromptTemplate.from("""
Analyze the sentiment of the text below. Respond only with one word to describe the sentiment.
INPUT: This is fantastic news!
OUTPUT: POSITIVE
INPUT: Pi is roughly equal to 3.14
OUTPUT: NEUTRAL
INPUT: I really disliked the pizza. Who would use pineapples as a pizza topping?
OUTPUT: NEGATIVE
INPUT: {{text}}
OUTPUT:
""");
Prompt prompt = promptTemplate.apply(
Map.of("text", "I love strawberries!"));
Response<AiMessage> response = model.generate(prompt.toUserMessage());
System.out.println(response.content().text());
}
}
In the main()
method, you create the Gemini chat model as usual, but with a small maximum output token number, as you only want a short response: the text is POSITIVE
, NEGATIVE
, or NEUTRAL
.
Then, you create a reusable prompt template with the few-shot prompting technique, by instructing the model about a few examples of inputs and outputs. This also helps the model follow the actual output. Gemini won't reply with a full blown sentence, instead, it's instructed to reply with just one word.
You apply the variables with the apply()
method, to replace the {{text}}
placeholder with the real parameter ("I love strawberries"
), and turn that template into a user message with toUserMessage()
.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.TextClassification
You should see a single word:
POSITIVE
Looks like loving strawberries is a positive sentiment!
10. Retrieval Augmented Generation
LLMs are trained on a large quantity of text. However, their knowledge covers only information that it has seen during its training. If there is new information released after the model training cut-off-date, those details won't be available to the model. Thus, the model will not be able to answer questions on information that it hasn't seen.
That's why approaches like Retrieval Augmented Generation (RAG) help provide the extra information that an LLM may need to know to fulfill the requests of its users, to reply with information that may be more current or on private information that is not accessible at training time.
Let's come back to conversations. This time, you will be able to ask questions about your documents. You will build a chatbot that is able to retrieve relevant information from a database containing your documents split in smaller pieces ("chunks") and that information will be used by the model to ground its answers, instead of relying solely on the knowledge contained in its training.
In RAG, there are two phases:
- Ingestion phase — Documents are loaded in memory, split into smaller chunks, and vector embeddings (a high multidimensional vector representation of the chunks) are calculated and stored in a vector database that is capable of doing semantic searches. This ingestion phase is normally done once, when new documents need to be added to the document corpus.
- Query phase — Users can now ask questions about the documents. The question will be transformed into a vector as well and compared with all the other vectors in the database. The most similar vectors are usually semantically related and are returned by the vector database. Then, the LLM is given the context of the conversation, the chunks of text that correspond to the vectors returned by the database, and it is asked to ground its answer by looking at those chunks.
Prepare your documents
For this new demo, you will ask questions about the "Attention is all you need" research paper. It describes the transformer neural network architecture, pioneered by Google, which is how all modern large language models are implemented nowadays.
The paper is already downloaded to attention-is-all-you-need.pdf in the repository.
Implement the chatbot
Let's explore how to build the 2-phase approach: first with the document ingestion, and then the query time when users ask questions about the document.
In this example, both phases are implemented in the same class. Normally, you'd have one application that takes care of the ingestion, and another application that offers the chatbot interface to your users.
Also, in this example we will use an in-memory vector database. In a real production scenario, the ingestion and the querying phases would be separated in two distinct applications, and the vectors are persisted in a standalone database.
Document ingestion
The very first step of the document ingestion phase is to locate the PDF file that we already downloaded, and prepare a PdfParser
to read it:
URL url = new URI("https://github.com/glaforge/gemini-workshop-for-java-developers/raw/main/attention-is-all-you-need.pdf").toURL();
ApachePdfBoxDocumentParser pdfParser = new ApachePdfBoxDocumentParser();
Document document = pdfParser.parse(url.openStream());
Instead of creating the usual chat language model, you create an instance of an embedding model. This is a particular model whose role is to create vector representations of text pieces (words, sentences or even paragraphs). It returns vectors of floating point numbers, rather than returning text responses.
VertexAiEmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder()
.endpoint(System.getenv("LOCATION") + "-aiplatform.googleapis.com:443")
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.publisher("google")
.modelName("textembedding-gecko@003")
.maxRetries(3)
.build();
Next, you will need a few classes to collaborate together to:
- Load and split the PDF document in chunks.
- Create vector embeddings for all of these chunks.
InMemoryEmbeddingStore<TextSegment> embeddingStore =
new InMemoryEmbeddingStore<>();
EmbeddingStoreIngestor storeIngestor = EmbeddingStoreIngestor.builder()
.documentSplitter(DocumentSplitters.recursive(500, 100))
.embeddingModel(embeddingModel)
.embeddingStore(embeddingStore)
.build();
storeIngestor.ingest(document);
An instance of InMemoryEmbeddingStore
, an in-memory vector database, is created to store the vector embeddings.
The document is split in chunks thanks to the DocumentSplitters
class. It is going to split the text of the PDF file into snippets of 500 characters, with an overlap of 100 characters (with the following chunk, to avoid cutting words or sentences, in bits and pieces).
The store ingestor links the document splitter, the embedding model to calculate the vectors, and the in-memory vector database. Then, the ingest()
method will take care of doing the ingestion.
Now, the first phase is over, the document has been transformed into text chunks with their associated vector embeddings, and stored in the vector database.
Asking questions
It's time to get ready to ask questions! Create a chat model to start the conversation:
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(1000)
.build();
You also need a retriever class to link the vector database (in the embeddingStore
variable) with the embedding model. Its job is to query the vector database by computing a vector embedding for the user's query, to find similar vectors in the database:
EmbeddingStoreContentRetriever retriever =
new EmbeddingStoreContentRetriever(embeddingStore, embeddingModel);
Outside of the main method, create an interface that represents an LLM expert assistant, that's an interface that the AiServices
class will implement for you to interact with the model:
interface LlmExpert {
String ask(String question);
}
At this point, you can configure a new AI service:
LlmExpert expert = AiServices.builder(LlmExpert.class)
.chatLanguageModel(model)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.contentRetriever(retriever)
.build();
This service binds together:
- The chat language model that you configured earlier.
- A chat memory to keep track of the conversation.
- The retriever compares a vector embedding query to the vectors in the database.
- A prompt template explicitly says that the chat model should reply by basing its response on the provided information (i.e. the relevant excerpts of the documentation whose vector embedding is similar to the vector of the user's question).
.retrievalAugmentor(DefaultRetrievalAugmentor.builder()
.contentInjector(DefaultContentInjector.builder()
.promptTemplate(PromptTemplate.from("""
You are an expert in large language models,\s
you excel at explaining simply and clearly questions about LLMs.
Here is the question: {{userMessage}}
Answer using the following information:
{{contents}}
"""))
.build())
.contentRetriever(retriever)
.build())
You're finally ready to ask your questions!
List.of(
"What neural network architecture can be used for language models?",
"What are the different components of a transformer neural network?",
"What is attention in large language models?",
"What is the name of the process that transforms text into vectors?"
).forEach(query ->
System.out.printf("%n=== %s === %n%n %s %n%n", query, expert.ask(query)));
);
The full source code is in RAG.java
in app/src/main/java/gemini/workshop
directory:
Run the sample:
./gradlew -q run -DjavaMainClass=gemini.workshop.RAG
In the output, you should see answers to your questions:
=== What neural network architecture can be used for language models? === Transformer architecture === What are the different components of a transformer neural network? === The different components of a transformer neural network are: 1. Encoder: The encoder takes the input sequence and converts it into a sequence of hidden states. Each hidden state represents the context of the corresponding input token. 2. Decoder: The decoder takes the hidden states from the encoder and uses them to generate the output sequence. Each output token is generated by attending to the hidden states and then using a feed-forward network to predict the token's probability distribution. 3. Attention mechanism: The attention mechanism allows the decoder to attend to the hidden states from the encoder when generating each output token. This allows the decoder to take into account the context of the input sequence when generating the output sequence. 4. Positional encoding: Positional encoding is a technique used to inject positional information into the input sequence. This is important because the transformer neural network does not have any inherent sense of the order of the tokens in the input sequence. 5. Feed-forward network: The feed-forward network is a type of neural network that is used to predict the probability distribution of each output token. The feed-forward network takes the hidden state from the decoder as input and outputs a vector of probabilities. === What is attention in large language models? === Attention in large language models is a mechanism that allows the model to focus on specific parts of the input sequence when generating the output sequence. This is important because it allows the model to take into account the context of the input sequence when generating each output token. Attention is implemented using a function that takes two sequences as input: a query sequence and a key-value sequence. The query sequence is typically the hidden state from the previous decoder layer, and the key-value sequence is typically the sequence of hidden states from the encoder. The attention function computes a weighted sum of the values in the key-value sequence, where the weights are determined by the similarity between the query and the keys. The output of the attention function is a vector of context vectors, which are then used as input to the feed-forward network in the decoder. The feed-forward network then predicts the probability distribution of the next output token. Attention is a powerful mechanism that allows large language models to generate text that is both coherent and informative. It is one of the key factors that has contributed to the recent success of large language models in a wide range of natural language processing tasks. === What is the name of the process that transforms text into vectors? === The process of transforming text into vectors is called **word embedding**. Word embedding is a technique used in natural language processing (NLP) to represent words as vectors of real numbers. Each word is assigned a unique vector, which captures its meaning and semantic relationships with other words. Word embeddings are used in a variety of NLP tasks, such as machine translation, text classification, and question answering. There are a number of different word embedding techniques, but one of the most common is the **skip-gram** model. The skip-gram model is a neural network that is trained to predict the surrounding words of a given word. By learning to predict the surrounding words, the skip-gram model learns to capture the meaning and semantic relationships of words. Once a word embedding model has been trained, it can be used to transform text into vectors. To do this, each word in the text is converted to its corresponding vector. The vectors for all of the words in the text are then concatenated to form a single vector, which represents the entire text. Text vectors can be used in a variety of NLP tasks. For example, text vectors can be used to train machine translation models, text classification models, and question answering models. Text vectors can also be used to perform tasks such as text summarization and text clustering.
11. Function calling
There are also situations where you would like an LLM to have access to external systems, like a remote web API that retrieves information or have an action, or services that perform some kind of computation. For example:
Remote web APIs:
- Track and update customer orders.
- Find or create a ticket in an issue tracker.
- Fetch real time data like stock quotes or IoT sensor measurements.
- Send an email.
Computation tools:
- A calculator for more advanced math problems.
- Code interpretation for running code when LLMs need reasoning logic.
- Convert natural language requests into SQL queries so that an LLM can query a database.
Function calling is the ability for the model to request one or more function calls to be made on its behalf, so it can properly answer a user's prompt with more fresh data.
Given a particular prompt from a user, and the knowledge of existing functions that can be relevant to that context, an LLM can reply with a function call request. The application integrating the LLM can then call the function, and then reply back to the LLM with a response, and the LLM then interprets back by replying with a textual answer.
Four steps of function calling
Let's have a look at an example of function calling: getting information about the weather forecast.
If you ask Gemini or any other LLM about the weather in Paris, they would reply by saying that it has no information about the weather forecast. If you want the LLM to have real time acccess to the weather data, you need to define some functions it can use.
Take a look at the following diagram:
1️⃣ First, a user asks about the weather in Paris. The chatbot app knows there are one or more functions that are at its disposal to help the LLM fulfill the query. The chatbot both sends the initial prompt, as well as the list of functions that can be called. Here, a function called getWeather()
which takes a string parameter for the location.
As the LLM doesn't know about weather forecasts, instead of replying via text, it sends back a function execution request. The chatbot must call the getWeather()
function with "Paris"
as location parameter.
2️⃣ The chatbot invokes that function on behalf of the LLM, retrieves the function response. Here, we imagine that the response is {"forecast": "sunny"}
.
3️⃣ The chatbot app sends the JSON response back to the LLM.
4️⃣ The LLM looks at the JSON response, interprets that information, and eventually replies back with the text that the weather is sunny in Paris.
Each step as code
First, you'll configure the Gemini model as usual:
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(100)
.build();
You specify a tool specification that describes the function that can be called:
ToolSpecification weatherToolSpec = ToolSpecification.builder()
.name("getWeatherForecast")
.description("Get the weather forecast for a location")
.addParameter("location", JsonSchemaProperty.STRING,
JsonSchemaProperty.description("the location to get the weather forecast for"))
.build();
The name of the function is defined, as well as the name and type of the parameter, but notice that both the function and the parameters are given descriptions. Descriptions are very important and help the LLM really understand what a function can do, and thus judge whether this function needs to be called in the context of the conversation.
Let's start step #1, by sending the initial question about the weather in Paris:
List<ChatMessage> allMessages = new ArrayList<>();
// 1) Ask the question about the weather
UserMessage weatherQuestion = UserMessage.from("What is the weather in Paris?");
allMessages.add(weatherQuestion);
In step #2, we pass the tool we'd like the model to use, and the model replies with a too execution request:
// 2) The model replies with a function call request
Response<AiMessage> messageResponse = model.generate(allMessages, weatherToolSpec);
ToolExecutionRequest toolExecutionRequest = messageResponse.content().toolExecutionRequests().getFirst();
System.out.println("Tool execution request: " + toolExecutionRequest);
allMessages.add(messageResponse.content());
Step #3. At this point, we know what function the LLM would like us to call. In the code, we're not making a real call to an external API, we just return an hypothetical weather forecast directly:
// 3) We send back the result of the function call
ToolExecutionResultMessage toolExecResMsg = ToolExecutionResultMessage.from(toolExecutionRequest,
"{\"location\":\"Paris\",\"forecast\":\"sunny\", \"temperature\": 20}");
allMessages.add(toolExecResMsg);
And in step #4, the LLM learns about the function execution result, and can then synthesize a textual response:
// 4) The model answers with a sentence describing the weather
Response<AiMessage> weatherResponse = model.generate(allMessages);
System.out.println("Answer: " + weatherResponse.content().text());
The output is:
Tool execution request: ToolExecutionRequest { id = null, name = "getWeatherForecast", arguments = "{"location":"Paris"}" }
Answer: The weather in Paris is sunny with a temperature of 20 degrees Celsius.
You can see in the output above the tool execution request, as well as the answer.
The full source code is in FunctionCalling.java
in app/src/main/java/gemini/workshop
directory:
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.FunctionCalling
You should see an output similar to the following:
Tool execution request: ToolExecutionRequest { id = null, name = "getWeatherForecast", arguments = "{"location":"Paris"}" }
Answer: The weather in Paris is sunny with a temperature of 20 degrees Celsius.
12. LangChain4j handles function calling
In the previous step, you saw how the normal text question/answer and function request/response interactions are interleaved, and in between, you provided the requested function response directly, without calling a real function.
However, LangChain4j also offers a higher-level abstraction that can handle the function calls transparently for you, while handling the conversation as usual.
Single function call
Let's have a look at FunctionCallingAssistant.java
, piece by piece.
First, you create a record that will represent the function's response data structure:
record WeatherForecast(String location, String forecast, int temperature) {}
The response contains information about the location, the forecast, and the temperature.
Then you create a class that contains the actual function you want to make available to the model:
static class WeatherForecastService {
@Tool("Get the weather forecast for a location")
WeatherForecast getForecast(@P("Location to get the forecast for") String location) {
if (location.equals("Paris")) {
return new WeatherForecast("Paris", "Sunny", 20);
} else if (location.equals("London")) {
return new WeatherForecast("London", "Rainy", 15);
} else {
return new WeatherForecast("Unknown", "Unknown", 0);
}
}
}
Note that this class contains a single function, but it is annotated with the @Tool
annotation which corresponds to the description of the function the model can request to call.
The parameters of the function (a single one here) is also annotated, but with this short @P
annotation, which also gives a description of the parameter. You could add as many functions as you wanted, to make them available to the model, for more complex scenarios.
In this class, you return some canned responses, but if you wanted to call a real external weather forecast service, this is in the body of that method that you would make the call to that service.
As we saw when you created a ToolSpecification
in the previous approach, it's important to document what a function does, and describe what the parameters correspond to. This helps the model understand how and when this function can be used.
Next, LangChain4j lets you provide an interface that corresponds to the contract you want to use to interact with the model. Here, it's a simple interface that takes in a string representing the user message, and returns a string corresponding to the model's response:
interface WeatherAssistant {
String chat(String userMessage);
}
It is also possible to use more complex signatures that involve LangChain4j's UserMessage
(for a user message) or AiMessage
(for a model response), or even a TokenStream
, if you want to handle more advanced situations, as those more complicated objects also contain extra information such as the number of tokens consumed, etc. But for simplicity sake, we'll just take string in input, and string in output.
Let's finish with the main()
method that ties all the pieces together:
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(100)
.build();
WeatherForecastService weatherForecastService = new WeatherForecastService();
WeatherAssistant assistant = AiServices.builder(WeatherAssistant.class)
.chatLanguageModel(model)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.tools(weatherForecastService)
.build();
System.out.println(assistant.chat("What is the weather in Paris?"));
}
As usual, you configure the Gemini chat model. Then you instantiate your weather forecast service that contains the "function" that the model will request us to call.
Now, you use the AiServices
class again to bind the chat model, the chat memory, and the tool (ie. the weather forecast service with its function). AiServices
returns an object that implements your WeatherAssistant
interface you defined. The only thing left is to call the chat()
method of that assistant. When invoking it, you will only see the text responses, but the function call requests and the function call responses will not be visible from the developer, and those requests will be handled automatically and transparently. If Gemini thinks a function should be called, it'll reply with the function call request, and LangChain4j will take care of calling the local function on your behalf.
Run the sample:
./gradlew run -q -DjavaMainClass=gemini.workshop.FunctionCallingAssistant
You should see an output similar to the following:
OK. The weather in Paris is sunny with a temperature of 20 degrees.
This was an example of a single function.
Multiple function calls
You can also have multiple functions and let LangChain4j handle multiple function calls on your behalf. Take a look at MultiFunctionCallingAssistant.java
for a multiple function example.
It has a function to convert currencies:
@Tool("Convert amounts between two currencies")
double convertCurrency(
@P("Currency to convert from") String fromCurrency,
@P("Currency to convert to") String toCurrency,
@P("Amount to convert") double amount) {
double result = amount;
if (fromCurrency.equals("USD") && toCurrency.equals("EUR")) {
result = amount * 0.93;
} else if (fromCurrency.equals("USD") && toCurrency.equals("GBP")) {
result = amount * 0.79;
}
System.out.println(
"convertCurrency(fromCurrency = " + fromCurrency +
", toCurrency = " + toCurrency +
", amount = " + amount + ") == " + result);
return result;
}
Another function to get the value of a stock:
@Tool("Get the current value of a stock in US dollars")
double getStockPrice(@P("Stock symbol") String symbol) {
double result = 170.0 + 10 * new Random().nextDouble();
System.out.println("getStockPrice(symbol = " + symbol + ") == " + result);
return result;
}
Another function to apply a percentage to a given amount:
@Tool("Apply a percentage to a given amount")
double applyPercentage(@P("Initial amount") double amount, @P("Percentage between 0-100 to apply") double percentage) {
double result = amount * (percentage / 100);
System.out.println("applyPercentage(amount = " + amount + ", percentage = " + percentage + ") == " + result);
return result;
}
You can then combine all these functions and a MultiTools class and ask questions like "What is 10% of AAPL stock price converted from USD to EUR?""
public static void main(String[] args) {
ChatLanguageModel model = VertexAiGeminiChatModel.builder()
.project(System.getenv("PROJECT_ID"))
.location(System.getenv("LOCATION"))
.modelName("gemini-1.5-flash-001")
.maxOutputTokens(100)
.build();
MultiTools multiTools = new MultiTools();
MultiToolsAssistant assistant = AiServices.builder(MultiToolsAssistant.class)
.chatLanguageModel(model)
.chatMemory(withMaxMessages(10))
.tools(multiTools)
.build();
System.out.println(assistant.chat(
"What is 10% of the AAPL stock price converted from USD to EUR?"));
}
Run it as follows:
./gradlew run -q -DjavaMainClass=gemini.workshop.MultiFunctionCallingAssistant
And you should see the multiple functions called:
getStockPrice(symbol = AAPL) == 172.8022224055534 convertCurrency(fromCurrency = USD, toCurrency = EUR, amount = 172.8022224055534) == 160.70606683716468 applyPercentage(amount = 160.70606683716468, percentage = 10.0) == 16.07060668371647 10% of the AAPL stock price converted from USD to EUR is 16.07060668371647 EUR.
Towards Agents
Function calling is a great extension mechanism for large language models like Gemini. It enables us to build more complex systems often called "agents" or "AI assistants". These agents can interact with the external world via external APIs and with services that can have side effects on the external environment (like sending emails, creating tickets, etc.)
When creating such powerful agents, you should do so responsibly. You should consider a human-in-the-loop before making automatic actions. It's important to keep safety in mind when designing LLM-powered agents that interact with the external world.
13. Running Gemma with Ollama and TestContainers
So far, we've been using Gemini but there's also Gemma, its little sister model.
Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Gemma is available in two variations Gemma1 and Gemma2 each with various sizes. Gemma1 is available in two sizes: 2B and 7B. Gemma2 is available in two sizes: 9B and 27B. Their weights are freely available, and their small sizes means you can run it on your own, even on your laptop or in Cloud Shell.
How do you run Gemma?
There are many ways to run Gemma: in the cloud, via Vertex AI with a click of a button, or GKE with some GPUs, but you can also run it locally.
One good option to run Gemma locally is with Ollama, a tool that lets you run small models, like Llama 2, Mistral, and many others on your local machine. It's similar to Docker but for LLMs.
Install Ollama following the instruction for your Operating System.
If you are using a Linux environment you will need to enable Ollama first after installing it.
ollama serve > /dev/null 2>&1 &
Once installed locally, you can run commands to pull a model:
ollama pull gemma:2b
Wait for the model to be pulled. This can take some time.
Run the model:
ollama run gemma:2b
Now, you can interact with the model:
>>> Hello! Hello! It's nice to hear from you. What can I do for you today?
To exit the prompt press Ctrl+D
Running Gemma in Ollama on TestContainers
Instead of having to install and run Ollama locally, you can use Ollama within a container, handled by TestContainers.
TestContainers is not only useful for testing, but you can also use it for executing containers. There's even a specific OllamaContainer
you can take advantage of!
Here's the whole picture:
Implementation
Let's have a look at GemmaWithOllamaContainer.java
, piece by piece.
First, you need to create a derived Ollama container that pulls in the Gemma model. This image either already exists from a previous run or it will be created. If the image already exists, you're just going to tell TestContainers that you want to substitute the default Ollama image with your Gemma-powered variant:
private static final String TC_OLLAMA_GEMMA_2_B = "tc-ollama-gemma-2b";
// Creating an Ollama container with Gemma 2B if it doesn't exist.
private static OllamaContainer createGemmaOllamaContainer() throws IOException, InterruptedException {
// Check if the custom Gemma Ollama image exists already
List<Image> listImagesCmd = DockerClientFactory.lazyClient()
.listImagesCmd()
.withImageNameFilter(TC_OLLAMA_GEMMA_2_B)
.exec();
if (listImagesCmd.isEmpty()) {
System.out.println("Creating a new Ollama container with Gemma 2B image...");
OllamaContainer ollama = new OllamaContainer("ollama/ollama:0.1.26");
ollama.start();
ollama.execInContainer("ollama", "pull", "gemma:2b");
ollama.commitToImage(TC_OLLAMA_GEMMA_2_B);
return ollama;
} else {
System.out.println("Using existing Ollama container with Gemma 2B image...");
// Substitute the default Ollama image with our Gemma variant
return new OllamaContainer(
DockerImageName.parse(TC_OLLAMA_GEMMA_2_B)
.asCompatibleSubstituteFor("ollama/ollama"));
}
}
Next, you create and start an Ollama test container and then create an Ollama chat model, by pointing at the address and port of the container with the model you want to use. Finally, you just invoke model.generate(yourPrompt)
as usual:
public static void main(String[] args) throws IOException, InterruptedException {
OllamaContainer ollama = createGemmaOllamaContainer();
ollama.start();
ChatLanguageModel model = OllamaChatModel.builder()
.baseUrl(String.format("http://%s:%d", ollama.getHost(), ollama.getFirstMappedPort()))
.modelName("gemma:2b")
.build();
String response = model.generate("Why is the sky blue?");
System.out.println(response);
}
Run it as follows:
./gradlew run -q -DjavaMainClass=gemini.workshop.GemmaWithOllamaContainer
The first run will take a while to create and run the container but once done, you should see Gemma responding:
INFO: Container ollama/ollama:0.1.26 started in PT2.827064047S
The sky appears blue due to Rayleigh scattering. Rayleigh scattering is a phenomenon that occurs when sunlight interacts with molecules in the Earth's atmosphere.
* **Scattering particles:** The main scattering particles in the atmosphere are molecules of nitrogen (N2) and oxygen (O2).
* **Wavelength of light:** Blue light has a shorter wavelength than other colors of light, such as red and yellow.
* **Scattering process:** When blue light interacts with these molecules, it is scattered in all directions.
* **Human eyes:** Our eyes are more sensitive to blue light than other colors, so we perceive the sky as blue.
This scattering process results in a blue appearance for the sky, even though the sun is actually emitting light of all colors.
In addition to Rayleigh scattering, other atmospheric factors can also influence the color of the sky, such as dust particles, aerosols, and clouds.
You have Gemma running in Cloud Shell!
14. Congratulations
Congratulations, you've successfully built your first Generative AI chat application in Java using LangChain4j and the Gemini API! You discovered along the way that multimodal large language models are pretty powerful and capable of handling various tasks like question/answering, even on your own documentation, data extraction, interacting with external APIs, and more.
What's next?
It's your turn to enhance your applications with powerful LLM integrations!
Further reading
- Generative AI common use cases
- Training resources on Generative AI
- Interact with Gemini through Generative AI Studio
- Responsible AI