Generative AI powered chat with users and docs in Java with PaLM and LangChain4J

29 mins remaining

About this codelab

Last updated Feb 5, 2024

Written by Guillaume Laforge

1. Introduction

Last Updated: 2024-02-05

What is Generative AI

Generative AI or generative artificial intelligence refers to the use of AI to create new content, like text, images, music, audio, and videos.

Generative AI is powered by foundation models (large AI models) that can multi-task and perform out-of-the-box tasks, including summarization, Q&A, classification, and more. Plus, with minimal training required, foundation models can be adapted for targeted use cases with very little example data.

How does Generative AI work?

Generative AI works by using an ML (Machine Learning) model to learn the patterns and relationships in a dataset of human-created content. It then uses the learned patterns to generate new content.

The most common way to train a generative AI model is to use supervised learning — the model is given a set of human-created content and corresponding labels. It then learns to generate content that is similar to the human-created content and labeled with the same labels.

What are common Generative AI applications?

Generative AI processes vast content, creating insights and answers via text, images, and user-friendly formats. Generative AI can be used to:

Improve customer interactions through enhanced chat and search experiences
Explore vast amounts of unstructured data through conversational interfaces and summarizations
Assist with repetitive tasks like replying to requests for proposals (RFPs), localizing marketing content in five languages, and checking customer contracts for compliance, and more

What Generative AI offerings does Google Cloud have?

With Vertex AI, interact with, customize, and embed foundation models into your applications — little to no ML expertise required. Access foundation models on Model Garden, tune models via a simple UI on Generative AI Studio, or use models in a data science notebook.

Vertex AI Search and Conversation offers developers the fastest way to build generative AI powered search engines and chatbots.

And, Duet AI is your AI-powered collaborator available across Google Cloud and IDEs to help you get more done, faster.

What is this codelab focusing on?

This codelab focuses on the PaLM 2 Large Language Model (LLM), hosted on Google Cloud Vertex AI that encompasses all the machine learning products and services.

You will use Java to interact with the PaLM API, in conjunction with the LangChain4J LLM framework orchestrator. You'll go through different concrete examples to take advantage of the LLM for question answering, idea generation, entity and structured content extraction, and summarization.

Tell me more about the LangChain4J framework!

The LangChain4J framework is an open source library for integrating large language models in your Java applications, by orchestrating various components, such as the LLM itself, but also other tools like vector databases (for semantic searches), document loaders and splitters (to analyze documents and learn from them), output parsers, and more.

From the Github project page:

The goal of this project is to simplify the integration of AI/LLM capabilities into your Java application.

This can be achieved thanks to:

A simple and coherent layer of abstractions, designed to ensure that your code does not depend on concrete implementations such as LLM providers, embedding store providers, etc. This allows for easy swapping of components.
Numerous implementations of the above-mentioned abstractions, providing you with a variety of LLMs and embedding stores to choose from.
Range of in-demand features on top of LLMs, such as:
The capability to ingest your own data (documentation, codebase, etc.), allowing the LLM to act and respond based on your data.
Autonomous agents for delegating tasks (defined on the fly) to the LLM, which will strive to complete them.
Prompt templates to help you achieve the highest possible quality of LLM responses.
Memory to provide context to the LLM for your current and past conversations.
Structured outputs for receiving responses from the LLM with a desired structure as Java POJOs.
"AI Services" for declaratively defining complex AI behavior behind a simple API.
Chains to reduce the need for extensive boilerplate code in common use-cases.
Auto-moderation to ensure that all inputs and outputs to/from the LLM are not harmful.

What you'll learn

How to setup a Java project to use PaLM and LangChain4J
How to extract useful information from unstructured content (entity or keyword extraction, output in JSON)
How to create a conversation with your users
How to use the chat model to ask questions on your own documentation

What you'll need

Knowledge of the Java programming language
A Google Cloud project
A browser, such as Chrome or Firefox

2. Setup and requirements

Self-paced environment setup

Sign-in to the Google Cloud Console and create a new project or reuse an existing one. If you don't already have a Gmail or Google Workspace account, you must create one.

The Project name is the display name for this project's participants. It is a character string not used by Google APIs. You can always update it.
The Project ID is unique across all Google Cloud projects and is immutable (cannot be changed after it has been set). The Cloud Console auto-generates a unique string; usually you don't care what it is. In most codelabs, you'll need to reference your Project ID (typically identified as PROJECT_ID). If you don't like the generated ID, you might generate another random one. Alternatively, you can try your own, and see if it's available. It can't be changed after this step and remains for the duration of the project.
For your information, there is a third value, a Project Number, which some APIs use. Learn more about all three of these values in the documentation.

Next, you'll need to enable billing in the Cloud Console to use Cloud resources/APIs. Running through this codelab won't cost much, if anything at all. To shut down resources to avoid incurring billing beyond this tutorial, you can delete the resources you created or delete the project. New Google Cloud users are eligible for the $300 USD Free Trial program.

Start Cloud Shell

While Google Cloud can be operated remotely from your laptop, in this codelab you will be using Cloud Shell, a command line environment running in the Cloud.

Activate Cloud Shell

From the Cloud Console, click Activate Cloud Shell .

If this is your first time starting Cloud Shell, you're presented with an intermediate screen describing what it is. If you were presented with an intermediate screen, click Continue.

It should only take a few moments to provision and connect to Cloud Shell.

This virtual machine is loaded with all the development tools needed. It offers a persistent 5 GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this codelab can be done with a browser.

Once connected to Cloud Shell, you should see that you are authenticated and that the project is set to your project ID.

Run the following command in Cloud Shell to confirm that you are authenticated:

gcloud auth list

Command output

 Credentialed Accounts
ACTIVE  ACCOUNT
*       <my_account>@<my_domain.com>

To set the active account, run:
    $ gcloud config set account `ACCOUNT`

Run the following command in Cloud Shell to confirm that the gcloud command knows about your project:

gcloud config list project

Command output

[core]
project = <PROJECT_ID>

If it is not, you can set it with this command:

gcloud config set project <PROJECT_ID>

Command output

Updated property [core/project].

3. Preparing your development environment

In this codelab, you're going to use the Cloud Shell terminal and code editor to develop your Java programs.

Enable Vertex AI APIs

In the Google Cloud console, make sure your project name is displayed at the top of your Google Cloud console. If it's not, click Select a project to open the Project Selector, and select your intended project.
If you aren't in the Vertex AI portion of the Google Cloud console, do the following:
In Search, enter Vertex AI, then return
In the search results, click Vertex AI The Vertex AI dashboard appears.
Click Enable All Recommended APIs in the Vertex AI dashboard.

This will enable several APIs, but the most important one for the codelab is the aiplatform.googleapis.com, which you can also enable on the command-line, in the Cloud Shell terminal, running the following command:

$ gcloud services enable aiplatform.googleapis.com

Creating the project structure with Gradle

In order to build your Java code examples, you'll be using the Gradle build tool, and version 17 of Java. To set up your project with Gradle, in the Cloud Shell terminal, create a directory (here, palm-workshop), run the gradle init command in that directory:

$ mkdir palm-workshop
$ cd palm-workshop

$ gradle init

Select type of project to generate:
  1: basic
  2: application
  3: library
  4: Gradle plugin
Enter selection (default: basic) [1..4] 2

Select implementation language:
  1: C++
  2: Groovy
  3: Java
  4: Kotlin
  5: Scala
  6: Swift
Enter selection (default: Java) [1..6] 3

Split functionality across multiple subprojects?:
  1: no - only one application project
  2: yes - application and library projects
Enter selection (default: no - only one application project) [1..2] 1

Select build script DSL:
  1: Groovy
  2: Kotlin
Enter selection (default: Groovy) [1..2] 1

Generate build using new APIs and behavior (some features may change in the next minor release)? (default: no) [yes, no] 

Select test framework:
  1: JUnit 4
  2: TestNG
  3: Spock
  4: JUnit Jupiter
Enter selection (default: JUnit Jupiter) [1..4] 4

Project name (default: palm-workshop): 
Source package (default: palm.workshop): 

> Task :init
Get more help with your project: https://docs.gradle.org/7.4/samples/sample_building_java_applications.html

BUILD SUCCESSFUL in 51s
2 actionable tasks: 2 executed

You will build an application (option 2), using the Java language (option 3), without using subprojects (option 1), using the Groovy syntax for the build file (option 1), don't use new build features (option no), generating tests with JUnit Jupiter (option 4), and for the project name you can use palm-workshop, and similarly for the source package you can use palm.workshop.

The project structure will look as follows:

├── gradle 
│   └── ...
├── gradlew 
├── gradlew.bat 
├── settings.gradle 
└── app
    ├── build.gradle 
    └── src
        ├── main
        │   └── java 
        │       └── palm
        │           └── workshop
        │               └── App.java
        └── test
            └── ...

Let's update the app/build.gradle file to add some needed dependencies. You can remove the guava dependency if it is present, and replace it with the dependencies for the LangChain4J project, and the logging library to avoid nagging missing logger messages:

dependencies {
    // Use JUnit Jupiter for testing.
    testImplementation 'org.junit.jupiter:junit-jupiter:5.8.1'

    // Logging library
    implementation 'org.slf4j:slf4j-jdk14:2.0.9'

    // This dependency is used by the application.
    implementation 'dev.langchain4j:langchain4j-vertex-ai:0.24.0'
    implementation 'dev.langchain4j:langchain4j:0.24.0'
}

There are 2 dependencies for LangChain4J:

one on the core project,
and one for the dedicated Vertex AI module.

In order to use Java 17 for compiling and running our programs, add the following block below the plugins {} block:

java {
    toolchain {
        languageVersion = JavaLanguageVersion.of(17)
    }
}

One more change to make: update the application block of app/build.gradle, to let users be able to override the main class to run on the command-line when invoking the build tool:

application {
    mainClass = providers.systemProperty('javaMainClass')
                         .orElse('palm.workshop.App')
}

To check that your build file is ready to run your application, you can run the default main class which prints a simple Hello World! message:

$ ./gradlew run -DjavaMainClass=palm.workshop.App

> Task :app:run
Hello World!

BUILD SUCCESSFUL in 3s
2 actionable tasks: 2 executed

Now you are ready to program with the PaLM large language text model, by using the LangChain4J project!

For reference, here's what the full app/build.gradle build file should look like now:

plugins {
    // Apply the application plugin to add support for building a CLI application in Java.
    id 'application'
}

java {
    toolchain {
        // Ensure we compile and run on Java 17
        languageVersion = JavaLanguageVersion.of(17)
    }
}

repositories {
    // Use Maven Central for resolving dependencies.
    mavenCentral()
}

dependencies {
    // Use JUnit Jupiter for testing.
    testImplementation 'org.junit.jupiter:junit-jupiter:5.8.1'

    // This dependency is used by the application.
    implementation 'dev.langchain4j:langchain4j-vertex-ai:0.24.0'
    implementation 'dev.langchain4j:langchain4j:0.24.0'
    implementation 'org.slf4j:slf4j-jdk14:2.0.9'
}

application {
    mainClass = providers.systemProperty('javaMainClass').orElse('palm.workshop.App')
}

tasks.named('test') {
    // Use JUnit Platform for unit tests.
    useJUnitPlatform()
}

4. Making your first call to PaLM's chat model

Now that the project is properly set up, it is time to call the PaLM API.

Create a new class called ChatPrompts.java in the app/src/main/java/palm/workshop directory (alongside the default App.java class), and type the following content:

package palm.workshop;

import dev.langchain4j.model.vertexai.VertexAiChatModel;
import dev.langchain4j.chain.ConversationalChain;

public class ChatPrompts {
    public static void main(String[] args) {
        VertexAiChatModel model = VertexAiChatModel.builder()
            .endpoint("us-central1-aiplatform.googleapis.com:443")
            .project("YOUR_PROJECT_ID")
            .location("us-central1")
            .publisher("google")
            .modelName("chat-bison@001")
            .maxOutputTokens(400)
            .maxRetries(3)
            .build();

        ConversationalChain chain = ConversationalChain.builder()
            .chatLanguageModel(model)
            .build();

        String message = "What are large language models?";
        String answer = chain.execute(message);
        System.out.println(answer);

        System.out.println("---------------------------");

        message = "What can you do with them?";
        answer = chain.execute(message);
        System.out.println(answer);

        System.out.println("---------------------------");

        message = "Can you name some of them?";
        answer = chain.execute(message);
        System.out.println(answer);
    }
}

In this first example, you need to import the VertexAiChatModel class, and the LangChain4J ConversationalChain to make it easier to handle the multiturn aspect of conversations.

Next, in the main method, you're going to configure the chat language model, by using the builder for the VertexAiChatModel, to specify:

the endpoint,
the project,
the region,
the publisher,
and name of the model (chat-bison@001).

Now that the language model is ready, you can prepare a ConversationalChain. This is a higher level abstraction offered by LangChain4J to configure together different components to handle a conversation, like the chat language model itself, but potentially other components to handle the history of the chat conversation, or to plug other tools like retrievers to fetch information from vector databases. But don't worry, we'll come back to that later on in this codelab.

Then, you are going to make a multi-turn conversation with the chat model, to ask several interrelated questions. First you wonder about LLMs, then you ask what you can do with them, and what are some examples of them. Notice how you don't have to repeat yourself, the LLM knows that "them" means LLMs, in the context of that conversation.

To take that multiturn conversation, you just call the execute() method on the chain, it'll add it to the context of the conversation, the chat model will generate a reply and add it to the chat history as well.

To run this class, run the following command in the Cloud Shell terminal:

./gradlew run -DjavaMainClass=palm.workshop.ChatPrompts

You should see an output similar to this one:

$ ./gradlew run -DjavaMainClass=palm.workshop.ChatPrompts
Starting a Gradle Daemon, 2 incompatible and 2 stopped Daemons could not be reused, use --status for details

> Task :app:run
Large language models (LLMs) are artificial neural networks that are trained on massive datasets of text and code. They are designed to understand and generate human language, and they can be used for a variety of tasks, such as machine translation, question answering, and text summarization.
---------------------------
LLMs can be used for a variety of tasks, such as:

* Machine translation: LLMs can be used to translate text from one language to another.
* Question answering: LLMs can be used to answer questions posed in natural language.
* Text summarization: LLMs can be used to summarize text into a shorter, more concise form.
* Code generation: LLMs can be used to generate code, such as Python or Java code.
* Creative writing: LLMs can be used to generate creative text, such as poems, stories, and scripts.

LLMs are still under development, but they have the potential to revolutionize a wide range of industries. For example, LLMs could be used to improve customer service, create more personalized marketing campaigns, and develop new products and services.
---------------------------
Some of the most well-known LLMs include:

* GPT-3: Developed by OpenAI, GPT-3 is a large language model that can generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.
* LaMDA: Developed by Google, LaMDA is a large language model that can chat with you in an open-ended way, answering your questions, telling stories, and providing different kinds of creative content.
* PaLM 2: Developed by Google, PaLM 2 is a large language model that can perform a wide range of tasks, including machine translation, question answering, and text summarization.
* T5: Developed by Google, T5 is a large language model that can be used for a variety of tasks, including text summarization, question answering, and code generation.

These are just a few examples of the many LLMs that are currently being developed. As LLMs continue to improve, they are likely to play an increasingly important role in our lives.

BUILD SUCCESSFUL in 25s
2 actionable tasks: 2 executed

PaLM replied to your 3 related questions!

The VertexAIChatModel builder lets you define optional parameters which already have some default values that you can override. Here are some examples:

.temperature(0.2) — to define how creative you want the response to be (0 being low creative and often more factual, while 1 is for more creative outputs)
.maxOutputTokens(50) — in the example, 400 tokens were requested (3 tokens are roughly equivalent to 4 words), depending on how long you want the generated answer to be
.topK(20) — to randomly select a word out of a maximum number of probably words for the text completion (from 1 to 40)
.topP(0.95) — to select the possible words whose total probability add up to that floating point number (between 0 and 1)
.maxRetries(3) — in case you're running past the request per time quota, you can have the model retry the call 3 times for example

5. A useful chatbot with a personality!

In the previous section, you started right away asking questions to the LLM chatbot without giving it any particular context. But you can specialize such a chatbot to become an expert at a particular task, or on a particular topic.

How do you do that? By setting the stage: by explaining the LLM the task at hand, the context, maybe giving a few examples of what it has to do, what persona it should have, in which format you'd like to get responses, and potentially a tone, if you want the chatbot to behave in a certain way.

This article on crafting prompts illustrates this approach nicely with this graphic:

https://medium.com/@eldatero/master-the-perfect-chatgpt-prompt-formula-c776adae8f19

To illustrate this point, let's get some inspiration from the prompts.chat websites, which lists lots of great and fun ideas of custom tailored chatbots to let them act as:

an emoji translator — to translate user messages into emojis
a prompt enhancer — to create better prompts
a journal reviewer — to help review research papers
a personal stylist — to get clothing style suggestions

There's one example to turn an LLM chatbot into a chess player! Let's implement that!

Update the ChatPrompts class as follows:

package palm.workshop;

import dev.langchain4j.chain.ConversationalChain;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.vertexai.VertexAiChatModel;
import dev.langchain4j.store.memory.chat.InMemoryChatMemoryStore;

public class ChatPrompts {
    public static void main(String[] args) {
        VertexAiChatModel model = VertexAiChatModel.builder()
            .endpoint("us-central1-aiplatform.googleapis.com:443")
            .project("YOUR_PROJECT_ID")
            .location("us-central1")
            .publisher("google")
            .modelName("chat-bison@001")
            .maxOutputTokens(7)
            .maxRetries(3)
            .build();

        InMemoryChatMemoryStore chatMemoryStore = new InMemoryChatMemoryStore();

        MessageWindowChatMemory chatMemory = MessageWindowChatMemory.builder()
            .chatMemoryStore(chatMemoryStore)
            .maxMessages(200)
            .build();

        chatMemory.add(SystemMessage.from("""
            You're an expert chess player with a high ELO ranking.
            Use the PGN chess notation to reply with the best next possible move.
            """
        ));


        ConversationalChain chain = ConversationalChain.builder()
            .chatLanguageModel(model)
            .chatMemory(chatMemory)
            .build();

        String pgn = "";
        String[] whiteMoves = { "Nf3", "c4", "Nc3", "e3", "Dc2", "Cd5"};
        for (int i = 0; i < whiteMoves.length; i++) {
            pgn += " " + (i+1) + ". " + whiteMoves[i];
            System.out.println("Playing " + whiteMoves[i]);
            pgn = chain.execute(pgn);
            System.out.println(pgn);
        }
    }
}

Let's break it down step by step:

Some new imports are needed to handle the memory of the chat.
You instantiate the chat model, but with a small number of maximum tokens (here 7), as we just want to generate the next move, not a whole dissertation on chess!
Next, you create a chat memory store to save the chat conversations.
You create an actual windowed chat memory, to retain the last moves.
In the chat memory, you add a "system" message, that instructs the chat model about who it is supposed to be (ie. an expert chess player). The "system" message adds some context, whereas "user" and "AI" messages are the actual discussion.
You create a conversational chain that combines the memory and the chat model.
Then, we have a list of moves for white, that you're iterating over. The chain is executed with the next white move each time, and the chat model replies with the next best move.

When you run this class with these moves, you should see the following output:

$ ./gradlew run -DjavaMainClass=palm.workshop.ChatPrompts
Starting a Gradle Daemon (subsequent builds will be faster)

> Task :app:run
Playing Nf3
1... e5
Playing c4
2... Nc6
Playing Nc3
3... Nf6
Playing e3
4... Bb4
Playing Dc2
5... O-O
Playing Cd5
6... exd5

Woh! PaLM knows how to play chess? Well, not exactly, but during its training, the model must have seen some chess game commentaries, or even the PGN (Portable Game Notation) files of past games. This chatbot will likely not win against AlphaZero though (the AI that defeats the best Go, Shogi, and Chess players) and the conversation might derail further down the road, with the model not really remembering the actual state of the game.

Chat models are very powerful, and can create rich interactions with your users, and handle various contextual tasks. In the next section, we'll have a look at a useful task: extracting structured data from text.

6. Extracting information from unstructured text

In the previous section, you created conversations between a user and a chat language model. But with LangChain4J, you can also use a chat model to extract structured information from unstructured text.

Let's say you want to extract the name and age of a person, given a biography or description of that person. You can instruct the large language model to generate JSON data structures with a cleverly tweaked prompt (this is commonly called "prompt engineering").

You will update the ChatPrompts class as follows:

package palm.workshop;

import dev.langchain4j.model.vertexai.VertexAiChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.UserMessage;

public class ChatPrompts {

    static class Person {
        String name;
        int age;
    }

    interface PersonExtractor {
        @UserMessage("""
            Extract the name and age of the person described below.
            Return a JSON document with a "name" and an "age" property, \
            following this structure: {"name": "John Doe", "age": 34}
            Return only JSON, without any markdown markup surrounding it.
            Here is the document describing the person:
            ---
            {{it}}
            ---
            JSON: 
            """)
        Person extractPerson(String text);
    }

    public static void main(String[] args) {
        VertexAiChatModel model = VertexAiChatModel.builder()
            .endpoint("us-central1-aiplatform.googleapis.com:443")
            .project("YOUR_PROJECT_ID")
            .location("us-central1")
            .publisher("google")
            .modelName("chat-bison@001")
            .maxOutputTokens(300)
            .build();
        
        PersonExtractor extractor = AiServices.create(PersonExtractor.class, model);

        Person person = extractor.extractPerson("""
            Anna is a 23 year old artist based in Brooklyn, New York. She was born and 
            raised in the suburbs of Chicago, where she developed a love for art at a 
            young age. She attended the School of the Art Institute of Chicago, where 
            she studied painting and drawing. After graduating, she moved to New York 
            City to pursue her art career. Anna's work is inspired by her personal 
            experiences and observations of the world around her. She often uses bright 
            colors and bold lines to create vibrant and energetic paintings. Her work 
            has been exhibited in galleries and museums in New York City and Chicago.    
            """
        );

        System.out.println(person.name);
        System.out.println(person.age);
    }
}

Let's have a look at the various steps in this file:

A Person class is defined to represent the details describing a person (its name and age).
The PersonExtractor interface is created with a method that, given an unstructured text string, returns an instantiated Person instance.
The extractPerson() is annotated with a @UserMessage annotation that associates a prompt with it. That's the prompt that the model will use to extract the information, and return the details in the form of a JSON document, that will be parsed for you, and unmarshalled into a Person instance.

Now let's look at the content of the main() method:

The chat model is instantiated.
A PersonExtractor object is created thanks to LangChain4J's AiServices class.
Then, you can simply call Person person = extractor.extractPerson(...) to extract the details of the person from the unstructured text, and get back a Person instance with the name and age.

Now, run this class with the following command:

$ ./gradlew run -DjavaMainClass=palm.workshop.ChatPrompts

> Task :app:run
Anna
23

Yes! This is Anna, she is 23!

What is of particular interest with this AiServices approach is that you operate with strongly typed objects. You are not interacting directly with the chat LLM. Instead, you are working with concrete classes, like the Person class to represent the extracted personal information, and you have a PersonExtractor class with an extractPerson() method which returns a Person instance. The notion of LLM is abstracted away, and as a Java developer, you are just manipulating normal classes and objects.

7. Retrieval Augmented Generation: chatting with your docs

Let's come back to conversations. This time, you will be able to ask questions about your documents. You will build a chatbot that is able to retrieve relevant information from a database of extracts of your documents, and that information will be used by the model to "ground" its answers, rather than trying to generate responses coming from its training. This pattern is called RAG, or Retrieval Augmented Generation.

In Retrieval Augmented Generation, in a nutshell, there are two phases:

Ingestion phase — Documents are loaded, split into smaller chunks, and a vectorial representation of them (a "vector embedding") is stored in a "vector database" that is capable of doing semantic searches.

Query phase — Users can now ask your chatbot questions about the documentation. The question will be transformed into a vector as well, and compared with all the other vectors in the database. The most similar vectors are usually semantically related, and are returned by the vector database. Then, the LLM is given the context of the conversation, the snippets of text that correspond to the vectors returned by the database, and it is asked to ground its answer by looking at those snippets.

Preparing your documents

For this new demo, you will ask questions about the "transformer" neural network architecture, pioneered by Google, which is how all modern large language models are implemented nowadays.

You can retrieve the research paper that described this architecture ("Attention is all you need"), by using the wget command to download the PDF from the internet:

wget -O attention-is-all-you-need.pdf \
    https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

Implementing a conversational retrieval chain

Let's explore, piece by piece, how to build the 2-phase approach, first with the document ingestion, and then the query time when users ask questions about the document.

Document ingestion

The very first step of the document ingestion phase is to locate the PDF file that we download it, and prepare a PdfParser to read it:

PdfDocumentParser pdfParser = new PdfDocumentParser();
Document document = pdfParser.parse(
    new FileInputStream(new File("/home/YOUR_USER_NAME/palm-workshop/attention-is-all-you-need.pdf")));

Instead of creating the usual chat language model, before that, you'll create an instance of an "embedding" model. This is a particular model and endpoint whose role is to create vector representations of text pieces (words, sentences or even paragraphs).

VertexAiEmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder()
    .endpoint("us-central1-aiplatform.googleapis.com:443")
    .project("YOUR_PROJECT_ID")
    .location("us-central1")
    .publisher("google")
    .modelName("textembedding-gecko@001")
    .maxRetries(3)
    .build();

Next, you will need a few classes to collaborate together to:

Load and split the PDF document in chunks.
Create vector embeddings for all of these chunks.

InMemoryEmbeddingStore<TextSegment> embeddingStore = 
    new InMemoryEmbeddingStore<>();

EmbeddingStoreIngestor storeIngestor = EmbeddingStoreIngestor.builder()
    .documentSplitter(DocumentSplitters.recursive(500, 100))
    .embeddingModel(embeddingModel)
    .embeddingStore(embeddingStore)
    .build();
storeIngestor.ingest(document);

EmbeddingStoreRetriever retriever = EmbeddingStoreRetriever.from(embeddingStore, embeddingModel);

An instance of InMemoryEmbeddingStore, an in-memory vector database, is created to store the vector embeddings.

The document is split in chunks thanks to the DocumentSplitters class. It is going to split the text of the PDF file into snippets of 500 characters, with an overlap of 100 characters (with the following chunk, to avoid cutting words or sentences, in bits and pieces).

The store "ingestor" links the document splitter, the embedding model to calculate the vectors, and the in-memory vector database. Then, the ingest() method will take care of doing the ingestion.

Now, the first phase is over, the document has been transformed into text chunks with their associated vector embeddings, and stored in the vector database.

Asking questions

It's time to get ready to ask questions! The usual chat model can be created to start the conversation:

VertexAiChatModel model = VertexAiChatModel.builder()
    .endpoint("us-central1-aiplatform.googleapis.com:443")
    .project("YOUR_PROJECT_ID")
    .location("us-central1")
    .publisher("google")
    .modelName("chat-bison@001")
    .maxOutputTokens(1000)
    .build();

You will also need a "retriever" class that will link the vector database (in the embeddingStore variable) and the embedding model. Its job is to query the vector database by computing a vector embedding for the user's query, to find similar vectors in the database:

EmbeddingStoreRetriever retriever = 
    EmbeddingStoreRetriever.from(embeddingStore, embeddingModel);

At this point, you can instantiate the ConversationalRetrievalChain class (this is just a different name for the Retrieval Augmented Generation pattern):

ConversationalRetrievalChain rag = ConversationalRetrievalChain.builder()
    .chatLanguageModel(model)
    .retriever(retriever)
    .promptTemplate(PromptTemplate.from("""
        Answer to the following query the best as you can: {{question}}
        Base your answer on the information provided below:
        {{information}}
        """
    ))
    .build();

This "chain" binds together:

The chat language model that you configured earlier.
The retriever compares a vector embedding query to the vectors in the database.
A prompt template explicitly says that the chat model should reply by basing its response on the provided information (i.e. the relevant excerpts of the documentation whose vector embedding is similar to the vector of the user's question).

And now you're finally ready to ask your questions!

String result = rag.execute("What neural network architecture can be used for language models?");
System.out.println(result);
System.out.println("------------");

result = rag.execute("What are the different components of a transformer neural network?");
System.out.println(result);
System.out.println("------------");

result = rag.execute("What is attention in large language models?");
System.out.println(result);
System.out.println("------------");

result = rag.execute("What is the name of the process that transforms text into vectors?");
System.out.println(result);

Run the program with:

$ ./gradlew run -DjavaMainClass=palm.workshop.ChatPrompts

In the output, you should see the answer to your questions:

The Transformer is a neural network architecture that can be used for 
language models. It is based solely on attention mechanisms, dispensing 
with recurrence and convolutions. The Transformer has been shown to 
outperform recurrent neural networks and convolutional neural networks on 
a variety of language modeling tasks.
------------
The Transformer is a neural network architecture that can be used for 
language models. It is based solely on attention mechanisms, dispensing 
with recurrence and convolutions. The Transformer has been shown to 
outperform recurrent neural networks and convolutional neural networks on a 
variety of language modeling tasks. The Transformer consists of an encoder 
and a decoder. The encoder is responsible for encoding the input sequence 
into a fixed-length vector representation. The decoder is responsible for 
decoding the output sequence from the input sequence. The decoder uses the 
attention mechanism to attend to different parts of the input sequence when 
generating the output sequence.
------------
Attention is a mechanism that allows a neural network to focus on specific 
parts of an input sequence. In the context of large language models, 
attention is used to allow the model to focus on specific words or phrases 
in a sentence when generating output. This allows the model to generate 
more relevant and informative output.
------------
The process of transforming text into vectors is called word embedding. 
Word embedding is a technique that represents words as vectors in a 
high-dimensional space. The vectors are typically learned from a large 
corpus of text, and they capture the semantic and syntactic relationships 
between words. Word embedding has been shown to be effective for a variety 
of natural language processing tasks, such as machine translation, question 
answering, and sentiment analysis.

The full solution

To facilitate copying and pasting, here's the full content of the ChatPrompts class:

package palm.workshop;

import dev.langchain4j.chain.ConversationalRetrievalChain;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.parser.PdfDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.segment.TextSegment; 
import dev.langchain4j.model.input.PromptTemplate;
import dev.langchain4j.model.vertexai.VertexAiChatModel;
import dev.langchain4j.model.vertexai.VertexAiEmbeddingModel;
import dev.langchain4j.retriever.EmbeddingStoreRetriever;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;

public class ChatPrompts {
    public static void main(String[] args) throws IOException {
        PdfDocumentParser pdfParser = new PdfDocumentParser();
        Document document = pdfParser.parse(new FileInputStream(new File("/ABSOLUTE_PATH/attention-is-all-you-need.pdf")));

        VertexAiEmbeddingModel embeddingModel = VertexAiEmbeddingModel.builder()
            .endpoint("us-central1-aiplatform.googleapis.com:443")
            .project("YOUR_PROJECT_ID")
            .location("us-central1")
            .publisher("google")
            .modelName("textembedding-gecko@001")
            .maxRetries(3)
            .build();

        InMemoryEmbeddingStore<TextSegment> embeddingStore = 
            new InMemoryEmbeddingStore<>();

        EmbeddingStoreIngestor storeIngestor = EmbeddingStoreIngestor.builder()
            .documentSplitter(DocumentSplitters.recursive(500, 100))
            .embeddingModel(embeddingModel)
            .embeddingStore(embeddingStore)
            .build();
        storeIngestor.ingest(document);

        EmbeddingStoreRetriever retriever = EmbeddingStoreRetriever.from(embeddingStore, embeddingModel);

        VertexAiChatModel model = VertexAiChatModel.builder()
            .endpoint("us-central1-aiplatform.googleapis.com:443")
            .project("genai-java-demos")
            .location("us-central1")
            .publisher("google")
            .modelName("chat-bison@001")
            .maxOutputTokens(1000)
            .build();

        ConversationalRetrievalChain rag = ConversationalRetrievalChain.builder()
            .chatLanguageModel(model)
            .retriever(retriever)
            .promptTemplate(PromptTemplate.from("""
                Answer to the following query the best as you can: {{question}}
                Base your answer on the information provided below:
                {{information}}
                """
            ))
            .build();

        String result = rag.execute("What neural network architecture can be used for language models?");
        System.out.println(result);
        System.out.println("------------");

        result = rag.execute("What are the different components of a transformer neural network?");
        System.out.println(result);
        System.out.println("------------");

        result = rag.execute("What is attention in large language models?");
        System.out.println(result);
        System.out.println("------------");

        result = rag.execute("What is the name of the process that transforms text into vectors?");
        System.out.println(result);
    }
}

8. Congratulations

Congratulations, you've successfully built your first Generative AI chat application in Java using LangChain4J and the PaLM API! You discovered along the way that large language chat models are pretty powerful and capable of handling various tasks like question/answering, even on your own documentation, data extraction, and to some extent, it was even able to play some Chess!

What's next?

Check out some the following codelabs to go further with PaLM in Java:

Generative AI text generation with PaLM and LangChain4J

Reference docs

Back

Report a mistake

Generative AI powered chat with users and docs in Java with PaLM and LangChain4J

About this codelab

What is Generative AI

How does Generative AI work?

What are common Generative AI applications?

What Generative AI offerings does Google Cloud have?

What is this codelab focusing on?

Tell me more about the LangChain4J framework!

What you'll learn

What you'll need

Self-paced environment setup

Start Cloud Shell

Activate Cloud Shell

Enable Vertex AI APIs

Creating the project structure with Gradle

Preparing your documents

Implementing a conversational retrieval chain

Document ingestion

Asking questions

The full solution

What's next?

Further reading

Reference docs