Deterministic generative AI with Gemini function calling in Java

1. Introduction

Generative AI models are remarkable at understanding and responding to natural language. But what if you need precise, predictable outputs for critical tasks like address standardization? Traditional generative models can sometimes provide different responses at different times for the same prompts, potentially leading to inconsistencies. That's where Gemini's Function Calling capability shines, allowing you to deterministically control elements of the AI's response.

This codelab illustrates this concept with the address completion and standardization use case. For this we will be building a Java Cloud Function that does the following tasks:

  1. Takes latitude and longitude coordinates
  2. Calls the Google Maps Geocoding API to get corresponding addresses
  3. Uses Gemini 1.0 Pro Function Calling feature to deterministically standardize and summarize those addresses in a specific format that we need

Let's dive in!

2. Gemini function calling

Gemini Function Calling stands out in the Generative AI era because it lets you blend the flexibility of generative language models with the precision of traditional programming.

Here are the tasks that you need complete to implement Gemini function calling:

  1. Define functions: describe the functions clearly. The descriptions must include the following information:
  • The name of the function, such as getAddress.
  • The parameters that the function expects, such as latlng as a string.
  • The type of data the function returns, such as a list of address strings.
  1. Create tools for Gemini: package function descriptions in the form of API specification into tools. Think of a tool as a specialized toolbox Gemini can use to understand the functionality of the API.
  2. Orchestrate APIs using Gemini: when you send a prompt to Gemini, it can analyze your request and recognize where it can use the tools you've provided. Gemini then acts as a smart orchestrator by performing the following tasks:
  • Generates the necessary API parameters to call your defined functions. Gemini doesn't call the API on your behalf. You must call the API based on the parameters and signature that Gemini function calling has generated for you.
  • Gemini processes the results by feeding the results from your API calls back into its generation, and incorporates structured information into its final response. You can process this information in the way you desire for your application.

The following image shows the flow of data, steps involved in the implementation, and the owner for each step, such as application, LLM, or API:

b9a39f55567072d3.png

What you'll build

You'll create and deploy a Java Cloud Function that does the following:

  • Takes latitude and longitude coordinates.
  • Calls the Google Maps Geocoding API to get the corresponding addresses.
  • Uses Gemini 1.0 Pro function calling feature to deterministically standardize and summarize those addresses in a specific format.

3. Requirements

  • A browser, such as Chrome or Firefox.
  • A Google Cloud project with billing enabled.

4. Before you begin

  1. In the Google Cloud Console, on the project selector page, select or create a Google Cloud project.
  2. Ensure that billing is enabled for your Google Cloud project. Learn how to check if billing is enabled on a project.
  3. Activate Cloud Shell from Google Cloud console. For more information, see Use Cloud Shell.
  4. If your project is not set, use the following command to set your project:
gcloud config set project <YOUR_PROJECT_ID>
  1. In the Cloud Shell, set the following environment variables:
export GCP_PROJECT=<YOUR_PROJECT_ID>
export GCP_REGION=us-central1
  1. Enable the necessary Google Cloud APIs by executing the following commands in the Cloud Shell:
gcloud services enable cloudbuild.googleapis.com cloudfunctions.googleapis.com run.googleapis.com logging.googleapis.com storage-component.googleapis.com cloudaicompanion.googleapis.com aiplatform.googleapis.com
  1. Open the Cloud Shell Editor, click Extensions and then install the Gemini + Google Cloud Code extension.

5. Implement the Cloud Function

  1. Launch the Cloud Shell Editor.
  2. Click Cloud Code and then expand the Cloud Functions section.
  3. Click the Create Function (+) icon.
  4. In the Create New Application dialog, Select Java: Hello World option.
  5. Provide a name for the project in the project path, such as GeminiFunctionCalling.
  6. Click Explorer to view the project structure and then open the pom.xml file. The following image shows the project structure:

bdf07515f413dd9e.png

  1. Add the necessary dependencies within the <dependencies>... </dependencies> tag in the pom.xml file. You can access the entire pom.xml from this project's github repository. Copy the pom.xml from there into your current project's pom.xml file that you are editing.
  2. Copy the HelloWorld.java class from the GeminiFunctionCalling github link. You must update the API_KEY and project_id with your geocoding API key and Google Cloud Project ID respectively.

6. Understand function calling by using the HelloWorld.java class

Prompt input

In this example, the following is the input prompt: What's the address for the latlong value 40.714224,-73.961452.

The following is the code snippet corresponding to the input prompt in the file:

String promptText = "What's the address for the latlong value '" + latlngString + "'?"; //40.714224,-73.961452

API specification

The Reverse Geocoding API is used in this example. The following is the API specification:

/* Declare the function for the API to invoke (Geo coding API) */ 
FunctionDeclaration functionDeclaration =
    FunctionDeclaration.newBuilder()
        .setName("getAddress")
        .setDescription("Get the address for the given latitude and longitude value.")
        .setParameters(
            Schema.newBuilder()
                .setType(Type.OBJECT)
                .putProperties(
                    "latlng",
                    Schema.newBuilder()
                        .setType(Type.STRING)
                        .setDescription("This must be a string of latitude and longitude coordinates separated by comma")
                        .build())
                .addRequired("latlng")
                .build())
        .build();

Orchestrate the prompt with Gemini

The prompt input and the API spec is sent to Gemini:

// Add the function to a "tool"
Tool tool = Tool.newBuilder()
.addFunctionDeclarations(functionDeclaration)
.build();

// Invoke the Gemini model with the use of the tool to generate the API parameters from the prompt input.
GenerativeModel model = GenerativeModel.newBuilder()
.setModelName(modelName)
.setVertexAi(vertexAI)
.setTools(Arrays.asList(tool))
.build();
GenerateContentResponse response = model.generateContent(promptText);
Content responseJSONCnt = response.getCandidates(0).getContent();

The response from this is the orchestrated parameters JSON to the API. Here's is an example output:

role: "model"
parts {
 function_call {
   name: "getAddress"
   args {
     fields {
       key: "latlng"
       value {
         string_value: "40.714224,-73.961452"
       }
     }
   }
 }
}

Pass the following parameter to the Reverse Geocoding API: "latlng=40.714224,-73.961452"

Match the orchestrated result to the format "latlng=VALUE".

Invoke the API

The following is the section of the code that invokes the API:

// Create a request
     String url = API_STRING + "?key=" + API_KEY + params;
     java.net.http.HttpRequest request = java.net.http.HttpRequest.newBuilder()
         .uri(URI.create(url))
         .GET()
         .build();
     // Send the request and get the response
     java.net.http.HttpResponse<String> httpresponse = client.send(request, java.net.http.HttpResponse.BodyHandlers.ofString());
     // Save the response
     String jsonResult =  httpresponse.body().toString();

The string jsonResult holds the response from the reverse Geocoding API. The following is a formatted version of the output:

"...277 Bedford Ave, Brooklyn, NY 11211, USA; 279 Bedford Ave, Brooklyn, NY 11211, USA; 277 Bedford Ave, Brooklyn, NY 11211, USA;..."

Process the API response and prepare the prompt

The following code processes the response from the API and prepares the prompt with instructions on how to process the response:

// Provide an answer to the model so that it knows what the result
     // of a "function call" is.
     String promptString =
     "You are an AI address standardizer for assisting with standardizing addresses accurately. Your job is to give the accurate address in the standard format as a JSON object containing the fields DOOR_NUMBER, STREET_ADDRESS, AREA, CITY, TOWN, COUNTY, STATE, COUNTRY, ZIPCODE, LANDMARK by leveraging the address string that follows in the end. Remember the response cannot be empty or null. ";

Content content =
         ContentMaker.fromMultiModalData(
             PartMaker.fromFunctionResponse(
                 "getAddress",
                 Collections.singletonMap("address", formattedAddress)));
     String contentString = content.toString();
     String address = contentString.substring(contentString.indexOf("string_value: \"") + "string_value: \"".length(), contentString.indexOf('"', contentString.indexOf("string_value: \"") + "string_value: \"".length()));

     List<SafetySetting> safetySettings = Arrays.asList(
       SafetySetting.newBuilder()
           .setCategory(HarmCategory.HARM_CATEGORY_HATE_SPEECH)
           .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
           .build(),
       SafetySetting.newBuilder()
           .setCategory(HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT)
           .setThreshold(SafetySetting.HarmBlockThreshold.BLOCK_ONLY_HIGH)
           .build()
   );

Invoke Gemini and return the standardized address

The following code passes the processed output from the previous step as prompt to Gemini:

GenerativeModel modelForFinalResponse = GenerativeModel.newBuilder()
     .setModelName(modelName)
     .setVertexAi(vertexAI)
     .build();
     GenerateContentResponse finalResponse = modelForFinalResponse.generateContent(promptString + ": " + address, safetySettings);
      System.out.println("promptString + content: " + promptString + ": " + address);
       // See what the model replies now
       System.out.println("Print response: ");
       System.out.println(finalResponse.toString());
       String finalAnswer = ResponseHandler.getText(finalResponse);
       System.out.println(finalAnswer);

The finalAnswer variable has the standardized address in JSON format. The following is an example output:

{"replies":["{ \"DOOR_NUMBER\": null, \"STREET_ADDRESS\": \"277 Bedford Ave\", \"AREA\": \"Brooklyn\", \"CITY\": \"New York\", \"TOWN\": null, \"COUNTY\": null, \"STATE\": \"NY\", \"COUNTRY\": \"USA\", \"ZIPCODE\": \"11211\", \"LANDMARK\": null} null}"]}

Now that you have understood how Gemini Function Calling works with the address standardization use case, you can go ahead and deploy the Cloud Function.

7. Deploy and test

  1. If you have already created the GeminiFunctionCalling project and implemented the Cloud Function, proceed to step 2. If you haven't created the project, go to Cloud Shell terminal, clone this repo: git clone https://github.com/AbiramiSukumaran/GeminiFunctionCalling
  2. Navigate to the project folder: cd GeminiFunctionCalling
  3. Run the following statement to build and deploy the Cloud Function:
gcloud functions deploy gemini-fn-calling --gen2 --region=us-central1 --runtime=java11 --source=. --entry-point=cloudcode.helloworld.HelloWorld --trigger-http

The following is the URL format after deployment: https://us-central1-YOUR_PROJECT_ID.cloudfunctions.net/gemini-fn-calling

  1. Test the Cloud Function by running the following command from the terminal:
gcloud functions call gemini-fn-calling --region=us-central1 --gen2 --data '{"calls":[["40.714224,-73.961452"]]}'

The following is a response for a random sample prompt: '{"replies":["{ "DOOR_NUMBER": "277", "STREET_ADDRESS": "Bedford Ave", "AREA": null, "CITY": "Brooklyn", "TOWN": null, "COUNTY": "Kings County", "STATE": "NY", "COUNTRY": "USA", "ZIPCODE": "11211", "LANDMARK": null}}```"]}'

8. Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this post, follow these steps:

  1. In the Google Cloud console, go to the Manage resources page.
  2. In the project list, select the project that you want to delete, and then click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.
  4. If you want to keep your project, skip the above steps and delete the Cloud Function by navigating to Cloud Functions and from the list of functions, check the one you want to delete and click DELETE.

9. Congratulations

Congratulations! You have successfully used the Gemini function calling feature in a Java application and transformed a generative AI task into a deterministic, reliable process. To learn more about available models, see Vertex AI LLM product documentation.