Setup Basic OpenTelemetry Plugin in gRPC Java

1. Introduction

In this codelab, you'll use gRPC to create a client and server that form the foundation of a route-mapping application written in java.

By the end of the tutorial, you will have a simple gRPC HelloWorld application instrumented with the gRPC OpenTelemetry plugin and be able to see the exported observability metrics in Prometheus.

What you'll learn

  • How to setup OpenTelemetry Plugin for existing gRPC java application
  • Running a local Prometheus instance
  • Exporting metrics to Prometheus
  • View metrics from Prometheus dashboard

2. Before you begin

What you'll need

  • git
  • curl
  • JDK v8 or higher

Install the prerequisites:

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install -y git curl

Get the code

To streamline your learning, this codelab offers a pre-built source code scaffold to help you get started. The following steps will guide you through instrumenting the gRPC OpenTelemetry Plugin in an application.

grpc-codelabs

The scaffold source code for this codelab is available in this github directory. If you prefer not to implement the code yourself, the completed source code is available in the completed directory.

First, clone the grpc codelab repo and cd into grpc-java-opentelemetry folder:

git clone https://github.com/grpc-ecosystem/grpc-codelabs.git
cd grpc-codelabs/codelabs/grpc-java-opentelemetry/

Alternatively, you can download the .zip file containing only the codelab directory and manually unzip it.

3. Register the OpenTelemetry Plugin

We need a gRPC application to add the gRPC OpenTelemetry plugin. In this codelab, we will use a simple gRPC HelloWorld client and server that we will instrument with the gRPC OpenTelemetry plugin.

Your first step is to register the OpenTelemetry Plugin configured with a Prometheus exporter in the client. Open codelabs/grpc-java-opentelemetry/start_here/src/main/java/io/grpc/codelabs/opentelemetry/OpenTelemetryClient.java with your favorite editor, Then modify main to add code to setup the gRPC Java OpenTelemetry API.

Setup instrumentation on the client

Create Prometheus exporter

Create a PrometheusHttpServer to convert OpenTelemetry metrics to Prometheus format and expose these via a HttpServer. The following code snippet creates a new Prometheus Exporter.

// Default prometheus port i.e `prometheusPort` has been initialized to 9465
 
PrometheusHttpServer prometheusExporter = PrometheusHttpServer.builder()
        .setPort(prometheusPort)         
        .build();

Create OpenTelemetry SDK instance

Register above create prometheusExporter as MetricReader to read metrics from an SdkMeterProvider. SdkMeterProvider is used to configure metric settings.

SdkMeterProvider sdkMeterProvider = SdkMeterProvider.builder()
        .registerMetricReader(prometheusExporter)
        .build();

Create an instance of OpenTelemetrySdk with the above created sdkMeterProvider for SDK implementation of OpenTelemetry.

OpenTelemetrySdk openTelemetrySdk =OpenTelemetrySdk.builder()
        .setMeterProvider(sdkMeterProvider)
        .build();

Create GrpcOpenTelemetry instance

Using the GrpcOpenTelemetry API set the OpenTelemetry SDK which uses Prometheus Metric exporter.

GrpcOpenTelemetry grpcOpenTelmetry = GrpcOpenTelemetry.newBuilder()
        .sdk(openTelemetrySdk)
        .build();

// Registers gRPC OpenTelemetry globally.
grpcOpenTelmetry.registerGlobal();

Once a GrpcOpenTelemetry instance is registered globally using registerGlobal all subsequently created gRPC clients and servers will be instrumented with OpenTelemetry.

Shutdown OpenTelemetry Sdk

Shutdown needs to happen inside the ShutDownHook. openTelemetrySdk.close() shutdowns the SDK and also calls shutdown on the SdkMeterProvider.

Setup instrumentation on the server

Similarly, let's add the GrpcOpenTelemetry to the server as well. Open codelabs/grpc-java-opentelemetry/start_here/src/main/java/io/grpc/codelabs/opentelemetry/OpenTelemetryServer.java and add code to initiaize GrpcOpenTelemetry.

Create Prometheus exporter

Since this codelab might be run from the same machine, we are using a different port to host gRPC server side metrics to avoid port conflicts while creating PrometheusHttpServer.

// Default prometheus port i.e `prometheusPort` has been set to 9464

PrometheusHttpServer prometheusExporter = PrometheusHttpServer.builder()
        .setPort(prometheusPort)
        .build();

Create OpenTelemetry SDK instance

SdkMeterProvider sdkMeterProvider = SdkMeterProvider.builder()
        .registerMetricReader(prometheusExporter)
        .build();

Initialize GrpcOpenTelemetry with OpenTelemetry SDK

OpenTelemetrySdk openTelemetrySdk =OpenTelemetrySdk.builder()
        .setMeterProvider(sdkMeterProvider)
        .build();

Create GrpcOpenTelemetry instance

GrpcOpenTelemetry grpcOpenTelmetry = GrpcOpenTelemetry.newBuilder()
        .sdk(openTelemetrySdk)
        .build();
    // Registers gRPC OpenTelemetry globally.
grpcOpenTelmetry.registerGlobal();

Shutdown OpenTelemetry Sdk

After the gRPC channel is shutdown. Calling openTelemetrySdk.close() shutdowns the SDK and also calls shutdown on the SdkMeterProvider.

4. Running the example and viewing metrics

To run the server, run -

cd start_here
../gradlew installDist
./build/install/start_here/bin/opentelemetry-server

With a successful setup, you will see the following output for the server -

[date and time] io.grpc.codelabs.opentelemetry.OpenTelemetryServer start
INFO: Server started, listening on 50051

While, the server is running, on another terminal, run the client -

./build/install/start_here/bin/opentelemetry-client world

A successful run will look like -

[date and time]io.grpc.codelabs.opentelemetry.OpenTelemetryClient greet
INFO: Greeting: Hello world 
[date and time] io.grpc.codelabs.opentelemetry.OpenTelemetryClient greet
INFO: Will try to greet world ...
[date and time]io.grpc.codelabs.opentelemetry.OpenTelemetryClient greet
INFO: Greeting: Hello world

Since we have set-up the gRPC OpenTelemetry plugin to export metrics using Prometheus. Those metrics will be available on localhost:9464 for server and localhost:9465 for client.

To see client metrics -

curl localhost:9465/metrics

The result would be of the form -

# HELP grpc_client_attempt_duration_seconds Time taken to complete a client call attempt
# TYPE grpc_client_attempt_duration_seconds histogram
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.002"} 0
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.003"} 2
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.004"} 14
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.005"} 29
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.1"} 33
grpc_client_attempt_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="+Inf"} 34
grpc_client_attempt_duration_seconds_count{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 34
grpc_client_attempt_duration_seconds_sum{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 0.46512665300000006
# HELP grpc_client_attempt_rcvd_total_compressed_message_size_bytes Compressed message bytes received per call attempt
# TYPE grpc_client_attempt_rcvd_total_compressed_message_size_bytes histogram
grpc_client_attempt_rcvd_total_compressed_message_size_bytes_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.0"} 0
grpc_client_attempt_rcvd_total_compressed_message_size_bytes_sum{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 442.0
# HELP grpc_client_attempt_sent_total_compressed_message_size_bytes Compressed message bytes sent per client call attempt
# TYPE grpc_client_attempt_sent_total_compressed_message_size_bytes histogram
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.0"} 0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="1024.0"} 34
grpc_client_attempt_sent_total_compressed_message_size_bytes_sum{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 238.0
# HELP grpc_client_attempt_started_total Number of client call attempts started
# TYPE grpc_client_attempt_started_total counter
grpc_client_attempt_started_total{grpc_method="helloworld.Greeter/SayHello",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 34.0
# HELP grpc_client_call_duration_seconds Time taken by gRPC to complete an RPC from application's perspective
# TYPE grpc_client_call_duration_seconds histogram
grpc_client_call_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.0"} 0
grpc_client_call_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="0.003"} 2
grpc_client_call_duration_seconds_bucket{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0",le="+Inf"} 34
grpc_client_call_duration_seconds_count{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 34
grpc_client_call_duration_seconds_sum{grpc_method="helloworld.Greeter/SayHello",grpc_status="OK",grpc_target="dns:///localhost:50051",otel_scope_name="grpc-java",otel_scope_version="1.66.0"} 0.512708707
# TYPE target_info gauge
target_info{service_name="unknown_service:java",telemetry_sdk_language="java",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.40.0"} 1

Similarly, for the server side metrics -

curl localhost:9464/metrics

5. Viewing metrics on Prometheus

Here, we will setup a prometheus instance that will scrape our gRPC example client and server that are exporting metrics using prometheus.

Download the latest release of Prometheus for your platform, then extract and run it:

tar xvfz prometheus-*.tar.gz
cd prometheus-*

Create a prometheus configuration file with the following -

cat > grpc_otel_java_prometheus.yml <<EOF
scrape_configs:
  - job_name: "prometheus"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "grpc-otel-java"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9464", "localhost:9465"]
EOF

Start prometheus with the new configuration -

./prometheus --config.file=grpc_otel_java_prometheus.yml

This will configure the metrics from the client and server codelab processes to be scraped every 5 seconds.

Go to http://localhost:9090/graph to view the metrics. For example, the query -

histogram_quantile(0.5, rate(grpc_client_attempt_duration_seconds_bucket[1m]))

will show a graph with the median attempt latency using 1minute as the time window for the quantile calculation.

Rate of queries -

increase(grpc_client_attempt_duration_seconds_bucket[1m])

6. (Optional) Exercise for User

In the prometheus dashboards, you'll notice that the QPS is low. See if you can identify some suspicious code in the example that is limiting the QPS.

For the enthusiastic, the client code limits itself to only have a single pending RPC at a given moment. This can be modified so that the client sends more RPCs without waiting for the previous ones to complete. (The solution for this has not been provided.)