gRPC Python で基本的な OpenTelemetry プラグインを設定する

1. はじめに

この Codelab では、gRPC を使用して、Python で記述されたルートマッピングアプリケーションの基盤となるクライアントとサーバーを作成します。

このチュートリアルを完了すると、gRPC OpenTelemetry プラグインで計測されたシンプルな gRPC HelloWorld アプリケーションが作成され、エクスポートされたオブザーバビリティ指標を Prometheus で確認できるようになります。

学習内容

既存の gRPC Python アプリケーションに OpenTelemetry プラグインを設定する方法
ローカル Prometheus インスタンスの実行
Prometheus への指標のエクスポート
Prometheus ダッシュボードから指標を表示する

2. 始める前に

必要なもの

git
curl
build-essential
Python 3.9 以降。プラットフォーム固有の Python インストール手順については、Python の設定と使用をご覧ください。または、uv や pyenv などのツールを使用して、システム以外の Python をインストールします。
Python パッケージをインストールするための pip バージョン 9.0.1 以降。
Python 仮想環境を作成する venv。

次コマンドで前提条件をインストールします。

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install -y git curl build-essential clang
sudo apt install python3
sudo apt install python3-pip python3-venv

コードを取得する

学習を効率化するため、この Codelab では、すぐに始められるように、あらかじめ作成されたソースコードスキャフォールドが用意されています。次の手順では、アプリケーションで gRPC OpenTelemetry プラグインを計測する方法について説明します。

grpc-codelabs

この Codelab のスキャフォールディングのソースコードは、こちらの GitHub のディレクトリにあります。コードを自分で実装しない場合は、completed ディレクトリで完成したソースコードを利用できます。

まず、grpc codelab リポジトリのクローンを作成し、grpc-python-opentelemetry フォルダに移動します。

git clone https://github.com/grpc-ecosystem/grpc-codelabs.git
cd grpc-codelabs/codelabs/grpc-python-opentelemetry/

または、Codelab ディレクトリのみを含む .zip ファイルをダウンロードして、手動で解凍することもできます。

まず、新しい Python 仮想環境（venv）を作成して、プロジェクトの依存関係をシステムパッケージから分離します。

python3 -m venv --upgrade-deps .venv

bash/zsh シェルで仮想環境を有効にするには:

source .venv/bin/activate

Windows と標準以外のシェルについては、https://docs.python.org/3/library/venv.html#how-venvs-work の表をご覧ください。

次に、次のコマンドを使用して環境に依存関係をインストールします。

python -m pip install -r requirements.txt

3. OpenTelemetry プラグインを登録する

gRPC OpenTelemetry プラグインを追加するには、gRPC アプリケーションが必要です。この Codelab では、gRPC OpenTelemetry プラグインでインストルメンテーションするシンプルな gRPC HelloWorld クライアントとサーバーを使用します。

まず、クライアントで Prometheus エクスポータで構成された OpenTelemetry プラグインを登録します。任意のエディタで start_here/observability_greeter_client.py を開きます。まず、関連する依存関係とマクロを追加して、次のようにします。

import logging
import time

import grpc
import grpc_observability
import helloworld_pb2
import helloworld_pb2_grpc
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from prometheus_client import start_http_server

_SERVER_PORT = "50051"
_PROMETHEUS_PORT = 9465

次に、run() を次のように変換します。

def run():
    # Start Prometheus client
    start_http_server(port=_PROMETHEUS_PORT, addr="0.0.0.0")
    meter_provider = MeterProvider(metric_readers=[PrometheusMetricReader()])

    otel_plugin = grpc_observability.OpenTelemetryPlugin(
        meter_provider=meter_provider
    )
    otel_plugin.register_global()

    with grpc.insecure_channel(target=f"localhost:{_SERVER_PORT}") as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        # Continuously send RPCs every second.
        while True:
            try:
                response = stub.SayHello(helloworld_pb2.HelloRequest(name="You"))
                print(f"Greeter client received: {response.message}")
                time.sleep(1)
            except grpc.RpcError as rpc_error:
                print("Call failed with code: ", rpc_error.code())

    # Deregister is not called in this example, but this is required to clean up.
    otel_plugin.deregister_global()

次のステップでは、OpenTelemetry プラグインをサーバーに追加します。start_here/observability_greeter_server.py を開き、関連する依存関係とマクロを追加して、次のようにします。

from concurrent import futures
import logging
import time

import grpc
import grpc_observability
import helloworld_pb2
import helloworld_pb2_grpc
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

_SERVER_PORT = "50051"
_PROMETHEUS_PORT = 9464

次に、run() を次のように変換します。

def serve():
    # Start Prometheus client
    start_http_server(port=_PROMETHEUS_PORT, addr="0.0.0.0")

    meter_provider = MeterProvider(metric_readers=[PrometheusMetricReader()])

    otel_plugin = grpc_observability.OpenTelemetryPlugin(
        meter_provider=meter_provider
    )
    otel_plugin.register_global()

    server = grpc.server(
        thread_pool=futures.ThreadPoolExecutor(max_workers=10),
    )
    helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    server.add_insecure_port("[::]:" + _SERVER_PORT)
    server.start()
    print("Server started, listening on " + _SERVER_PORT)

    server.wait_for_termination()

    # Deregister is not called in this example, but this is required to clean up.
    otel_plugin.deregister_global()

4. 例を実行して指標を表示する

サーバーを実行するには、次のコマンドを実行します。

cd start_here
python -m observability_greeter_server

設定が正常に完了すると、サーバーに次の出力が表示されます。

Server started, listening on 50051

サーバーが実行されている間に、別のターミナルでクライアントを実行します。

# Run the below commands to cd to the working directory and activate virtual environment in the new terminal
cd grpc-codelabs/codelabs/grpc-python-opentelemetry/
source .venv/bin/activate

cd start_here
python -m observability_greeter_client

実行が成功すると、次のようになります。

Greeter client received: Hello You
Greeter client received: Hello You
Greeter client received: Hello You

gRPC OpenTelemetry プラグインは、Prometheus を使用して指標をエクスポートするように設定されているためです。これらの指標は、サーバーの場合は localhost:9464、クライアントの場合は localhost:9465 で確認できます。

クライアント指標を表示するには -

curl localhost:9465/metrics

結果は次の形式になります。

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 241.0
python_gc_objects_collected_total{generation="1"} 163.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 78.0
python_gc_collections_total{generation="1"} 7.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="9",version="3.10.9"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.868988416e+09
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.1680896e+07
# TYPE process_resident_memory_bytes gauge                                                                                                                                                                                                                                                                21:20:16 [154/966]
process_resident_memory_bytes 4.1680896e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.72375679833e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.38
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 4096.0
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{service_name="unknown_service",telemetry_sdk_language="python",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.26.0"} 1.0
# HELP grpc_client_attempt_started_total Number of client call attempts started
# TYPE grpc_client_attempt_started_total counter
grpc_client_attempt_started_total{grpc_method="other",grpc_target="localhost:50051"} 18.0
# HELP grpc_client_attempt_sent_total_compressed_message_size_bytes Compressed message bytes sent per client call attempt
# TYPE grpc_client_attempt_sent_total_compressed_message_size_bytes histogram
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="0.0"} 0.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="5.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="10.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="25.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="50.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="75.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="100.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="250.0"} 18.0

同様に、サーバーサイドの指標の場合 -

curl localhost:9464/metrics

5. Prometheus で指標を表示する

ここでは、prometheus を使用して指標をエクスポートする gRPC サンプルクライアントとサーバーをスクレイピングする prometheus インスタンスを設定します。

指定されたリンクを使用して、ご使用のプラットフォーム用の Prometheus の最新リリースをダウンロードするか、次のコマンドを使用します。

curl -sLO https://github.com/prometheus/prometheus/releases/download/v3.7.3/prometheus-3.7.3.linux-amd64.tar.gz

次のコマンドを使用して、抽出して実行します。

tar xvfz prometheus-*.tar.gz
cd prometheus-*

次の内容で Prometheus 構成ファイルを作成します。

cat > grpc_otel_python_prometheus.yml <<EOF
scrape_configs:
  - job_name: "prometheus"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "grpc-otel-python"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9464", "localhost:9465"]
EOF

新しい構成で Prometheus を起動します。

./prometheus --config.file=grpc_otel_python_prometheus.yml

これにより、クライアントとサーバーの Codelab プロセスから 5 秒ごとに指標がスクレイピングされるように構成されます。

http://localhost:9090/graph に移動して、指標を表示します。たとえば、次のクエリがあるとします。

histogram_quantile(0.5, rate(grpc_client_attempt_duration_seconds_bucket[1m]))

このクエリは、分位値の計算に 1 分間のタイムウィンドウを使用して、試行レイテンシの中央値を示すグラフを表示します。

クエリのレート -

increase(grpc_client_attempt_duration_seconds_bucket[1m])

6. （省略可）ユーザー向けの演習

Prometheus ダッシュボードで、QPS が低いことがわかります。QPS を制限している疑わしいコードを特定できるかどうか、サンプルで確認してください。

熱心なユーザーのために、クライアントコードは、特定の時点で保留中の RPC を 1 つだけにするように制限されています。これを変更して、クライアントが前の RPC の完了を待たずに、より多くの RPC を送信するようにできます。（この解決策は提供されていません）。