Google uses AI technology to translate content into your preferred language. AI translations can contain errors.

在 gRPC Python 中设置基本 OpenTelemetry 插件

1. 简介

在此 Codelab 中，您将使用 gRPC 创建一个客户端和服务器，它们将构成一个用 Python 编写的路线映射应用的基础。

完成本教程后，您将拥有一个通过 gRPC OpenTelemetry 插件进行插桩的简单 gRPC HelloWorld 应用，并且能够在 Prometheus 中看到导出的可观测性指标。

学习内容

如何为现有的 gRPC Python 应用设置 OpenTelemetry 插件
运行本地 Prometheus 实例
将指标导出到 Prometheus
查看 Prometheus 信息中心内的指标

2. 准备工作

所需条件

git
curl
build-essential
Python 3.9 或更高版本。如需查看针对具体平台的 Python 安装说明，请参阅 Python 设置和使用。或者，使用 uv 或 pyenv 等工具安装非系统 Python。
pip 版本 9.0.1 或更高版本，以安装 Python 软件包。
venv 来创建 Python 虚拟环境。

安装必备项：

sudo apt-get update -y
sudo apt-get upgrade -y
sudo apt-get install -y git curl build-essential clang
sudo apt install python3
sudo apt install python3-pip python3-venv

获取代码

为了简化学习过程，此 Codelab 提供了预构建的源代码框架，可帮助您快速入门。以下步骤将指导您在应用中对 gRPC OpenTelemetry 插件进行插桩处理。

grpc-codelabs

此 Codelab 的框架源代码位于此 GitHub 目录中。如果您不想自行实现代码，可以在 completed 目录中找到已完成的源代码。

首先，克隆 grpc Codelab 代码库，然后 cd 进入 grpc-python-opentelemetry 文件夹：

git clone https://github.com/grpc-ecosystem/grpc-codelabs.git
cd grpc-codelabs/codelabs/grpc-python-opentelemetry/

或者，您也可以下载仅包含 Codelab 目录的 .zip 文件，然后手动将其解压缩。

我们先创建一个新的 Python 虚拟环境 (venv)，以将项目的依赖项与系统软件包隔离开来：

python3 -m venv --upgrade-deps .venv

如需在 bash/zsh shell 中激活虚拟环境，请执行以下操作：

source .venv/bin/activate

对于 Windows 和非标准 shell，请参阅 https://docs.python.org/3/library/venv.html#how-venvs-work 中的表格。

接下来，使用以下命令在环境中安装依赖项：

python -m pip install -r requirements.txt

3. 注册 OpenTelemetry 插件

我们需要一个 gRPC 应用来添加 gRPC OpenTelemetry 插件。在此 Codelab 中，我们将使用简单的 gRPC HelloWorld 客户端和服务器，并使用 gRPC OpenTelemetry 插件对其进行插桩处理。

第一步是在客户端中注册配置了 Prometheus 导出器的 OpenTelemetry 插件。使用您惯用的编辑器打开 start_here/observability_greeter_client.py。首先，添加相关依赖项和宏，如下所示 -

import logging
import time

import grpc
import grpc_observability
import helloworld_pb2
import helloworld_pb2_grpc
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from opentelemetry.sdk.metrics import MeterProvider
from prometheus_client import start_http_server

_SERVER_PORT = "50051"
_PROMETHEUS_PORT = 9465

然后，转换 run()，使其如下所示 -

def run():
    # Start Prometheus client
    start_http_server(port=_PROMETHEUS_PORT, addr="0.0.0.0")
    meter_provider = MeterProvider(metric_readers=[PrometheusMetricReader()])

    otel_plugin = grpc_observability.OpenTelemetryPlugin(
        meter_provider=meter_provider
    )
    otel_plugin.register_global()

    with grpc.insecure_channel(target=f"localhost:{_SERVER_PORT}") as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        # Continuously send RPCs every second.
        while True:
            try:
                response = stub.SayHello(helloworld_pb2.HelloRequest(name="You"))
                print(f"Greeter client received: {response.message}")
                time.sleep(1)
            except grpc.RpcError as rpc_error:
                print("Call failed with code: ", rpc_error.code())

    # Deregister is not called in this example, but this is required to clean up.
    otel_plugin.deregister_global()

下一步是将 OpenTelemetry 插件添加到服务器。打开 start_here/observability_greeter_server.py 并添加相关依赖项和宏，使其看起来像这样 -

from concurrent import futures
import logging
import time

import grpc
import grpc_observability
import helloworld_pb2
import helloworld_pb2_grpc
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.exporter.prometheus import PrometheusMetricReader
from prometheus_client import start_http_server

_SERVER_PORT = "50051"
_PROMETHEUS_PORT = 9464

然后，转换 run()，使其如下所示 -

def serve():
    # Start Prometheus client
    start_http_server(port=_PROMETHEUS_PORT, addr="0.0.0.0")

    meter_provider = MeterProvider(metric_readers=[PrometheusMetricReader()])

    otel_plugin = grpc_observability.OpenTelemetryPlugin(
        meter_provider=meter_provider
    )
    otel_plugin.register_global()

    server = grpc.server(
        thread_pool=futures.ThreadPoolExecutor(max_workers=10),
    )
    helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    server.add_insecure_port("[::]:" + _SERVER_PORT)
    server.start()
    print("Server started, listening on " + _SERVER_PORT)

    server.wait_for_termination()

    # Deregister is not called in this example, but this is required to clean up.
    otel_plugin.deregister_global()

4. 运行示例并查看指标

如需运行服务器，请运行以下命令：

cd start_here
python -m observability_greeter_server

如果设置成功，您将看到以下服务器输出 -

Server started, listening on 50051

在服务器运行时，在另一个终端上运行客户端 -

# Run the below commands to cd to the working directory and activate virtual environment in the new terminal
cd grpc-codelabs/codelabs/grpc-python-opentelemetry/
source .venv/bin/activate

cd start_here
python -m observability_greeter_client

成功运行后，输出结果如下所示：

Greeter client received: Hello You
Greeter client received: Hello You
Greeter client received: Hello You

由于我们已设置 gRPC OpenTelemetry 插件以使用 Prometheus 导出指标。这些指标将分别在 localhost:9464（服务器）和 localhost:9465（客户端）上提供。

如需查看客户端指标，请执行以下操作：

curl localhost:9465/metrics

结果的格式应为：

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 241.0
python_gc_objects_collected_total{generation="1"} 163.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 78.0
python_gc_collections_total{generation="1"} 7.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="9",version="3.10.9"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.868988416e+09
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 4.1680896e+07
# TYPE process_resident_memory_bytes gauge                                                                                                                                                                                                                                                                21:20:16 [154/966]
process_resident_memory_bytes 4.1680896e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.72375679833e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.38
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 9.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 4096.0
# HELP target_info Target metadata
# TYPE target_info gauge
target_info{service_name="unknown_service",telemetry_sdk_language="python",telemetry_sdk_name="opentelemetry",telemetry_sdk_version="1.26.0"} 1.0
# HELP grpc_client_attempt_started_total Number of client call attempts started
# TYPE grpc_client_attempt_started_total counter
grpc_client_attempt_started_total{grpc_method="other",grpc_target="localhost:50051"} 18.0
# HELP grpc_client_attempt_sent_total_compressed_message_size_bytes Compressed message bytes sent per client call attempt
# TYPE grpc_client_attempt_sent_total_compressed_message_size_bytes histogram
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="0.0"} 0.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="5.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="10.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="25.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="50.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="75.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="100.0"} 18.0
grpc_client_attempt_sent_total_compressed_message_size_bytes_bucket{grpc_method="other",grpc_status="OK",grpc_target="localhost:50051",le="250.0"} 18.0

同样，对于服务器端指标 -

curl localhost:9464/metrics

5. 在 Prometheus 上查看指标

在此示例中，我们将设置一个 Prometheus 实例，该实例将抓取使用 Prometheus 导出指标的 gRPC 示例客户端和服务器。

使用给定的链接下载适用于您的平台的最新版 Prometheus，或使用以下命令：

curl -sLO https://github.com/prometheus/prometheus/releases/download/v3.7.3/prometheus-3.7.3.linux-amd64.tar.gz

然后使用以下命令提取并运行该文件：

tar xvfz prometheus-*.tar.gz
cd prometheus-*

创建一个包含以下内容的 Prometheus 配置文件：

cat > grpc_otel_python_prometheus.yml <<EOF
scrape_configs:
  - job_name: "prometheus"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9090"]
  - job_name: "grpc-otel-python"
    scrape_interval: 5s
    static_configs:
      - targets: ["localhost:9464", "localhost:9465"]
EOF

使用新配置启动 Prometheus -

./prometheus --config.file=grpc_otel_python_prometheus.yml

这会将客户端和服务器 Codelab 进程的指标配置为每 5 秒抓取一次。

前往 http://localhost:9090/graph 查看指标。例如，以下查询：

histogram_quantile(0.5, rate(grpc_client_attempt_duration_seconds_bucket[1m]))

将显示一个图表，其中包含使用 1 分钟作为分位数计算的时间窗口的尝试延迟时间中位数。

查询率 -

increase(grpc_client_attempt_duration_seconds_bucket[1m])

6. （可选）面向用户的练习

在 Prometheus 信息中心内，您会发现 QPS 较低。看看您能否在示例中找到一些限制 QPS 的可疑代码。

对于热衷于此的开发者，客户端代码将自身限制为在给定时刻仅具有一个待处理的 RPC。您可以修改此设置，以便客户端发送更多 RPC，而无需等待之前的 RPC 完成。（此问题的解决方案尚未提供。）