Unable to collect metrics for SPM #3975
-
Hi, I already went thought
troubleshooting, but it seems that I have something misconfigured or misunderstood which concludes in the monitor section being empty I'm using Python with Jaeger (traces) and Prometheus (metrics) and I'm able to successfully extract traces and metrics respectively. For the sake of this question, and to keep it simple, let's suppose my Python program consumes from Kafka messages, processes its contents, and just relays them over to a HTTP destination. simple example of the pattern tracer = trace.get_tracer(__name__)
async def send_post_request(url: str, data: dict[str, object]) -> None:
with tracer.start_as_current_span('send_post_request'):
async with self.create_http_client() as async_client:
await async_client.post(url, data) I'm using HTTPX to send the messages, which are POST. Will the SPM be populated automatically or do I need to manually and explicitly tell it how to generate Here's my current configuration, what am I missing to make the SPM functional? Thanks prometheus.yml global:
scrape_interval: 15s # By default, scrape targets every 15 seconds.
evaluation_interval: 15s
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'my-program'
query_log_file: 'prometheus.log'
# A scrape configuration containing exactly one endpoint to scrape:
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
# https://github.com/prometheus/node_exporter
- job_name: 'node_exporter'
honor_labels: true
honor_timestamps: true
static_configs:
- targets: ['node_exporter:9100']
- job_name: my_program
scrape_interval: 15s
static_configs:
- targets: ['host.docker.internal:8000']
metric_relabel_configs:
- source_labels: [__name__]
regex: '.*grpc_io.*'
action: drop otel-collector-config.yml receivers:
jaeger:
protocols:
grpc:
# Dummy receiver that's never used, because a pipeline is required to have one.
otlp/spanmetrics:
protocols:
grpc:
endpoint: 'localhost:65535'
otlp:
protocols:
grpc:
prometheus:
config:
scrape_configs:
- job_name: open-telemetry
scrape_interval: 15s
static_configs:
- targets: ['localhost:8888']
metric_relabel_configs:
- source_labels: [__name__]
regex: '.*grpc_io.*'
action: drop
exporters:
logging:
loglevel: debug
prometheus:
endpoint: '0.0.0.0:8889'
resource_to_telemetry_conversion:
enabled: true
jaeger:
endpoint: 'jaeger:14250'
tls:
insecure: true
otlp/spanmetrics:
endpoint: 'localhost:55677'
tls:
insecure: true
processors:
# https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md
batch:
# https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiterprocessor/README.md
memory_limiter:
check_interval: 5s
limit_mib: 819
spike_limit_mib: 256
spanmetrics:
metrics_exporter: prometheus
latency_histogram_buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions_cache_size: 1500
# Additional list of dimensions on top of:
# - service.name
# - operation
# - span.kind
# - status.code
dimensions:
# If the span is missing http.method, the processor will insert
# the http.method dimension with value 'GET'.
# For example, in the following scenario, http.method is not present in a span and so will be added as a dimension to the metric with value "GET":
# - calls_total{http_method="GET",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
- name: http.method
default: GET
# If a default is not provided, the http.status_code dimension will be omitted
# if the span does not contain http.status_code.
# For example, consider a scenario with two spans, one span having http.status_code=200 and another missing http.status_code. Two metrics would result with this configuration, one with the http_status_code omitted and the other included:
# - calls_total{http_status_code="200",operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
# - calls_total{operation="/Address",service_name="shippingservice",span_kind="SPAN_KIND_SERVER",status_code="STATUS_CODE_UNSET"} 1
- name: http.status_code
# The aggregation temporality of the generated metrics.
# Default: "AGGREGATION_TEMPORALITY_CUMULATIVE"
aggregation_temporality: 'AGGREGATION_TEMPORALITY_CUMULATIVE'
extensions:
health_check:
memory_ballast:
pprof:
endpoint: :1888
zpages:
# http://localhost:55679/debug/tracez
endpoint: :55679
service:
extensions: [memory_ballast, health_check, zpages, pprof]
telemetry:
metrics:
address: :8888
logs:
level: debug
pipelines:
traces:
receivers: [jaeger]
processors: [memory_limiter, batch, spanmetrics]
exporters: [logging, jaeger]
metrics:
receivers: [prometheus]
processors: [memory_limiter, batch]
exporters: [logging]
# The exporter name must match the metrics_exporter name.
# The receiver is just a dummy and never used; added to pass validation requiring at least one receiver in a pipeline.
metrics/spanmetrics:
receivers: [otlp/spanmetrics]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [logging] docker-compose ---
version: "3.9"
services:
jaeger:
networks:
- backend
container_name: jaeger
image: jaegertracing/all-in-one:latest
volumes:
- "./jaeger-ui.json:/etc/jaeger/jaeger-ui.json"
command: --query.ui-config /etc/jaeger/jaeger-ui.json
# https://www.jaegertracing.io/docs/1.38/cli/
# https://github.com/jaegertracing/jaeger/discussions/2834?sort=top
environment:
- METRICS_STORAGE_TYPE=prometheus
- PROMETHEUS_SERVER_URL=http://prometheus:9090
- COLLECTOR_ENABLE_SPAN_SIZE_METRICS=true
ports:
- 14269:14269 # admin port
- 14250:14250 # gRPC
- 14268:14268
- 6831:6831/udp # Thrift compact, might delete
- 16686:16686 # UI
- 16685:16685 # Protobuf
- 9411:9411
- 16687:16687
- 5775:5775/udp
- 6832:6832
- 5778:5778
restart: on-failure
otel_collector:
networks:
- backend
image: otel/opentelemetry-collector-contrib:latest
container_name: otel_collector
volumes:
- "./otel-collector-config.yml:/etc/otelcol/otel-collector-config.yml"
command: --config /etc/otelcol/otel-collector-config.yml
depends_on:
- jaeger
#TODO!: Do I need these ports? probably..
ports:
- 1888:1888 # pprof extension
- 8888:8888 # Prometheus metrics exposed by the collector
- 8889:8889 # Prometheus exporter metrics
- 13133:13133 # health_check extension
- 4317:4317 # OTLP gRPC receiver
- 55679:55679 # zpages extension
restart: on-failure
prometheus:
container_name: prometheus
networks:
- backend
image: prom/prometheus:latest
volumes:
# - ./prometheus:/etc/prometheus
# - ./prometheus_data:/prometheus
- "./prometheus.yml:/etc/prometheus/prometheus.yml"
- "./prometheus-alerts.yml:/etc/prometheus/prometheus-alerts.yml"
command:
- '--config.file=/etc/prometheus/prometheus.yml'
# - '--storage.tsdb.path=/prometheus'
# - '--storage.tsdb.retention.time=200h'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--web.enable-lifecycle'
ports:
- 9090:9090
restart: on-failure
node_exporter:
image: prom/node-exporter:latest
container_name: node_exporter
restart: on-failure
# Additional volumes?
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro # Docker
# - '/:/host:ro,rslave' # Host machine
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs' # Docker
# - '--path.rootfs=/host' # Host machine
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc|var/lib/docker/.+)($$|/)'
# /roots/etc/os-release permission denied node exporter
ports:
- 9100:9100
networks:
- backend
grafana:
container_name: grafana
networks:
- backend
image: grafana/grafana:latest
volumes:
# - grafana_data:/var/lib/grafana
# - ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana.ini:/etc/grafana/grafana.ini
- ./datasource.yml:/etc/grafana/provisioning/datasources/datasource.yaml
# environment:
# - GF_SECURITY_ADMIN_USER=admin
# - GF_SECURITY_ADMIN_PASSWORD=admin
# - GF_USERS_ALLOW_SIGN_UP=false
ports:
- 3000:3000
restart: on-failure
networks:
backend:
pyproject.toml (telemetry dependencies) [tool.poetry.group.telemetry.dependencies]
opentelemetry-distro = {extras = ["otlp"], version = "^0.34b0"}
opentelemetry-exporter-jaeger-proto-grpc = "^1.13.0"
opentelemetry-instrumentation-httpx = "^0.34b0"
opentelemetry-instrumentation-kafka-python = "^0.34b0"
opentelemetry-instrumentation-logging = "^0.34b0"
opentelemetry-instrumentation-sqlalchemy = "^0.34b0"
opentelemetry-instrumentation-system-metrics = "^0.34b0"
opentelemetry-exporter-prometheus = "^1.12.0rc1"
prometheus-client = "^0.14.1" |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 9 replies
-
Isn't this your problem? If your OTEL collector is not receiving any spans there's nothing for the rest of the pipeline to do, so you'll get no metrics. Your application needs to export data to OTEL collector, not directly to Jaeger. |
Beta Was this translation helpful? Give feedback.
-
In case it helps you @yurishkuro @albertteoh to figure out where's the problem I'm posting a log from the docker containter. |
Beta Was this translation helpful? Give feedback.
-
The problem might be this:
Since you're using OTLP exporters, then jaeger receivers won't know how to read spans from your python script, and that's probably the cause of the errors you see. The second issue is the second line seems to overwrite the first. As I couldn't get your python script to work quickly, I just threw in a
|
Beta Was this translation helpful? Give feedback.
The problem might be this:
Since you're using OTLP exporters, then jaeger receivers won't know how to read spans from your python script, and that's probably the cause of the errors you see. The second issue is the second line seems to overwrite the first.
As I couldn't get your python script to work quickly, I just threw in a
microsim
container into your docker-compose file and, using ajaeger
receiver, I could see metrics in SPM: