Metrics with multiple workers not properly working? #24

ayyuno · 2021-01-05T11:48:34Z

ayyuno
Jan 5, 2021

hi! first of all, thanks for providing this tool to us.

The metrics page looks good unless we use multiple workers (we use gunicorn to start 4 workers for an API). On every request the metrics page is served by a different (and mostly unpredictable) worker, which returns the metrics of this specific worker only.

As an example, this leads to "jumping" counter metrics, making them mostly unusable in Prometheus/Grafana.

We use a simple config for the requests metrics:

instrumentator.add(metrics.requests())

And then access the the /metrics endpoint (for simplification I focus on a single metric only) ...:

#1 ... http_requests_total{handler="/credits/",method="GET",status="200"} 161.0
#2 ... http_requests_total{handler="/credits/",method="GET",status="200"} 223.0
#3 ... http_requests_total{handler="/credits/",method="GET",status="200"} 197.0
#4 ... http_requests_total{handler="/credits/",method="GET",status="200"} 161.0
#5 ... http_requests_total{handler="/credits/",method="GET",status="200"} 161.0
...

Request #1 and #4 seem to be served by the same worker. So even if there's no traffic on the page, the resulting graph using the rate function would look like a white noise signal:

I was already thinking about including the worker ID and then group the results. But that's not very reliable at all. Did you see this before or do we miss something?

greetings, Hans

Answered by ayyuno

Jan 5, 2021

Hi Tim, I presume the fault is on our side. We missed the hint on the multiproc mentioned in the automatic doc, function expose. It's mentioned in the raise condition (If prometheus_multiproc_dir env var is found ... ).

THANKS for the tip!

Together with https://github.com/prometheus/client_python#multiprocess-mode-gunicorn we were able to reconfigure the API. Looks good now, but we will need some more investigation.

best Hans

View full answer

trallnag · 2021-01-05T13:37:02Z

trallnag
Jan 5, 2021
Maintainer

Hey Hans, I'll take a look at it this evening and provide a proper solution / answer. The issue should be one of the following:

I forgot to set the correct multiprocess mode in the metrics.py file where I defined the "instrumentation handlers" that contain the Prometheus metrics.
You forgot to expose the /metrics endpoint with taking multiprocess mode into consideration. Take a look at the docs for the expose() method or create your own endpoint that serves metrics.
A combination of point 1 and 2.

0 replies

ayyuno · 2021-01-05T15:45:51Z

ayyuno
Jan 5, 2021
Author

Hi Tim, I presume the fault is on our side. We missed the hint on the multiproc mentioned in the automatic doc, function expose. It's mentioned in the raise condition (If prometheus_multiproc_dir env var is found ... ).

THANKS for the tip!

Together with https://github.com/prometheus/client_python#multiprocess-mode-gunicorn we were able to reconfigure the API. Looks good now, but we will need some more investigation.

best Hans

0 replies

nielstenboom · 2021-02-02T15:56:29Z

nielstenboom
Feb 2, 2021

Wow thanks a lot! I also got stuck on this today for quite some time haha.

Would it be an idea to add a section to the README regarding this? I think it could prevent quite a few people making the same mistake :).

Because as soon as you'll containerize fastapi in the default way then you'll run into this problem (https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi)

1 reply

trallnag Feb 14, 2021
Maintainer

Good point #26

javi-cortes · 2024-12-19T10:57:03Z

javi-cortes
Dec 19, 2024

Hey, I have a quick question—sorry if this doesn’t fit perfectly here, but I’d like to know how to handle a situation where we have multiple instances of the app, each with 4-5 workers (e.g., running on ECS or Kubernetes).

I get that prometheus_multiproc_dir works well when we have multiple workers within the same instance, but how do we aggregate metrics across different instances?

How can we ensure the metrics are merged correctly across all instances?
What’s the best way to ensure Prometheus or Grafana can handle this setup consistently?

Thank you in advance!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics with multiple workers not properly working? #24

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Metrics with multiple workers not properly working? #24

ayyuno Jan 5, 2021

Replies: 4 comments · 1 reply

trallnag Jan 5, 2021 Maintainer

ayyuno Jan 5, 2021 Author

nielstenboom Feb 2, 2021

trallnag Feb 14, 2021 Maintainer

javi-cortes Dec 19, 2024

ayyuno
Jan 5, 2021

Replies: 4 comments 1 reply

trallnag
Jan 5, 2021
Maintainer

ayyuno
Jan 5, 2021
Author

nielstenboom
Feb 2, 2021

trallnag Feb 14, 2021
Maintainer

javi-cortes
Dec 19, 2024