-
hi! first of all, thanks for providing this tool to us. The metrics page looks good unless we use multiple workers (we use gunicorn to start 4 workers for an API). On every request the metrics page is served by a different (and mostly unpredictable) worker, which returns the metrics of this specific worker only. As an example, this leads to "jumping" counter metrics, making them mostly unusable in Prometheus/Grafana. We use a simple config for the requests metrics:
And then access the the
Request #1 and #4 seem to be served by the same worker. So even if there's no traffic on the page, the resulting graph using the I was already thinking about including the worker ID and then group the results. But that's not very reliable at all. Did you see this before or do we miss something? greetings, Hans |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 1 reply
-
Hey Hans, I'll take a look at it this evening and provide a proper solution / answer. The issue should be one of the following:
|
Beta Was this translation helpful? Give feedback.
-
Hi Tim, I presume the fault is on our side. We missed the hint on the multiproc mentioned in the automatic doc, function expose. It's mentioned in the raise condition ( THANKS for the tip! Together with https://github.com/prometheus/client_python#multiprocess-mode-gunicorn we were able to reconfigure the API. Looks good now, but we will need some more investigation. best Hans |
Beta Was this translation helpful? Give feedback.
-
Wow thanks a lot! I also got stuck on this today for quite some time haha. Would it be an idea to add a section to the README regarding this? I think it could prevent quite a few people making the same mistake :). Because as soon as you'll containerize fastapi in the default way then you'll run into this problem (https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi) |
Beta Was this translation helpful? Give feedback.
-
Hey, I have a quick question—sorry if this doesn’t fit perfectly here, but I’d like to know how to handle a situation where we have multiple instances of the app, each with 4-5 workers (e.g., running on ECS or Kubernetes). I get that prometheus_multiproc_dir works well when we have multiple workers within the same instance, but how do we aggregate metrics across different instances? How can we ensure the metrics are merged correctly across all instances? Thank you in advance! |
Beta Was this translation helpful? Give feedback.
Hi Tim, I presume the fault is on our side. We missed the hint on the multiproc mentioned in the automatic doc, function expose. It's mentioned in the raise condition (
If prometheus_multiproc_dir env var is found ...
).THANKS for the tip!
Together with https://github.com/prometheus/client_python#multiprocess-mode-gunicorn we were able to reconfigure the API. Looks good now, but we will need some more investigation.
best Hans