Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics disappear when setting PROMETHEUS_MULTIPROC_DIR #282

Open
lucasalvarezlacasa opened this issue Jan 16, 2024 · 4 comments
Open

Metrics disappear when setting PROMETHEUS_MULTIPROC_DIR #282

lucasalvarezlacasa opened this issue Jan 16, 2024 · 4 comments

Comments

@lucasalvarezlacasa
Copy link

lucasalvarezlacasa commented Jan 16, 2024

I'm serving my FastApi application using more than one worker. For this, I had to set PROMETHEUS_MULTIPROC_DIR and make sure it points to a proper directory, so that all workers can read/write metrics there.

However, I noticed that the resulting /metrics endpoint exposes way less metrics than if I don't do this. For instance, metrics related to the garbage collector, the process information (CPU, GPU utilization), python version, etc, are not exposed anymore. All I see now are metrics related to the HTTP requests and responses.

Any ideas why? Am I doing something wrong?

These are the metrics I get when running with more than one worker:

# HELP http_request_size_bytes Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes summary
http_request_size_bytes_count{handler="/metrics"} 59.0
http_request_size_bytes_sum{handler="/metrics"} 404.0
http_request_size_bytes_count{handler="/status"} 7.0
http_request_size_bytes_sum{handler="/status"} 707.0
# HELP http_response_size_bytes Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes summary
http_response_size_bytes_count{handler="/metrics"} 59.0
http_response_size_bytes_sum{handler="/metrics"} 223520.0
http_response_size_bytes_count{handler="/status"} 7.0
http_response_size_bytes_sum{handler="/status"} 14.0
# HELP http_request_duration_highr_seconds Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds histogram
http_request_duration_highr_seconds_sum 0.35216000000000003
http_request_duration_highr_seconds_bucket{le="0.01"} 54.0
http_request_duration_highr_seconds_bucket{le="0.025"} 66.0
http_request_duration_highr_seconds_bucket{le="0.05"} 66.0
http_request_duration_highr_seconds_bucket{le="0.075"} 66.0
http_request_duration_highr_seconds_bucket{le="0.1"} 66.0
http_request_duration_highr_seconds_bucket{le="0.25"} 66.0
http_request_duration_highr_seconds_bucket{le="0.5"} 66.0
http_request_duration_highr_seconds_bucket{le="0.75"} 66.0
http_request_duration_highr_seconds_bucket{le="1.0"} 66.0
http_request_duration_highr_seconds_bucket{le="1.5"} 66.0
http_request_duration_highr_seconds_bucket{le="2.0"} 66.0
http_request_duration_highr_seconds_bucket{le="2.5"} 66.0
http_request_duration_highr_seconds_bucket{le="3.0"} 66.0
http_request_duration_highr_seconds_bucket{le="3.5"} 66.0
http_request_duration_highr_seconds_bucket{le="4.0"} 66.0
http_request_duration_highr_seconds_bucket{le="4.5"} 66.0
http_request_duration_highr_seconds_bucket{le="5.0"} 66.0
http_request_duration_highr_seconds_bucket{le="7.5"} 66.0
http_request_duration_highr_seconds_bucket{le="10.0"} 66.0
http_request_duration_highr_seconds_bucket{le="30.0"} 66.0
http_request_duration_highr_seconds_bucket{le="60.0"} 66.0
http_request_duration_highr_seconds_bucket{le="+Inf"} 66.0
http_request_duration_highr_seconds_count 66.0
# HELP http_request_duration_seconds Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{handler="/metrics",method="GET"} 0.34132999999999997
http_request_duration_seconds_sum{handler="/status",method="GET"} 0.01083
http_request_duration_seconds_bucket{handler="/metrics",le="0.1",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="0.5",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="1.0",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/metrics",le="+Inf",method="GET"} 59.0
http_request_duration_seconds_count{handler="/metrics",method="GET"} 59.0
http_request_duration_seconds_bucket{handler="/status",le="0.1",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="0.5",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="1.0",method="GET"} 7.0
http_request_duration_seconds_bucket{handler="/status",le="+Inf",method="GET"} 7.0
http_request_duration_seconds_count{handler="/status",method="GET"} 7.0
# HELP nlu_call_times_total Number of times the NLU has been called
# TYPE nlu_call_times_total counter
nlu_call_times_total 0.0
# HELP http_requests_total Total number of requests by method, status and handler.
# TYPE http_requests_total counter
http_requests_total{handler="/metrics",method="GET",status="2xx"} 59.0
http_requests_total{handler="/status",method="GET",status="2xx"} 7.0

These are the ones I get when running only with one worker (and thus, not using PROMETHEUS_MULTIPROC_DIR):

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 29056.0
python_gc_objects_collected_total{generation="1"} 20925.0
python_gc_objects_collected_total{generation="2"} 3496.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 460.0
python_gc_collections_total{generation="1"} 41.0
python_gc_collections_total{generation="2"} 3.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="11",patchlevel="0",version="3.11.0"} 1.0
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 2.09629184e+08
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.06852352e+08
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.70540120624e+09
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 1.4400000000000002
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 17.0
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP nlu_call_times_total Number of times the NLU has been called
# TYPE nlu_call_times_total counter
nlu_call_times_total 0.0
# HELP nlu_call_times_created Number of times the NLU has been called
# TYPE nlu_call_times_created gauge
nlu_call_times_created 1.7054012078114848e+09
# HELP http_requests_total Total number of requests by method, status and handler.
# TYPE http_requests_total counter
http_requests_total{handler="/metrics",method="GET",status="2xx"} 4.0
http_requests_total{handler="/status",method="GET",status="2xx"} 3.0
# HELP http_requests_created Total number of requests by method, status and handler.
# TYPE http_requests_created gauge
http_requests_created{handler="/metrics",method="GET",status="2xx"} 1.7054012132842155e+09
http_requests_created{handler="/status",method="GET",status="2xx"} 1.7054012154598083e+09
# HELP http_request_size_bytes Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes summary
http_request_size_bytes_count{handler="/metrics"} 4.0
http_request_size_bytes_sum{handler="/metrics"} 303.0
http_request_size_bytes_count{handler="/status"} 3.0
http_request_size_bytes_sum{handler="/status"} 303.0
# HELP http_request_size_bytes_created Content length of incoming requests by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_request_size_bytes_created gauge
http_request_size_bytes_created{handler="/metrics"} 1.7054012132842422e+09
http_request_size_bytes_created{handler="/status"} 1.7054012154598227e+09
# HELP http_response_size_bytes Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes summary
http_response_size_bytes_count{handler="/metrics"} 4.0
http_response_size_bytes_sum{handler="/metrics"} 26713.0
http_response_size_bytes_count{handler="/status"} 3.0
http_response_size_bytes_sum{handler="/status"} 6.0
# HELP http_response_size_bytes_created Content length of outgoing responses by handler. Only value of header is respected. Otherwise ignored. No percentile calculated. 
# TYPE http_response_size_bytes_created gauge
http_response_size_bytes_created{handler="/metrics"} 1.70540121328427e+09
http_response_size_bytes_created{handler="/status"} 1.7054012154598377e+09
# HELP http_request_duration_highr_seconds Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds histogram
http_request_duration_highr_seconds_bucket{le="0.01"} 7.0
http_request_duration_highr_seconds_bucket{le="0.025"} 7.0
http_request_duration_highr_seconds_bucket{le="0.05"} 7.0
http_request_duration_highr_seconds_bucket{le="0.075"} 7.0
http_request_duration_highr_seconds_bucket{le="0.1"} 7.0
http_request_duration_highr_seconds_bucket{le="0.25"} 7.0
http_request_duration_highr_seconds_bucket{le="0.5"} 7.0
http_request_duration_highr_seconds_bucket{le="0.75"} 7.0
http_request_duration_highr_seconds_bucket{le="1.0"} 7.0
http_request_duration_highr_seconds_bucket{le="1.5"} 7.0
http_request_duration_highr_seconds_bucket{le="2.0"} 7.0
http_request_duration_highr_seconds_bucket{le="2.5"} 7.0
http_request_duration_highr_seconds_bucket{le="3.0"} 7.0
http_request_duration_highr_seconds_bucket{le="3.5"} 7.0
http_request_duration_highr_seconds_bucket{le="4.0"} 7.0
http_request_duration_highr_seconds_bucket{le="4.5"} 7.0
http_request_duration_highr_seconds_bucket{le="5.0"} 7.0
http_request_duration_highr_seconds_bucket{le="7.5"} 7.0
http_request_duration_highr_seconds_bucket{le="10.0"} 7.0
http_request_duration_highr_seconds_bucket{le="30.0"} 7.0
http_request_duration_highr_seconds_bucket{le="60.0"} 7.0
http_request_duration_highr_seconds_bucket{le="+Inf"} 7.0
http_request_duration_highr_seconds_count 7.0
http_request_duration_highr_seconds_sum 0.023
# HELP http_request_duration_highr_seconds_created Latency with many buckets but no API specific labels. Made for more accurate percentile calculations. 
# TYPE http_request_duration_highr_seconds_created gauge
http_request_duration_highr_seconds_created 1.7054012078115673e+09
# HELP http_request_duration_seconds Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{handler="/metrics",le="0.1",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="0.5",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="1.0",method="GET"} 4.0
http_request_duration_seconds_bucket{handler="/metrics",le="+Inf",method="GET"} 4.0
http_request_duration_seconds_count{handler="/metrics",method="GET"} 4.0
http_request_duration_seconds_sum{handler="/metrics",method="GET"} 0.01729
http_request_duration_seconds_bucket{handler="/status",le="0.1",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="0.5",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="1.0",method="GET"} 3.0
http_request_duration_seconds_bucket{handler="/status",le="+Inf",method="GET"} 3.0
http_request_duration_seconds_count{handler="/status",method="GET"} 3.0
http_request_duration_seconds_sum{handler="/status",method="GET"} 0.00571
# HELP http_request_duration_seconds_created Latency with only few buckets by handler. Made to be only used if aggregation by handler is important. 
# TYPE http_request_duration_seconds_created gauge
http_request_duration_seconds_created{handler="/metrics",method="GET"} 1.7054012132843099e+09
http_request_duration_seconds_created{handler="/status",method="GET"} 1.7054012154598625e+09

This is the code I'm using to register the instrumentator:

def register_instrumentator(app: FastAPI) -> None:
    """Registers the instrumentator into the application"""
    settings: Settings = get_settings()
    instrumentator: Instrumentator = Instrumentator(
        should_round_latency_decimals=settings.METRICS_SHOULD_ROUND_LATENCY_DECIMALS,
        round_latency_decimals=settings.METRICS_LATENCY_DECIMALS,
        excluded_handlers=settings.METRICS_EXCLUDE_HANDLERS,
        should_respect_env_var=True,
    )
    instrumentator.add(metrics.default())  # this is needed to have all default + custom metrics
    instrumentator.instrument(app=app).expose(app=app, endpoint=settings.METRICS_ENDPOINT)
@angel18megha
Copy link

Any updates, I am facing the same issue as well.

@Zwujun
Copy link

Zwujun commented Aug 27, 2024

How did you set the environment variable? If you added it in the code using os.environ[key] = val, then it needs to be placed before importing PrometheusFastApiInstrumentator.

@therrj
Copy link

therrj commented Oct 22, 2024

I'm running into a similar issue. In my case the PROMETHEUS_MULTIPROC_DIR is set in the environment that FastAPI is started from.

My instrumentator setup is pretty simple:

app = FastAPI()
instrumentator = Instrumentator(excluded_handlers=["/metrics"]).instrument(app)

and then in the api startup I run
instrumentator.expose(app)

The process metrics were working before setting PROMETHEUS_MULTIPROC_DIR but I need to set that as I've got multiple workers running which was messing with the metrics scraping

@vincentnonim
Copy link

Hello,

I'm facing the same issue.
I found this README.md explaining it's a limitation of the Prometheus client library (documented here)...

No metrics for things like CPU and memory. They come from components like the ProcessCollector and PlatformCollector which are not supported by the Prometheus client library in multi process mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants