[prometheus][remote_write] Failing to parse some histogram fields #7893

tetianakravchenko · 2023-09-20T13:51:34Z

Some documents are dropped due to:

"prometheus\":{\"apiserver_flowcontrol_priority_level_request_utilization\":{\"histogram\":{\"counts\":[5000945144],\"values\":[0.25]}},\"labels\":{\"instance\":\"10.128.0.10:443\",\"job\":\"kubernetes-apiservers\",\"phase\":\"waiting\",\"priority_level\":\"node-high\"}},\"service\":{\"type\":\"prometheus\"}}, Private:interface {}(nil), TimeSeries:true}, Flags:0x0, Cache:publisher.EventCache{m:mapstr.M(nil)}} (status=400): {\"type\":\"document_parsing_exception\",
    
    \"reason\":\"[1:2472] failed to parse field [prometheus.apiserver_flowcontrol_priority_level_request_utilization.histogram] of type [histogram]\",\"caused_by\":{\"type\":\"illegal_argument_exception\",
    
    "reason\":\"[1:2482] Numeric value (5000945144) out of range of int (-2147483648 - 2147483647)\\n at

"reason":"[1:3039] failed to parse field [prometheus.go_gc_pauses_seconds_total.histogram] of type [histogram]","caused_by":{"type":"document_parsing_exception","reason":"[1:3039] error parsing field [prometheus.go_gc_pauses_seconds_total.histogram], [values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]"}}, dropping event!

This could be related to the fact that the datastream was actually dropped first to empty the index

The text was updated successfully, but these errors were encountered:

tetianakravchenko · 2023-09-21T12:43:34Z

The second error ([values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]") is related to this issue - elastic/beats#36317, and is going to be fixed soon.

pjbertels · 2023-09-21T13:04:02Z

'kubernetes-apiservers' and job_name: 'kubernetes-cadvisor' are the two scraping targets that generate the histograms in my setup.

tetianakravchenko · 2023-09-22T12:31:39Z

I was able to reproduce the issue on my setup as well for multiple apiserver_flowcontrol_* histograms, it is actually just 3 metrics: apiserver_flowcontrol_priority_level_request_utilization,
apiserver_flowcontrol_demand_seats,
apiserver_flowcontrol_read_vs_write_current_requests

After some time, I see the histogram metric - prometheus.apiserver_flowcontrol_priority_level_request_utilization.histogram, but it is empty - {"values":[],"counts":[]}, not sure if it is a correct value:

tetianakravchenko · 2023-09-22T15:21:56Z

opened elasticsearch issue - elastic/elasticsearch#99820
one thing I can think of for now - add check on the beats side, so not whole document with all other metrics will be dropped

tetianakravchenko · 2023-09-26T14:22:14Z

regarding the error: reason":"[1:2805] failed to parse field [prometheus.go_gc_pauses_seconds_total.histogram] of type [histogram]","caused_by":{"type":"document_parsing_exception","reason":"[1:2805] error parsing field [prometheus.go_gc_pauses_seconds_total.histogram], [values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]"}}, dropping event!

all similar error seems to be coming from the kubernetes-nodes job.

The actual metric looks like:

curl -s localhost:10249/metrics | grep go_gc_pauses_seconds_total
# HELP go_gc_pauses_seconds_total Distribution individual GC-related stop-the-world pause latencies.
# TYPE go_gc_pauses_seconds_total histogram
go_gc_pauses_seconds_total_bucket{le="-5e-324"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-10"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999999e-09"} 0
go_gc_pauses_seconds_total_bucket{le="9.999999999999998e-08"} 0
go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-06"} 0
go_gc_pauses_seconds_total_bucket{le="1.0239999999999999e-05"} 24575
go_gc_pauses_seconds_total_bucket{le="0.00010239999999999998"} 25754
go_gc_pauses_seconds_total_bucket{le="0.0010485759999999998"} 51322
go_gc_pauses_seconds_total_bucket{le="0.010485759999999998"} 51579
go_gc_pauses_seconds_total_bucket{le="0.10485759999999998"} 51628
go_gc_pauses_seconds_total_bucket{le="+Inf"} 51628
go_gc_pauses_seconds_total_sum NaN
go_gc_pauses_seconds_total_count 51628

the first bucket actually is a negative value - le="-5e-324"

the same behavior for some other metrics - go_sched_latencies_seconds

this will be fixed by elastic/beats#36647

tetianakravchenko · 2023-10-11T11:47:05Z

first error - Numeric value (5000945144) out of range of int (-2147483648 - 2147483647) should be fixed in elastic/elasticsearch#99820

second error - [values] values must be in increasing order, got [-4.9E-324] but previous value was [0.0]" should be closed in elastic/beats#36647

both PRs were merged and will be available in 8.11.0

tetianakravchenko changed the title ~~[prometheus][remote_write]~~ [prometheus][remote_write] Failing to parse some histogram fields Sep 20, 2023

tetianakravchenko self-assigned this Sep 20, 2023

tetianakravchenko mentioned this issue Sep 26, 2023

[Prometheus] Align on the algorithm used to transform Prometheus histograms into Elasticsearch histograms elastic/beats#36647

Merged

6 tasks

tetianakravchenko closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prometheus][remote_write] Failing to parse some histogram fields #7893

[prometheus][remote_write] Failing to parse some histogram fields #7893

tetianakravchenko commented Sep 20, 2023 •

edited

Loading

tetianakravchenko commented Sep 21, 2023

pjbertels commented Sep 21, 2023 •

edited

Loading

tetianakravchenko commented Sep 22, 2023

tetianakravchenko commented Sep 22, 2023 •

edited

Loading

tetianakravchenko commented Sep 26, 2023 •

edited

Loading

tetianakravchenko commented Oct 11, 2023

[prometheus][remote_write] Failing to parse some histogram fields #7893

[prometheus][remote_write] Failing to parse some histogram fields #7893

Comments

tetianakravchenko commented Sep 20, 2023 • edited Loading

tetianakravchenko commented Sep 21, 2023

pjbertels commented Sep 21, 2023 • edited Loading

tetianakravchenko commented Sep 22, 2023

tetianakravchenko commented Sep 22, 2023 • edited Loading

tetianakravchenko commented Sep 26, 2023 • edited Loading

tetianakravchenko commented Oct 11, 2023

tetianakravchenko commented Sep 20, 2023 •

edited

Loading

pjbertels commented Sep 21, 2023 •

edited

Loading

tetianakravchenko commented Sep 22, 2023 •

edited

Loading

tetianakravchenko commented Sep 26, 2023 •

edited

Loading