Enhance Histogram feature implementation of Prometheus Server Integration #5042

gizas · 2023-01-18T14:20:51Z

Context

Histograms are a one of the supported types of metrics of Prometheus toolkit. In general Histograms provide pre-aggregated numerical values in the form of groups.

In our Prometheus Integration we currently provide support of histogram type by enabling Use types option. By enabling this option, we retrieve prometheus metrics categorised as histograms and index those inside Elasticsearch. We have identified that support of histograms through Elasticsearch needs specific pre-processing on index time in our integration package. Additionally, relevant efforts (1165, 26903) revealed possible enhancements that can be added to our code.

Diagnosis

Users have reported differences between the histograms scraped from Prometheus comparing to the ones we save inside Elasticsearch . This revealed the extra calculation we during ingestion time and also the need to document and explain the procedure to our users.

Prometheus Buckets scraped:

Bucket	Value
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="+Inf"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.1"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.2"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.4"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="1"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="120"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="20"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="3"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="60"}	1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="8"}	1

Elasticsearch Histograms we ingest (retrieved from Kibana Discovery):

"prometheus": {
      "prometheus_http_request_duration_seconds": {
        "histogram": {
          "counts": [
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0
          ],
          "values": [
            0.05,
            0.15000000000000002,
            0.30000000000000004,
            0.7,
            2,
            5.5,
            14,
            40,
            90,
            180
          ]
        }
      },
      "labels": {
        "handler": "/api/v1/label/:name/values",
        "instance": "prometheus-server-server.kube-system:80",
        "job": "prometheus"
      }

Questions that we need to answer:

Why le Bucket values are different than the ones we see in Elastic?
What is the value in Elasticsearch of the +Inf bucket ?
(Code Ref) In our example: 120 + (120-60) = 180, so it matches with 180 Value.

Additionally for http_request_duration_seconds, Prometheus offers prometheus_http_request_duration_seconds_count:1 and http_request_duration_seconds_sum:1.

prometheus_http_request_duration_seconds_count{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"}:1

prometheus_http_request_duration_seconds_sum{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"} : 1

Count and Sum values are not returned in from our code, so not present in Elasticsearch. Is there any valid scenario where those might needed?

Also prometheus_http_request_duration_seconds_histogram field is not available to search and provide filters in Kibana Discovery

Comparing to other fields:

Action

This story summarizes all the actions we have categorised that are needed in order to enhance the Prometheus Histogram support in our integration:

Code Enhancements:

Account for negative count values inside initial buckets
Use the preceding bucket's value for +Inf "le"
for the first bucket only: if it has a negative "le", use the value as-is; otherwise use half its value (midpoint to zero)
Investigate if we need to provide sum and count values additonally to the ones we provide now
Can we retrieve and index histogram buckets exactly as retrieved from Prometheus? If no we need to document this but if yes we need to evaluate if we need to support this as a new enhancement in the code. Is there any Elasticsearch limitations that prevent us from doing this?

Kibana Support:

We need to create a visualisation based on histograms. Understand all the different functions that are suggested to be used with histograms like aggregations, buckets etc.

Check available Use Cases of histograms here

Documentation Enhancement:

Document and explain the current centroid calculation (https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/prometheus/collector/histogram.go#L34)
Document why we have chosen to implement only T-Digest type of histograms (see comment here). We will probably need to sync with Elasticsearch team to understand more about the logic behind the choice

Deliverables

Relevant code improvements in Prometheus code base
Documentation updates that will explain the end-to-end user journey and support of Histogram type

Relevant Links

https://www.elastic.co/guide/en/elasticsearch/reference/current/histogram.html --- We support only T-Digest type of histograms
Prometheus code base

Useful External links

The text was updated successfully, but these errors were encountered:

ruflin · 2023-01-23T12:07:59Z

For the histograms in Elasticsearch that you showed above as sample outputs, are these stored with histogram field type? If yes, is this done by a dynamic template? In the case of APM Agents AFAIK there is a dynamic template name that gets assigned in the ingest pipeline: https://github.com/elastic/apm-server/blob/main/apmpackage/apm/data_stream/app_metrics/elasticsearch/ingest_pipeline/default.yml#L16-L35

Can you get the exact mapping of the index that is created for the fields you listed above. This will also help investigate why the fields do not show up in the query.

gizas · 2023-01-27T10:53:04Z

Indeed they are stored as .histogram type, when Use Types enabled

Also to add some more details after some more testing:

When Use_types is disabled (we also need to disable Rate Counters ) then bucket fields are present in Elasticsearch although they come as Unmapped
When Use Types enabled, we call PromHistogramToES Function that recalculates the buckets and returns .histogram fields.
The mapping in this case is based on this https://github.com/elastic/integrations/blob/main/packages/prometheus/data_stream/collector/fields/fields.yml#L36

All the histograms do not appear on the filter search box. So I can not create a filter with histogram type fields maybe?

ruflin · 2023-01-30T08:16:14Z

then bucket fields are present in Elasticsearch although they come as Unmapped

What does Unmapped mean in this context? Is it a keyword?

All the histograms do not appear on the filter search box. So I can not create a filter with histogram type fields maybe?

@gizas Can you share the query / filter you would want to run on this histogram. You are correct that you can't filter on a histogram value but only run aggregations.

gizas · 2023-01-31T13:59:11Z

Unmapped= Fields that are not explicitly matched to a field data type

There is no reference for .bucket fields that are mentioned above when Use_types is disabled, in our mappings file.

And as for the filter I was trying the simple exists filter, like: prometheus_http_request_duration_seconds.histogram: *

ruflin · 2023-02-01T08:20:43Z

Unmapped= Fields that are not explicitly matched to a field data type

Ok, I assume all these fields fall back to keyword as it is set as default mapping?

And as for the filter I was trying the simple exists filter, like: prometheus_http_request_duration_seconds.histogram: *

Is this just for testing or is it what you will want to use in some visualisations? Exists query only works when the field is indexed which is not the case for histogram and likely not the case for most metrics in the future. The assumption we are following is that most metrics are not used for filtering but aggregations. Does this apply here too?

gizas · 2023-02-01T09:51:08Z

_bucket fields are coming as Numbers, just double checked.

And yes the filter was just for testing. I guess for now indeed does not seem a valid user scenario.

gizas · 2023-02-01T11:35:47Z

I was performing the tests with managed agent and was switching on/off the Use_types and Rate_Counters. So that is why the fields come as Unmapped until we refresh whole browser. Seems like Kibana has some kind of cache or does some mapping in the background and until you refresh you dont take the updates.

So bottom line _bukets come as Numbers, which are actually matched as double based on here

And _histograms based on here as histograms.

Documentation for not indexing histograms see here

ruflin · 2023-02-02T07:21:05Z

_bucket fields are coming as Numbers, just double checked.

For these mappings, lets make sure we always refer to the Elasticsearch types / mappings and not kibana data views as the ES ones are the ones that count. Glad to see the follow up details on the actual mapping in Elasticsearch.

@gizas Let me know if there anything further you need on my end.

tetianakravchenko · 2023-09-22T09:17:43Z

Code Enhancements:

Account for negative count values inside initial buckets

Use the preceding bucket's value for +Inf "le"

for the first bucket only: if it has a negative "le", use the value as-is; otherwise use half its value (midpoint to zero)

3 points above are covered by elastic/beats#36647

Investigate if we need to provide sum and count values additonally to the ones we provide now

Can we retrieve and index histogram buckets exactly as retrieved from Prometheus? If no we need to document this but if yes we need to evaluate if we need to support this as a new enhancement in the code. Is there any Elasticsearch limitations that prevent us from doing this?

this need investigation

gizas added enhancement New feature or request Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team [elastic/obs-cloudnative-monitoring] labels Jan 18, 2023

tetianakravchenko self-assigned this Feb 2, 2023

tetianakravchenko mentioned this issue Feb 2, 2023

Adjust Prometheus histogram buckets calculation elastic/beats#26903

Closed

tetianakravchenko mentioned this issue Sep 21, 2023

[Prometheus] Align on the algorithm used to transform Prometheus histograms into Elasticsearch histograms elastic/beats#36647

Merged

6 tasks

andrewkroh added the Integration:prometheus Prometheus label Aug 14, 2024

henrikno mentioned this issue Nov 20, 2024

Missing count and sum in prometheus histogram metrics elastic/beats#41573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance Histogram feature implementation of Prometheus Server Integration #5042

Enhance Histogram feature implementation of Prometheus Server Integration #5042

gizas commented Jan 18, 2023 •

edited

Loading

ruflin commented Jan 23, 2023

gizas commented Jan 27, 2023

ruflin commented Jan 30, 2023

gizas commented Jan 31, 2023 •

edited

Loading

ruflin commented Feb 1, 2023

gizas commented Feb 1, 2023

gizas commented Feb 1, 2023

ruflin commented Feb 2, 2023

tetianakravchenko commented Sep 22, 2023

Enhance Histogram feature implementation of Prometheus Server Integration #5042

Enhance Histogram feature implementation of Prometheus Server Integration #5042

Comments

gizas commented Jan 18, 2023 • edited Loading

Context

Diagnosis

Action

Code Enhancements:

Kibana Support:

Documentation Enhancement:

Deliverables

Relevant Links

Useful External links

ruflin commented Jan 23, 2023

gizas commented Jan 27, 2023

ruflin commented Jan 30, 2023

gizas commented Jan 31, 2023 • edited Loading

ruflin commented Feb 1, 2023

gizas commented Feb 1, 2023

gizas commented Feb 1, 2023

ruflin commented Feb 2, 2023

tetianakravchenko commented Sep 22, 2023

gizas commented Jan 18, 2023 •

edited

Loading

gizas commented Jan 31, 2023 •

edited

Loading