prometheus-operator provided by user interfers with operator provided by kyma #14379

a-thaler · 2022-05-24T12:30:46Z

Description

Kyma ships a prometheus-operator to manage the in-cluster prometheus installation in kyma-system namespace. The purpose of the instance is to collect and serve metrics provided by system component. Custom metrics could be added in an open source based installatin as well, however the stack might be not sufficient enough and requires a custom installation besides. For that, users might want to install an own kube-prometheus stack.

Running an own stack side-by-side brings potential problems.

There can be only one set of CRDs and only one webhook should listen for resource validation.
Multiple operators might manage the same prometheus instances or configure the same servicemonitors.

Especially the 2. point can lead to bad effects. A default installation via helm will manage all prometheus resources in the whole cluster. With that, both operators will try to manage the one prometheus resource defined in the kyma-system namespace. The effect will be a constant reconcilation of the prometheus statefulset from both operators resulting in high CPU usage of the operators and indeterministic result for the reconcilation.

Possible Solution
The kyma base components should be independent of a prometheus-operator usage by managing the components on it's own. So introducing a plain StatefulSet with a scrape config not served via ServiceMonitors but something like individual annotations.
With that the user has the freedom to introduce the operator for usage in scenarios not possible with the kyma setup (like thanos support).
That solution can nicely fit with the vision of separating the metric collection from the actuals shipping and storage as described in #13079

Workaround

As a workaround, the user which is installing an own prometheus-operator should exclude the kyma-system namespace from the operator. This can be achived in the values.yaml of the helm chart by configuring the denyNamespace attribute (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/values.yaml#L1528)

prometheusOperator:
  denyNamespaces:
  - kyma-system

or by setting args in the pure deployment (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L53)

--deny-namespaces=kyma-system

The text was updated successfully, but these errors were encountered:

github-actions · 2022-08-13T03:09:18Z

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

a-thaler · 2022-09-09T13:42:57Z

As a workaround I started to document how you can bring your own stack with https://github.com/kyma-project/examples/tree/main/prometheus

a-thaler · 2023-02-07T08:22:54Z

The monitoring module got deprecated with 2.10 and with the removal the problem will be solved implicit.
As part of #11300 we are already removing the usage of the prometheus-operator CRDs in kyma-system itself.

If the removal of the component needs to be delayed, we could remove the operator already and switch to a plain prometheus installation and still keep the feature.

a-thaler · 2023-05-16T14:47:33Z

In the meantime we fixed the upstream helm chart to respect the namespace exclude also in the webhooks: kyma-project/examples#241
I adjusted the example by removing the warning notes and re-enabled the webhook configuration: kyma-project/examples#241

a-thaler · 2023-11-20T08:02:21Z

With Kyma 2.20 the monitoring stack got removed, with that any installation of a prometheus-operator will be possible.
The kyma-specifics of the installation example will be removed with kyma-project/examples#277

a-thaler added kind/bug Categorizes issue or PR as related to a bug. area/monitoring Issues or PRs related to the monitoring module (deprecated) labels May 24, 2022

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2022

a-thaler removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2022

a-thaler added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 5, 2022

a-thaler mentioned this issue Sep 9, 2022

First version on how to bring your custom kube-prometheus-stack kyma-project/examples#206

Merged

a-thaler added the area/telemetry Issues or PRs related to the telemetry module label Mar 8, 2023

a-thaler closed this as completed Nov 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prometheus-operator provided by user interfers with operator provided by kyma #14379

prometheus-operator provided by user interfers with operator provided by kyma #14379

a-thaler commented May 24, 2022 •

edited

Loading

github-actions bot commented Aug 13, 2022

a-thaler commented Sep 9, 2022 •

edited

Loading

a-thaler commented Feb 7, 2023

a-thaler commented May 16, 2023

a-thaler commented Nov 20, 2023

prometheus-operator provided by user interfers with operator provided by kyma #14379

prometheus-operator provided by user interfers with operator provided by kyma #14379

Comments

a-thaler commented May 24, 2022 • edited Loading

github-actions bot commented Aug 13, 2022

a-thaler commented Sep 9, 2022 • edited Loading

a-thaler commented Feb 7, 2023

a-thaler commented May 16, 2023

a-thaler commented Nov 20, 2023

a-thaler commented May 24, 2022 •

edited

Loading

a-thaler commented Sep 9, 2022 •

edited

Loading