prometheus-operator provided by user interfers with operator provided by kyma #14379
Labels
area/monitoring
Issues or PRs related to the monitoring module (deprecated)
area/telemetry
Issues or PRs related to the telemetry module
kind/bug
Categorizes issue or PR as related to a bug.
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
Description
Kyma ships a prometheus-operator to manage the in-cluster prometheus installation in kyma-system namespace. The purpose of the instance is to collect and serve metrics provided by system component. Custom metrics could be added in an open source based installatin as well, however the stack might be not sufficient enough and requires a custom installation besides. For that, users might want to install an own kube-prometheus stack.
Running an own stack side-by-side brings potential problems.
Especially the 2. point can lead to bad effects. A default installation via helm will manage all prometheus resources in the whole cluster. With that, both operators will try to manage the one prometheus resource defined in the kyma-system namespace. The effect will be a constant reconcilation of the prometheus statefulset from both operators resulting in high CPU usage of the operators and indeterministic result for the reconcilation.
Possible Solution
The kyma base components should be independent of a prometheus-operator usage by managing the components on it's own. So introducing a plain StatefulSet with a scrape config not served via ServiceMonitors but something like individual annotations.
With that the user has the freedom to introduce the operator for usage in scenarios not possible with the kyma setup (like thanos support).
That solution can nicely fit with the vision of separating the metric collection from the actuals shipping and storage as described in #13079
Workaround
As a workaround, the user which is installing an own prometheus-operator should exclude the kyma-system namespace from the operator. This can be achived in the values.yaml of the helm chart by configuring the denyNamespace attribute (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/values.yaml#L1528)
or by setting args in the pure deployment (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L53)
The text was updated successfully, but these errors were encountered: