Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus-operator provided by user interfers with operator provided by kyma #14379

Closed
a-thaler opened this issue May 24, 2022 · 5 comments
Closed
Labels
area/monitoring Issues or PRs related to the monitoring module (deprecated) area/telemetry Issues or PRs related to the telemetry module kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@a-thaler
Copy link
Contributor

a-thaler commented May 24, 2022

Description

Kyma ships a prometheus-operator to manage the in-cluster prometheus installation in kyma-system namespace. The purpose of the instance is to collect and serve metrics provided by system component. Custom metrics could be added in an open source based installatin as well, however the stack might be not sufficient enough and requires a custom installation besides. For that, users might want to install an own kube-prometheus stack.

Running an own stack side-by-side brings potential problems.

  1. There can be only one set of CRDs and only one webhook should listen for resource validation.
  2. Multiple operators might manage the same prometheus instances or configure the same servicemonitors.

Especially the 2. point can lead to bad effects. A default installation via helm will manage all prometheus resources in the whole cluster. With that, both operators will try to manage the one prometheus resource defined in the kyma-system namespace. The effect will be a constant reconcilation of the prometheus statefulset from both operators resulting in high CPU usage of the operators and indeterministic result for the reconcilation.

Possible Solution
The kyma base components should be independent of a prometheus-operator usage by managing the components on it's own. So introducing a plain StatefulSet with a scrape config not served via ServiceMonitors but something like individual annotations.
With that the user has the freedom to introduce the operator for usage in scenarios not possible with the kyma setup (like thanos support).
That solution can nicely fit with the vision of separating the metric collection from the actuals shipping and storage as described in #13079

Workaround

As a workaround, the user which is installing an own prometheus-operator should exclude the kyma-system namespace from the operator. This can be achived in the values.yaml of the helm chart by configuring the denyNamespace attribute (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/values.yaml#L1528)

prometheusOperator:
  denyNamespaces:
  - kyma-system

or by setting args in the pure deployment (https://github.com/prometheus-community/helm-charts/blob/2e86d1618eb5d599576789c1202853dc5bc808c0/charts/kube-prometheus-stack/templates/prometheus-operator/deployment.yaml#L53)

--deny-namespaces=kyma-system
@a-thaler a-thaler added kind/bug Categorizes issue or PR as related to a bug. area/monitoring Issues or PRs related to the monitoring module (deprecated) labels May 24, 2022
@github-actions
Copy link

This issue has been automatically marked as stale due to the lack of recent activity. It will soon be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 13, 2022
@a-thaler a-thaler removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 15, 2022
@a-thaler a-thaler added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 5, 2022
@a-thaler
Copy link
Contributor Author

a-thaler commented Sep 9, 2022

As a workaround I started to document how you can bring your own stack with https://github.com/kyma-project/examples/tree/main/prometheus

@a-thaler
Copy link
Contributor Author

a-thaler commented Feb 7, 2023

The monitoring module got deprecated with 2.10 and with the removal the problem will be solved implicit.
As part of #11300 we are already removing the usage of the prometheus-operator CRDs in kyma-system itself.

If the removal of the component needs to be delayed, we could remove the operator already and switch to a plain prometheus installation and still keep the feature.

@a-thaler a-thaler added the area/telemetry Issues or PRs related to the telemetry module label Mar 8, 2023
@a-thaler
Copy link
Contributor Author

In the meantime we fixed the upstream helm chart to respect the namespace exclude also in the webhooks: kyma-project/examples#241
I adjusted the example by removing the warning notes and re-enabled the webhook configuration: kyma-project/examples#241

@a-thaler
Copy link
Contributor Author

With Kyma 2.20 the monitoring stack got removed, with that any installation of a prometheus-operator will be possible.
The kyma-specifics of the installation example will be removed with kyma-project/examples#277

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Issues or PRs related to the monitoring module (deprecated) area/telemetry Issues or PRs related to the telemetry module kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

1 participant