Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCO: add service monitor to alertmanager #1319

Merged

Conversation

thibaultmg
Copy link
Contributor

@thibaultmg thibaultmg commented Feb 9, 2024

Adds platform serviceMonitor for alertManager following platform monitoring recommendations:

  • Adds serviceMonitor configured with tls
  • Adds kube-rbac-proxy sidecar for protecting the metrics endpoint, exposes it on port 9096
  • Removes exposed 9093 port from alertmanager service
  • Adds new service for port 9096 because existing services have incompatible settings for the metrics endpoint needs (sessionAffinity)
  • Adds rbac for fetching configmap in kube-system namespace
  • Adds watcher for the configmap in kube-system

As a side note, kube-rbac-proxy automatically reloads TLS and client CA files. So no need to restart the service when they change.

I have tested the manifests and the new serviceMonitor is well being picked up by platform monitoring, and it is able to fetch those new metrics.

Jira task: https://issues.redhat.com/browse/RHOBS-1002

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Copy link

openshift-ci bot commented Feb 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
@thibaultmg thibaultmg marked this pull request as ready for review February 13, 2024 11:12
@thibaultmg thibaultmg enabled auto-merge (squash) February 13, 2024 11:13
@philipgough
Copy link
Contributor

/test test-unit

namespace: open-cluster-management
namespace: open-cluster-management-observability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary? Is it breaking? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those namespaces are reset by the controller. So personally, I would remove all of them from the manifests. But I am not sure it has not impact 🤷
Resetting this one as it wasn't changed on purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all unnecessary namespaces. Unit tests are checking them now

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
@douglascamata
Copy link
Contributor

@thibaultmg something's badly wrong with Prow, it seems.

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
…icemonitor

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
…icemonitor

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Copy link
Contributor

@douglascamata douglascamata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@@ -2,6 +2,5 @@ apiVersion: v1
kind: ServiceAccount
metadata:
name: observability-alertmanager-accessor
namespace: open-cluster-management
Copy link
Contributor

@douglascamata douglascamata Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected? In which namespace will this land on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service account resources are handled by "ServiceAccount": r.renderer.RenderNamespace,
So it will land as before in the namespace := mcoconfig.GetDefaultNamespace() which is open-cluster-management-observability
I removed all of them so that there is no ambiguity. And they are all checked in unit tests.

@openshift-ci openshift-ci bot added the lgtm label Feb 28, 2024
@douglascamata
Copy link
Contributor

/hold

Copy link

openshift-ci bot commented Feb 28, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: douglascamata, thibaultmg

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@douglascamata
Copy link
Contributor

/hold cancel

@openshift-ci openshift-ci bot removed the lgtm label Mar 1, 2024
Copy link

openshift-ci bot commented Mar 1, 2024

New changes are detected. LGTM label has been removed.

Copy link

openshift-ci bot commented Mar 1, 2024

Thanks for your pull request. Before we can look at it, you'll need to add a 'DCO signoff' to your commits.

📝 Please follow instructions in the contributing guide to update your commits with the DCO

Full details of the Developer Certificate of Origin can be found at developercertificate.org.

The list of commits missing DCO signoff:

  • 8c8398f Merge branch 'main' into add_alertmanager_servicemonitor
  • de98b30 Merge remote-tracking branch 'origin/main' into add_alertmanager_servicemonitor
  • cb87dea Merge remote-tracking branch 'origin/main' into add_alertmanager_servicemonitor
  • c965b32 Merge remote-tracking branch 'origin/main' into add_alertmanager_servicemonitor

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link

sonarqubecloud bot commented Mar 1, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
56.7% Coverage on New Code (required ≥ 70%)
C Security Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint

Copy link

openshift-ci bot commented Mar 1, 2024

@thibaultmg: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-kind 4b3d030 link true /test e2e-kind
ci/prow/test-e2e 4b3d030 link true /test test-e2e
ci/prow/sonarcloud c965b32 link true /test sonarcloud

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@thibaultmg thibaultmg disabled auto-merge March 1, 2024 09:27
@thibaultmg thibaultmg merged commit b265920 into stolostron:main Mar 1, 2024
8 of 14 checks passed
@thibaultmg thibaultmg deleted the add_alertmanager_servicemonitor branch March 1, 2024 09:27
coleenquadros pushed a commit that referenced this pull request Mar 1, 2024
* init

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add cm watcher

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* cleaning

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* change rbac-proxy image

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix namespaces names

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix unit tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix serviceMonitor namespace

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add e2e test debug info

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add mch-image-manifest debug info

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve image replacement logging

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix call to image replacer

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add service monitor crd to kind e2e

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add debug logging oba e2e tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve logs, reduce volume

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve ci logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve ci logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add oba logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve oba logging

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add unit tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add apiextensionsv1 scheme

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add resource version to crd update

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Signed-off-by: Coleen Iona Quadros <coleen.quadros27@gmail.com>
sradco pushed a commit to sradco/multicluster-observability-operator that referenced this pull request Mar 17, 2024
* init

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add cm watcher

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* cleaning

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* change rbac-proxy image

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix namespaces names

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix unit tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix serviceMonitor namespace

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add e2e test debug info

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add mch-image-manifest debug info

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve image replacement logging

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix call to image replacer

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add service monitor crd to kind e2e

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add debug logging oba e2e tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve logs, reduce volume

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve ci logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve ci logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* fix

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add oba logs

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* improve oba logging

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add unit tests

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add apiextensionsv1 scheme

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

* add resource version to crd update

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>

---------

Signed-off-by: Thibault Mange <22740367+thibaultmg@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants