Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Adding service monitor for retina-operator #848

Merged
merged 4 commits into from
Oct 15, 2024

Conversation

mereta
Copy link
Contributor

@mereta mereta commented Oct 11, 2024

Description

Adding a ServiceMonitor for retina-operator

  • parameterized & applied retina-operator name
  • adding service & serviceMonitor CRD's for retina-operator
  • applied appropriate relabeling & metric relabeling config to align with retina-jobs additional scrape config

Related Issue

retina-operator wasn't being scraped for metrics by prometheus.
Initially it was appearing in the 'retina-pods' job and failing as reported in this issue:
#738

Partial fix was merged to remove the operator pod from the list here:
#770

Checklist

  • I have read the contributing documentation.
  • I signed and signed-off the commits (git commit -S -s ...). See this documentation on signing commits.
  • I have correctly attributed the author(s) of the code.
  • I have tested the changes locally.
  • I have followed the project's style guidelines.
  • I have updated the documentation, if necessary.
  • I have added tests, if applicable.

Screenshots (if applicable) or Testing Completed

image

Operator specific metrics with job='retina-operator' selector:

image

Additional Notes

Proposed next steps is to align the way we add scrap configs:
#847


Please refer to the CONTRIBUTING.md file for more information on how to contribute to this project.

@mereta mereta changed the title Adding service monitor for retina-operator fix: Adding service monitor for retina-operator Oct 11, 2024
@mereta
Copy link
Contributor Author

mereta commented Oct 11, 2024

@microsoft-github-policy-service agree company="Microsoft"

@mereta mereta marked this pull request as ready for review October 11, 2024 13:30
@mereta mereta requested a review from a team as a code owner October 11, 2024 13:30
Copy link
Member

@SRodi SRodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this LGTM @mereta - can we also push the chart update for hubble control plane deploy as part of this PR?

@mereta
Copy link
Contributor Author

mereta commented Oct 11, 2024

this LGTM @mereta - can we also push the chart update for hubble control plane deploy as part of this PR?

Working on it! ✅

@mereta
Copy link
Contributor Author

mereta commented Oct 14, 2024

this LGTM @mereta - can we also push the chart update for hubble control plane deploy as part of this PR?

@SRodi I will hold off on the Hubble charts. I ran in to a few issues with the operator, I cant seem to get any metrics form /metrics endpoint. It seems to be different to the legacy operator. I reached out to @rbtr to have a look and maybe advide.
Maybe we create a separate issue for that.

SRodi
SRodi previously approved these changes Oct 14, 2024
@mereta mereta added this pull request to the merge queue Oct 14, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 14, 2024
@SRodi
Copy link
Member

SRodi commented Oct 14, 2024

@mereta it looks like E2E tests need to be updated as part of this PR - see this failure https://github.com/microsoft/retina/actions/runs/11331610343/job/31512301186#step:6:213

    runner.go:27: 
        	Error Trace:	/home/runner/work/retina/retina/test/e2e/framework/types/runner.go:27
        	            				/home/runner/work/retina/retina/test/e2e/retina_e2e_test.go:77
        	Error:      	Received unexpected error:
        	            	did not expect error from step InstallHelmChart but got error: failed to install chart: unable to build kubernetes objects from release manifest: resource mapping not found for name: "retina-operator-servicemonitor" namespace: "kube-system" from "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
        	            	ensure CRDs are installed first
        	Test:       	TestE2ERetina

@mereta
Copy link
Contributor Author

mereta commented Oct 15, 2024

@mereta it looks like E2E tests need to be updated as part of this PR - see this failure https://github.com/microsoft/retina/actions/runs/11331610343/job/31512301186#step:6:213

    runner.go:27: 
        	Error Trace:	/home/runner/work/retina/retina/test/e2e/framework/types/runner.go:27
        	            				/home/runner/work/retina/retina/test/e2e/retina_e2e_test.go:77
        	Error:      	Received unexpected error:
        	            	did not expect error from step InstallHelmChart but got error: failed to install chart: unable to build kubernetes objects from release manifest: resource mapping not found for name: "retina-operator-servicemonitor" namespace: "kube-system" from "": no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"
        	            	ensure CRDs are installed first
        	Test:       	TestE2ERetina

I'm going to create a separate PR for this. The ServiceMonitor CRD is not available in the cluster, as prometheus is not installed.
For SeriviceMonitor to be the default way to collect metrics we need to address this issue by pre-installing the CRD in the e2e cluster. This is a pre-req for #847

Copy link
Member

@SRodi SRodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mereta! Just to clarify for everyone: retina-operator ServiceMonitor is currently disabled as we need to update E2E tests to install Prometheus ServiceMonitor CRDs - that will be captured in separate GH issue and tackled in another PR.

@mereta mereta added this pull request to the merge queue Oct 15, 2024
Merged via the queue into microsoft:main with commit f800786 Oct 15, 2024
22 checks passed
@mereta mereta deleted the mereta/svcmonitor branch October 15, 2024 15:07
github-merge-queue bot pushed a commit that referenced this pull request Oct 23, 2024
…a-operator (#870)

# Description

For a ServiceMonitor to be default way of getting Prometheus metrics
from retina-operator, ServiceMonitor CRD needs to be installed for the
e2e test flow. Missing CRD causes test to fail.

## Related Issue

A ServiecMonitor was added in the PR below but was disabled, due to e2e
issue.
#848 
This PR resolves that issue.

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
@mereta mereta mentioned this pull request Oct 24, 2024
7 tasks
github-merge-queue bot pushed a commit that referenced this pull request Oct 24, 2024
# Description

Disabling the Service Monitor, we need to find a way to install
Prometheus CRD's as part of retina.

## Related Issue

#848 

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed

Please add any relevant screenshots or GIFs to showcase the changes
made.

## Additional Notes

Add any additional notes or context about the pull request here.

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Kubernetes cannot scrape the operator pod (tries port 80, but the operator uses port 8080)?
2 participants