Skip to content

Commit

Permalink
fix: Adding service monitor for retina-operator (#848)
Browse files Browse the repository at this point in the history
# Description

Adding a ServiceMonitor for retina-operator
* parameterized & applied retina-operator name
* adding service & serviceMonitor CRD's for retina-operator
* applied appropriate relabeling & metric relabeling config to align
with retina-jobs additional scrape config

## Related Issue

retina-operator wasn't being scraped for metrics by prometheus.
Initially it was appearing in the 'retina-pods' job and failing as
reported in this issue:
#738 


Partial fix was merged to remove the operator pod from the list here:
#770

## Checklist

- [x] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [x] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [x] I have correctly attributed the author(s) of the code.
- [x] I have tested the changes locally.
- [x] I have followed the project's style guidelines.
- [x] I have updated the documentation, if necessary.
- [x] I have added tests, if applicable.

## Screenshots (if applicable) or Testing Completed


![image](https://github.com/user-attachments/assets/b1722546-d013-4ab3-8565-1c0357eea0da)

Operator specific metrics with job='retina-operator' selector:


![image](https://github.com/user-attachments/assets/821bc2ae-3d16-40a5-a527-4b36ecbb82e4)


## Additional Notes

Proposed next steps is to align the way we add scrap configs:
#847

---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
mereta authored Oct 15, 2024
1 parent 3009f28 commit f800786
Show file tree
Hide file tree
Showing 4 changed files with 121 additions and 32 deletions.
117 changes: 91 additions & 26 deletions deploy/legacy/manifests/controller/helm/retina/templates/operator.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,31 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: retina-operator
namespace: kube-system
name: {{ .Values.operator.name }}
namespace: {{ .Values.namespace }}
labels:
app: retina-operator
control-plane: retina-operator
app: {{ .Values.operator.name }}
control-plane: {{ .Values.operator.name }}
app.kubernetes.io/name: deployment
app.kubernetes.io/instance: retina-operator
app.kubernetes.io/component: retina-operator
app.kubernetes.io/instance: {{ .Values.operator.name }}
app.kubernetes.io/component: {{ .Values.operator.name }}
app.kubernetes.io/created-by: operator
app.kubernetes.io/part-of: operator
app.kubernetes.io/managed-by: kustomize
spec:
selector:
matchLabels:
control-plane: retina-operator
control-plane: {{ .Values.operator.name }}
replicas: 1
template:
metadata:
annotations:
kubectl.kubernetes.io/default-container: retina-operator
kubectl.kubernetes.io/default-container: {{ .Values.operator.name }}
prometheus.io/port: "{{ .Values.operatorService.port }}"
prometheus.io/scrape: "true"
labels:
app: retina-operator
control-plane: retina-operator
app: {{ .Values.operator.name }}
control-plane: {{ .Values.operator.name }}
spec:
# TODO(user): Uncomment the following code to configure the nodeAffinity expression
# according to the platforms which are supported by your solution.
Expand All @@ -51,21 +53,24 @@ spec:
runAsNonRoot: true
containers:
- image: {{ .Values.operator.repository }}:{{ .Values.operator.tag }}
name: retina-operator
name: {{ .Values.operator.name }}
{{- if .Values.operator.container.command }}
command:
{{- range .Values.operator.container.command }}
- {{ . }}
{{- end }}
{{- end }}
{{- if .Values.operator.container.args}}
ports:
- containerPort: {{ .Values.operatorService.port }}
name: {{ .Values.operatorService.name }}
args:
{{- range $.Values.operator.container.args}}
- {{ . | quote }}
{{- end}}
{{- end}}
volumeMounts:
- name: retina-operator-config
- name: "{{ .Values.operator.name }}-config"
mountPath: /retina/
{{- if .Values.capture.enableManagedStorageAccount }}
- name: cloud-config
Expand All @@ -91,12 +96,12 @@ spec:
periodSeconds: 10
resources:
{{- toYaml .Values.operator.resources | nindent 12 }}
serviceAccountName: retina-operator
serviceAccountName: {{ .Values.operator.name }}
terminationGracePeriodSeconds: 10
volumes:
- name: retina-operator-config
- name: "{{ .Values.operator.name }}-config"
configMap:
name: retina-operator-config
name: "{{ .Values.operator.name }}-config"
{{- if .Values.capture.enableManagedStorageAccount }}
- name: cloud-config
secret:
Expand All @@ -108,19 +113,19 @@ kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: serviceaccount
app.kubernetes.io/instance: retina-operator
app.kubernetes.io/instance: {{ .Values.operator.name }}
app.kubernetes.io/component: rbac
app.kubernetes.io/created-by: operator
app.kubernetes.io/part-of: operator
app.kubernetes.io/managed-by: kustomize
name: retina-operator
namespace: kube-system
name: {{ .Values.operator.name }}
namespace: {{ .Values.namespace }}
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
creationTimestamp: null
name: retina-operator-role
name: "{{ .Values.operator.name }}-role"
rules:
- apiGroups:
- "apiextensions.k8s.io"
Expand Down Expand Up @@ -271,25 +276,25 @@ kind: ClusterRoleBinding
metadata:
labels:
app.kubernetes.io/name: clusterrolebinding
app.kubernetes.io/instance: retina-operator-rolebinding
app.kubernetes.io/instance: "{{ .Values.operator.name }}-rolebinding"
app.kubernetes.io/component: rbac
app.kubernetes.io/created-by: operator
app.kubernetes.io/part-of: operator
app.kubernetes.io/managed-by: kustomize
name: retina-operator-rolebinding
name: "{{ .Values.operator.name }}-rolebinding"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: retina-operator-role
name: "{{ .Values.operator.name }}-role"
subjects:
- kind: ServiceAccount
name: retina-operator
namespace: kube-system
name: {{ .Values.operator.name }}
namespace: {{ .Values.namespace }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: retina-operator-config
name: "{{ .Values.operator.name }}-config"
namespace: {{ .Values.namespace }}
data:
operator-config.yaml: |-
Expand All @@ -308,7 +313,7 @@ apiVersion: v1
kind: Secret
metadata:
name: azure-cloud-config
namespace: kube-system
namespace: {{ .Values.namespace }}
type: Opaque
stringData:
azure.json: |-
Expand All @@ -333,3 +338,63 @@ stringData:
}
{{- end }}
{{- end }}
---
apiVersion: v1
kind: Service
metadata:
name: {{ .Values.operator.name }}
namespace: {{ .Values.namespace }}
labels:
app: {{ .Values.operator.name }}
spec:
ports:
- name: {{ .Values.operatorService.name }}
port: {{ .Values.operatorService.port }}
protocol: TCP
targetPort: {{ .Values.operatorService.targetPort }}
selector:
app: {{ .Values.operator.name }}
control-plane: {{ .Values.operator.name }}
---
{{- if .Values.metrics.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: "{{ .Values.operator.name }}-servicemonitor"
namespace: {{ ternary .Values.metrics.serviceMonitor.namespace .Values.namespace (not (empty .Values.metrics.serviceMonitor.namespace)) }}
labels:
app: {{ .Values.operator.name }}
{{- if .Values.metrics.serviceMonitor.additionalLabels }}
{{- toYaml .Values.metrics.serviceMonitor.additionalLabels | nindent 4 }}
{{- end }}
spec:
endpoints:
- targetPort: retina-operator
path: /metrics
{{- if .Values.metrics.serviceMonitor.interval }}
interval: {{ .Values.metrics.serviceMonitor.interval }}
{{- end }}
{{- if .Values.metrics.serviceMonitor.scrapeTimeout }}
scrapeTimeout: {{ .Values.metrics.serviceMonitor.scrapeTimeout }}
{{- end }}
{{- if .Values.metrics.serviceMonitor.scheme }}
scheme: {{ .Values.metrics.serviceMonitor.scheme }}
{{- end }}
{{- if .Values.metrics.serviceMonitor.tlsConfig }}
tlsConfig: {{- .Values.metrics.serviceMonitor.tlsConfig | nindent 8 }}
{{- end }}
{{- if .Values.metrics.serviceMonitor.relabelings }}
relabelings:
{{- toYaml .Values.metrics.serviceMonitor.relabelings | nindent 8 }}
{{- end }}
{{- if .Values.metrics.serviceMonitor.metricRelabelings }}
metricRelabelings:
{{- toYaml .Values.metrics.serviceMonitor.metricRelabelings | nindent 8 }}
{{- end }}
namespaceSelector:
matchNames:
- {{ .Values.namespace }}
selector:
matchLabels:
app: {{ .Values.operator.name }}
{{- end }}
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,10 @@ metadata:
app.kubernetes.io/component: networking
spec:
ports:
- port: {{ .Values.service.port }}
targetPort: {{ .Values.service.targetPort }}
- name: {{ .Values.service.name }}
port: {{ .Values.service.port }}
protocol: TCP
targetPort: {{ .Values.service.targetPort }}
selector:
{{- include "retina.selectorLabels" . | nindent 4 }}
app.kubernetes.io/component: workload
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ metadata:
labels:
k8s-app: {{ include "retina.name" . }}
{{- include "retina.labels" . | nindent 4 }}
app.kubernetes.io/component: metrics
{{- if .Values.metrics.serviceMonitor.additionalLabels }}
{{- toYaml .Values.metrics.serviceMonitor.additionalLabels | nindent 4 }}
{{- end }}
Expand Down
29 changes: 26 additions & 3 deletions deploy/legacy/manifests/controller/helm/retina/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ os:
windows: true

operator:
name: retina-operator
enabled: false
repository: ghcr.io/microsoft/retina/retina-operator
tag: "v0.0.2"
Expand Down Expand Up @@ -116,6 +117,12 @@ service:
targetPort: 10093
name: retina

operatorService:
type: ClusterIP
port: 8080
targetPort: 8080
name: retina-operator

serviceAccount:
annotations: {}
name: "retina-agent"
Expand Down Expand Up @@ -237,8 +244,11 @@ metrics:
scrapeTimeout: 30s
## @param metrics.serviceMonitor.additionalLabels [object] Additional labels that can be used so serviceMonitor will be discovered by Prometheus
## ref: https://github.com/coreos/prometheus-operator/blob/master/Documentation/api.md#prometheusspec
## 'release: prometheus' label is needed for prometheus to discover ServiceMoniotrs
##
additionalLabels: {}
additionalLabels:
release: prometheus
app.kubernetes.io/component: metrics
## @param metrics.serviceMonitor.scheme Scheme to use for scraping
##
scheme: http
Expand All @@ -253,7 +263,20 @@ metrics:
tlsConfig: {}
## @param metrics.serviceMonitor.relabelings [array] Prometheus relabeling rules to apply to samples before scraping
##
relabelings: []
relabelings:
- sourceLabels:
[__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
separator: ":"
regex: ([^:]+)(?::\d+)?
targetLabel: __address__
replacement: ${1}:${2}
action: replace
- sourceLabels: [__meta_kubernetes_pod_node_name]
action: replace
targetLabel: instance
## @param metrics.serviceMonitor.metricRelabelings [array] Prometheus relabeling rules to apply to samples before ingestion
##
metricRelabelings: []
metricRelabelings:
- sourceLabels: [__name__]
action: keep
regex: (.*)

0 comments on commit f800786

Please sign in to comment.