Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-22444: CloudZero KSM (Feature Branch) #102

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions charts/cloudzero-agent/BETA-INSTALLATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ helm install <RELEASE_NAME> cloudzero-beta/cloudzero-agent \
--set clusterName=<CLUSTER_NAME> \
--set-string cloudAccountId=<CLOUD_ACCOUNT_ID> \
--set region=<REGION> \
--set kube-state-metrics.enabled=<true|false> \
--set kube_state_metrics.enabled=<true|false> \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to remove this? I would think that we wouldn't want to show this option now?

--create-namespace
```

Expand All @@ -63,7 +63,7 @@ helm install <RELEASE_NAME> cloudzero-beta/cloudzero-agent \
--set clusterName=<CLUSTER_NAME> \
--set-string cloudAccountId=<CLOUD_ACCOUNT_ID> \
--set region=<REGION> \
--set kube-state-metrics.enabled=<true|false> \
--set kube_state_metrics.enabled=<true|false> \
--create-namespace
```

Expand Down
4 changes: 2 additions & 2 deletions charts/cloudzero-agent/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@ dependencies:
- name: prometheus-node-exporter
repository: https://prometheus-community.github.io/helm-charts
version: 4.24.0
digest: sha256:827a33fa07fde17be0bf808e0beba3ca7b23c9fc1960580b2ba6d0ecc0b57a3f
generated: "2024-03-20T11:42:44.034766-04:00"
digest: sha256:254bcb4b6b7f42a53ad1ec5885e079958efa2a09f30ffafe03c6ad0eccd06f7d
generated: "2024-11-14T04:46:38.987981-08:00"
2 changes: 1 addition & 1 deletion charts/cloudzero-agent/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ dependencies:
- name: kube-state-metrics
version: "5.15.*"
repository: https://prometheus-community.github.io/helm-charts
condition: kube-state-metrics.enabled
condition: kubeStateMetrics.enabled
- name: prometheus-node-exporter
version: "4.24.*"
repository: https://prometheus-community.github.io/helm-charts
Expand Down
74 changes: 0 additions & 74 deletions charts/cloudzero-agent/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ helm install <RELEASE_NAME> cloudzero/cloudzero-agent \
--set clusterName=<CLUSTER_NAME> \
--set-string cloudAccountId=<CLOUD_ACCOUNT_ID> \
--set region=<REGION> \
# optionally deploy kube-state-metrics if it doesn't exist in the cluster already
--set kube-state-metrics.enabled=<true|false>
```

### Update Helm Chart
Expand All @@ -58,7 +56,6 @@ helm upgrade <RELEASE_NAME> cloudzero/cloudzero-agent \
--set clusterName=<CLUSTER_NAME> \
--set-string cloudAccountId=<CLOUD_ACCOUNT_ID> \
--set region=<REGION> \
--set kube-state-metrics.enabled=<true|false>
```

### Mandatory Values
Expand Down Expand Up @@ -109,33 +106,6 @@ helm install <RELEASE_NAME> cloudzero/cloudzero-agent \
-f values-override.yaml
```

### Metric Exporters

This chart depends on metrics from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics). There are two installation options for providing the `kube-state-metrics` metrics to the cloudzero-agent. If you don't know which option is right for you, use the second option.

#### Option 1 (default): Use existing kube-state-metrics

Using an existing `kube-state-metrics` exporter may be desirable for minimizing cost. By default, the `cloudzero-agent` will attempt to find an existing `kube-state-metrics` K8s Service by searching for a K8s Service with the annotation `prometheus.io/scrape: "true"`. If an existing `kube-state-metrics` Service exists but does not have that annotation and you do not wish to add it, see the **Custom Scrape Configs** section below.

In addition to the above, the existing `kube-state-metrics` Service address should be added in `values-override.yaml` as shown below so that the `cloudzero-agent` can validate the connection:

```yaml
validator:
serviceEndpoints:
kubeStateMetrics: <kube-state-metrics>.<example-namespace>.svc.cluster.local:8080
```


#### Option 2: Use kube-state-metrics subchart

Alternatively, deploy the `kube-state-metrics` subchart that comes packaged with this chart. This is done by enabling settings in `values-override.yaml` as shown:

```yaml
kube-state-metrics:
enabled: true
```
In this option, no additional configuration is required in the `validator` field.

### Secret Management

The chart requires a CloudZero API key to send metric data. Admins can retrieve API keys [here](https://app.cloudzero.com/organization/api-keys).
Expand Down Expand Up @@ -174,50 +144,6 @@ kube-state-metrics:
repository: my-custom-kube-state-metrics/kube-state-metrics
```

### Custom Scrape Configs

If running without the default `kube-state-metrics` exporter subchart and your existing `kube-state-metrics` deployment does not have the required `prometheus.io/scrape: "true"`, adjust the Prometheus scrape configs as shown:

`values-override.yaml`
```yaml
prometheusConfig:
scrapeJobs:
kubeStateMetrics:
enabled: false # this disables the default kube-state-metrics scrape job, which will be replaced by an entry in additionalScrapeJobs
additionalScrapeJobs:
- job_name: custom-kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 10s
metrics_path: /metrics
static_configs:
- targets:
- 'my-kube-state-metrics-service.default.svc.cluster.local:8080'
relabel_configs:
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
action: labelmap
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_node_name]
separator: ;
regex: (.*)
target_label: node
replacement: $1
action: replace
```

### Exporting Pod Labels

Pod labels can be exported as metrics using kube-state-metrics. To customize the labels for export, modify the values-override.yaml file as shown below:
Expand Down
51 changes: 9 additions & 42 deletions charts/cloudzero-agent/templates/cm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,10 @@ data:
scrape_interval: {{ .Values.prometheusConfig.globalScrapeInterval }}
scrape_configs:
{{- if .Values.prometheusConfig.scrapeJobs.kubeStateMetrics.enabled }}
- job_name: cloudzero-service-endpoints # kube_*, node_* metrics
honor_labels: true
- job_name: static-kube-state-metrics
honor_timestamps: true
track_timestamps_staleness: false
scrape_interval: {{ .Values.prometheusConfig.scrapeJobs.kubeStateMetrics.scrapeInterval }}
scrape_interval: 1m
scrape_timeout: 10s
scrape_protocols:
- OpenMetricsText1.0.0
Expand All @@ -34,38 +33,6 @@ data:
follow_redirects: true
enable_http2: true
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
separator: ;
regex: "true"
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape_slow]
separator: ;
regex: "true"
replacement: $1
action: drop
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
separator: ;
regex: (https?)
target_label: __scheme__
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: $1
action: replace
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
separator: ;
regex: (.+?)(?::\d+)?;(\d+)
target_label: __address__
replacement: $1:$2
action: replace
- separator: ;
regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
replacement: __param_$1
action: labelmap
- separator: ;
regex: __meta_kubernetes_service_label_(.+)
replacement: $1
Expand All @@ -92,13 +59,13 @@ data:
- source_labels: [__name__]
regex: "^({{ join "|" .Values.kubeMetrics }})$"
action: keep
- action: labelkeep
regex: "^({{ include "cloudzero-agent.requiredMetricLabels" . }})$"
kubernetes_sd_configs:
- role: endpoints
kubeconfig_file: ""
follow_redirects: true
enable_http2: true
- separator: ;
regex: ^(board_asset_tag|container|created_by_kind|created_by_name|image|instance|name|namespace|node|node_kubernetes_io_instance_type|pod|product_name|provider_id|resource|unit|uid|_.*|label_.*|app.kubernetes.io/*|k8s.*)$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a good idea to remove the named template that this is replacing if we're not going to use it. I do like this change though, it's more readable for sure

replacement: $1
action: labelkeep
static_configs:
- targets:
- {{ printf "%s-kube-state-metrics.%s.svc.cluster.local:%d" .Release.Name .Release.Namespace (int .Values.kubeStateMetrics.service.port) }}
{{- end }}
{{- if .Values.prometheusConfig.scrapeJobs.cadvisor.enabled }}
- job_name: cloudzero-nodes-cadvisor # container_* metrics
Expand Down
2 changes: 1 addition & 1 deletion charts/cloudzero-agent/templates/validatorcm.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ data:
{{- if .Values.validator.serviceEndpoints.kubeStateMetrics }}
kube_state_metrics_service_endpoint: http://{{ .Values.validator.serviceEndpoints.kubeStateMetrics }}/
{{- else }}
kube_state_metrics_service_endpoint: http://{{- if .Release.Name }}{{.Release.Name}}-{{- end }}kube-state-metrics:8080/
kube_state_metrics_service_endpoint: http://{{- if .Release.Name }}{{.Release.Name}}-{{- end }}state-metrics:8080/
{{- end }}
{{- if .Values.validator.serviceEndpoints.prometheusNodeExporter }}
prometheus_node_exporter_service_endpoint: http://{{ .Values.validator.serviceEndpoints.prometheusNodeExporter }}/
Expand Down
12 changes: 10 additions & 2 deletions charts/cloudzero-agent/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,10 +47,18 @@ prometheusConfig:
# -- Any items added to this list will be added to the Prometheus scrape configuration.
additionalScrapeJobs: []

kube-state-metrics:
enabled: false
kubeStateMetrics:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work if alias is not set in the dependencies section? I would have thought that would need to be set. but maybe I'm just not up to date

enabled: true
fullnameOverride: "cloudzero-state-metrics"
extraArgs:
- --metric-labels-allowlist=pods=[app.kubernetes.io/component]
# Disable CloudZero KSM as a Scrape Target since the service endpoint is explicity defined
# by the Validators config file.
prometheusScrape: false
# Set a default port other than 8080 to avoid collisions with any existing KSM services.
service:
port: 8080

prometheus-node-exporter:
enabled: false

Expand Down
21 changes: 21 additions & 0 deletions docs/releases/0.0.30-beta.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
## [0.0.30-beta](https://github.com/cloudzero/cloudzero-charts/compare/v0.0.28...v0.0.30-beta) (2024-11-12)

Improve Kube State Metrics Install

### Upgrade Steps
To install, specify the version of the beta chart:

``` bash
helm upgrade --install -n cz-prom-agent cz-prom-agent charts/cloudzero-agent \
--set apiKey=$api_key \
--set clusterName='cluster' \
--set-string cloudAccountId="account_id" \
--set region='region' \
--version 0.0.30-beta

```

### Improvements
* **CloudZero Metrics:** CloudZero State Metrics is enabled/installed by default.

```
Loading