Release 1.0.0-rc1 (2025-01-23)
This release contains several improvements from 1.0.0-beta-10
:
- The name of the initialization Job that gathers information about existing state of a cluster now includes the version of the chart and the image tag used in the Pod.
- The
initScrapeJob
field is deprecated in favor ofinitBackfillJob
. However, this is not a breaking change;initScrapeJob
can still be used without issue. - The
server.agentMode
boolean argument is now provided. - Improvements are made to the resource consumption of the agent-server pod.
- Metrics from the agent-server pod are made available for monitoring.
Upgrade Steps
Optionally rename the initScrapeJob
field in any override files with initBackfillJob
. initBackfillJob
is the preferred field, but configurations using initScrapeJob
will still work.
Upgrade using the following command:
helm upgrade --install <RELEASE_NAME> cloudzero/cloudzero-agent -n <NAMESPACE> --create-namespace -f configuration.example.yaml --version 1.0.0-rc1
Improvements
-
Initialization Job Name Changes With Releases: It was previously possible to have failures in release upgrades if the container image used in the Job changed. This is because the
image
field in a Job spec is immutable. To prevent this, a new Job is created every time the Helm chart version is changed and/or when the image used in the Job is changed. This also ensures that changes to the underlyinginsights-controller
application will be used in the new backfill of existing cluster state data. -
Clarified Field Names: The Job used for gathering existing cluster data was previously controlled via a field named
initScrapeJob
. This is an overloaded term given that this chart also uses the term "scrape job" in the context of Prometheus. This has caused some confusion, so the field is now renamed toinitBackfillJob
.initScrapeJob
is still usable, and values frominitScrapeJob
are merged withinitBackfillJob
with the latter having precedence. -
Easier Debugging: The
server.agentMode
field can be toggled tofalse
; by default it is set totrue
so that the Prometheus server runs inagent
mode to keep resource usage manageable. Setting the field tofalse
takes the Prometheus server out of agent mode. This is helpful for debugging issues with the Prometheus agent-server. -
Resource Consumption Reduction: The Prometheus scrape job used to gather metrics from the
insights-controller
pods now restricts the metrics scraped to ones explicitly set in thevalues.yaml
. This means that the internal TSDB must hold less data. -
Improved Observability: The agent-server now scrapes itself for metrics and exports them for monitoring by the CloudZero platform. This means that issues within a cluster can be detected much sooner and with greater visibility into the cause of the issue.