-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spec.Manifest != release.Manifest when helm "lookup" func used in template causing repeat reconcilliation #209
Comments
First, the approach you have for secret generation (generate new if not exist, otherwise use existing) is one I have not seen before. I didn't know that was possible with helm, and it's a very elegant solution to the "you can't use random values in the helm charts managed by the helm operator" problem. If we can figure out this other problem, I would really like to see this in our docs!
^ That's a release upgrade which wouldn't follow the patch code path. The patch code path is followed when the manifests match. On the release upgrade path, the I wonder if there's some inconsequential difference (different spacing or base64 vs not base64)? Can you turn up log verbosity to get these diff log lines in the output: helm-operator-plugins/pkg/reconciler/reconciler.go Lines 795 to 797 in 6e30bde
|
Hey @joelanford, thanks I cant take any credit for it, it was an idea I exported from some stackoverflow post at some point a while ago. I have actually moved away from the helm-operator-plugin a few days ago in favor of calling helm directly: type AkSpec struct {
// +kubebuilder:pruning:PreserveUnknownFields
// +kubebuilder:validation:Schemaless
// Values is the helm chart values map to override chart defaults. This is often further adapted by the controller
// to add additional resources like declarative blueprints into the deployments. Values is a loose, and unstructured
// datatype. It will not complain if the values do not override anything, or do anything at all.
Values json.RawMessage `json:"values,omitempty"`
} Reconciler Logic (apologies for the length I included it here for completion): func (r *AkReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
l := klog.FromContext(ctx)
// Parsing options to make them available TODO: pass them in rather than read continuously
o := utils.Opts{}
arg.MustParse(&o)
actionConfig, err := r.GetActionConfig(req.NamespacedName.Namespace, l)
if err != nil {
return ctrl.Result{}, err
}
// GET CRD
crd := &akmv1a1.Ak{}
err = r.Get(ctx, req.NamespacedName, crd)
if err != nil {
if errors.IsNotFound(err) {
// Request object not found, could have been deleted after reconcile request.
// Owned objects are automatically garbage collected. For additional cleanup logic use finalizers.
// Return and don't requeue
l.Info("Ak resource reconciliation triggered but disappeared. Checking for residual chart for uninstall then ignoring since object must have been deleted.")
_, err := r.UninstallChart(req.NamespacedName, actionConfig)
if err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
// Error reading the object - requeue the request.
l.Error(err, "Failed to get Ak resource. Likely fetch error. Retrying.")
return ctrl.Result{}, err
}
l.Info(fmt.Sprintf("Found Ak resource `%v` in `%v`.", crd.Name, crd.Namespace))
// Helm Chart Identification
u, err := url.Parse(fmt.Sprintf("file://workspace/helm-charts/ak-%v.tgz", o.SrcVersion))
if err != nil {
return ctrl.Result{}, err
}
// Helm Install or Upgrade Chart
var vals map[string]interface{}
err = json.Unmarshal(crd.Spec.Values, &vals)
if err != nil {
return ctrl.Result{}, err
}
_, err = r.UpgradeOrInstallChart(req.NamespacedName, u, actionConfig, vals)
if err != nil {
return ctrl.Result{}, err
}
return ctrl.Result{}, nil
}
// UpgradeOrInstallChart upgrades a chart in cluster or installs it new if it does not already exist
// ulr format is [scheme:][//[userinfo@]host][/]path[?query][#fragment] e.g file://workspace/helm-charts/ak-0.1.0.tgz"
func (r *AkReconciler) UpgradeOrInstallChart(nn types.NamespacedName, u *url.URL, a *action.Configuration, o map[string]interface{}) (*release.Release, error) {
// Helm List Action
listAction := action.NewList(a)
releases, err := listAction.Run()
if err != nil {
return nil, err
}
toUpgrade := false
for _, release := range releases {
// fmt.Println("Release: " + release.Name + " Status: " + release.Info.Status.String())
if release.Name == nn.Name {
toUpgrade = true
}
}
c, err := r.LoadHelmChart(u)
if err != nil {
return nil, err
}
fmt.Println(o)
var rel *release.Release
if toUpgrade {
// Helm Upgrade
updateAction := action.NewUpgrade(a)
rel, err = updateAction.Run(nn.Name, c, o)
if err != nil {
return nil, err
}
} else {
// Helm Install
installAction := action.NewInstall(a)
installAction.Namespace = nn.Namespace
installAction.ReleaseName = nn.Name
rel, err = installAction.Run(c, o)
if err != nil {
return nil, err
}
}
return rel, nil
}
func (r *AkReconciler) UninstallChart(nn types.NamespacedName, a *action.Configuration) (*release.UninstallReleaseResponse, error) {
uninstallAction := action.NewUninstall(a)
releaseResponse, err := uninstallAction.Run(nn.Name)
if err != nil {
return nil, err
}
return releaseResponse, nil
}
// GetActionConfig Get the Helm action config from in cluster service account
func (r *AkReconciler) GetActionConfig(namespace string, l logr.Logger) (*action.Configuration, error) {
actionConfig := new(action.Configuration)
var kubeConfig *genericclioptions.ConfigFlags
config, err := rest.InClusterConfig()
if err != nil {
return nil, err
}
// Set properties manually from official rest config
kubeConfig = genericclioptions.NewConfigFlags(false)
kubeConfig.APIServer = &config.Host
kubeConfig.BearerToken = &config.BearerToken
kubeConfig.CAFile = &config.CAFile
kubeConfig.Namespace = &namespace
if err := actionConfig.Init(kubeConfig, namespace, os.Getenv("HELM_DRIVER"), log.Printf); err != nil {
}
return actionConfig, nil
}
// Get Connection Client to Kubernetes
func (r *AkReconciler) GetKubeClient() (*kubernetes.Clientset, error) {
config, err := rest.InClusterConfig()
if err != nil {
return nil, err
}
// creates the clientset
clientset, err := kubernetes.NewForConfig(config)
if err != nil {
return nil, err
}
return clientset, nil
}
// GetHelmChart loads a helm chart from a given file as URL
func (r *AkReconciler) LoadHelmChart(u *url.URL) (*chart.Chart, error) {
// GET HELM CHART
if u.Scheme != "file" {
err := errors.NewInvalid(
schema.GroupKind{
Group: "akm.goauthentik.io",
Kind: "Ak",
},
fmt.Sprintf("Url scheme `%v` != `file`, unsupported scheme.", u.Scheme),
field.ErrorList{})
return nil, err
}
// load chart from filepath (which is part of host in url)
path, err := filepath.Abs(u.Host + u.Path)
if err != nil {
return nil, err
}
_, err = utils.Exists(path)
if err != nil {
return nil, err
}
return chartLoader.Load(path)
} Which has much the same effect. |
This just bit us pretty badly, spent a few hours on this with an infinite reconcile loop stemming from this code in the helm chart template:
metadata:
labels:
app: {{ .Values.metadata.name }}
backstage.io/kubernetes-id: {{ .Values.metadata.annotations.backstageName }}
kuma.io/region: {{ include "webapp.cluster.region" . }} # <--- this line which calls this helper {{- define "webapp.cluster.region" -}}
{{- $region:= "noregion" }}
{{- if $config := (lookup "v1" "ConfigMap" "runway-operator" "kubernetes-info") -}}
{{- $region = $config.data.region_name -}}
{{- end }}
{{- printf "%v" $region -}}
{{- end -}}
when the release spec is gotten it doesn't account for lookup values and therefore sees the two manifests as different Is there considered a bug or expected behavior? |
Lookup values should be accounted for. The release secret of the current release should have the value populated from the lookup function. The next time reconcile happens, I would expect that the dry-run upgrade that happens would also populate the value from the lookup. If the value of the lookup changes, then that would trigger a diff in the manifest and a real upgrade would proceed. Can you turn logging up to level 4 to get the release diff in the logs and see what the difference is (if there is a difference at least): helm-operator-plugins/pkg/reconciler/reconciler.go Lines 755 to 757 in 8c7e1e2
|
Update: its because of this line
When DryRun is set to true lookup values are not respected and the upgrade check for sameness will always fail because the template has default values while the actual release has the lookedup values |
@joelanford Apologies for the delay I didn't see your comment. Here is a short 2 second log dump run with https://pastebin.com/WKCRk8Ns The diff that it logs every time is identical. The actual upgrade check, done here helm-operator-plugins/pkg/reconciler/reconciler.go Lines 727 to 730 in ebfbea8
is done with a dry run flag which does not actually lookup values in a cluster so the string comparison fails because it fell back to the default value. As documented here https://helm.sh/docs/chart_template_guide/functions_and_pipelines/#using-the-lookup-function ![]() |
Oh, it is unfortunate that the lookup function is non-functional during dry runs. The dry run and manifest comparison are fundamental aspects of the reconciliation, and we have not been able to find an alternative. Some context is documented here: operator-framework/operator-sdk#5059 (comment) |
@joelanford: Would it be possible to use the new |
We can certainly try that out! My initial instinct is that we can just change the existing client-side dry-run used to calculate the manifest diff to be a server-side dry-run. Do we know of any examples where switching from client dry-run to server dry-run would break existing usage? The goal of the dry-run manifest generation is to simulate as closely as possible a real upgrade. So my personal feeling is that since a real upgrade obviously contacts the server and supports templates with the Can you try using the code from #344 to see if it would resolve this issue? |
@miquella If you have a normal helm operator this feature already in operator-sdk, and can be enabled by updating the |
@joelanford: I apparently crossed some wires in my research 😝 It was pointed out to me that our current solution isn't currently using the hybrid operator, so I don't think I have a way to test with your change. But it sounds like we can take advantage of the solution that @acornett21 suggested! Thanks, guys! |
Hey, I have been building a hybrid operator (https://gitlab.com/GeorgeRaven/authentik-manager/-/tree/staging/operator) with the helm controller which attempts to reconcile one of my helm charts (https://gitlab.com/GeorgeRaven/authentik-manager/-/tree/staging/operator/helm-charts/ak).
This works overall however on closer inspection I note that the reconciler is repeatedly trying to reconcile a resource that it later recognizes as being correct or requiring no change. This leads to extremely high resource usage.
I believe this is related to helm lookup functions, and attempting to impute values from already existing resources. The reason I say this is because the affected resource in particular appear to be one of my secrets:
Which I believe may be affecting this check which is being met causing repeat reconciles:
helm-operator-plugins/pkg/reconciler/reconciler.go
Lines 727 to 730 in ebfbea8
In contrast the actual actionclient reconciler is likely properly generating an empty patch:
helm-operator-plugins/pkg/client/actionclient.go
Lines 276 to 313 in ebfbea8
This also affects my sub-charts (Bitnami redis, Bitnami postgres) which also use the lookup function and fail although are disabled here to focus only on the clearer example.
Here is a full log of a run where I apply my watched for helm resource:
I can confirm it is related to the secret as when it is disabled I have no such reconcilliation problems:
If I am doing anything particularly wrong please let me know. I will try to see if there is a specific in-code fix possible and potentially PR it if I find one.
The text was updated successfully, but these errors were encountered: