Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static HelmRepository OCI #1243

Merged
merged 4 commits into from
Nov 22, 2023
Merged

Static HelmRepository OCI #1243

merged 4 commits into from
Nov 22, 2023

Conversation

darkowlzz
Copy link
Contributor

@darkowlzz darkowlzz commented Sep 27, 2023

To address #1249, this change makes the HelmRepository of type OCI static object, no status.
With this change HelmRepository OCI objects become:

apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: HelmRepository
metadata:
  creationTimestamp: "2023-09-26T13:59:24Z"
  generation: 3
  name: example
  namespace: default
  resourceVersion: "3713789"
  uid: 67efbe67-4290-4e7d-9c1c-e0fc01d5ee4a
spec:
  provider: generic
  type: oci
  url: oci://ghcr.io/stefanprodan/charts
status: {}

The empty status can't be removed as the API still has it for the default HelmRepository API.
This HelmRepository spec fields interval has been made optional with an internal default. Unless an interval is explicitly set, no interval is set on the object. The default value of timeout at the API level has been removed and an internal default is added. Unless a timeout is explicitly set, no timeout is set on the object.
Due to the default values in the HelmRepository API, the kubernetes API server automatically fills the .status.observedGeneration with value -1. This can be removed using a migration that can be performed by the HelmRepositoryReconciler. Every newly created object goes through the migration to be patched with an empty status.

For migration of the existing HelmRepository OCI objects, the HelmRepositoryReconciler performs the migration by removing finalizers and status. When converting the object from HelmRepository default to OCI, HelmRepositoryReconciler migrates the object by also removing any artifact that previously existed.

A new API level validation is added for the .spec.url field to ensure only http, https and oci URL schemes are allowed. This performs some validation to the URL that's lost by removing the status and the HelmRepositoryOCIReconciler.

The HelmRepositoryTypePredicate has be removed as all the HelmRepositories are now handled by the same reconciler. A new predicate HelmRepositoryOCIMigrationPredicate is introduced to allow migration reconciliation for objects that are of OCI type but need migration. This ensures that once migrated, these objects are not reconciled again. This predicate readily allows non-OCI HelmRepository events.

Also added a test TestHelmRepositoryReconciler_ociMigration to verify that the migration of objects are handled correctly in different scenarios.

Events related to object migration when the object type is switched:

Events:
  Type    Reason                      Age                 From               Message
  ----    ------                      ----                ----               -------
  Normal  NewArtifact                 83s                 source-controller  stored fetched index of size 63.32kB from 'https://stefanprodan.github.io/podinfo'
  Normal  Migration                   28s (x2 over 119s)  source-controller  removed artifact and finalizer to migrate to static HelmRepository of type OCI
  Normal  GarbageCollectionSucceeded  28s                 source-controller  garbage collected artifacts for deleted resource

Fix: #1249

@makkes
Copy link
Member

makkes commented Sep 27, 2023

Care to elaborate on the reason for this change?

@darkowlzz
Copy link
Contributor Author

Care to elaborate on the reason for this change?

@makkes I'll add more details in the PR description but this was a quick draft to discuss about the results of this experiment in the community meeting today.
In short, the HelmRepository OCI reconciler is only performing URL validation and registry host authentication. The authentication at HelmRepo level has been observed to be misleading. HelmChart may still encounter authentication failure. This is because HelmRepo OCI reconciler performs authentication only against the registry auth. It doesn't try to pull the chart. HelmRepository OCI also doesn't produce any artifact that HelmChart needs.
In line with Notification-controller Alerts and Providers becoming static in fluxcd/notification-controller#540, this too can become static.

@makkes
Copy link
Member

makkes commented Sep 27, 2023

Care to elaborate on the reason for this change?

@makkes I'll add more details in the PR description but this was a quick draft to discuss about the results of this experiment in the community meeting today.
In short, the HelmRepository OCI reconciler is only performing URL validation and registry host authentication. The authentication at HelmRepo level has been observed to be misleading. HelmChart may still encounter authentication failure. This is because HelmRepo OCI reconciler performs authentication only against the registry auth. It doesn't try to pull the chart. HelmRepository OCI also doesn't produce any artifact that HelmChart needs.
In line with Notification-controller Alerts and Providers becoming static in fluxcd/notification-controller#540, this too can become static.

For transparency's sake I would recommend creating an issue that describes the observed problems with the status field.

@darkowlzz darkowlzz force-pushed the helmrepo-oci-static branch 4 times, most recently from cb28864 to 4230894 Compare September 27, 2023 22:17
@darkowlzz darkowlzz added area/helm Helm related issues and pull requests area/oci OCI related issues and pull requests area/api API related issues and pull requests labels Sep 27, 2023
@darkowlzz darkowlzz force-pushed the helmrepo-oci-static branch 2 times, most recently from 93b3ac1 to 7b02790 Compare October 18, 2023 03:34
@darkowlzz darkowlzz force-pushed the helmrepo-oci-static branch from 7b02790 to 778b263 Compare October 31, 2023 19:58
@darkowlzz darkowlzz marked this pull request as ready for review October 31, 2023 20:19
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with kstatus and Flux kustomize-controller after the status is reset, will it be stuck forever in waiting? Assuming a change is made in desired state, KC will update the HelmRepo, the API will set the observed generation to -1, then SC will reset the observed generation to 0 but the actual generation is >1.

@darkowlzz
Copy link
Contributor Author

What happens with kstatus and Flux kustomize-controller after the status is reset, will it be stuck forever in waiting? Assuming a change is made in desired state, KC will update the HelmRepo, the API will set the observed generation to -1, then SC will reset the observed generation to 0 but the actual generation is >1.

I may be missing something but I'm not sure what exactly you mean by kstatus getting stuck in waiting. In fluxcd/notification-controller#540 (comment), we discussed about how empty status is considered ready by kstatus. And I also created fluxcd/flux2#4311 to verify that theory for alerts, providers and helmrepo OCI. The migration process ensure that status is removed from the object if anyone adds it back. Because I'm not sure about the concern, I tried computing the helmrepo object status using kstatus compute function. When the observed generation is -1, kstatus results in inprogress. When the client polls again and the object undergoes migration and status is removed, kstatus checkGeneration() which looks for status.observedGeneration results in found=false and observedGeneration=0, which returns an empty result, not in progress. The other checks that kstatus performs are based on ready condition. Since there's no condition, the object is assumed to be ready.
Does this answer the question or did I misunderstand the question?

@stefanprodan
Copy link
Member

The controller does not remove the status, instead it does obj.Status = helmv1.HelmRepositoryStatus{}, when kustomize-controller applies an update, the Kubernetes API will see an empty status and I guess it will apply the default from here: // +kubebuilder:default={"observedGeneration":-1}.

@darkowlzz
Copy link
Contributor Author

darkowlzz commented Nov 7, 2023

The controller does not remove the status, instead it does obj.Status = helmv1.HelmRepositoryStatus{}, when kustomize-controller applies an update, the Kubernetes API will see an empty status and I guess it will apply the default from here: // +kubebuilder:default={"observedGeneration":-1}.

As I mentioned above, even if this happens, the migration would make it work nicely. Any update will emit an event that'll trigger the migration.

But to investigate this behavior further, why don't I seeing any defaulting in the status on updates, I read some docs and tried a few things. As per https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#defaulting-and-nullable, defaulting happens based on the null value of fields. Our API fields are not nullable so a null value is removed unless they have a default value.

Null values for fields that either don't specify the nullable flag, or give it a false value, will be pruned before defaulting happens. If a default is present, it will be applied.

Because of this, when a new object is created without any status, the null value of the status is replaced by the configured default value by the API server. Hence, we see

status:
  observedGeneration: -1

null is how a field can be set to null or just removing the field also makes it null. But null is different from {}, which is more like zero value of a type. If I edit the status (I used kubectl edit --subresource=status ... to edit) to be {}, it doesn't result in any defaulting. The result is:

status: {}

If I again edit the status to remove {}:

status:

or add null:

status: null

it results in the default to be applied.

status:
  observedGeneration: -1

Since the HelmRepository OCI migration process sets the status to be {}, I think we are safe from any unexpected defaulting.

@stefanprodan
Copy link
Member

Since the HelmRepository OCI migration process sets the status to be {}, I think we are safe from any unexpected defaulting.

Ok great, thanks for investigating this.

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@darkowlzz darkowlzz force-pushed the helmrepo-oci-static branch 2 times, most recently from 4777769 to 2ca1247 Compare November 22, 2023 15:14
Remove the HelmRepositoryOCI reconciler and make HelmRepository of type
OCI static. The existing HelmRepository OCI objects are migrated to
static object by removing their finalizers and status. New
HelmRepository OCI objects go through one time migration to remove the
status. These are not reconciled again, unless the type is changed to
default. On type switching from HelmRepository default to OCI, the
finalizer, status and artifact are removed to make the object static. On
switching from OCI to default, a complete reconciliation of
HelmRepository takes place to build artifact and add status and
finalizer.

The HelmRepository .spec.url has a new validation to check the URL
scheme. This is to add some validation to HelmRepository OCI since it's
not backed by a reconciler for full validation.

Add HelmRepositoryOCIMigrationPredicate predicate to detect and allow
reconciliation of HelmRepository OCI objects that need migration. The
other predicates that filtered the HelmRepository events based on the
type have been removed as all the HelmRepositories will now be
reconciled by a single reconciler. HelmRepositoryOCIMigrationPredicate
readily allows non-OCI objects and only checks if a migration is needed
for OCI type object.

Add controller tests for different migration scenarios.

Signed-off-by: Sunny <[email protected]>
With static HelmRepository OCI, the interval become optional. Make
interval optional in the API. Introduce getters for interval, in the
form of GetRequeueAfter(), and timeout with internal default values.

HelmRepository will not have interval and timeout fields unless it's
explicitly set.

Signed-off-by: Sunny <[email protected]>
Although all the APIs had interval as a required field, when tests
objects were created, they had the zero value of interval, which the API
server accepts. A zero interval value results in the test objects to
reconcile only once when they are created and never reconcile again
unless there's an update to the object. Most of the tests worked with
this behavior.

With HelmRepository removing the interval requirement and adding an
internal default, all the HelmRepository objects created in the tests
without any interval have a default interval value which results in
objects to reconcile automatically if they are not cleaned up after
running tests. TestHelmRepositoryReconciler_InMemoryCaching and
TestHelmChartReconciler_Reconcile create HelmRepository but doesn't
delete it at the end. This leads to a reconciliation of HelmRepository
outside of the test in the envtest environment. It just happened to be
that the reconciliation time matches with the end of test time. At the
end of the test run, the reconcilers receive shutdown signal and any
test server, like helmrepository server, are stopped. A HelmRepository
reconciliation triggered just before the shutdown signal gets stuck in
the reconciliation. HelmRepository can't download the index as the test
index server has stopped and hangs for some time. The HelmRepository
reconciler worker remains in active state, unlike other reconciler
workers that shut down, resulting in the test to timeout at the end.

The is fixed by deleting the HelmRepository object created in
TestHelmRepositoryReconciler_InMemoryCaching and
TestHelmChartReconciler_Reconcile at the end of the test similar to
other tests.

Signed-off-by: Sunny <[email protected]>
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Went over it one more time, thanks @darkowlzz 🏅

@stefanprodan stefanprodan merged commit 936cfd6 into main Nov 22, 2023
10 checks passed
@stefanprodan stefanprodan deleted the helmrepo-oci-static branch November 22, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api API related issues and pull requests area/helm Helm related issues and pull requests area/oci OCI related issues and pull requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing HelmRepository OCI status
5 participants