Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch already existing versions on channel sync #533

Merged

Conversation

davidcassany
Copy link
Contributor

@davidcassany davidcassany commented Oct 3, 2023

This PR adds am owner label to the managedOSVersions created by a channel. This allows to easily fetch owned versions later on for a given channel. Having a map of owned versions allows to easily patch already existing versions.

Note this PR does not apply the label to already existing versions, I have not considered this case for simplicity and because the workaround is as simple as deleting the versions and re-sync again, or deleting and recreating channels. More over this is not a regression in any case.

Deleting ManagedOSVersions is not implemented as I am not sure we have a clear criteria to do so. In any case most of the logic is already there and it could be easily added later on if required.

Fixes #513

@davidcassany davidcassany requested a review from a team as a code owner October 3, 2023 14:41
@github-actions github-actions bot added the area/tests test related changes label Oct 3, 2023
@codecov
Copy link

codecov bot commented Oct 3, 2023

Codecov Report

Attention: 21 lines in your changes are missing coverage. Please review.

Comparison is base (662d618) 53.62% compared to head (4010e96) 53.65%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #533      +/-   ##
==========================================
+ Coverage   53.62%   53.65%   +0.03%     
==========================================
  Files          39       39              
  Lines        5643     5690      +47     
==========================================
+ Hits         3026     3053      +27     
- Misses       2351     2368      +17     
- Partials      266      269       +3     
Files Coverage Δ
controllers/managedosversionchannel_controller.go 81.56% <68.65%> (-3.65%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fgiudici
fgiudici previously approved these changes Oct 4, 2023
Copy link
Member

@fgiudici fgiudici left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch looks good.
Anyway, wondering... wouldn't be more straightforward to add a owner reference instead of an owner label?
Having the child resources of the owner deleted is something I would expect in any case (would make no sense to keep the generated managedOSVersion resources with a label meant to reference the ownership of a deleted resource).

@anmazzotti
Copy link
Contributor

I find the entire process a bit hard to read and follow.
I see now that we do create a Pod to "load" the managed os version channel image and dump the json content into the logs, so that we can read them to extract such json and patch it to all managed os versions.

I wonder if we can find maybe a simplified approach for this, like maintaining a static .json file somewhere. Even though that would most likely complicate a bit any airgap scenario (since it will require a blob storage or anything else to serve the file)

Regarding this patch however I have doubts regarding deleting old managed os version. If the updated channel does not include existing (and owned) versions, should they be deleted or not? I assume not for safety, but long term this may be a problem. I guess users can delete them manually.

Still approving despite the questions because I think the PR is ok.

@anmazzotti
Copy link
Contributor

The patch looks good. Anyway, wondering... wouldn't be more straightforward to add a owner reference instead of an owner label? Having the child resources of the owner deleted is something I would expect in any case (would make no sense to keep the generated managedOSVersion resources with a label meant to reference the ownership of a deleted resource).

I think the owner reference is already there, the label is to filter search only.

@davidcassany
Copy link
Contributor Author

The patch looks good. Anyway, wondering... wouldn't be more straightforward to add a owner reference instead of an owner label?

We already have the owner reference, we always had it. But we can't query owned resources in k8s, for that you have to list them all and find in references, which are in turn another array. It is a O(n^2) operation. We could argue we will never have such a big amount of references, but still, the implementation without the labels would have been uglier.

Having the child resources of the owner deleted is something I would expect in any case (would make no sense to keep the generated managedOSVersion resources with a label meant to reference the ownership of a deleted resource).

This is already the case.

@davidcassany
Copy link
Contributor Author

I find the entire process a bit hard to read and follow.
I see now that we do create a Pod to "load" the managed os version channel image and dump the json content into the logs, so that we can read them to extract such json and patch it to all managed os versions.

This out of scope of this PR. This has been always the case with channels, in #529 this got refactored, but the pod was already there dumping content into stdout. The logic just moved from syncer package directly to the controller, this way the controller owns the whole process, it was just hidden in an insane Sync method before.

I wonder if we can find maybe a simplified approach for this, like maintaining a static .json file somewhere. Even though that would most likely complicate a bit any airgap scenario (since it will require a blob storage or anything else to serve the file)

That's already possible, that what the JSON syncer does. And we used to have it downloaded within the reconcile loop which I consider a really bad choice (that was also fixed in #529). Sandboxing the synchronization within a separate process (Pods are like a good extensible and scalable approach in a k8s env) is also a matter of performance and security. The reconcile loop shouldn't ever block downloading stuff from internet.

@davidcassany davidcassany force-pushed the patch_owned_managedosversions branch from 582bdde to d61c309 Compare October 5, 2023 16:01
@davidcassany davidcassany requested a review from fgiudici October 5, 2023 16:01
@davidcassany davidcassany force-pushed the patch_owned_managedosversions branch from d61c309 to 401ee30 Compare October 5, 2023 16:16
@@ -140,7 +140,7 @@ func (r *ManagedOSVersionChannelReconciler) reconcile(ctx context.Context, manag
meta.SetStatusCondition(&managedOSVersionChannel.Status.Conditions, metav1.Condition{
Type: elementalv1.ReadyCondition,
Reason: elementalv1.InvalidConfigurationReason,
Status: metav1.ConditionTrue,
Status: metav1.ConditionFalse,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it also show up in UI, this status true|fase was a bit inconsistent before. So it is only in ready state after a successful sync, and on creation syncing is the first thing it does. It don't think we need multiple conditions to track resource creation per se and synchronization status.

Also what UI tracks is the ready condition.

@@ -204,18 +204,22 @@ func (r *ManagedOSVersionChannelReconciler) reconcile(ctx context.Context, manag
return ctrl.Result{}, err
}

return r.handleSyncPod(ctx, pod, managedOSVersionChannel, interval)
return r.handleSyncPod(ctx, pod, managedOSVersionChannel, interval), nil
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the pod started we never return error, at most track the failed status in ready condition. This is to prevent error infinite loops. If we return an error here we directly trigger a new reconcile loop which will create a new attempt to sync.

if err != nil {
logger := ctrl.LoggerFrom(ctx)
now := metav1.Now()
ch.Status.LastSyncedTime = &now
Copy link
Contributor Author

@davidcassany davidcassany Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we essentially log the error and set lastsynctime to current time, so it also tracks the time it failed.

case corev1.PodSucceeded:
data, err = r.syncerProvider.ReadPodLogs(ctx, r.kcl, pod, displayContainer)
if err != nil {
return ctrl.Result{}, err
return ctrl.Result{RequeueAfter: interval}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On error we just requeue to the next synchronization interval. The status (failed or not) is tracked in logs and in ready condition.

@davidcassany davidcassany force-pushed the patch_owned_managedosversions branch from 401ee30 to 4010e96 Compare October 5, 2023 16:17
Copy link
Contributor

@anmazzotti anmazzotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@davidcassany davidcassany merged commit b359b1e into rancher:main Oct 6, 2023
14 checks passed
@davidcassany davidcassany deleted the patch_owned_managedosversions branch October 6, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests test related changes
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Support patching (and deleting) managedOSVersions form channel
3 participants