Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the cleanup command to manage finalizers #536

Closed
wants to merge 1 commit into from

Conversation

davidcassany
Copy link
Contributor

@davidcassany davidcassany commented Oct 9, 2023

This PR adds predelete helm hook in crds chart. This predelete hooks removes finalizers from machineinventories resources.

This implies the hook requires an RBAC setup. Hence this includes a role, serviceaccound and all that for the CRDS chart. Note we can't reuse the ones defined within the elemental-operator as they are likely to be already gone when this is executed. Moreover this way we can have some focused and limited set of permissions.

Fixes #515

@davidcassany davidcassany requested a review from a team as a code owner October 9, 2023 07:31
@davidcassany davidcassany marked this pull request as draft October 9, 2023 07:31
@codecov
Copy link

codecov bot commented Oct 9, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (961b0c5) 53.65% compared to head (771d048) 53.76%.
Report is 6 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #536      +/-   ##
==========================================
+ Coverage   53.65%   53.76%   +0.10%     
==========================================
  Files          39       39              
  Lines        5690     5690              
==========================================
+ Hits         3053     3059       +6     
+ Misses       2368     2364       -4     
+ Partials      269      267       -2     

see 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@kkaempf kkaempf added the kind/bug Something isn't working label Oct 9, 2023
@davidcassany davidcassany force-pushed the cleanup_command branch 9 times, most recently from e007922 to 5dd36a2 Compare October 9, 2023 12:52
@github-actions github-actions bot added the area/operator operator related changes label Oct 9, 2023
@davidcassany davidcassany force-pushed the cleanup_command branch 4 times, most recently from 1bcf564 to d3acc41 Compare October 9, 2023 20:23
@davidcassany davidcassany marked this pull request as ready for review October 9, 2023 20:23
This commit is to ensures helm is capable to delete custom resources
and its definition when uninstalling the CRDS chart.

Signed-off-by: David Cassany <[email protected]>
@fgiudici
Copy link
Member

Pretty nice technical solution with the pre-deletion job 👍🏼
I took some time thinking about this PR: it seems to me we are overloading the CRDS chart.
Having the CRDS chart to remove the finalizers added by the Operator (which is in a different chart) looks wrong to me.
The CRDS chart should be kept simple and just carry the CustomResourceDefinitions: also having the need to pull an image is something that looks wrong to me for a CRDS chart.

So, I see, what should we do then? The operator should not touch the objects when being removed, as one may install a newer version later and we don't want to mess with the existing resources.

Well, I would do nothing: if you remove the CRDS chart while having the resources around, you get stuck with the finalizers and you have to fix it manually. I would just document the procedure in our docs page.
How would that look?

@davidcassany
Copy link
Contributor Author

Indeed this is an aggressive approach and adding jobs to CRDs charts feels like an overkill.

To my eyes the real issue to solve here is the weird behavior when the finalizers are around an installation of both charts is executed, leaving some resources alive (including the related CRDs) alive marked for deletion. This is a really obfuscate situation as then when reinstalling the CRDs all works fine and then when reinstalling the elemental-operator as soon the operator starts reconciling it deletes all the previous resources in cascade including the CRD, leaving the operator running in a broken installation. To fix it a simple re-uninstall of both and reinstall of both does the trick, but it all feels poorly handled.

Another option I don't know if it is actually feasible is error out on new installation if a previous one was not fully cleaned. Something like fail on CRDs chart installation (or probably the operator chart install) if some CRD is already around and marked for deletion. If we could error out in such a case documenting a manual cleanup process should be enough IMHO. So at least finding a way to prevent silently broken installations.

@fgiudici
Copy link
Member

fgiudici commented Oct 19, 2023

[...]

Another option I don't know if it is actually feasible is error out on new installation if a previous one was not fully cleaned. Something like fail on CRDs chart installation (or probably the operator chart install) if some CRD is already around and marked for deletion. If we could error out in such a case documenting a manual cleanup process should be enough IMHO. So at least finding a way to prevent silently broken installations.

Well, if we put that logic in the CRDs chart, we would end up in (almost) the same issue (crds chart "overload").
What instead if we put the logic in the elemental-operator chart? If objects have finalizers we warn the user and stop the chart installation (and maybe we add a variable to force the installation in that case, self documented in the warning).

@davidcassany
Copy link
Contributor Author

Well, if we put that logic in the CRDs chart, we would end up in (almost) the same issue (crds chart "overload").
What instead if we put the logic in the elemental-operator chart? If objects have finalizers we warn the user and stop the chart installation (and maybe we add a variable to force the installation in that case, self documented in the warning).

@fgiudici yes i have been wondering about it too. Probably this is what we should do, just add some sort of pre-install check, so before the operator is installed include some sort of crds installation check, something verify all CRDs are there and without the deletion mark. If some deletion detected we could error out and instruct to either omit the hook or remove finalizers and reinstall CRDs chart. Feels also a bit convoluted but at least the logic would be in a more appropriate chart.

@davidcassany
Copy link
Contributor Author

closing in favor of #553, thanks for the reviews so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operator operator related changes kind/bug Something isn't working
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

e2e: cannot delete remaining Elemental cluster after uninstallation of operator
3 participants