-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS provider can't delete VPC if dependencies are present #152
Comments
Also the cluster deletion process may fall into an infinite loop of other AWSCluster resource removal, i.e.: |
This E2E test has reproduced the issue: https://github.com/Mirantis/hmc/actions/runs/10800786965/job/29959547787 Here's the log artifacts from the test: e2e-test.zip In this case we create a The |
@squizzi have you enabled the ExternalResourceGC feature gate? |
@Kshatrix nope, didn't realize that was a feature gate. I'll try, perhaps we should be making that a default annotation in our templates if it works? |
Unfortunately, the issue I opened my ticket that got merged into this one is still present:
The elb has been deleted, but the sg associated with the elb is holding the vpc hostage.
Note the |
Looking further, I don't even see external GC running, none of the logs that start the gc_service are showing up, and the GC service is supposed to get registered prior to if r.ExternalResourceGC {
gcSvc := gc.NewService(clusterScope, gc.WithGCStrategy(r.AlternativeGCStrategy))
if gcErr := gcSvc.ReconcileDelete(ctx); gcErr != nil {
allErrs = append(allErrs, fmt.Errorf("failed delete reconcile for gc service: %w", gcErr))
}
}
if err := networkSvc.DeleteNetwork(); err != nil {
allErrs = append(allErrs, errors.Wrap(err, "error deleting network"))
} Based on this, I guess it was just pure luck that the igw deletion was better. I'm going to build a custom CAPA image with some debug logging added and see if I can get to the bottom of this, I guess I should own this bug too. |
* Bump aws templates to 0.1.3 * Remove testing around CCM features from hosted template until fixed Closes: #152 Signed-off-by: Kyle Squizzato <[email protected]>
Alright, I figured out what was going on and fixed how we enable the feature gate in the provider. It appears to have fixed the issue 🎉, but when deleting even the standalone cluster we end up with orphaned I've added a commit to the testing PR which adds these changes so they can be tested when #242 merges. If all goes well over there we can most likely consider this resolved when that merges. |
* Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Delete csi-driver, ccm validation tests from aws-hosted-cp test until #290 is resolved so that we don't get stuck there and can properly test deletion. Closes: #152 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. Closes: #212 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. Closes: #212 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. * Make dev-aws-nuke target less PHONY. Closes: #212 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file. * Try to simplify the validation process for lots of different providers with different requirements. * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. * Make dev-aws-nuke target less PHONY. Closes: #212 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file. * Try to simplify the validation process for lots of different providers with different requirements. * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. * Make dev-aws-nuke target less PHONY. Closes: #212 Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file. * Try to simplify the validation process for lots of different providers with different requirements. * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: #152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: #290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. * Make dev-aws-nuke target less PHONY. * Only build linux/amd64 arch since CI does not need arm. Signed-off-by: Kyle Squizzato <[email protected]>
* Break KubeClient helpers into provider specific file. * Try to simplify the validation process for lots of different providers with different requirements. * Finish aws-hosted-cp test and add comments through test to make it easier to understand. * Use GinkgoHelper across e2e tests, populate hosted vars from AWSCluster. * No longer rely on local registry for images in test/e2e. * Support OS for awscli install. * Prepend hostname to collected log artifacts. * Support no cleanup of provider specs, differentiate ci cluster names. * Add docs on running tests, do not wait for all providers if configured. * Reinstantiate resource validation map on each instance of validation. * Enable the external-gc feature via annotation, featureGate bool. (Closes: K0rdent#152) * Bump aws-*-cp templates to 0.1.3 * Bump cluster-api-provider-aws template to 0.1.2 * Improve test logging to log template name and validation phase. * Bump k0s version to v1.30.4+k0s.0, set CCM nodeSelector to null for aws-hosted-cp. (Closes: K0rdent#290) * Break cleanup into seperate job so that it is unaffected by concurrency group cancellations. * Make dev-aws-nuke target less PHONY. * Only build linux/amd64 arch since CI does not need arm. Signed-off-by: Kyle Squizzato <[email protected]>
CAPI AWS provider can’t delete VPC if any resource (like LB or security group) is present in there.
As a result cluster will stuck in the
Deleting
state indefinitely. In the logs there is no clue regarding which resource is preventing deletion, so it could be only done manually.We should research if there is a possibility to recursively delete all resources in VPC using the provider or fix is required in AWS provider code base.
The text was updated successfully, but these errors were encountered: