-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
awscloud/secure-instance: fix cleaning up fleets with creation errors #4489
base: main
Are you sure you want to change the base?
Conversation
We're seeing some behaviour where create fleet is not retried and subsequently the SI cleanup fails due to the security group already being tied to an existing instance. There is no error that an instance was launched anyway.
By surfacing the output even in case of an error, the fleet ID and instance ID can be extracted if present. Thus the instance can be terminated before its dependencies are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just minor things
if len(createFleetOutput.Instances) > 0 && len(createFleetOutput.Instances[0].InstanceIds) > 0 { | ||
secureInstance.InstanceID = createFleetOutput.Instances[0].InstanceIds[0] | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could as well move this block before the if err != nil {
and comment that we need this info regardless of err or not and remove the code duplication below, right?
but I'm fine with this, too
@@ -639,6 +650,7 @@ func doCreateFleetRetry(cfOutput *ec2.CreateFleetOutput) (bool, []string) { | |||
msg := []string{} | |||
retry := false | |||
for _, err := range cfOutput.Errors { | |||
logrus.Infof("Comparing create fleet error %s (msg: %s)", *err.ErrorCode, *err.ErrorMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe more descriptive why we compare?
logrus.Infof("Comparing create fleet error %s (msg: %s)", *err.ErrorCode, *err.ErrorMessage) | |
logrus.Infof("Checking to retry fleet create on error %s (msg: %s)", *err.ErrorCode, *err.ErrorMessage) |
By surfacing the output even in case of an error, the fleet ID and
instance ID can be extracted if present. Thus the instance can be
terminated before its dependencies are deleted.
We're seeing some behaviour where create fleet is not retried and subsequently the SI cleanup fails due to the security group already being tied to an existing instance. There is no error that an instance was launched anyway.