Revert "operator: always delete terminated pending nodes (#2545)" #2596

3u13r · 2023-11-13T18:42:09Z

This reverts commit 5267ad0.

Context

Proposed change(s)

We need to revert this. With this patch we've observed 30+ nodes being scheduled during the upgrade of a 1:1 cluster on GCP. This presumably happens because if a node is not found then it's state within the operator is "Terminated". This likely happens in the first seconds of the new node being created. With is patch the "Terminated" node already has reached its goal. It's true goal was joining the cluster though.

The idea before (and after) this patch is that the joiningNode has a deadline set until it must join the cluster. Otherwise it's goal is set to "Leaving" which is reached if the node never joined the cluster because it was shut down. This should've already handled the AWS case.

Additional info

Checklist

Add labels (e.g., for changelog category)
Link to Milestone

This reverts commit 5267ad0.

netlify · 2023-11-13T18:42:25Z

✅ Deploy Preview for constellation-docs canceled.

Name	Link
🔨 Latest commit	`c2e2ae3`
🔍 Latest deploy log	https://app.netlify.com/sites/constellation-docs/deploys/65526e036c278b00084f3205

malt3 · 2023-11-13T18:45:43Z

I agree that reverting is better. However, we should not forget about the AWS bug, where we will now schedule too many nodes. The AWS case is not handled because of weird behavior of the AWS state machine (the per instance state can be unexpected).

msanft

approving here while workaround for AWS should be implemented separately

Revert "operator: always delete terminated pending nodes (#2545)"

c2e2ae3

This reverts commit 5267ad0.

3u13r added the no changelog Change won't be listed in release changelog label Nov 13, 2023

3u13r added this to the v2.13.0 milestone Nov 13, 2023

3u13r requested a review from malt3 as a code owner November 13, 2023 18:42

3u13r requested a review from daniel-weisse November 13, 2023 18:45

msanft approved these changes Nov 13, 2023

View reviewed changes

3u13r merged commit 56ab3e9 into main Nov 13, 2023
8 checks passed

3u13r deleted the fix/revert/2545 branch November 13, 2023 19:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "operator: always delete terminated pending nodes (#2545)" #2596

Revert "operator: always delete terminated pending nodes (#2545)" #2596

3u13r commented Nov 13, 2023 •

edited

Loading

netlify bot commented Nov 13, 2023

malt3 commented Nov 13, 2023 •

edited

Loading

msanft left a comment

Revert "operator: always delete terminated pending nodes (#2545)" #2596

Revert "operator: always delete terminated pending nodes (#2545)" #2596

Conversation

3u13r commented Nov 13, 2023 • edited Loading

Context

Proposed change(s)

Additional info

Checklist

netlify bot commented Nov 13, 2023

✅ Deploy Preview for constellation-docs canceled.

malt3 commented Nov 13, 2023 • edited Loading

msanft left a comment

Choose a reason for hiding this comment

3u13r commented Nov 13, 2023 •

edited

Loading

malt3 commented Nov 13, 2023 •

edited

Loading