Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes a rare race condition that only happened in enviornments where the same app is migrated many times.
In particular, this bug only appeared when the same application migrated away from one host, and then migrated back into it. Migrating into a new host (wrt the previous scheduling decision) requires one of the migrated-to ranks to run the world initialisation to set the local-remote leaders and in-memory queues. However, the second migration above was not triggering the "new world" migration procedure because the world had lingered in the per-node registry.
This bug materialised in applications having an old version of the host-port mappings, and failing to start.
The fix involves knowing when we are evicting a host for a given world id, and clearing it from the registry if so.