Skip to content

Commit

Permalink
server: fix panic if heartbeat reset happens for GC'd node (#23383)
Browse files Browse the repository at this point in the history
When setting up the timer for heartbeat invalidation, there's no control that
allows us to remove that timer when the node is GC'd. If the GC window is narrow
enough, it's possible to GC a node that has a waiting heartbeat timer. In this
case, we hit a bug where querying for the node returns `nil` and this is
incorrectly handled when checking for disconnect/reconnect state. Fix this bug
by correctly handling a `nil` node and allowing the `Node.Update` RPC to fire
normally (which then errors correctly).

Fixes: #23376
Ref: https://hashicorp.atlassian.net/browse/NET-10109
  • Loading branch information
tgross authored Jun 20, 2024
1 parent ca97aa5 commit ee48bdd
Show file tree
Hide file tree
Showing 2 changed files with 7 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .changelog/23383.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:bug
server: Fixed a bug where expiring heartbeats for garbage collected nodes could panic the server
```
4 changes: 4 additions & 0 deletions nomad/heartbeat.go
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,10 @@ func (h *nodeHeartbeater) disconnectState(id string) (bool, bool) {
h.logger.Error("error retrieving node by id", "error", err)
return false, false
}
if node == nil {
h.logger.Error("node not found", "node_id", id)
return false, false
}

// Exit if the node is already down or just initializing.
if node.Status == structs.NodeStatusDown || node.Status == structs.NodeStatusInit {
Expand Down

0 comments on commit ee48bdd

Please sign in to comment.