storcon: separate drivers for pageserver and safekeeper heartbeats #10967
Labels
a/tech_debt
Area: related to tech debt
c/storage/controller
Component: Storage Controller
c/storage
Component: storage
The storage controller heartbeats al pageservers and safekeepers in a region.
The heartbeats are sent concurrently here, but the handling code below waits
for both futures to complete.
This means that handling of pageserver heartbeats can stall on safekeeper heartbeats
and vice-versa. Currently, some things rely on up-to date pageserver node availability
(e.g. transition from warming up state to active block filling the node after restart).
There's a time-out in place for the safekeeper heartbeats, but since that's likely not a long
term solution we should have separate heartbeat drivers. The cost is a new tokio task that's
idle for most of the time.
The text was updated successfully, but these errors were encountered: