The drainer is a component that runs on the leader that services requests from
the nomad node drain
command and related workflows in the web UI. For a
play-by-play from the user's perspective, see node drain tutorial. This
document describes the internals of the drainer for Nomad developers.
The high-level workflow is that:
- The user sets the drain state of the Client ("Node") in the state store.
- Allocations are migrated according to their
migrate
block. - The drainer creates a watcher for the Node to fire an event when the work is done or the drain's deadline is reached.
- The drainer creates watchers for each job's allocs on the Node, to fire progress events.
Effectively the drainer marks allocations for migration and emits an eval, and then lets the scheduler take it from there.
There are four major components of the drainer:
-
NodeDrainer
: The entrypoint struct for thenomad/drainer
package. This struct runs a top-level event loop that's enabled only on the leader. It's configured with the three "watcher" interfaces described below. -
DrainingNodeWatcher
: A watcher interface implemented by thenodeDrainWatcher
struct inwatch_nodes.go
. Runs a loop that watches for changes to Nodes. If a Node change transitions a Node to draining, theDrainingNodeWatcher
adds the Node to its tracker. It queries the state store to get the jobs that have allocations running on that Node, and registers those jobs with theDrainingJobWatcher
. -
DrainingJobWatcher
: A watcher interface implemented by thedrainingJobWatcher
struct inwatch_jobs.go
. Runs a loop that watches jobs registered by theDrainingNodeWatcher
and allocations in the state store. The job watcher is where the job'smigrate
block is handled so that only the correct number of allocations are being drained at a time. The job watcher exposes two methods that return channels:-
Drain()
returns a channel that produces aDrainRequest
for all the allocs on these jobs that need to be drained. TheNodeDrainer
turns this request into Raft writes viaAllocUpdateDesiredTransition
. -
Migrated()
returns a channel that produces slices of allocations that have completed migration. "Completed" should mean exactly what the end user would expect; the replacement allocations have been placed,ephemeral_disk
has been migrated (if possible), and the old allocation is fully stopped.
-
-
DrainDeadlineNotifier
: A watcher interface implemented by thedeadlineHeap
struct indrain_heap.go
. Runs a loop that tracks the Nodes being drained against their deadline timers. TheNodeDrainer
can watch the channel returned by theNextBatch
method to get slices of Nodes that have failed to complete their migrations by the deadline.
There is also a collection of other minor components important to understanding the workflow:
-
Raft shims: Because the
nomad/drainer
package is not in the same package as the server code, the server configures theNodeDrainer
with shim functions that close over the small set of Raft apply functions the drainer needs. For this reason they are located innomad/drainer_shims.go
rather than thenomad/drainer
package.-
AllocUpdateDesiredTransition
includes allocation desired status changes and the evaluations that will need to be processed. -
NodesDrainComplete
includes updates for the drained Node.
-
-
drainingNode
: This struct represents the state of a single Node whose drain is being tracked. Created by theDrainingNodeWatcher
whenever a Node is marked for draining in the state store. -
DrainRequest
: This struct represents a set of allocations that should be marked for drain. Created byDrainingJobWatcher
whenever it receives a job to drain.
A note on code style: the drainer is implemented with an unusual amount of
dependency injection via factory functions that return interfaces because it has
to handle state and raft writes without being in the top-level nomad
package
itself. It also can't import the top-level nomad
package because the drainer
is instantiated by the server, and that would create a circular
import. Generally speaking we don't want to emulate this style elsewhere in
Nomad because it makes implementation harder to follow, but it makes sense in
this limited case.
The components combine into three high-level flow of events. The first is the
flow of a newly draining Node. The NodeWatcher
gets the Node from a blocking
query. It registers the job with the JobWatcher
. The JobWatcher
determines
which allocations need draining. These are polled from the Drain()
channel by
NodeDrainer
and written to raft via the AllocUpdateDesiredTransition
shim. Then the scheduler and clients picks up the changes.
flowchart TD
%% entities
clients
scheduler
user(user)
NodeDrainer([NodeDrainer])
NodeWatcher([DrainingNodeWatcher])
JobWatcher([DrainingJobWatcher])
StateStore([state store])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class user,clients,scheduler,StateStore other;
class NodeDrainer,NodeWatcher,JobWatcher component;
user -. "1. enable drain\nfor node" .-> StateStore
StateStore -- "2. blocking query for\nnewly draining nodes" --> NodeWatcher
NodeWatcher -- "3. RegisterJobs(jobs)" --> JobWatcher
JobWatcher -- "4. Drain(): allocs for job that need draining" --> NodeDrainer
NodeDrainer -- "5. AllocUpdateDesiredTransition\n(raft shim)" --> StateStore
StateStore -. 6. EvalDequeue .-> scheduler
StateStore -. 7. GetAllocs .-> clients
The second is when allocation migrations are complete. The clients update the
state of the migrated allocs. The JobWatcher
has a blocking query that detects
these changes. Allocs that are done migrating get sent on the Migrated()
channel that's polled by the NodeDrainer
. The NodeDrainer
determines whether
the Node is done being drained, and writes an update via the
NodesDrainComplete
raft shim.
flowchart TD
%% entities
clients
StateStore([state store])
NodeDrainer([NodeDrainer])
JobWatcher([DrainingJobWatcher])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class clients,StateStore other;
class NodeDrainer,JobWatcher component;
clients -. "1. UpdateAlloc" .-> StateStore
StateStore -- "2. blocking query\nfor allocs" --> JobWatcher
JobWatcher -- "3. Migrated(): allocs that are done" --> NodeDrainer
NodeDrainer -- "4. NodesDrainComplete\n(raft shim)" --> StateStore
And the third is when Nodes pass their deadline. The NodeWatcher
is
responsible for adding and removing the watch in the DeadlineNotifier
. The
DeadlineNotifier
is responsible for watching the timer. If the Node isn't
removed before the deadline, the DeadlineNotifier
tells the NodeDrainer
and
the NodeDrainer
updates the state via the NodesDrainComplete
shim. At this
point the remaining allocs will be forced to shutdown immediately.
flowchart TD
%% entities
NodeDrainer([NodeDrainer])
NodeWatcher([DrainingNodeWatcher])
DeadlineNotifier([DrainDeadlineNotifier])
StateStore([state store])
%% style classes
classDef component fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class StateStore other;
class NodeDrainer,NodeWatcher,DeadlineNotifier component;
NodeWatcher -- "1. Watch()" --> DeadlineNotifier
NodeWatcher -- "2a. Remove()" --> DeadlineNotifier
DeadlineNotifier -- "2b. watch the clock" --> DeadlineNotifier
DeadlineNotifier -- "3. node has passed deadline" --> NodeDrainer
NodeDrainer -- "4. NodesDrainComplete\n(raft shim)" --> StateStore