Recover pipeline from brutal shutdown (checkpointing) #516
Closed
chubei
started this conversation in
Feature Requests
Replies: 2 comments 3 replies
-
Maybe this is unnecessarily complicated. With the |
Beta Was this translation helpful? Give feedback.
0 replies
-
@chubei Why not maintain a log of all |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Precondition
DAG is
partially consistent
, which meanscheckpoint
.checkpoint
is less than or equal to all its parents.checkpoint
is greater than or euqal to all other source nodes.Checkpoint
A
checkpoint
is aHashMap
from source node handles to (txn id, seq number) pairs. We definecheckpoint
's compare function as followsEqual
, returnEqual
.Less
and other values compareEqual
, returnLess
.Greater
and other values compareEqual
, returnGreater
.Postcondition
DAG is
fully consistent
, which meanscheckpoint
s are equal to that of the maximum node.Algorithm
checkpoint
target
.checkpoint
, if it's less than that intarget
, run Source Replay subroutine.checkpoint
is guaranteed to betarget
.checkpoint
s are guaranteed to be equal totarget
.checkpoint
is equal totarget
, skip.checkpoint
is guaranteed to betarget
.Transaction Log
Source Replay and Receiver Replay subroutines read from and write to transaction logs.
Multiple transaction log entries are maintained for each output port. A transaction log entry looks like following
We guarantee
entry.from < entry.to
.If we sort all the transaction log entries by
from
, we guaranteeentries[i + 1].from == entries[i].to
.A node's
checkpoint
isentries.last().unwrap().to
.Source Replay
Input
current
checkpointtarget
checkpointdyn Source
.Precondition
current < target
.current
contains an entry for this node. Call its value(current txn, current_seq)
.target
contains an entry for this node. Call its values(target_txn, target_seq)
.Postcondition
current
totarget
for all output ports are added to the database.Algorithm
Start the source from
(current txn, current_seq)
, when it reaches(target_txn, target_seq)
, commit the transaction log entry and stop the source.Receiver Replay
Input
dyn Processor
ordyn Sink
.current
checkpoint.target
checkpoint.Precondition
entries.first().unwrap().from == current
,entries.last().unwrap().to ==
targetand
entries[i + 1].from == entries[i].to. Call such log entries "replay entries from
currentto
target`".Postcondition
current
totarget
for all output ports are added to the database.Algorithm
current
totarget
.ops
in all the entries and commit the transaction log entry.@snork-alt Check this?
Beta Was this translation helpful? Give feedback.
All reactions