-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(WIP) Add a draft spec for RMN OffChain Blessing #1043
Conversation
d6610f5
to
2223947
Compare
2223947
to
0b67563
Compare
0b67563
to
80558ae
Compare
80558ae
to
a1d27cf
Compare
a1d27cf
to
31ec17f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall makes sense but is a bit tough to follow. There is an implicit state machine in the existing impl that basically matches what you have laid out here:
- Determine intervals to be used in the next round
- If previous outcome is not nil, read messages from readable chains, include those in the observation.
- In outcome, agree on intervals.
re: your point here:
the complications that can arise if a report is not successfully transmitted (as we explicitly only continue once we know the previous report has been committed).
This is typically what ShouldAccept and ShouldTransmit are used for, it seems a bit odd to me to have states regarding report acceptance onchain.
Some things that would help me personally:
- State machine diagram with clear start and terminal states
- Handling the nil outcome in the spec code
case ReportGenerated(signed_intervals): | ||
return CommitReport(signed_intervals) | ||
case _: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to figure out how token/gas prices fit in this design.
pass | ||
|
||
# TODO: doc | ||
def query(self, previous_outcome: CommitOutcome) -> CommitQuery: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are missing validation,shouldTransmit,shouldAccept
phases, is it a followup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example we need to define what's happening after e.g. shouldAccept
returns false
. In that case and based on the previous outcome state we'll keep waiting (until reaching retries limit) for the report to be committed even though this will never happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't thought much about validation, shouldTransmit, shouldAccept
honestly. I don't think there's much we can do besides exhaust the "max committed seq num" checks. How acceptable that is depends on how often we expect shouldTransmit/shouldAccept to fail.
9df28ae
to
977d264
Compare
@@ -0,0 +1,382 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe move PR to https://github.com/smartcontractkit/chainlink-ccip
977d264
to
3132f5a
Compare
- Determine intervals to be used in the next round | ||
2. BuildingReport | ||
- Build a report from the intervals determined in the previous round | ||
3. CheckingForUpdatedMaxCommittedSeqNums |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be easier to follow the states if we name them in relation to the state of the report? Like SelectingMessagesForReport, BuildingReport, WaitingForReportInclusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
observed_merkle_roots = self.flatten_merkle_root_observations(observations) | ||
signed_intervals = self.get_signed_intervals_to_report(intervals, query.signed_intervals, | ||
observed_merkle_roots) | ||
return ReportGenerated(signed_intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its possible that signed_intervals ends up being empty (unlikely, but say briefly >f_chain nodes goes down in either RMN or CCIP after providing a set of intervals). For that case I think we'd want to go back to ChoosingIntervals to start gathering potentially a larger batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, great call out!
pass | ||
|
||
# The OCR3 implementation of Outcome | ||
def outcome( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth noting somewhere that if this returns an error repeatedly, we could get stuck in a state as if outcome errors the next round has the same prevOutcome. However since its a pure function we expect to fully control the error scenarios
# have different merkle roots for the same chain, this chain is not included in the output. Additionally, if there | ||
# are chains that don't require RMN support, these chains will be in observed_merkle_roots but not | ||
# rmn_signed_intervals, and will be included in the output (with an empty set of RMN signatures). | ||
def get_signed_intervals_to_report( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is worth spelling out in the spec. You'd need CCIP f_chain and RMN f_chain defined on class or as inputs. We can include them in config for now (can just copy in CommitConfig in the commit_plugin.py)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'll be simpler to implement this first in the Go code and then port it over
pass | ||
|
||
# Verify the RMN signatures on the given signed_intervals | ||
def verify_signed_intervals( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe to be called inside of get_signed_intervals_to_report? Then if an sig doesn't validate we exclude that observation but don't error
) -> Dict[ChainSelector, SignedInterval]: | ||
pass | ||
|
||
# Given a list of SequenceNumbersObservation, return a flattened consensus on the max committed sequence number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth noting same logic as here
func maxSeqNumsConsensus( |
3132f5a
to
367a8b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! here are some initial questions
def request_max_seq_nums( | ||
self, | ||
chains: List[ChainSelector] | ||
) -> Dict[ChainSelector, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be signed?
|
||
|
||
@dataclass | ||
class SignedInterval: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this name a little confusing, it's the root that's signed, no?
return ReportNotYetTransmitted(previous_max_committed_seq_nums, attempts + 1) | ||
|
||
# The OCR3 implementation of Report | ||
def report(self, outcome: CommitOutcome) -> Optional[CommitReport]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be interesting to split into multiple reports in case they get too large?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah potentially once we have enough chains s.t. roots can't easily fit in one report (long way away though)
observed_merkle_roots = self.flatten_merkle_root_observations(observations) | ||
signed_intervals = self.get_signed_intervals_to_report(intervals, query.signed_intervals, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we check that only nodes that have been assigned the role of reading from a chain include it in its intervals?
# If we are choosing the next intervals this round, we need to query RMN for the max uncommitted sequence | ||
# numbers it has for each source chain, so we can set appropriate upper ranges for our intervals. | ||
case SelectingIntervalsForReport(): | ||
rmn_max_seq_nums = self.rmn_client.request_max_seq_nums(self.all_source_chains) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this sync or async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed live - we want async because:
- Avoids long query timeout, which leads to longer deltaProgress (slowing OCR leader rotation)
- Can do the network calls themselves in parallel
pass | ||
|
||
# The OCR3 implementation of Outcome | ||
def outcome( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double checking,this function is pure, right?
return SequenceNumbersObservation(self.get_max_committed_seq_nums(), {}) | ||
|
||
# Given a list of MerkleRootObservations, return a flattened consensus on the merkle root for each source chain | ||
def flatten_merkle_root_observations(self, observations: List[CommitObservation]) -> Dict[ChainSelector, bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this function work internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max voted on root which has at least f+1 votes
# exhausted, or return ReportNotYetTransmitted with an incremented "attempts" value otherwise | ||
case WaitingForReportTransmission(previous_max_committed_seq_nums, attempts): | ||
max_committed_seq_nums = self.flatten_max_committed_seq_nums_observations(observations) | ||
if self.max_committed_seq_nums_are_updated(max_committed_seq_nums, previous_max_committed_seq_nums): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this evaluate to true whenever they change, or only when they match what we transmitted?
dest_chain: ChainSelector | ||
chain_readers: Dict[ChainSelector, ChainReader] | ||
f: int | ||
max_check_report_persisted_attempts: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider making this wall clock time?
consider checking vs SharedConfig
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
This PR was closed because it has been stalled for 10 days with no activity. |
Motivation
Solution