Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

approval-voting: Fix sending of assignments after restart #6973

Merged
merged 7 commits into from
Jan 14, 2025

Conversation

alexggh
Copy link
Contributor

@alexggh alexggh commented Dec 20, 2024

There is a problem on restart where nodes will not trigger their needed assignment if they were offline while the time of the assignment passed.

That happens because after restart we will hit this condition

if !t.triggered() && t.tranche() > considered {
and considered will be tick_now which is already higher than the tick of our assignment.

The fix is to schedule a wakeup for untriggered assignments at restart and let the logic of processing an wakeup decide if it needs to trigger the assignment or not.

One thing that we need to be careful here is to make sure we don't schedule the wake up immediately after restart because, the node would still be behind with all the assignments that should have received and might make it wrongfully decide it needs to trigger its assignment, so I added a RESTART_WAKEUP_DELAY: Tick = 12 which should be more than enough for the node to catch up.

Signed-off-by: Alexandru Gheorghe <[email protected]>
Copy link
Member

@ordian ordian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

polkadot/node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
polkadot/node/core/approval-voting/src/lib.rs Outdated Show resolved Hide resolved
@alexggh alexggh added the A4-needs-backport Pull request must be backported to all maintained releases. label Jan 13, 2025
@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12750710096
Failed job name: test-linux-stable-no-try-runtime

@alexggh alexggh enabled auto-merge January 14, 2025 16:12
@alexggh alexggh added this pull request to the merge queue Jan 14, 2025
Merged via the queue into master with commit d38bb95 Jan 14, 2025
192 of 204 checks passed
@alexggh alexggh deleted the alexggh/retrigger_assignments_after_restart branch January 14, 2025 17:44
github-actions bot pushed a commit that referenced this pull request Jan 14, 2025
There is a problem on restart where nodes will not trigger their needed
assignment if they were offline while the time of the assignment passed.

That happens because after restart we will hit this condition
https://github.com/paritytech/polkadot-sdk/blob/4e805ca05067f6ed970f33f9be51483185b0cc0b/polkadot/node/core/approval-voting/src/lib.rs#L2495
and considered will be `tick_now` which is already higher than the tick
of our assignment.

The fix is to schedule a wakeup for untriggered assignments at restart
and let the logic of processing an wakeup decide if it needs to trigger
the assignment or not.

One thing that we need to be careful here is to make sure we don't
schedule the wake up immediately after restart because, the node would
still be behind with all the assignments that should have received and
might make it wrongfully decide it needs to trigger its assignment, so I
added a `RESTART_WAKEUP_DELAY: Tick = 12` which should be more than
enough for the node to catch up.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: ordian <[email protected]>
Co-authored-by: Andrei Eres <[email protected]>
(cherry picked from commit d38bb95)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2407:

github-actions bot pushed a commit that referenced this pull request Jan 14, 2025
There is a problem on restart where nodes will not trigger their needed
assignment if they were offline while the time of the assignment passed.

That happens because after restart we will hit this condition
https://github.com/paritytech/polkadot-sdk/blob/4e805ca05067f6ed970f33f9be51483185b0cc0b/polkadot/node/core/approval-voting/src/lib.rs#L2495
and considered will be `tick_now` which is already higher than the tick
of our assignment.

The fix is to schedule a wakeup for untriggered assignments at restart
and let the logic of processing an wakeup decide if it needs to trigger
the assignment or not.

One thing that we need to be careful here is to make sure we don't
schedule the wake up immediately after restart because, the node would
still be behind with all the assignments that should have received and
might make it wrongfully decide it needs to trigger its assignment, so I
added a `RESTART_WAKEUP_DELAY: Tick = 12` which should be more than
enough for the node to catch up.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: ordian <[email protected]>
Co-authored-by: Andrei Eres <[email protected]>
(cherry picked from commit d38bb95)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2409:

github-actions bot pushed a commit that referenced this pull request Jan 14, 2025
There is a problem on restart where nodes will not trigger their needed
assignment if they were offline while the time of the assignment passed.

That happens because after restart we will hit this condition
https://github.com/paritytech/polkadot-sdk/blob/4e805ca05067f6ed970f33f9be51483185b0cc0b/polkadot/node/core/approval-voting/src/lib.rs#L2495
and considered will be `tick_now` which is already higher than the tick
of our assignment.

The fix is to schedule a wakeup for untriggered assignments at restart
and let the logic of processing an wakeup decide if it needs to trigger
the assignment or not.

One thing that we need to be careful here is to make sure we don't
schedule the wake up immediately after restart because, the node would
still be behind with all the assignments that should have received and
might make it wrongfully decide it needs to trigger its assignment, so I
added a `RESTART_WAKEUP_DELAY: Tick = 12` which should be more than
enough for the node to catch up.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: ordian <[email protected]>
Co-authored-by: Andrei Eres <[email protected]>
(cherry picked from commit d38bb95)
@paritytech-cmd-bot-polkadot-sdk

Successfully created backport PR for stable2412:

EgorPopelyaev pushed a commit that referenced this pull request Jan 15, 2025
Backport #6973 into `stable2409` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Alexandru Gheorghe <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Jan 15, 2025
Backport #6973 into `stable2412` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Co-authored-by: Alexandru Gheorghe <[email protected]>
EgorPopelyaev pushed a commit that referenced this pull request Jan 15, 2025
Backport #6973 into `stable2407` from alexggh.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: Alexandru Gheorghe <[email protected]>
ordian added a commit that referenced this pull request Jan 16, 2025
* master: (33 commits)
  Implement `pallet-asset-rewards` (#3926)
  [pallet-revive] Add host function `to_account_id` (#7091)
  [pallet-revive] Remove revive events (#7164)
  [pallet-revive] Remove debug buffer (#7163)
  litep2p: Provide partial results to speedup GetRecord queries (#7099)
  [pallet-revive] Bump asset-hub westend spec version (#7176)
  Remove 0 as a special case in gas/storage meters (#6890)
  [pallet-revive] Fix `caller_is_root` return value (#7086)
  req-resp/litep2p: Reject inbound requests from banned peers (#7158)
  Add "run to block" tools (#7109)
  Fix reversed error message in DispatchInfo (#7170)
  approval-voting: Make importing of duplicate assignment idempotent (#6971)
  Parachains: Use relay chain slot for velocity measurement (#6825)
  PRDOC: Document `validate: false` (#7117)
  xcm: convert properly assets in xcmpayment apis (#7134)
  CI: Only format umbrella crate during umbrella check (#7139)
  approval-voting: Fix sending of assignments after restart (#6973)
  Retry approval on availability failure if the check is still needed (#6807)
  [pallet-revive-eth-rpc] persist eth transaction hash (#6836)
  litep2p: Sufix litep2p to the identify agent version for visibility (#7133)
  ...
Nathy-bajo pushed a commit to Nathy-bajo/polkadot-sdk that referenced this pull request Jan 21, 2025
…#6973)

There is a problem on restart where nodes will not trigger their needed
assignment if they were offline while the time of the assignment passed.

That happens because after restart we will hit this condition
https://github.com/paritytech/polkadot-sdk/blob/4e805ca05067f6ed970f33f9be51483185b0cc0b/polkadot/node/core/approval-voting/src/lib.rs#L2495
and considered will be `tick_now` which is already higher than the tick
of our assignment.

The fix is to schedule a wakeup for untriggered assignments at restart
and let the logic of processing an wakeup decide if it needs to trigger
the assignment or not.

One thing that we need to be careful here is to make sure we don't
schedule the wake up immediately after restart because, the node would
still be behind with all the assignments that should have received and
might make it wrongfully decide it needs to trigger its assignment, so I
added a `RESTART_WAKEUP_DELAY: Tick = 12` which should be more than
enough for the node to catch up.

---------

Signed-off-by: Alexandru Gheorghe <[email protected]>
Co-authored-by: ordian <[email protected]>
Co-authored-by: Andrei Eres <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A4-needs-backport Pull request must be backported to all maintained releases.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants