Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded concurrency framework to support panic=unwind mode #16

Merged
merged 9 commits into from
Oct 31, 2023

Conversation

pompon0
Copy link
Contributor

@pompon0 pompon0 commented Oct 27, 2023

We are going to use era-consensus code in zksync-era, which doesn't have panic=abort enabled for the time being. This PR adjust the concurrency framework so that it behaves in the expected way in the presence of panics:

  • scope::run/run_blocking will wait for all tasks to either complete OR panic
  • scope's context gets cancelled immediately if any of the tasks returns an error OR panics
  • scope::run/run_blocking will panic if any of the scope's tasks panicked (but only after all tasks complete or panic). This also means that even if task A returns an error before task B panics, the whole scope will panic anyway (A's error will be ignored).
  • the root task of the scope::run/run_blocking call will NOT be executed in the same tokio task as the run call any more (so that a panic in the root task won't prevent scope call from awaiting for completion of all tasks).

@pompon0 pompon0 requested a review from slowli as a code owner October 27, 2023 11:18
brunoffranca
brunoffranca previously approved these changes Oct 27, 2023
popzxc
popzxc previously approved these changes Oct 27, 2023
node/libs/concurrency/src/scope/mod.rs Outdated Show resolved Hide resolved
node/libs/concurrency/src/scope/mod.rs Outdated Show resolved Hide resolved
async fn replica_state(&self, ctx: &ctx::Ctx) -> StorageResult<Option<ReplicaState>> {
scope::wait_blocking(ctx, || {
async fn replica_state(&self, _ctx: &ctx::Ctx) -> StorageResult<Option<ReplicaState>> {
scope::wait_blocking(|| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've discussed this previously, but it could be worth bringing up another time. If the context is canceled, wouldn't it make sense to immediately return a cancellation error from wait_blocking, even if the provided closure runs to completion (because we cannot stop it)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't changed the semantics - scope waits for all tasks to finish no matter if they ignore the ctx cancellation or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaict, our use cases for this function are about running a blocking function in non-blocking context rather than allowing things to run after context cancellation. In general it is a bug to spawn a blocking context-unaware method from within a scope (it is against structured concurrency principle), so I don't want to facilitate such an antipattern (with the current semantics a context-unaware function should cause a deadlock which will propagate up the stack and should be relatively easy to notice). In case of rocksdb access it is ok-ish, because it is expected to finish in a bounded amount of time (in fact all filesystem accesses in unix don't have a proper notion of cancellability).

@pompon0 pompon0 dismissed stale reviews from brunoffranca and popzxc via 0c92b56 October 30, 2023 15:03
@pompon0 pompon0 merged commit cb34e4c into main Oct 31, 2023
4 checks passed
@pompon0 pompon0 deleted the gprusak-must-complete branch October 31, 2023 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants