Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zenoh::open crashes in single threaded environment #835

Closed
stepkun opened this issue Mar 15, 2024 · 9 comments · Fixed by #844
Closed

zenoh::open crashes in single threaded environment #835

stepkun opened this issue Mar 15, 2024 · 9 comments · Fixed by #844
Labels
bug Something isn't working

Comments

@stepkun
Copy link

stepkun commented Mar 15, 2024

Describe the bug

With current main branch (#c401e3a( zenoh seem not to work anymore in a single threaded tokio environment. It worked with #ba50eec

To reproduce

can be produced with

	#[tokio::test]
	async fn zenoh_create_default() {
		let _zenoh = zenoh::open(config::default()).res_sync();
	}

will panic with something like

---- com::communicator::tests::zenoh_create_default stdout ----
thread '...::tests::zenoh_create_default' panicked at ~/.cargo/git/checkouts/zenoh-cc237f2570fab813/c401e3a/commons/zenoh-runtime/src/lib.rs:113:9:
can call blocking only when running on the multi-threaded runtime
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

System info

Plattform; ubuntu 22.04 on Intel i7 gen 4

@stepkun stepkun added the bug Something isn't working label Mar 15, 2024
@stepkun
Copy link
Author

stepkun commented Mar 15, 2024

Created a few more tests - the first one succeeds the latter two not:

	#[test]
	fn zenoh_create_default_sync() {
		let _zenoh = zenoh::open(config::default()).res_sync();
	}

	#[tokio::test(flavor = "current_thread")]
	async fn zenoh_create_default_sync_in_async() {
		let _zenoh = zenoh::open(config::default()).res_sync();
	}

	#[tokio::test(flavor = "current_thread")]
	async fn zenoh_create_default_async() {
		let _zenoh = zenoh::open(config::default()).res_async().await;
	}

@Mallets
Copy link
Member

Mallets commented Mar 18, 2024

@YuanYuYuan would you mind performing some additional test on this issue?

@YuanYuYuan
Copy link
Contributor

YuanYuYuan commented Mar 18, 2024

I'm still studying a perfect solution to it. Try to summarise what I know so far.

  1. Recap: Currently, we have demands of nested block_on in zenoh whereas it's not allowed in tokio due to the error below.
fn main() {
    let rt1 = Builder::new_multi_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
        rt2.block_on(async {
            println!("Done");
        })
    });
}
Cannot start a runtime from within a runtime. This happens because a function (like `block_on`) attempted to block the current thread while the thread is being used to drive asynchronous tasks.

To work around it, we wrap the block_on with a block_in_place.

fn main() {
    let rt1 = Builder::new_multi_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
        tokio::task::block_in_place(move ||
            rt2.block_on(async {
                println!("Done");
            })
        )
    });
}

This lets us block_on a future on our ZRuntime (dedicated tokio runtimes under the hood). The trick is simply mark the current thread as a non-worker thread and behaves like a thread outside the runtime (Ref). Like the example below.

fn main() {
    let rt1 = Builder::new_multi_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        std::thread::spawn(|| {
            let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
            rt2.block_on(async {
                println!("Done");
            })
        })
        .join()
        .unwrap();
    });
}
  1. Root cause: The macro #[tokio::main(flavor = "current_thread")] and #[tokio::main(flavor = "multi_thread")] can be expanded to
fn main() {
    Builder::new_current_thread().enable_all().build().unwrap().block_on(async { ... });
}

and

fn main() {
    Builder::new_multi_thread().enable_all().build().unwrap().block_on(async { ... });
}

The issue @stepkun reported here is tokio restricts block_in_place in a current thread scheduler (https://docs.rs/tokio/latest/tokio/task/fn.block_in_place.html)

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();    // <-- current thread scheduler
    rt1.block_on(async {
        let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
        tokio::task::block_in_place(move ||                                   // <-- error
            rt2.block_on(async {
                println!("Done");
            })
        )
    });
}
can call blocking only when running on the multi-threaded runtime

Although we can use std::thread::spawn to tackle this,

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        std::thread::spawn(|| {
            let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
            rt2.block_on(async {
                println!("Done");
            })
        })
        .join()
        .unwrap();
    });
}

it's not easy to adopt this solution since std::thread::spawn(f) forces the function f and its return be 'static + Send and would change our codebase a lot. (Using either a lightweighted executor futures-lite, tokio's EneterGuard, or a message passing channel like flume can not solve the problem.)

  1. Workaround: Basically, the issue is caused by the interaction between the current thread scheduler and the block_in_place within zenoh library. The issue disappears once the user don't use it. For instance, the following examples work normally.
#[async_std::main]
async fn main() {
    let rt = Builder::new_multi_thread().enable_all().build().unwrap();
    tokio::task::block_in_place(move ||
        rt.block_on(async {
            println!("Done");
        })
    )
}

fn main() {
    let rt = Builder::new_multi_thread().enable_all().build().unwrap();
    tokio::task::block_in_place(move ||
        rt.block_on(async {
            println!("Done");
        })
    )
}

Furthermore, users still can call zenoh from a single threaded environment

#[tokio::main(flavor = "multi_thread", worker_threads = 1)]                        // Use one single thread to call zenoh (and process maybe other stuffs)
async fn main() {
    let rt = Builder::new_multi_thread()
        .thread_name("ZRuntime")
        .enable_all()
        .build()
        .unwrap();
    tokio::task::block_in_place(move || {
        rt.block_on(async {
            tokio::time::sleep(std::time::Duration::from_secs(10000)).await;
            println!("Done");
        })
    })
}
           PID       User   │ TTY    CPU MEM TCP CPU Time │ Command
                            │        [%] [%]              │
 └┬─────── 1         root   │        0.0 0.1     00:00:04 │ /sbin/init
  └┬────── 6172      user   │        0.0 0.1  [] 00:31:44 │ tmux attach-session
   └┬───── 2420082   user   │ pts/3  0.0 0.0  [] 00:00:05 │ -zsh
    └┬──── 3442553   user   │ pts/3  0.0 0.0  [] 00:00:00 │ target/debug/tokio-dev
     ├──── [3442664] user   │ pts/3  0.0 0.0     00:00:00 │ tokio-runtime-w          // The so-called single threaded environment
     ├──── [3442665] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime                 // ZRuntimes are used internally
     ├──── [3442666] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     ├──── [3442667] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     ├──── [3442668] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     ├──── [3442669] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     ├──── [3442670] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     ├──── [3442671] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime
     └──── [3442672] user   │ pts/3  0.0 0.0     00:00:00 │ ZRuntime

There might be confusion between the single-threaded environment/runtime and tokio's current thread scheduler. Actually we still can use single-threaded runtime with a multi thread scheduler. On the other hand, the current thread scheduler is designed for single thread and any blocking on it should be careful.

As a quick fix to this issue, we may forbid the calling of zenoh from a current thread scheduler. And a long term solution is to remove the unsoundness of nested block_on in our codes.

*Bonus: I found that an additional Handle can solve this problem as well. But as I said above it's error-prone and only Runtime::block_on can drive the IO and timers(https://docs.rs/tokio/latest/tokio/runtime/struct.Handle.html#method.block_on). Plus, this way doesn't work in macro #[tokio::main] .

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.handle().block_on(async {
        let rt2 = Builder::new_multi_thread().enable_all().build().unwrap();
        tokio::task::block_in_place(move ||
            rt2.block_on(async {
                println!("Done");
            })
        )
    });
}

@stepkun, thanks for reporting the issue and any comment is welcome!

@stepkun
Copy link
Author

stepkun commented Mar 18, 2024

I also only found possible solutions using (hidden) handles.
This one:

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.block_on(async move {
        let _r = tokio::task::spawn(async {
            println!("Done");
        }).await;
    });
}

Or this, if a non async closure is needed:

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        let _r = tokio::task::spawn_blocking(move || {
            println!("Done");
        }).await;
    });
}

credits to: Alice Rhyl

@YuanYuYuan
Copy link
Contributor

I also only found possible solutions using (hidden) handles. This one:

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.block_on(async move {
        let _r = tokio::task::spawn(async {
            println!("Done");
        }).await;
    });
}

Or this, if a non async closure is needed:

fn main() {
    let rt1 = Builder::new_current_thread().enable_all().build().unwrap();
    rt1.block_on(async {
        let _r = tokio::task::spawn_blocking(move || {
            println!("Done");
        }).await;
    });
}

credits to: Alice Rhyl

Yes, we have studied Alice's post before. Unfortunately, spawnand spawn_blocking have the same problem as std::thread::spawn that it would impose the condition 'static + Send. 🤷‍♂️

YuanYuYuan added a commit to YuanYuYuan/zenoh that referenced this issue Mar 19, 2024
@stepkun
Copy link
Author

stepkun commented Mar 19, 2024

Yes that is inevitable.
I had to ensure it too within my coding when switching from async-std to tokio.

@YuanYuYuan
Copy link
Contributor

YuanYuYuan commented Mar 19, 2024

BTW, I just found that both #[tokio::main] and tokio::runtime::Runtime::new use multi-threaded runtime by default. However, #[tokio::test] uses the current thread runtime and therefore has this problem. According to tokio's doc, the current thread runtime is designed for specific usage and has some limitation. That's why they suggest using multi-threaded runtime for general purposes.

@YuanYuYuan
Copy link
Contributor

Yes that is inevitable. I had to ensure it too within my coding when switching from async-std to tokio.

We do have the same experience a few months ago. Feel free to contact us if you have any issues. You can either open an issue or leave a message in Discord. 😃

According to Alice's words, async-std tends to process this problematic stuff silently. It may be a good procedure to inform users of the risk beforehand.

@stepkun
Copy link
Author

stepkun commented Mar 19, 2024

Thank you.
I totally agree with Alice's words. With async-std I had to serialize a bunch of tests because they sometimes went into a neverending state.
With tokio this is no longer necessary.

Mallets pushed a commit that referenced this issue Mar 19, 2024
* Fix issue #835

* Bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants