Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starving timeout with async_io on closed connections #6971

Open
flumm opened this issue Nov 13, 2024 · 7 comments
Open

starving timeout with async_io on closed connections #6971

flumm opened this issue Nov 13, 2024 · 7 comments
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-io Module: tokio/io

Comments

@flumm
Copy link

flumm commented Nov 13, 2024

Version

tokio v1.41.1
tokio-macros v2.4.0

Platform
Linux hostname 6.11.0-1-pve #1 SMP PREEMPT_DYNAMIC PMX 6.11.0-1 (2024-10-23T15:32Z) x86_64 GNU/Linux

Description
Is it expected behaviour that i can easily prevent a timeout from happening with async_io on closed connections?

I tried this code:

use std::{io, time::Duration};

use anyhow::Error;
use tokio::io::Interest;

#[tokio::main]
async fn main() -> Result<(), Error> {
    let server = tokio::net::TcpListener::bind("127.0.0.1:65000").await?;

    let (incoming, _socket) = server.accept().await?;
    eprintln!("connection established");

    let future = incoming.async_io(Interest::READABLE, || {
        Err::<(), io::Error>(io::ErrorKind::WouldBlock.into())
    });

    let _ = tokio::time::timeout(Duration::from_secs(5), future).await;

    eprintln!("timeout reached");

    Ok(())
}

(In our real code we call peek on the underlying socket to check if there is enough data available, and if not, we return WouldBlock)

After that, i connect to the server with nc localhost 65000

If i cancel the nc connection (with CTRL+C) before the 5 seconds, the timeout is never reached and it loops forever in the async_io callback
If i don't close the connection (regardless if i send data or not), the timeout is reached normally.

I would have expected the timeout to trigger in both cases, but it does not.

@flumm flumm added A-tokio Area: The main tokio crate C-bug Category: This is a bug. labels Nov 13, 2024
@Darksonn Darksonn added the M-io Module: tokio/io label Nov 13, 2024
@Darksonn
Copy link
Contributor

Please share the actual code you are using. Your example async_io violates the API contract of async_io by returning WouldBlock without having received that error from the OS in a syscall on the socket.

@flumm
Copy link
Author

flumm commented Nov 13, 2024

the original code in the async_io callback is this:

let mut buf = [0; HANDSHAKE_BYTES_LEN];

let raw_fd = incoming_stream.as_raw_fd();
let std_stream =
unsafe { ManuallyDrop::new(std::net::TcpStream::from_raw_fd(raw_fd)) };

let peek_res = std_stream.peek(&mut buf);

match peek_res {
    Ok(peek_len) if peek_len < HANDSHAKE_BYTES_LEN => {
        Err(io::ErrorKind::WouldBlock.into())
    }
    res => res.map(|_| contains_tls_handshake_fragment(&buf)),
}

(see the original code: https://git.proxmox.com/?p=proxmox.git;a=blob;f=proxmox-rest-server/src/connection.rs;h=3815a8f40c082410c8e0091f4a57541f616003d2;hb=refs/heads/master#l473 )

the reason we do this is because we want to look at the data sent from the client and decide if it's a tls handshake, so we either continue with an encrypted connection or handle an unencrypted HTTP request

while i understand that this violates the contract of async_io it's still unexpected that this messes with the timeout in a way that this never finishes, since the contract only talks about the socket in this case, not about how the future is polled/executed...
(it says:

[...] which can cause the socket to behave incorrectly.

)

@Darksonn
Copy link
Contributor

Yeah, that definitely violates the API contract. Unfortunately the OS provides no way to do what you want, so you're out of luck. It's not a Tokio limitation.

As for the issue itself, I guess the request is to fail more gracefully when the API contract is violated?

@ThomasLamprecht
Copy link

ThomasLamprecht commented Nov 13, 2024

One question is why the timeout future is not being scheduled anymore when running into this.

I get that we're holding this wrong and FWIW, we have a stop-gap for this specific case where we check if the peek_len from the previous call is different than the current one, when they stay the same we can assume that the socket became ready without new data being available and thus abort the waiting. But yes, that probably cannot work for the generic case.

But that we have to do that to avoid spinning completely is not what is unexpected to me, rather, for the current code I'd have expected that if the client disconnects the tokio timeout will still get hit after 10s, leading "only" to at max 10s of a spinning thread, i.e. due to us holding it wrong.
I.e., why does the timeout future make no progress anymore, and why isn't it taken over by another thread if this one is blocked?

@Darksonn
Copy link
Contributor

The problem is that async_io ends up blocking the thread.

@flumm
Copy link
Author

flumm commented Nov 14, 2024

The problem is that async_io ends up blocking the thread.

ok while i get that, two things are still surprising here:

  1. async_io returns a future that is seemingly easy to make blocking code, IMHO a future generated by a library should not indefinitely block, even when "holding wrong", or at least should mention so in the docs
  2. having a timeout future that is not scheduled even when there are multiple idle threads available
    (i also tried a "manual" timeout future with sleep + select!, but that didn't work too)

the questions are: should async_io "yield" (or the rust equivalent) between async_io callback calls ?
and should tokio schedule the timeout future more intelligently?

@Darksonn
Copy link
Contributor

The timeout future can't be scheduled for execution because it's already busy running. It's stuck inside this call to poll, which never returns as the future is blocking the thread.

It is reasonable to make async_io behave less badly when you hold it wrong, though we shouldn't make correct usage worse by yielding unnecessarily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-io Module: tokio/io
Projects
None yet
Development

No branches or pull requests

3 participants