Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RaitoBezarius
Copy link

@RaitoBezarius RaitoBezarius commented Nov 20, 2024

Since systemd/systemd@13b67b6
systemd shutdown the write end of one end of the VSOCK.

cloud-hypervisor virtio-sock code does not handle well partial shutdown of the
stream.

socat -T flag can serve as a workaround to mask this bug.

Total inactivity timeout: when socat is already in the transfer loop and
nothing has happened for [timeval] seconds (no data arrived, no
interrupt occurred...) then it terminates. Useful with protocols like UDP that
cannot transfer EOF.

Co-authored-by: Puck Meerburg [email protected]
Signed-off-by: Raito Bezarius [email protected]

All thanks to Puck for the debugging of this.

Since systemd/systemd@13b67b6
systemd shutdown the write end of one end of the VSOCK.

cloud-hypervisor virtio-sock code does not handle well partial shutdown of the
stream.

socat `-T` flag can serve as a workaround to mask this bug.

> Total inactivity timeout: when socat is already in the transfer loop and
> nothing has happened for <timeout> [timeval] seconds (no data arrived, no
> interrupt occurred...) then it terminates. Useful with protocols like UDP that
> cannot transfer EOF.

Co-authored-by: Puck Meerburg <[email protected]>
Signed-off-by: Raito Bezarius <[email protected]>
@astro
Copy link
Owner

astro commented Nov 20, 2024

Cool, a solution, albeit it is a hack.

Will the constant timeout (and reconnect) work reliably even under high load?

As the hang occurred before the actual notification is sent, does cloud-hypervisor actually reconnect?

Do you know if there is an upstream cloud-hypervisor issue? I didn't find one.

@RaitoBezarius
Copy link
Author

Cool, a solution, albeit it is a hack.

I wouldn't take offense if we don't merge it, it's up there also as a documentation for quick workarounds.

Will the constant timeout (and reconnect) work reliably even under high load?

It's inactivity no, so you would be disconnected and done? Not activity. Though, this is a real question.

As the hang occurred before the actual notification is sent, does cloud-hypervisor actually reconnect?

The hang occur when sendto is successfully called and systemd performs a SHUTDOWN_WR right? I can look again at a strace if needed.

Do you know if there is an upstream cloud-hypervisor issue? I didn't find one.

I will file an issue upstream tomorrow-ish, I just didn't have time to write a very trivial reproducer to prove that it's indeed on CH VSOCK code side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants