fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

RaitoBezarius · 2024-11-20T20:29:25Z

Since systemd/systemd@13b67b6
systemd shutdown the write end of one end of the VSOCK.

cloud-hypervisor virtio-sock code does not handle well partial shutdown of the
stream.

socat -T flag can serve as a workaround to mask this bug.

Total inactivity timeout: when socat is already in the transfer loop and
nothing has happened for [timeval] seconds (no data arrived, no
interrupt occurred...) then it terminates. Useful with protocols like UDP that
cannot transfer EOF.

Co-authored-by: Puck Meerburg [email protected]
Signed-off-by: Raito Bezarius [email protected]

All thanks to Puck for the debugging of this.

Since systemd/systemd@13b67b6 systemd shutdown the write end of one end of the VSOCK. cloud-hypervisor virtio-sock code does not handle well partial shutdown of the stream. socat `-T` flag can serve as a workaround to mask this bug. > Total inactivity timeout: when socat is already in the transfer loop and > nothing has happened for <timeout> [timeval] seconds (no data arrived, no > interrupt occurred...) then it terminates. Useful with protocols like UDP that > cannot transfer EOF. Co-authored-by: Puck Meerburg <[email protected]> Signed-off-by: Raito Bezarius <[email protected]>

astro · 2024-11-20T22:57:54Z

Cool, a solution, albeit it is a hack.

Will the constant timeout (and reconnect) work reliably even under high load?

As the hang occurred before the actual notification is sent, does cloud-hypervisor actually reconnect?

Do you know if there is an upstream cloud-hypervisor issue? I didn't find one.

RaitoBezarius · 2024-11-21T00:05:42Z

Cool, a solution, albeit it is a hack.

I wouldn't take offense if we don't merge it, it's up there also as a documentation for quick workarounds.

Will the constant timeout (and reconnect) work reliably even under high load?

It's inactivity no, so you would be disconnected and done? Not activity. Though, this is a real question.

As the hang occurred before the actual notification is sent, does cloud-hypervisor actually reconnect?

The hang occur when sendto is successfully called and systemd performs a SHUTDOWN_WR right? I can look again at a strace if needed.

Do you know if there is an upstream cloud-hypervisor issue? I didn't find one.

I will file an issue upstream tomorrow-ish, I just didn't have time to write a very trivial reproducer to prove that it's indeed on CH VSOCK code side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

RaitoBezarius commented Nov 20, 2024 •

edited

Loading

astro commented Nov 20, 2024

RaitoBezarius commented Nov 21, 2024

fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

Are you sure you want to change the base?

fix(cloud-hypervisor/vsock): apply a workaround for notify sockets #297

Conversation

RaitoBezarius commented Nov 20, 2024 • edited Loading

astro commented Nov 20, 2024

RaitoBezarius commented Nov 21, 2024

RaitoBezarius commented Nov 20, 2024 •

edited

Loading