Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not fill terminal with noise when a device is surprise disconnected #452

Merged
merged 4 commits into from
Oct 21, 2023

Conversation

whitequark
Copy link
Member

@whitequark whitequark commented Oct 21, 2023

"Surprise disconnected" means disconnecting a device while it is being used by an applet, script, or REPL; whether there are ongoing transfers or not.

After this commit, in general, only one message should be printed:

E: g.cli: device disconnected

However, applet logic will often not handle cancellation or USB transfer errors, so certain individual applets may still print more noise.

This has been one of the more irritating papercuts, especially as all that noise will make recovering from an accidentally unplugged device all that more annoying, and it's confusing to people who have not seen the error before too.

See individual commit messages for details of implementation, which is nightmarish.

Fixes #386.

Before this commit, a queue with only finished tasks was considered
empty. This is problematic because the finished tasks still carry their
result value and, more visibly, exceptions; if not retrieved, asyncio
will complain about tasks with exceptions on interpreter shutdown.

After this commit, any tasks in a queue make it non-empty. (That is,
casting to bool will make the queue truthful, and the length includes
every task.) The only user of these API calls is the demultiplexer,
and it only uses them in one call site, which is improved by the change.
Before this commit, cancelling a task queue would cancel all the tasks
and, once they finish, clear them from the queue, ignoring their result
or exception. This isn't correct since it ignores errors that could have
been raised before or during cancellation, and it also causes asyncio to
print a *lot* of backtraces (one per task) on interpreter shutdown.

After this commit, all tasks are awaited, and then the first exception
that isn't asyncio.CancelledError is re-raised. As a result, unplugging
a device while it's open results in a single error message...

    T: g.applet.radio.sx1276: FIFO: cancelling operations
    E: g.cli: device lost

... rather than sixteen backtraces.

However, more work is necessary to handle surprise disconnection while
there are ongoing transfers.
This commit adds handling of the LIBUSB_ERROR_NO_DEVICE error during
transfer submission. This error was already handled during transfer
completion, but since surprise disconnection is (by definition)
asynchronous, this it can happen right before submission too.

The message is also changed from ambiguous "device lost" (lost where?)
to unambiguous "device disconnected".

The code in _do_transfer with the exception handler nested three levels
deep is some of the most cursed and convoluted logic I have ever written
but as far as I can tell it is reasonably appropriate to the task. Sigh.

This is still not quite enough to reduce the errors printed for
a surprise disconnection to a single line, but we're getting there.
Otherwise asyncio will print a very long backtrace corresponding to
the applet task (without any useful information) if, most commonly,
the device was surprise unplugged and all of the other tasks that need
USB communication fail.

After this commit, finally, this works as it should:

    $ glasgow run benchmark -c 1000000000
    I: g.device.hardware: generating bitstream ID 6b120f4f94cf4ccd95c51417abc09069
    I: g.cli: running handler for applet 'benchmark'
    I: g.applet.internal.benchmark: running benchmark mode source for 953.674 MiB
    E: g.cli: device disconnected
Copy link
Member

@attie attie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed / tested with ps2-host - this is great, thank you!

As mentioned in #386, uart still blocks, but is at least much quieter about the situation.

@attie
Copy link
Member

attie commented Oct 21, 2023

Oh, hang on...

After this commit, in general, only one message should be printed:

E: g.cli: device disconnected

I see:

$ glasgow -vv run ps2-host -V3.3
[...]
T: g.applet.interface.ps2_host: deasserting reset
T: g.device.hardware: register 0 write: 0x00
T: g.device.hardware: USB: CONTROL OUT type=0x40 request=0x13 value=0x0000 index=0x0000 data=<00> (submit)
T: g.device.hardware: USB: CONTROL OUT (completed)
T: g.applet.interface.ps2_host: FIFO: write <7f>
T: g.applet.interface.ps2_host: FIFO: flush
T: g.applet.interface.ps2_host: FIFO: need 1 bytes
T: g.device.hardware: USB: BULK EP2 OUT data=<7f> (submit)
T: g.device.hardware: USB: BULK EP2 OUT (completed)
T: g.applet.interface.ps2_host: FIFO: flush
T: g.applet.interface.ps2_host: FIFO: wait for flush
T: g.applet.interface.ps2_host: FIFO: cancelling operations
E: g.cli: transfer error: LIBUSB_TRANSFER_ERROR

@whitequark
Copy link
Member Author

I see:

This is also fine--the transfer error: message is printed by an adjacent code path. In general we make no guarantees about exactly what happens when an error is asynchronously thrown like this other than that it's one something.

@whitequark
Copy link
Member Author

The difference is basically that in one case a planned transfer was never scheduled because the device was gone, and in another a scheduled transfer failed because the device went poof while it was in progress. Not much more I can do.

@attie
Copy link
Member

attie commented Oct 21, 2023

Not much more I can do.

I was just digging in to it too... it's not as shiny as a "device disconnected", but I agree it's the best we can do.

@attie attie added this pull request to the merge queue Oct 21, 2023
Merged via the queue into GlasgowEmbedded:main with commit 21c4f66 Oct 21, 2023
10 of 12 checks passed
@whitequark whitequark deleted the surprise-disconnection branch July 21, 2024 20:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Glasgow does not gracefully shutdown if the device is unplugged...
2 participants