Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gracefully shutting down nurseries on KeyboardInterrupt #143

Closed
merrellb opened this issue Apr 28, 2017 · 3 comments
Closed

Gracefully shutting down nurseries on KeyboardInterrupt #143

merrellb opened this issue Apr 28, 2017 · 3 comments

Comments

@merrellb
Copy link

merrellb commented Apr 28, 2017

I am experimenting with nested nurseries to handle multiple tasks associated with each incoming connection. I would like to send a final message on each connection as the nursery exits. I've run some tests and noticed a few things that seem to make this difficult.

  1. KeyboardInterrupt is only raised on the outer Nursery (at which point the nested nurseries, tasks, and incoming sockets have been dismantled)
  2. KeyboardInterrupt triggers a trio.Cancelled in tasks (which can be caught) but not in children nursery scopes. If we push enough state down into the tasks this might be workable, but it would be easier to handle at the parent/nursery level.
  3. finally seems the only way to trigger code as we leave a child nursery but as soon as we await, we exit (presumably because the scope has been cancelled)

What is the best way to trigger "awaitable" code (eg send one last message) as we exit a child nursery from a KeyboardInterrupt?

The following code illustrates the issue (without using any networking :-). I would like to trigger my shutdown code in the "inner" scope:

import trio

async def print_message(msg):
	try:
		while True:
			print(msg)
			await trio.sleep(1)
	except trio.Cancelled:
		print("Cancelled Print Message")
		await trio.sleep(1)
		print("Cancelled Blocking Print Message")
	finally:
		print("Finally Print Message")
		await trio.sleep(1)
		print("Finally Blocking Print Message")

async def print_spawner(nursery):
	count = 0
	while True:
		nursery.spawn(print_message, "msg {}".format(count))
		count += 1
		await trio.sleep(1)

async def child():
	try:
		async with trio.open_nursery() as inner_nursery:
			inner_nursery.spawn(print_spawner, inner_nursery)
	except KeyboardInterrupt:
		print("Inner Keyboard")
	except trio.Cancelled:
		print("Inner Cancelled")
		await trio.sleep(1)
		print("Inner Blocking Cancelled")
	finally:
		print("Finally Inner")
		await trio.sleep(1)
		print("Finally Blocking Inner")
	print("Inner Nursery Finished")


async def parent():
	try:
		async with trio.open_nursery() as outer_nursery:
			outer_nursery.spawn(child)
	except KeyboardInterrupt:
		print("Outer Keyboard")
	except trio.Cancelled:
		print("Outer Cancelled")
		await trio.sleep(1)
		print("Outer Blocking Cancelled")
	finally:
		print("Finally Outer")
		await trio.sleep(1)
		print("Finally Blocking Outer")
	print("Outer Nursery Finished")

trio.run(parent)
msg 0
msg 0
msg 1
^CCancelled Print Message
Cancelled Print Message
Finally Print Message
Finally Print Message
Finally Inner
Outer Keyboard
Finally Outer
Finally Blocking Outer
Outer Nursery Finished
@njsmith
Copy link
Member

njsmith commented May 1, 2017

KeyboardInterrupt is only raised on the outer Nursery (at which point the nested nurseries, tasks, and incoming sockets have been dismantled)

This is partly a matter of luck – KeyboardInterrupt can get raised absolutely anywhere. But it does tend to end up in the outermost task because that's who gets it if the KeyboardInterrupt arrives at a moment when no tasks are running. You can think of this as, it's getting raised right at the end of the outermost nursery block, and then triggering the regular nursery cleanup logic. Which, of course, starts by cancelling that outermost nursery's cancel scope.

KeyboardInterrupt triggers a trio.Cancelled in tasks (which can be caught) but not in children nursery scopes. If we push enough state down into the tasks this might be workable, but it would be easier to handle at the parent/nursery level.

What do you mean that children nursery scopes don't get a Cancelled exception? Once the outermost nursery has its cancel scope cancelled, then everything (children, grandchildren, great-grandchildren, ...) below it should get a Cancelled exception the next time it executes a checkpoint.

Oh, actually, I have a guess – there are several grandchildren that get Cancelled exceptions, and those propagate into their parent, one of the first-level children. And since it's supervising multiple grandchildren and they all crashed, it bundles their exceptions up into a MultiError. So if you do except Cancelled:, that won't catch a MultiError representing multiple Cancelleds. A simple check would be to catch BaseException as exc and print(repr(exc)). Unfortunately MultiErrors are not super easy to work with because they don't fit very neatly into Python's idea of how exceptions work, but there are some tools to help, in particular with MultiError.catch(filter_callback): ..., where filter_callback gets called with all the individual exceptions and can let them continue to propagate, catch them, or replace them with a different exception.

finally seems the only way to trigger code as we leave a child nursery but as soon as we await, we exit (presumably because the scope has been cancelled)

As noted above, catching MultiError will also work, but if you want to run something every time you exit then finally is a better choice, or if you want to run something on all exceptional exits then an except: ... raise block is totally legit. But yes, once a scope has been cancelled, all blocking operations are disabled. See this section, in particular the discussion of "level triggered" cancellation.

What is the best way to trigger "awaitable" code (eg send one last message) as we exit a child nursery from a KeyboardInterrupt?

So the fundamental challenge here is that there isn't really any way to know "why" you were cancelled – maybe the peer has totally disappear and a timeout finally expired, maybe there was a KeyboardInterrupt and the program is shutting down, ... and this is intentional, because it turns out that once you allow multiple "types" of cancellation it makes life really messy really fast. What if a cancel scope had its timeout expire and someone called .cancel() on it explicitly, how should that be signaled, that kind of thing.

One option is discussed in that doc section I linked above: while a scope that's been cancelled prevents all blocking operations inside it by default, you can disable this by wrapping your "say goodbye" code in a new cancel scope with its shield attribute set to True. I strongly recommend that if you do this, you also set a sensible timeout on the shield cancel scope, because if the remote host doesn't respond or whatever then your goodbye code will get forever and there's nothing anything outside the shield can do to stop it. But for your use case a timeout of like 0.1 seconds might be plenty, if it's just a best effort attempt to stick a few bytes on the wire before exiting. Of course, this will then run no matter why your code was cancelled.

Alternatively, if you really want control-C specifically to be treated differently than other kinds of shutdown/crash/cancellation events, you absolutely can set up a custom handler for it that does whatever clever thing you want. For example, you could set a global flag exiting_due_to_KeyboardInterrupt = True and then raise KeyboardInterrupt, and then your individual tasks could consult that flag. There's an example here. The downside is that this kind of custom handler doesn't work if your code gets stuck in an infinite loop and isn't letting trio schedule tasks, but life always has trade-offs... I guess it would also be possible to make a custom signal handler that sets that flag and then invokes trio's normal control-C logic. There's some discussion of this in #134, but if you want to hack it you could do something like:

trio_SIGINT_handler = signal.getsignal(signal.SIGINT)
def my_SIGINT_handler(signo, frame):
    # set our flag:
    global got_KeyboardInterrupt
    got_KeyboardInterrupt = True
    # then run trio's normal handler:
    trio_SIGINT_handler(signo, frame)
signal.signal(signal.SIGINT, my_SIGINT_handler)

Does any of that help? It's kind of hard to give specific advice without knowing what you're ultimately trying to do :-)

@merrellb
Copy link
Author

merrellb commented May 1, 2017

Thanks for the detailed response. My ultimate use case isn't all that mysterious (#124 :-) Basically I want to gracefully send the close message to each WebSocket connection when shutting down the server.

As an async/trio newbie, I am still struggling to wrap my head around the idea that the child nurseries aren't nested in the typical stack sense, ready to catch any exception before a parent nursery could possibly see it (eg if all of my leaf nurseries wrap KeyboardInterrupt at least one of them should catch it). However, it does make sense that while idle it wouldn't be clear where, other than the outer nursery, to raise the Exception (and regardless it seems a bad idea to depend on the vagaries of what happens to be executing when one hits Control-C)

It was the MultiError that was keeping me from catching the nurseryCancelled although I think finally is what I really need to gracefully close connections/nurseries.

In finally, open_cancel_scope with shield=True seems to do the trick. However, given the importance of timeouts, I am a bit surprised to see none of the convenience functions (eg move_on_after) accept the shield argument (feature request? :-)

As always, thanks for your patience as I try to figure this stuff out.

@merrellb
Copy link
Author

merrellb commented May 2, 2017

I've created a new issue for adding shield to the convenience functions (#147) and will close this issue as there doesn't seem to be much else to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants