-
-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: what's the best strategy for "nowait" and "select" operations? #242
Comments
For reference, here's a minimal |
Maybe the key idea I needed to make the code in the previous comment work is: you need to make the actual operation happen synchronously with the reschedule, because:
We've already moved in this direction for OTOH, there are also operations where you simply can't perform the operation synchronously at reschedule time: Unix socket operations are like this, because it's possible for the Are there other operations like this? IOCP is a whole other issue, but with IOCP it's literally impossible to get exactly-one-of-these semantics no matter what you do, so I guess we can write that off. I don't know that anyone actually cares about the ability to do exactly-one-of-these on sockets anyway. It's stuff like |
@jchayat in chat described a use case they have for More discussion: https://gitter.im/python-trio/general?at=59b07c66b16f26464212fe0c It sounds like at least in this case, a multi-task strategy is a viable alternative, though @jchayat still feels:
|
There are some fun error cases to handle. If some wait setup call errors, then you have to go and abort all the previous wait setup calls to unwind One way a wait setup call can error is if the same task is trying to wait twice on the same event, e.g. trying to simultaneously put two items into the same queue. Or how should this be handled? The annoying thing is that if we just iterate through the |
Oh, another fun case: if we want to support this for |
Some interesting arguments against |
Thanks for sharing those links, @njsmith. They have been quite revealing to me. The most insightful was indeed the Select statement considered harmful and the "Two Big Use Cases" idea. So simple, but so "demolishing" at the same time. Of course, they are an attempt to over-simplify all concurrency problems under some general patterns... but that's exactly where their power resides! |
Here's another place where a I guess it'd actually be possible to have a "no wait" mode, that could handle this in a generic way without going full concurrent-ML: basically it would be like cancellation, except that calls to |
In #586 (the PR for adding channels), we've run into an interesting problem that's actually closely related to this: See: #586 (comment) |
It looks like the |
I've been working on incorporating some of the CML ideas into Trio, representing operations as (synchronous) generator functions that yield twice. (Ten-second overview: the part before the first yield is "attempt", between the first and second yields is "publish", and after the second yield is "unpublish". At the first yield the operation is sent a handle, which it arranges for another task to call So far I've reimplemented I'm pretty pleased with the ergonomics of this approach so far in terms of making it easy to write low-level operations that are robust:
It's like a friendlier, more composable, less flexible (because you can't refuse a cancellation) I'm still polishing it, but I'm wondering if this is something that Trio might be interested in having in the core, or whether I should target an external library instead? Having it in the core seems like a substantial force multiplier -- if it's in a library, you can only compose operations in that library or its dependencies, which probably winds up resulting in that library providing its own synchronization primitives that mostly reimplement Trio's core ones. On the other hand, it's a nontrivial chunk of functionality ( |
Some examples: ParkingLot.park():
Event.wait(): simple delegation
Condition.wait(): delegation with extra publish step and async cleanup
Memory channel send/receive:
|
Oh cool!
I think this is too complicated a decision to make quickly :-). But having actual code will tell us a lot! Some things I want to understand better before forming any conclusions:
...Looking at that list, some of those feel like they're going to need to cook for a while before we can draw conclusions. So maybe it makes most sense to put it in a library for now? It's true that if it does turn out to be a big win then having it in the core will act as a force multiplier, but we can probably learn things without that? |
Cool, glad this seems interesting! I'm going to keep prototyping it as a PR to Trio because that gives a nice collection of ready-made examples (and tests!) of how it might simplify low-level operations, but I won't be sad if you want to hold off on reviewing/merging for a while/ever. :-)
If you want to select between
Of course (Should these be in hazmat? Right now I have them in hazmat, because writing combinations of unevaluated operations is a little bit of a departure from trio's traditional worldview. So far
This is probably one of those questions that requires some time to marinate :-) but my intuition here is that Operations are also useful even without select, because of how the composition mechanisms let you run additional "unpublish" code synchronously with the reschedule/abort.
Definitely a major open question! I'll see what I can come up with :-)
Each operation (including each individual branch of a select()) gets its own completion handle, which can go in the wait queue of your choice. When one of them gets called, the others synchronously remove themselves. I don't understand some of the WFQ discussions well enough to know whether this system would compromise our ability to move in that direction, but it plays totally fine with the current strict-FIFO fairness.
They both get published (in left-to-right order). Everything is indexed by the completion handle, not the task, and each branch of the select() gets a different completion handle. Whichever one gets completed first will synchronously unpublish the other one. So it's kind of a silly thing to do, the second copy of the operation doesn't add anything, but it's not going to totally blow up. (All the things I'm calling "operation" are wrappers around the generator function plus its arguments, so there's no issue with calling the same one multiple times. This is needed to support retrying, to preserve BSD socket semantics. Internally there are also "opiters" i.e. generator iterators of operation functions, but those aren't exposed publicly.) |
Code is on the |
The I was looking at this article again, and its running example for a novel selectable operation is I think a closely related issue is how in classic concurrent ML, when you write a # concurrent ML style
await select({
send_channel.send.operation("foo"): lambda _: print("sent"),
receive_channel.receive.operation(): lambda value: print(f"got: {value}"),
})
# prototype style
branch, value = await select_with_tag({
send_channel.send.operation("foo"): "sent",
receive_channel.receive.operation(): "received",
})
if branch == "sent":
print("sent")
elif branch == "received":
print(f"got: {value}") But, they're different in a crucial case: in the first one, if you make it The select_builder = SelectBuilder()
@select_builder(send_channel.send.operation("foo"))
def handler(_):
print("sent")
@select_builder(receive_channel.receive)
def handler(value):
print(f"got {value}")
new_op = select_builder.finish() ...oh, I see, but then this is pretty awkward when you want to actually wrap this up into a new public API like your another random thought: await select(
partial(send_channel.send, "foo"), # or trio.withargs or something, maybe
receive_channel.receive,
) |
It always runs if control flow reaches the The actual async cleanup functions run, shielded, in the woken task after it gets rescheduled. If there are multiple, the current implementation runs them in parallel in a nursery; maybe this should be one-at-a-time in the same order as the sync cleanups, but then it gets confusing that we run all the sync cleanups before any of the async ones. I added async cleanup mostly to support I don't think async cleanup is too much of an attractive nuisance given that it's kind of awkward to use it (you have to define a separate async function, you can't just write
Yep, the two cases (running as part of the operation vs running afterward) definitely carry different semantics and supporting both of them makes sense. One especially notable difference in my implementation is:
My first draft did some magic so all of
would work. I deemed that too magical and switched to the current version where the way you're planning on using the function gets written before the parens. It could change back though. One wrinkle is: how do you turn Currently I have the post-decorator I guess we could combine these tricks, actually: if you're expecting an operation, and you get a callable instead, try calling it with no arguments; if you get a coroutine object for
|
Demand for However, there are a couple of pieces of code I'd like to translate from Go to Python (because, you know, Python ;-) ) and having |
The blog post in the OP also mentions Reagents as a generalization of CML. From the comments:
Basically, using this comment's notation, THEN performs op1 and op2 sequentially, passing the output of op1 as the input of op2, while AND performs them in parallel, returning the output of both op1 and op2 in a tuple (which basically corresponds to join patterns). The key is that all these forms of composition can be combined arbitrarily, so you can AND things together and then use the resulting operation as a single unit in a select, or whatever. (More links, in the context of OCaml this time.) Reagents was originally conceived as a low-level library for lock-free programming:
But I don't know whether that is somehow intrinsic to the whole concept, or if the basic programming model is separable from that. (I'd guess that it would be, but it's just a guess.) Anyway, I don't know how relevant any of this might be to Trio :), just figured it might be worth a mention. @oremanj mentioned that select can be emulated to some extent using structured concurrency and cancellation, so it makes me wonder if there are any other connections. |
Yeah,
Well,
Heh, I just stumbled on those through some completely different path yesterday... I actually have Aaron Turon's thesis open, but I haven't wrapped my head around these yet :-). However, based on first principles, I feel like there must be some pretty substantial limitations. In Trio, you can sorta fake But it also majorly limits what kinds of operations you can In general, OTOH, that might still be powerful enough to implement all the operations that we actually want to support |
This article seems relevant here: https://medium.com/@elizarov/deadlocks-in-non-hierarchical-csp-e5910d137cc |
In #1411 I was asked to present my uses cases for
|
So from @wingo's blog I have learned about Concurrent ML. (His concurrency tag is also worth perusing generally.)
The core insight, IIUC, is to take the atomic blocking operations – roughly, the same ones our current cancellation guarantee applies to – and systematically split them into a few consistent sub-operations:
We already implicitly have this basic structure open-coded in a bunch of places, e.g. the
_try_sync
helper intrio/socket.py
, the classes intrio/_sync.py
, etc. Pretty much anywhere you see theyield_if_cancelled
/yield_briefly_no_cancel
pair in trio fits this general pattern, and "unpublish" is basically just our abort callback. So the advantages of reifying this would partly be just to simplify the code by having a single implementation of the overall pattern that we could slot things into – but even more, so because given the above pieces, you can create generic implementations of three variants:*_nowait
variants that we currently implement in an ad hoc way)select
, where you can say "perform exactly one of the following operations: ..."(The first is done by just calling the "try this" sub-operation; the second you do by trying and then blocking if you fail, in a loop, with the unpublish operation as the abort callback; the third is done by calling a bunch of "try this" sub-operations and if one succeeds you're done, otherwise publish all the operations and go to sleep. There's some subtleties around knowing which operation woke you up, and when unpublish happens, etc., but that's the basic idea.)
Right now we have a bunch of manual implementations of
await x()
/x_nowait()
primitives. It's not clear that we have enough, either; #14 has an example of a case where you semantically needaccept_nowait
, and for HTTP client connection pooling when you pull a connection out of the pool you need something likereceive_some_nowait
to check whether the server has closed it before you try to use it.Also, a golang-style
select
is potentially quite useful but isn't possible right now (at least for trio's built-in operations, of course you certainly could build a golang-style channels library on top, but thenselect
would only work on that library's operations). You can spawn some child tasks to try doing all the things concurrently, but there's no way to make sure that no more than one complete – for that you would need to be able to (1) guarantee that all of the operations really are cleanly cancellable, and (2) perform the cancellation synchronously with the first operation completing, which isn't possible if the last thing it does after committing its work is to callyield_briefly_no_cancel
.An ancillary benefit is that if we expose these things as a standard interface to users, then this would also serve as very clear documentation of which the actual atomic cancellable operations are.
But, there are also some issues, I think:
IOCP: the above pattern works for BSD-style non-blocking socket operations, but not for Windows IOCP operations (#52). You can implement cancellable blocking operations as IOCP calls (that's basically what they are), and nowait operations using Windows' non-blocking calls, but IOCP has no generic equivalent to epoll to implement wake-me-when-this-non-blocking-call-might-succed, which means that golang-
select
is not possible. All you can do is ask the kernel to start initiating all the operations, and then by the time you find out that, say, yourrecv
has finished, then yoursend
might also have finished. I guess this might be possible to work around: forsend
I'm pretty sure we can treat IOCP as a kind of epoll, and then use non-blocking send. Forrecv
I'm not sure if this works and foraccept
I'm pretty sure it doesn't, but for these operations you can more-or-less fake them as being always cancellable by using a little user-space buffer to push back any result that you want to pretend didn't happen. Are there any other cases? Do we need to change the pattern to accomodate this? The pushback trick doesn't seem compatible with a strict separation between "try it" and "wake me when you want to try again" primitives.The main problem with HTTP connection pooling isn't checking if the socket is readable – we can already do that in the specific case of a raw socket. It's that it isn't something you can reasonably abstract in terms of generic "streams". In particular, if you have an TLS-wrapped socket, then you actually don't want to check if the TLS layer has data available, you really do want to check the raw socket underneath. And in any case I'm not sure that this would help make it easier to implement
receive_some_nowait
as a generic feature on streams, because receiving on a TLS stream is a very complex operation that may require things like lock acquisition, and all of the operations above have to be synchronous. So maybe the HTTP connection pool case isn't really a good motivator anyway.Lock.acquire
andLock.acquire_nowait
are tricky because of fairness (#54); it's not the case thatacquire
is just a loop likewhile acquire_nowait fails: sleep until it might succeed
, because that creates a race on wakeup. I don't think it's possible to implement a fair mutex using just the framework described above. The problem is that we really need the two operations to be "try it" and "block until the operation is done"; a retry loop just doesn't work. So maybe this is actually equivalent to the IOCP case? Maybe we need primitives:I think this is flexible enough to implement fair synchronization primitives and handle all the public operations. E.g. for golang-
select
we would want to arrange so that when one operation gets rescheduled, then we immediately abort all the other operations, before waiting to actually be woken up – this would need to happen from the context of the task that's handing off the mutex (for example).But... this isn't quite right for stuff like non-blocking socket operations, where you actually need a try-sleep-retry-sleep-retry-... loop. Need to think some more about this.
The text was updated successfully, but these errors were encountered: