-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC]: Lazy state changes. #228
Comments
I should note that this isn't quite as trivial as I made it sound above. In particular, it is possible for multiple events to pop for a single frame. I think the only situation in which this can happen is when receiving a frame that has the END_STREAM flag set, when we will emit {Request,Response,InformationalResponse,Trailers,Data}Received and then immediately StreamEnded. I think this is ok: |
Quick reaction: I'm worried that this would be harder to use. What would
actually do? What would iterating g2 first do? What would iterating them in an interleaved fashion do? I think its much much easier to reason about the state machine if at any point in time its safe to call any method on it: having it defer actions it will take until some other object has been used means it is no longer safe to calls methods on it. One way we could make it work is to just have a single infinite iterator of events from the connection, with one event it can return /raise being NoMoreEvents (or we could have a generator that we raise StopException from multiple times - but I think that that is going to be rather confusing to people and code :).
Taking two references to pending_events would be fine - they'd be the same iterator. |
Oh, another consideration: do we need to worry about folk doing something we will know is impossible once we process more of the already received data? I guess in principle a slow TCP link could introduce any possible race we might have, but I figure its worth raising as a consideration. |
From IRC, without prejuidice :) Two aspects: secondly, the result of an advance can't be a single event because of the nuance @Lukasa noted, so what about making the result be a frozenset of events - when two events, like datareceived and streamend happen in the same state transition, we expose them both, and we make sure the fact they are not ordered, because they are simultaneous, is clear to users. |
@rbtcollins: oh hello, fancy meeting you here :-) @Lukasa et la: just from an API evolution perspective, another natural option would be to add two new functions, like (Ideally one would come up with a better name than |
Also, fwiw - h11 handles events that happen "at the same time" by serializing them: e.g. it's very common that the same byte is both the end of the content and also indicates end-of-message, and h11 treats this as a final byte of data followed by a separate zero-byte end of message, updating its state machine in sequence. (In fact, this exact case is the one that triggered my switching h11 to be lazy -- a bug in a response handler caused it to fail before consuming a final zero width EndOfMessage event, and then things got out of sync.) This may or may not make sense for h2 if your state machine advances frame-by-frame instead of event-by-event, but semantically it's quite clean. |
Yeah, so we could do that, but it's really a bit inaccurate. For example, if a DATA frame carrying the END_STREAM flag is received, at that point there are two events and two state transitions, but both happen basically sequentially but at once. However, I think the END_STREAM flag in this instance mostly restricts what the remote peer can do. That's why I think in practice this concern is relatively minimal. With all of that said, I think we could consider a broader refactor to get more in line with h11's view of the world. Right now when a frame is received the processing fires both events at once with no room to get in the way of it, but we could play some games with the state machine to ensure that the END_STREAM flag is processed separately. So I see the following options:
|
This is being postponed from 3.0.0: it just hasn't come together yet. |
Inspired by @njsmith in njsmith/h11#4.
Right now there is a wart in the h2 API when working on a busy connection. The problem exists when working with
receive_data
. For example, it is possible that you might encounter the following set of events returned fromreceive_data
: [DataReceived(stream_id=1), WindowUpdated(stream_id=1), StreamReset(stream_id=1)].In response to that WindowUpdated you may want to send another data frame on stream 1, but if you do so without processing the entire event list h2 will throw a
ProtocolError
at you because the stream is now in the CLOSED state. However, the user doesn't know that: they're processing the events serially, and haven't gotten to the event that actually closed the stream.The outcome of this is not too bad: while the
ProtocolError
does get thrown, the connection state is still fine and no damage has been done to your ability to continue to use the connection. However, the exception may cause an incautious user to run into trouble: if they're not expecting and handling this exception it'll quite probably cause them to lose a lot of application state.In njsmith/h11#4, @njsmith has proposed a split API for receiving data: a single
receive_data
call returns no events, but instead you need to callnext_event
to advance the state machine. This proposal is essentially a "split" API much like we have for sending.Making that change is a pretty substantial API change and I don't know how I feel about it. However, it turns out that hyper-h2 is capable of emulating that change with a very simple change to the logic of
receive_data
.Right now,
receive_data
returns a list of events. However, it would be a trivial change forreceive_data
to return a generator instead. That generator would then iterate across events, one event at a time. The effect of doing that would be to avoid updating the state machines until the user has an event in hand, and would essentially immediately change hyper-h2 to an API that lazily updates stream state. This would ensure that a user cannot see aProtocolError
for sending data on a closed stream unless they have iterated past an event that closed the stream (or they closed it themselves).I'm interested in what @python-hyper/contributors think, as well as our wider user-base. I think making such a change would be extremely valuable, but it represents a breaking API change (
receive_data
is documented as returning alist
).Do people think this change is sensible?
The text was updated successfully, but these errors were encountered: