How should event stream consumers handle out-of-order sequence numbers? #1552
-
https://atproto.com/specs/event-stream#sequence-numbers says:
I assume "increase monotonically" means that, for a given event stream connection, servers should emit messages with sequence numbers in monotonically increasing order, ie they shouldn't send frames with seq 1, then 3, then 2. If so, ok. I'm curious though, if that does happen, how should an event stream consumer handle it? Specifically, for If/when that happens, is the BGS expected to continue using the repo at commit 1 until it later receives commit 2, at which point it has all of the commits it needs, so it then advances its copy of the repo to commit 3 and continue on? Or is it ok for the BGS to break and be unable to recover that repo and have it stuck at commit 1 indefinitely? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
I plan to handle this case the same as if 2 had been dropped entirely. When 3 comes in, I won't necessarily know that a 2 ever existed or is missing, but I will notice that I'm missing MST and/or record blocks. When I notice that, I'll start trying to catch up "out-of-band" using getBlocks requests, recursively. Depending on when 2 finally arrives, it won't be telling me any new information at all, so I can safely ignore it. (I won't even attempt to make use out of out-of-order events, I will simply drop then unconditionally) The logic to do this ought to come "for free" as part of handling recovery after extended outages, beyond the firehose cursor history window. |
Beta Was this translation helpful? Give feedback.
-
Thanks! This all makes sense. Definitely nice that at least for |
Beta Was this translation helpful? Give feedback.
If you're emitting events, you should work very hard to guarantee in-order and consistent sequence numbers for events.
Our services will just ignore any out of order event.
However, this is a good sign that something went wrong with the providing service.
Services may:
getRepo
orgetBlocks
Persistent failures suggest a faulty service, and I would expect consumers to stop subscribing to them. Similar to publishing poorly signed commits
If the providing service is out of order because they reset their cursor (ie curso…