Querying/Filtering on the Firehose stream #2418
Replies: 2 comments
-
Agreed on this - I think ways to make the Bluesky firehose easier to consume would be really nice. I'm worried for my hosting costs if the platform grows successfully and becomes 100x bigger. I think that firehoses that only act on certain operations would be a good basic way of improving things for feed maintainers. Most feeds are only interested in posts, and all other types of action other than post creation & deletion are wasted time and bandwidth - for both the feed maintainer and for Bluesky. This is especially since there are far more likes on the network than posts. Maybe as an easy interim measure Bluesky could host an additional firehose endpoint that only contains post creations and deletions? That should be sufficient for a lot of people (and would be easier to work with for feed devs.) |
Beta Was this translation helpful? Give feedback.
-
So I built a thing for this called Jetstream that allows you to use one firehose connection and serve a bunch of local websocket streams in JSON or CBOR without all of the MST bits and also supports filtering for a set of collections you care about or a set of repos you care about. It can also write the records to a Kafka topic if you want services in your infra to consume from Kafka or something. I talked about this on the App in Juni's thread but wanted to update here so folks who see this on GH can find a solution. |
Beta Was this translation helpful? Give feedback.
-
Over the last 6 months I've been working on some bluesky feeds, and bots (most popular being the reminder bot) and an issue I've come across is that >99% of the data from the firehose is completely ignored. I've been running bots separately, and without some sort of local relay server (a mistake I'm already working on correcting) so my total data usage for March was 6TB.
My initial hope was to find a way to filter what the firehose sends to the client. It could be as basic as filtering by repo operation, payload type (post, like, follow, etc.), or as fine grained as the query options for
app.bsky.feed.searchPosts
. I brought this up today in this thread on bluesky, and as many folks probably know, this is not yet possible, though we had a good conversation about some current workarounds/alternatives, as well as possibilities for future implementations.I'm still working on getting a more complete grasp of the atproto in general, so there may be other solutions/alternatives I've overlooked (and I'd love to hear about them), but I want to avoid having to make
searchPosts
calls on an interval. I'd just prefer to do it with a websocket connection/stream.I would love to see some sort of firehose/stream filtering on the server side in the future, but also understand that it's not as easy as the initial idea sounds
Beta Was this translation helpful? Give feedback.
All reactions