-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete processed events #94
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bloom filters, cool! But I'm confused about their suitability here after doing some research. Bloom filters are susceptible to false positives, and since we are using them skip processing of events then any false positive will result in an event that should have been processed not being processed. So with an error rate of 0.1% we will skip one in every thousand events we should have processed? What am I missing here?
The error rate is actually lower, it’s currently set to 0.01%, and we could tweak it even further if needed (e.g., down to 0.001%). The idea here is that even if we miss a few events due to false positives, we’ll catch up on the next update for those events. Since we’re dealing with replaceable events, missing one update isn’t the end of the world—we’ll process it the next time it comes through. The chance of missing the same event twice is extremely low (0.01% × 0.01% = 0.0001%, or even lower if we adjust the rate). We’ve already been okay with some level of event loss (like with events that are too large), so this feels like a natural extension of that trade-off. The occasional miss from the Bloom filter is a small price to pay for the benefits: it simplifies the system, removes the need for a dedicated table for duplicate checks, and keeps the server healthier. In this case, I think prioritizing reliability and scalability over 100% precision makes sense. That said, if these occasional misses feel like a no-go, we can always switch back to using a table for duplicate checks. Let me know what you think! |
This is for https://github.com/verse-pbc/issues/issues/142
This PR adds event cleanup to prevent indefinite storage in the event service. Events are now deleted after successful processing, and a rotating Bloom filter is used to detect duplicates. Unused event retrieval endpoints and handlers have been removed, and metrics for Bloom filter usage and saturation have been added.