Delete processed events #94

dcadenas · 2025-01-31T18:05:55Z

This is for https://github.com/verse-pbc/issues/issues/142

This PR adds event cleanup to prevent indefinite storage in the event service. Events are now deleted after successful processing, and a rotating Bloom filter is used to detect duplicates. Unused event retrieval endpoints and handlers have been removed, and metrics for Bloom filter usage and saturation have been added.

mplorentz

Bloom filters, cool! But I'm confused about their suitability here after doing some research. Bloom filters are susceptible to false positives, and since we are using them skip processing of events then any false positive will result in an event that should have been processed not being processed. So with an error rate of 0.1% we will skip one in every thousand events we should have processed? What am I missing here?

dcadenas · 2025-02-05T21:44:46Z

The error rate is actually lower, it’s currently set to 0.01%, and we could tweak it even further if needed (e.g., down to 0.001%). The idea here is that even if we miss a few events due to false positives, we’ll catch up on the next update for those events. Since we’re dealing with replaceable events, missing one update isn’t the end of the world—we’ll process it the next time it comes through. The chance of missing the same event twice is extremely low (0.01% × 0.01% = 0.0001%, or even lower if we adjust the rate).

We’ve already been okay with some level of event loss (like with events that are too large), so this feels like a natural extension of that trade-off. The occasional miss from the Bloom filter is a small price to pay for the benefits: it simplifies the system, removes the need for a dedicated table for duplicate checks, and keeps the server healthier. In this case, I think prioritizing reliability and scalability over 100% precision makes sense.

That said, if these occasional misses feel like a no-go, we can always switch back to using a table for duplicate checks. Let me know what you think!

dcadenas added 9 commits January 30, 2025 08:15

Move filter constants to app level and adjust filter parameters

735ee64

Delete events after successful processing

e220fc1

Fix flaky tests

d103bd5

Remove unused endpoints

cfe1428

Add bloom filter to detect events already processed

da41be8

Use bloom filter

cfdc378

Update metrics

e05c418

Reduce bloom error

5ef9329

Reduce initial window

c99cad1

dcadenas requested a review from mplorentz January 31, 2025 18:05

dcadenas added 3 commits January 31, 2025 16:08

Fix DI

74c24f7

Update deps

8ede8c7

Update wire_gen

b971cbc

mplorentz reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete processed events #94

Delete processed events #94

dcadenas commented Jan 31, 2025

mplorentz left a comment

dcadenas commented Feb 5, 2025

Delete processed events #94

Are you sure you want to change the base?

Delete processed events #94

Conversation

dcadenas commented Jan 31, 2025

mplorentz left a comment

Choose a reason for hiding this comment

dcadenas commented Feb 5, 2025