[meta] `EventCache` storage #3280

Hywan · 2024-03-27T11:05:34Z

matrix_sdk::EventCache is a new module that has been introduced with #3058. The idea is to manage all events inside the Rust SDK in a single place. An important user of EventCache is the matrix_sdk_ui::Timeline.

EventCache uses a new data structure to organize all events, LinkedChunk, since:

This issue describes the plan to get a persistent storage for EventCache, along with the ability to make it reactive.

Persistent storage

In order to create a persistent storage for EventCache, we need a mechanism that listens to LinkedChunk updates, and map these updates to database operations (like INSERT, DELETE and so on).

We initially went to using a reactive mechanism (like ObservableLinkedChunk), but like any reactive mechanisms, we need to handle the lag.

Note

What is a lag? When new updates are generated, they are usually accumulated in a buffer. This buffer is drained by subscribers of the observed value. When the buffer is full, which can happen because subscribers lag to consume updates, the buffer is reset.

The lag, in this case, is pretty problematic. In case of a lag, the database will be out of sync —data will be missing—, and there is no easy way to get them again. In case of a lag, we might imagine to reset the database and to rewrite everything again, but not all events are loaded in memory. Alternatively, we may want to recompute all the differences between what is in memory and what is inside the database, but again, it's not an easy problem. Anyway, it implies more guards and more complexity.

Instead, we've decided to define a LinkedChunkListener trait, used by LinkedChunk, onto which methods will be called on some particular operations, like insert_chunk, remove_chunk, insert_events and remove_events, that's probably all we need.

The cons:

It's a new API to manage,
It cannot be re-used by something else. It exists for one usage only.

The pros:

It's immediate: it removes the problem of the lag introduced by the initial reactive approach,
It's easily unit-testable!
The API is quite small and clearly not complex at all, it's straighforward.

Tasks

Reactive `EventCache`

Right now, there is no satisfying way to get a stream of EventCache updates. The only mechanism that exists so far is RoomEventCache::subscribe. It returns a tokio::sync::mpsc::Receiver<RoomEventCacheUpdate>. RoomEventCacheUpdate is defined like so:

matrix-rust-sdk/crates/matrix-sdk/src/event_cache/mod.rs

Lines 816 to 838 in ac0bc95

    
           /// An update related to events happened in a room. 
        
           #[derive(Debug, Clone)] 
        
           pub enum RoomEventCacheUpdate { 
        
               /// The room has been cleared from events. 
        
               Clear, 
        
               /// The room has new events. 
        
               Append { 
        
                   /// All the new events that have been added to the room's timeline. 
        
                   events: Vec<SyncTimelineEvent>, 
        
                   /// XXX: this is temporary, until account data lives in the event cache 
        
                   /// — or will it live there? 
        
                   account_data: Vec<Raw<AnyRoomAccountDataEvent>>, 
        
                   /// XXX: this is temporary, until read receipts are handled in the event 
        
                   /// cache 
        
                   ephemeral: Vec<Raw<AnySyncEphemeralRoomEvent>>, 
        
                   /// Collection of ambiguity changes that room member events trigger. 
        
                   /// 
        
                   /// This is a map of event ID of the `m.room.member` event to the 
        
                   /// details of the ambiguity change. 
        
                   ambiguity_changes: BTreeMap<OwnedEventId, AmbiguityChange>, 
        
               }, 
        
           }

Constraint: Prepare for reconciliation

This is OK-ish for the moment (at the time of writing), but it will quickly show limitations, in particular with the reconciliation.

Note

When events are received in different orders —e.g. between /messages, /context or /sync—, it's important to re-order them. We call that reconciliation. It's far, far, faaar from trivial. Actually, there is no solution to this problem, but we will try to make the best heuristics as possible.

Why reconciliation is going to create a problem here? Because it's impossible to represent an update like: “remove item at position $p$, and insert item at position $q$”. The only possible updates so far are “clear” and “append”.

The assiduous reader (oh, hi¹) will think: “How does it work with back- or front-pagination?”. Thanks for asking. Let's make a detour.

Constraint: Pagination

Frontpagination isn't implemented yet (not something hard, just not here yet). Backpagination is done with RoomEventCache::backpaginate. It returns a BackPaginationOutcome, defined like so:

matrix-rust-sdk/crates/matrix-sdk/src/event_cache/mod.rs

Lines 792 to 814 in ac0bc95

    
           /// The result of a single back-pagination request. 
        
           #[derive(Debug)] 
        
           pub enum BackPaginationOutcome { 
        
               /// The back-pagination succeeded, and new events have been found. 
        
               Success { 
        
                   /// Did the back-pagination reach the start of the timeline? 
        
                   reached_start: bool, 
        
                   /// All the events that have been returned in the back-pagination 
        
                   /// request. 
        
                   /// 
        
                   /// Events are presented in reverse order: the first element of the vec, 
        
                   /// if present, is the most "recent" event from the chunk (or 
        
                   /// technically, the last one in the topological ordering). 
        
                   /// 
        
                   /// Note: they're not deduplicated (TODO: smart reconciliation). 
        
                   events: Vec<TimelineEvent>, 
        
               }, 
        
               /// The back-pagination token was unknown to the event cache, and the caller 
        
               /// must retry after obtaining a new back-pagination token. 
        
               UnknownBackpaginationToken, 
        
           }

Ah. Isn't it using RoomEventCacheUpdate? Well, no, because of the Timeline! The API from EventCache has been extracted from the Timeline. The Timeline had and still has 2 sources of updates: /sync and /messages for pagination.

Would it be hard to switch to a single Stream<Item = SyncTimelineEvent>? Well. Yes and no.

For /sync, the Timeline listens to RoomEventCache::subscribe:

matrix-rust-sdk/crates/matrix-sdk-ui/src/timeline/builder.rs

Lines 133 to 137 in ac0bc95

    
           // Subscribe the event cache to sync responses, in case we hadn't done it yet. 
        
           event_cache.subscribe()?; 
        
           let (room_event_cache, event_cache_drop) = room.event_cache().await?; 
        
           let (events, mut event_subscriber) = room_event_cache.subscribe().await?;

For /messages, the Timeline has a dedicated mechanism that calls /messages in a loop until $n$ TimelineItems are received. This is different of $n$ SyncTimelineItem, because some events are state events and cannot be displayed to the user:

matrix-rust-sdk/crates/matrix-sdk-ui/src/timeline/pagination.rs

Lines 49 to 51 in ac0bc95

    
           while let Some(batch_size) = options.next_event_limit(outcome) { 
        
               loop { 
        
                   match self.event_cache.backpaginate(batch_size, token).await? {

What do we want? A single source of data for Timeline.
What does it require? Change the backpagination mechanism. Not a big deal, it's doable. The biggest difficulty is that the Timeline will ask for data that will be injected to another place (Timeline::paginate_backwards will see the results of RoomEventCache::backpaginate via RoomEventCache::subscribe). It's not easy to connect both.
How to do it? Glad you ask.

The solution: Step 1

LinkedChunk must expose a Stream<Item = Vec<VectorDiff<SyncTimelineEvent>>>-like API, something like:

impl LinkedChunk {
    fn subscribe_as_vector(&self) -> (Vec<SyncTimelineEvent>, impl Stream<Item = Vec<VectorDiff<SyncTimelineEvent>>>) {
        todo!()
    }

Easy right? Well. No. LinkedChunk is not a Vector. The algorithm is going to be fun here. LinkedChunk::subscribe_as_vector should fake it's a Vector and should emit VectorDiff, à la eyeball_im::ObservableVector.

Of course, RoomEventCache::subscribe must be rewritten to use LinkedChunk::subscribe_as_vector.

The solution: Step 2

Timeline must listen to RoomEventCache::subscribe but for all updates. Then, Timeline will map Stream<Item = Vec<VectorDiff<SyncTimelineItem>>> into TimelineItems that will be inserted/deleted/moved in the correct places inside its own ObservableVector<TimelineItem> inside TimelineInnerState:

matrix-rust-sdk/crates/matrix-sdk-ui/src/timeline/inner/state.rs

Lines 71 to 75 in ac0bc95

    
           #[derive(Debug)] 
        
           pub(in crate::timeline) struct TimelineInnerState { 
        
               pub items: ObservableVector<Arc<TimelineItem>>, 
        
               pub meta: TimelineInnerMetadata, 
        
           }

This is going to be delicate.

Tasks

The 2 following lists can be done in parallel:

Tasks on EventCache:

Create LinkedChunk::subscribe_as_vector:
- feat(sdk): Introduce linked_chunk::Updates #3384
- feat(sdk): Add LinkedChunk::subscribe_as_vector #3392

Tasks on EventCache and Timeline:

Improve/refactor the pagination mechanism of Timeline to EventCache so that EventCache is the only place to have pagination logics. Timeline must entirely depend on EventCache.

The following list must be done after the 2 previous lists:

Lazy `EventCache`: Combine pagination and persistent storage

Before being able to enable persistent storage in EventCache, one last problem must be addressed.

RoomEventCache will use RoomEvents (which uses LinkedChunk) to load events from the persistent storage. That's fine. However, we don't want to load all events from the persistent storage. Imagine rooms with 1'000'000 events: do we want to load all the events in memory? Absolutely no! RoomEventCache must be lazy: it must load only the $n$ newest chunks from the persistent storage.

OK, but what happens when we backpaginate?

We must load events from the persistent storage first,
We must run a proper backpagination (with /messages) to see if events aren't missing from the federation that could have missed by /sync,
We must reconciliate both.

Note

A more detailed plan is being drafted by @bnjbvr and @Hywan to define when running a network request (/messages) is necessary or not necessary. The heuristic is not trivial, all edge-cases must be well-defined and considered carefully.

Once we have this mechanism in-place, we can enable the persistent storage for EventCache.

Tasks

RoomEventCache is lazy: load chunks on-demand,
RoomEventCache is able to backpaginate from its persistent storage and from /messages at the same time,
Finally, enable the persistent storage for EventCache 🎉.

Naive reconciliation

To start with, we can implement a super native reconciliation algorithm that will simply remove duplicated events. For a first shot, it's totally fine. A better reconciliation algorithm can be defined later. For the record, what is present now is the remove duplicated events approach.

What we want to ensure is that the reconciliation algorithm will modify events in LinkedChunk, which will propagate to the persistent storage via LinkedChunkListener and to the Timeline via RoomEventCache::subscribe. That's one of the main goal of this proposed designed.

Tasks

feat(sdk): Introduce event_cache::Deduplicator #4174

Conclusion

This plan provides a solution to support a reconciliation mechanism in a single place. It will benefit to other users, like Timeline. The Timeline needs to be refactored due to having 2 source of updates, one for /sync (live events), and one for /messages (front- and back-paginations of old events).

Because EventCache will be reactive, it will simplify a lot all the testing aspects:

EventCache will be easy to test: assert a stream,
Timeline will be super easy to test without having to mock a server will all its possible requests: simply generate a stream that will emit what we want to test exactly, it's a matter of writing the following code for the inputs of the Timeline:

let inputs = stream! {
    yield vec![VectorDiff::Append { values: events![…, …, …] }];
    yield vec![VectorDiff::Remove { index: … }];
    yield vec![VectorDiff::Insert { index: …, value: … }];
};

Components will be fully decoupled. Each part will be unit-tested. It will increase the robustness of the Timeline.

Bonus

LinkedChunk uses mutation-based testing and property-based testing:
- feat(sdk): Mutation-based testing and property-based testing for LinkedChunk #3460

fix(sdk): Fix a substract with overflow in LinkedChunk #3578

Address [meta] Event Cache API 🕸 #3058
Address [Story] 🦀 Event cache storage element-hq/element-meta#2405

Mark? ↩

The text was updated successfully, but these errors were encountered:

manuroe · 2024-10-11T11:39:45Z

Constraint: Pagination

The biggest difficulty is that the Timeline will ask for data that will be injected to another place (Timeline::paginate_backwards will see the results of RoomEventCache::backpaginate via RoomEventCache::subscribe). It's not easy to connect both.

For info, this is exactly the shape used in the Android SDK (hello @ganfra!). The timeline observes a reactive DB that is fed by other mechanisms. The actioner and the consumers are deconnected.

Lazy EventCache: Combine pagination and persistent storage

We must run a proper backpagination (with /messages) to see if events aren't missing from the federation that could have missed by /sync,

This is not mandatory. We must use the pagination token provided by the backend on /sync, /messages, /context requests. The backend will provide late messages through this stream.
How to render or reorder a late message is a tricky question where we need help from the product. There is no good answer as far as I know:

if you reorder the message in the existing timeline, the user will miss it
if you don't , the timeline ordering will look weird

I think most matrix clients take the second approach (which is also the lazy one). If we choose this path too, there might be things to do in the timeline module to flag late messages to offer a better UX. But again, we need a product decision first.

Hywan · 2024-11-20T14:07:19Z

Task distribution. First level items must be executed in that order. Other level items of the same level can be done in parallel.

EventCacheStore
- Add EventCacheStore::handle_linked_chunk_update with empty implementations for MemoryStore and SQLite
  - feat(base): Add EventCacheStore::handle_linked_chunk_updates #4299
- Implement EventCacheStore::handle_linked_chunk_update for MemoryStore
  - feat(base) Implement RelationalLinkedChunk #4298
- Implement EventCacheStore::handle_linked_chunk_update for the sqlite backend
  - feat(event cache): handle linked chunk updates and reload in the sqlite backend #4340
- refactor(linked chunk): replace LinkedChunk::len() with a simpler impl #4334
- feat(linked chunk): add a way to reconstruct a linked chunk from its raw representation #4318
- feat(event cache): reload a linked chunk from a store #4362
- Check-in a sqlite database with an initial set of events, so it's impossible to regress loading from an old format without implementing some kind of migration. (Make sure to include meaningful, realistic events with UTD info, etc.)
EventCache
- Load from store
  - Plug reloading from the store into the event cache
    - feat(event cache): connect the room event cache and persistent storage #4347
- Sync
  - Store events
    - Plug storing from the event cache into the store
      - feat(event cache): connect the room event cache and persistent storage #4347
    - Do not store bundled events / do not attempt to use them if they come from the cache
      - feat(event cache): strip bundled relations before persisting events #4373
  - Do not store previous-batch tokens unnecessarily
    - Stop assuming no previous-batch token means it'll come from sync
    - Don't store a previous-batch token for a back-pagination that only returned known events
- Enable storage by default
  - Get rid of EventCache::add_initial_events() (replace it with custom prefilled store in tests)
    - test(event cache): rely less on EventCache::add_initial_events() #4372
  - Don't notify clear for gappy timeline responses, if storage's enabled
    - feat(event cache): don't clear a timeline if storage's enabled #4374
  - Have the RoomEventCache propagate remove/insertion updates to observers (so back-paginations make sense)
  - Some kind of lazy-loading (see below)
- Pagination
  - Load from the store
  - Load from the network
  - Store events from network
  - Lazy-load events
    - Implement an equivalent of a reverse Stream::limit in eyeball?
    - Use the rlimit stream adapter on top of vectordiff, for the timeline to listen to
    - Being able to build a LinkedChunk chunk-by-chunk manually
  - Respect the room retention policy via observing the state m.room.retention event, and automatically remove events
Clean up / followups
- Sliding Sync
  - Kill timeline_queue
- When deleting (forgetting) a room, delete the accompanying room events from the event cache store
  - feat(event cache): remove events when forgetting a room #4521
- Redacted events should be really removed from the linked chunk (and storage)
- Measure performance of storage
  - Add benchmark to measure how long it takes to load events from the cache
  - Add benchmark to measure how long it takes to create a timeline with those events
- Check if the pinned events timeline makes use of the event cache
  - And make it work if it doesn't
- Remove AllEvents
  - Update the MemoryStore and SQLite representation to include a relates_to column to create 1:N relations between events
  - Write a migration for the above
  - Get rid of AllEvents once the above has been done
- Think about the cross-process locking situation

This patch adds the `handle_linked_chunk_updates` method on the `EventCacheStore` trait. Part of matrix-org#3280.

Hywan mentioned this issue Mar 28, 2024

feat(sdk): Introduce LinkedChunkUpdate #3281

Merged

manuroe mentioned this issue Apr 24, 2024

[Story] 🦀 Event cache storage element-hq/element-meta#2405

Open

6 tasks

bnjbvr mentioned this issue Apr 29, 2024

[meta] Event Cache API 🕸 #3058

Open

2 tasks

This was referenced May 6, 2024

feat(sdk): Introduce linked_chunk::Updates #3384

Merged

feat(sdk): Add LinkedChunk::subscribe_as_vector #3392

Merged

This was referenced May 27, 2024

feat(sdk): Improve RoomEventCacheUpdate #3471

Merged

feat(sdk): Add RoomPagination::run_backwards(…, until) #3472

Merged

This was referenced Jun 13, 2024

fix(ui): Change the behaviour when a duplicated event is received by the Timeline #3550

Merged

fix(sdk): Fix a substract with overflow in LinkedChunk #3578

Merged

feat(sdk): Implement event_cache::Deduplicator #3580

Closed

bnjbvr changed the title ~~[meta] EventCache storage, story of a plan~~ [meta] EventCache storage Aug 21, 2024

This was referenced Aug 21, 2024

Event cache: Move decryption retry in there #3872

Open

Reconcile or unify RoomEvents event chunks with AllEventsCache events #3886

Open

Hywan mentioned this issue Oct 9, 2024

feat(sdk): Dropping an UpdatesSubscriber release its reader token for the garbage collector #4102

Merged

bnjbvr mentioned this issue Oct 17, 2024

Follow an ignore/unignore user event with an initial sync #1703

Open

This was referenced Oct 28, 2024

test(sdk): Improve the test suite of RoomEvents #4172

Merged

feat(sdk): Add LinkedChunk::remove_item_at #4173

Merged

feat(sdk): Introduce event_cache::Deduplicator #4174

Merged

feat: Implement cross-process lock for the EventCache #4192

Merged

Hywan added a commit to Hywan/matrix-rust-sdk that referenced this issue Nov 20, 2024

feat(base): Add EventCacheStore::handle_linked_chunk_updates.

67ff82e

This patch adds the `handle_linked_chunk_updates` method on the `EventCacheStore` trait. Part of matrix-org#3280.

Hywan mentioned this issue Nov 20, 2024

feat(base): Add EventCacheStore::handle_linked_chunk_updates #4299

Merged

bnjbvr mentioned this issue Jan 21, 2025

event cache: be more conservative when adding duplicated events to a linked chunk #4563

Open

Hywan mentioned this issue Feb 4, 2025

fix(sdk): Improve performance of RoomEvents::maybe_apply_new_redaction #4616

Merged

Hywan self-assigned this Feb 4, 2025

This was referenced Feb 5, 2025

chore(sdk): Remove RoomEventCacheUpdate::Clear #4627

Merged

feat(sdk): Implement EventCache lazy-loading #4632

Merged

bnjbvr mentioned this issue Feb 10, 2025

fix(timeline): maintain aggregations when an event is deduplicated #4576

Merged

This was referenced Feb 11, 2025

feat(sdk): event_cache::Deduplicator is able to learn without scanning #4652

Closed

refactor(common): LinkedChunk can start by a gap #4653

Merged

bnjbvr mentioned this issue Feb 12, 2025

refactor(event cache): extract deduplication out of RoomEvents #4656

Merged

Hywan mentioned this issue Feb 12, 2025

feat(base): Add EventCacheStore::filter_duplicated_events #4659

Merged

bnjbvr mentioned this issue Feb 12, 2025

feat(event cache): deduplicate using a store when available #4662

Merged

This was referenced Feb 20, 2025

feat(event cache): unload a linked chunk whenever we get a limited sync #4694

Merged

feat(event cache): automatically shrink a room's linked chunk when all subscribers are gone #4703

Merged

Hywan mentioned this issue Feb 21, 2025

fix: Duplicated events are always removed #4706

Merged

2 tasks

Hywan mentioned this issue Feb 28, 2025

fix(sqlite): SqliteEventCacheStore is 35 times faster #4739

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[meta] `EventCache` storage #3280

[meta] `EventCache` storage #3280

Hywan commented Mar 27, 2024 •

edited

Loading

manuroe commented Oct 11, 2024

Constraint: Pagination

Lazy EventCache: Combine pagination and persistent storage

Hywan commented Nov 20, 2024 •

edited by bnjbvr

Loading

[meta] EventCache storage #3280

[meta] EventCache storage #3280

Comments

Hywan commented Mar 27, 2024 • edited Loading

Persistent storage

Tasks

Reactive EventCache

Constraint: Prepare for reconciliation

Constraint: Pagination

The solution: Step 1

The solution: Step 2

Tasks

Lazy EventCache: Combine pagination and persistent storage

Tasks

Naive reconciliation

Tasks

Conclusion

Bonus

Footnotes

manuroe commented Oct 11, 2024

Constraint: Pagination

Lazy EventCache: Combine pagination and persistent storage

Hywan commented Nov 20, 2024 • edited by bnjbvr Loading

[meta] `EventCache` storage #3280

[meta] `EventCache` storage #3280

Hywan commented Mar 27, 2024 •

edited

Loading

Reactive `EventCache`

Lazy `EventCache`: Combine pagination and persistent storage

Hywan commented Nov 20, 2024 •

edited by bnjbvr

Loading