-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185
Comments
Rough conclusions from the discussion: We have this problem in quite a few areas of Synapse. SyTest's approach here would be to retry in a loop until the 200 comes back — let's do this for now as doing anything more proper is tricky. (It would also be likely to worsen latency if all requests had to wait for things to catch up.) In real-world situations this is unlikely to be a problem because this would usually propagate in a few milliseconds. |
In a nutshell, we've written the test expecting "read-after-write consistency" but we don't have that. (I say this because I think it's useful to have the phrase in mind) |
👍 Just experienced the stale cache racing while waiting for invalidation from replication with another Complement test but this time it's all within the same Edit: Split off to #14211 |
Spawning from matrix-org/synapse#14028 (comment) The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment) > Here is what happens: > > 1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling) > 1. Client on `serverB` calls `/messages?dir=b` > 1. `serverB:client_reader1` accepts the request and drives things > 1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA` > 1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue` > 1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication > 1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier` > 1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out > 1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)
Spawning from matrix-org/synapse#14028 (comment) CI failure: https://github.com/matrix-org/synapse/actions/runs/3182998161/jobs/5189731097#step:6:15343 (from [discussion](matrix-org/synapse#14028 (comment))) ``` WORKERS=1 POSTGRES=1 COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestJumpToDateEndpoint/parallel/federation/can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint ``` ### Why did this test fail with workers before? The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment) > Here is what happens: > > 1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling) > 1. Client on `serverB` calls `/messages?dir=b` > 1. `serverB:client_reader1` accepts the request and drives things > 1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA` > 1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue` > 1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication > 1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier` > 1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out > 1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)
(This is the cause of a flake in Complement's
TestRestrictedRoomsLocalJoin
tests when running with workers.)If room A is a restricted room, restricted to members of room B, then if:
the user can be erroneously rejected from joining room A (in step (3)).
This is because there is a cache,
get_rooms_for_user_with_stream_ordering
, which is populated in step (1) on the event creator.In step (2), the event persister will invalidate that cache and send a command over replication for other workers to do the same.
This issue then occurs if the event creator doesn't receive that command until after step (3).
This occurs easily in Complement+workers on CI, where it takes ~200 ms for the invalidation to be received on the new worker (CPU contention in CI is probably playing a big part).
An obvious workaround is for the client to just sleep & retry, but it seems very ugly that we're issuing 403s to clients when they're making fully serial requests.
Another incomplete workaround is for the event creator to invalidate its own cache, but that won't help if the join to room B takes place on a different event creator.
I think a potential solution that would work is to:
(I'm not exactly sure how you 'fetch the latest position'; I was rather hoping there'd be
SETMAX
in Redis but didn't find it. Alternatively, it's probably possible to just PING Redis and treat the PONG as the barrier, but it's not a very nice solution.)The text was updated successfully, but these errors were encountered: