-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
squash doc history in separate task #58
base: main
Are you sure you want to change the base?
Conversation
I like that we don't need to query the last timestamp anymore.This should now make it easy to implement document TTL for |
Oh yeah, totally forgot about that. Let me do so.
Should be the exact same behavior as a write occurring after a squash, since |
@davidbrochart Wait, we still need a timestamp if we want to handle the case where the server is not kept alive during a document's TTL. Take this scenario:
In this scenario, the document history grows without bound. I think we need to handle this case as it's fairly common for users running Jupyter locally on a laptop (in which case disk space is also tight), but I'd like @ellisonbg's opinion on this matter as well. |
To handle the above case, we'd need to squash on |
I agree we do need the timestamp in cases where the server restarts, but why wouldn't we need it otherwise? I am trying to understand the logic here. My idea of how this would work is this:
This logic is needed to ensure that this cleanup doesn't happen while users are editing it. |
In the current state of the code, is the idea that you simply count the number of updates to the doc and if it hasn't changed in the TTL then you collapse? |
@ellisonbg The implementation in this PR basically just creates a task that sleeps for the duration of the document TTL and then squashes the document history . Any updates cancel the existing task and create a new task. Therefore having a process repeatedly querying the database isn't needed, and thus the timestamp isn't needed for this case. This is how cron is typically implemented. |
I think the key question here is this: when is it safe to squash the history? Here is my understanding (@davidbrochart please add more details or correct any inaccuracies here). When the first user opens a document, the full state of the document is loaded into the memory of the server. As that user makes changes, updates are 1) forwarded to other users, 2) stored in the db, 3) applied to the in memory document on the server. The server is unable to distinguish between users closing the document or network disruptions (both of these appear as network disconnections). The database is used when a client disconnects and reconnects to get them any missing updates. When all clients have been disconnected for a certain amount of time (I think 1m be default) the state of the document is flushed from memory. The db is also used the next time a user opens the document after it is flushed from memory. The failure mode of squashing history is that we would squash updates that are needed by a user. Let's imagine the following scenario:
A few questions:
|
That's correct, just mentioning that 3) is the cause of 1) and 2). We apply changes from the clients to the "server peer", and listen to changes made on this peer to broadcast them to clients and store them.
Yes, but only if no client is accessing the document for some time. It there is at least one client accessing the document, the missing updates will be sent to new clients from memory, as the in-memory document would still be present.
Unless the document is still in memory, in which case user 1 will get the full history of changes. Otherwise, yes they will get a document with no history, but I think this is the desired behavior. If we agree on this, I think your questions don't apply anymore, right? |
Oh I see your point. Let's assume that updates have been flushed from memory to disk, and squashed.
No, I don't think it will give user 1 their missing updates. I will give them a big update consisting of the whole document, leading to content duplication.
Yes, the point being making sure that all clients closed their document in the front-end, i.e. willingly disconnected.
True, although this will never eliminate the risk. Actually the real solution would be to do what Kevin once suggested: introduce the notion of document session. If the user reconnects with a document session, and that document session is not anymore in the back-end, invalidate the shared document in the front-end, and recreate it from the back-end. |
@davidbrochart Revision summary:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks a lot David.
I'll leave it open so that others can review.
I guess we can implement TTL for FileYStore
later.
Thanks for tagging this issue in #14031 @davidbrochart - I am not familiar with all the details, but it seems indeed that some very robust logic for invalidation and full document refresh in the frontend is necessary. Google Docs evidently does that on occasion - I've had it tell me that my document was stale and forced a reload, after refusing any more edits into a doc. I'm spitballing here, not knowing the implementation details, but it seems to me that any squashing should be done "behind a barrier" - a point in the history that defines a fixed state that updates can't traverse with any granularity. Anyone needing to sync state across that barrier simply needs to re-sync to the state at the barrier, and then get the updates after that. I'll come to the team meeting tomorrow and hopefully we can discuss this a bit. I was willing to dogfood RTC in production for a bit, but at this point I may need to pull the plug unless we find a viable solution path quickly, as I can't really tolerate any data loss/damage in that environment. It would be a shame to lose the opportunity of this environment for real-world feedback on various aspects of UX and performance, so 🤞 that we can figure this out :) |
Squashes document history in a separate scheduled task rather than on write. This will significantly improve performance for document writes in the case where the time since last update exceeds the document TTL.
Open concerns
asyncio.sleep()
in the unit tests? Testing this is tricky because it's not as simple as doingunittest.mock.patch("asyncio.sleep")
, since that causes the document history to be squashed on every write. What I'm looking for is a utility that lets me imperatively advance the event loop clock, e.g.loop.advance(123)
. I found asynctest, but it doesn't seem to work and is abandoned.