Creates document session #108

hbcarlos · 2023-02-23T11:34:44Z

Fixes:

github-actions · 2023-02-23T11:35:06Z

👈 Launch a Binder on branch hbcarlos/jupyter_collaboration/document_sessions

jupyter_collaboration/handlers.py

davidbrochart · 2023-02-23T13:11:54Z

jupyter_collaboration/handlers.py

-            self.log.info("Request for Y document '%s' with room ID: %s", path, ws_url)
-            return self.finish(ws_url)
+            self.log.info("Request for Y document '%s' with room ID: %s", path, idx)
+            data = json.dumps({"file_id": idx, "session": DOCUMENT_SESSION})


Why did you remove format:type: from file_id?

Because we are not using it.

The WebSocketProvider requests the file id and sends the format and type in that request, then the backend retrieves the file id, composes the room id with format:type:file_id, and returns the room id. We have that info in the front. We only need the file id. We cannot even retrieve the file id when we don't have the format and type. So I don't think sending that info just to compose the room id in the backend makes sense. I believe it is better to retrieve the file id and compose the room id on the client side

Now I'm thinking that maybe is better to use a get request since we only need to retrieve the file id.

Also, we are not setting anything with that request. We are only retrieving information. The one deciding to index the file if it's not indexed yet is the backend, not the frontend. The frontend is just asking for data.

The PUT /api/yjs/roomid/{path} endpoint is for getting a room ID for a document path, format and type. What is returned is an implementation detail, it could also be a pure UUID. If it returns only the file ID, we cannot distinguish between a document e.g. as a plain file or as a notebook type.

Maybe using WebSocket subprotocols for the Y-WebSocket? The client sends everything it supports, and the server checks them and accepts or closes the connection.

The client will unlikely send everything it supports because it will need to come with a specific version of jupyter_ydoc (whatever the language) and any doc provider. And I don't see us going into the trouble of supporting multiple versions within the same package. So my guess is that this will be simplified to the client sends the "version" it supports.

I'm not sure supporting multiple versions is such an issue. The problem I see with the client sending only one supported version is that it could quickly lead to incompatible clients. But I'm not sure either this will be the case.
In any case, using WebSocket subprotocols leaves the door open.

Maybe using WebSocket subprotocols for the Y-WebSocket? The client sends everything it supports, and the server checks them and accepts or closes the connection.

I don't think using subprotocols is the solution. In my opinion, we should validate the requirements before opening a connection.

After investigating a little bit, I believe the only way we can clean and reset a client, it is by disposing the shared model (destroying the YDoc and YWebSocketProvider) and instantiating a new one. I don't think it is possible to remove the updates from a YDoc and reinitialize it again (It doesn't make sense either).

The document session requirements are the information we need before establishing the communication. Once clients start communicating, those requirements can not change. Even in the case of squashing updates in the Store, we can only do it when the room is empty. Otherwise, we will need to:

1 - communicate to every client that there was a squash in the document, then on each client
2 - we will need to close the connection
3 - dispose the resources
4 - instantiate a new shared document
5 - open a new connection

That's inconvenient from a user perspective. Imagine you are working on a document, and every X minutes, the document closes (or asks you to close it) and reopens.

I also thought using subprotocols was the best solution, but after debugging this issue and investigating a little, I don't think it is the right solution.

I don't think using subprotocols is the solution. In my opinion, we should validate the requirements before opening a connection.

With WebSocket subprotocols, the requirements are validated before opening the connection.

fcollonval

Thanks @hbcarlos

In addition to the point raised by David, could you pass the ITranslator object to the provider to the error message can be translated?

I guess you need to pass it through the IDrive at

https://github.com/jupyterlab/jupyter_collaboration/blob/d8481ce01934a671e5da049f31062bbbb6bccd56/packages/collaboration-extension/src/filebrowser.ts#L44

packages/docprovider/src/yprovider.ts

jupyter_collaboration/handlers.py

davidbrochart · 2023-02-25T12:29:50Z

@hbcarlos do you need help to finish this PR? I think it's almost ready, as a quick fix to the duplication issue, and then we can iterate and improve on it. You just need to revert the changes to the PUT /api/yjs/roomid/{path} handler (keeping the returned session of course).

fcollonval

Thanks @hbcarlos

The new endpoints are better as they are based on meaning rather than technology and avoid the need to deal with regex order. Moreover as we need to backport this, this will allow to not break the client installation as for JLab 3.6.x, the user may end up calling the previous roomid not expecting a JSON answer.

There is naming convention that requires some changes but otherwise it is good to go.

packages/docprovider/src/ydrive.ts

packages/docprovider/src/yprovider.ts

jupyter_collaboration/app.py

davidbrochart · 2023-02-27T08:16:27Z

jupyter_collaboration/app.py

@@ -59,5 +64,7 @@ def initialize_handlers(self):
            [
                (r"/api/yjs/roomid/(.*)", YDocRoomIdHandler),
                (r"/api/yjs/(.*)", YDocWebSocketHandler),
+                (r"/api/collaboration/room/(.*)", YDocWebSocketHandler),
+                (r"/api/collaboration/session/(.*)", DocSessionHandler),


I don't think we should grow the API surface.
I think we should use the already existing PUT /api/yjs/roomid/{path} endpoint. It could return a value that is a combination of the file ID, the document type and format, and the session ID. That way, the frontend doesn't have to ask for a session ID, all it asks for is a room ID. When it tries to connect to the WebSocket with a room ID that doesn't exist (because the server session has changed), the room cannot be found and the frontend asks for the room ID again, discarding the shared document that it may have.

I'm afraid I have to disagree. Returning a string with multiple attributes in it is not the proper solution.

By combination, I don't mean necessarily concatenating multiple attributes, it can be a UUID and the server keeps a mapping of this UUID to the attributes.

The trouble with the opaque UUID is that on the frontend you will not be able to provide good error message to the user and follow-up action. In particular if the reason the websocket cannot be opened is the session id, we must tell the user to not use the current browser tab.

It is easy for the frontend to know that the session has changed: it fails to connect with its room ID. It can now ask for the room ID again, if that fails it's because the document doesn't exist, otherwise it's because the server has restarted, and in that case it can tell the user to not use the tab.

Implicit deduction is never a good path for code evolution, maintenance and understanding.

jupyter_collaboration/handlers.py

davidbrochart · 2023-02-27T09:48:10Z

So to summarize, this PR (correct me if I'm wrong) introduces a new session ID on top of the room ID, and is a breaking change.
While what I'm suggesting it to only deal with a room ID, and not break the API.
I'm not sure a new session ID is needed, so we should make sure it is before merging and breaking the API.

hbcarlos · 2023-02-27T11:32:14Z

So to summarize, this PR (correct me if I'm wrong) introduces a new session ID on top of the room ID, and is a breaking change.

It is not a breaking change. We did not change any API. We are adding a new endpoint to retrieve session information next to the room id.

I'm not sure a new session ID is needed

Why is not needed anymore?

davidbrochart · 2023-02-27T11:52:56Z

It is not a breaking change. We did not change any API. We are adding a new endpoint to retrieve session information next to the room id.

But you are deprecating the old API, isn't that a breaking change?

Why is not needed anymore?

Because we can do everything with the already existing room ID (see my comment).

fcollonval · 2023-02-27T13:30:14Z

Ok let's move forward keeping a single opaque identifier. This will likely induce the need for storing a mapping between the identifier and the room + session information to make the bridge between the request of the single identifier and the request of the document connection.

davidbrochart · 2023-02-28T08:44:12Z

@hbcarlos mentioned the issue of transient rooms which don't require a room ID, so we cannot rely on room IDs to support sessions for them. I cannot think of a solution for them, that wouldn't involve a session ID, so let's got with explicit session ID and merge this PR.
Thanks Carlos for exploring all possible solutions.

…ty and deprecates old endpoints

hbcarlos · 2023-02-28T12:00:47Z

Thanks, @fcollonval and @davidbrochart, for looking into this PR.

I reverted my latest commits to come back to the session ID implementation.

davidbrochart · 2023-03-01T13:38:36Z

I just tested that PR on JupyterLab master with the following steps:

Launchr JupyterLab, create a notebook
Open a new browser window and open the same notebook
Stop and restart the server, but don't close the two browser windows
A window detects that the session has changed and asks to reload the page, accept
A blank notebook appears (no cell)

@hbcarlos have you tried that scenario?

davidbrochart · 2023-03-01T13:41:47Z

It seems to be a jupyter_ydoc version issue because I can see these errors in the backend:

[E 2023-03-01 14:34:27.379 ServerApp] Uncaught exception GET /api/collaboration/room/json:notebook:01aac1bc-a0c5-42c7-8ac9-6a147601e341?sessionId=6a88536d-04df-4935-9bf2-549b6254c915 (127.0.0.1)
    HTTPServerRequest(protocol='http', host='127.0.0.1:8888', method='GET', uri='/api/collaboration/room/json:notebook:01aac1bc-a0c5-42c7-8ac9-6a147601e341?sessionId=6a88536d-04df-4935-9bf2-549b6254c915', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/home/david/micromamba/envs/jupyterlab/lib/python3.11/site-packages/tornado/websocket.py", line 944, in _accept_connection
        await open_result
      File "/home/david/github/davidbrochart/jupyter_collaboration/jupyter_collaboration/handlers.py", line 236, in open
        if self.room.document.source != model["content"]:
           ^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/home/david/micromamba/envs/jupyterlab/lib/python3.11/site-packages/jupyter_ydoc/ybasedoc.py", line 60, in source
        return self.get()
               ^^^^^^^^^^
      File "/home/david/micromamba/envs/jupyterlab/lib/python3.11/site-packages/jupyter_ydoc/ynotebook.py", line 213, in get
        metadata=meta["metadata"],
                 ~~~~^^^^^^^^^^^^
    KeyError: 'metadata'

davidbrochart · 2023-03-01T13:47:13Z

It seemed to be a .jupyter_ystore.db issue, it works fine after removing it 👍

jupyter_collaboration/handlers.py

davidbrochart · 2023-03-01T14:17:16Z

Thanks @hbcarlos !

* Creates document session * Uses translator * Update jupyter_collaboration/handlers.py * Create a new endpoint for document sessions, changes tranlator property and deprecates old endpoints --------- Co-authored-by: David Brochart <[email protected]>

* Creates document session (#108) * Creates document session * Uses translator * Update jupyter_collaboration/handlers.py * Create a new endpoint for document sessions, changes tranlator property and deprecates old endpoints --------- Co-authored-by: David Brochart <[email protected]> * Deprecate endpoints * Fixes version, pre-commit, and runs pre-commit * Apply suggestions from code review * Update jupyter_server_ydoc/__init__.py --------- Co-authored-by: David Brochart <[email protected]> Co-authored-by: Frédéric Collonval <[email protected]>

fperez · 2023-03-14T15:59:03Z

Quick question - if we want to use this to mitigate the document corruption problems, do we need to wait for a 3.6.2 release or can we update a separate subpackage? I'm not sure whether any of this code goes into the core lab package or not, sorry!

This is on the critical path for us to test RTC in teaching settings at Berkeley, so I'm trying to figure out what the order of operations is for us. Thanks for the great work!

fperez · 2023-03-14T16:00:15Z

Ah, I just realized that 14128 may be the answer to my question, sorry for the noise. In that case I presume a 3.6.2 is coming soon with these fixes! 🎉

hbcarlos · 2023-03-14T16:28:26Z

Ah, I just realized that jupyterlab/jupyterlab#14128 may be the answer to my question, sorry for the noise. In that case I presume a 3.6.2 is coming soon with these fixes! 🎉

Yep, that is fixed and backported to lab 3.6 we just need to wait for the 3.6.2 release.

hbcarlos added the bug Something isn't working label Feb 23, 2023

hbcarlos requested review from davidbrochart and fcollonval February 23, 2023 11:34

hbcarlos self-assigned this Feb 23, 2023

davidbrochart reviewed Feb 23, 2023

View reviewed changes

jupyter_collaboration/handlers.py Outdated Show resolved Hide resolved

davidbrochart reviewed Feb 23, 2023

View reviewed changes

fcollonval requested changes Feb 23, 2023

View reviewed changes

packages/docprovider/src/yprovider.ts Outdated Show resolved Hide resolved

packages/docprovider/src/yprovider.ts Outdated Show resolved Hide resolved

packages/docprovider/src/yprovider.ts Outdated Show resolved Hide resolved

hbcarlos force-pushed the document_sessions branch from d8fab50 to bef99bc Compare February 23, 2023 15:36

davidbrochart reviewed Feb 23, 2023

View reviewed changes

jupyter_collaboration/handlers.py Outdated Show resolved Hide resolved

fperez mentioned this pull request Feb 24, 2023

Damaged notebook (duplicate cells) in RTC-active hosted hub jupyterlab/jupyterlab#14031

Open

fcollonval requested changes Feb 27, 2023

View reviewed changes

davidbrochart requested changes Feb 27, 2023

View reviewed changes

fcollonval approved these changes Feb 27, 2023

View reviewed changes

davidbrochart reviewed Feb 27, 2023

View reviewed changes

jupyter_collaboration/handlers.py Outdated Show resolved Hide resolved

hbcarlos and others added 4 commits February 28, 2023 12:48

Creates document session

f69a150

Uses translator

0cea8f0

Update jupyter_collaboration/handlers.py

626339f

Create a new endpoint for document sessions, changes tranlator proper…

7d9e5bf

…ty and deprecates old endpoints

hbcarlos force-pushed the document_sessions branch from d83973b to 7d9e5bf Compare February 28, 2023 11:59

davidbrochart approved these changes Mar 1, 2023

View reviewed changes

jupyter_collaboration/handlers.py Show resolved Hide resolved

davidbrochart merged commit 9d57e8b into jupyterlab:main Mar 1, 2023

hbcarlos deleted the document_sessions branch March 1, 2023 14:41

hbcarlos mentioned this pull request Mar 5, 2023

Doc session jupyterlab/jupyterlab#14128

Merged

fperez mentioned this pull request Mar 14, 2023

Real Time Collaboration (RTC) rollout plan for Spring 2023 and later (starting with Stat 159) berkeley-dsep-infra/datahub#4223

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creates document session #108

Creates document session #108

hbcarlos commented Feb 23, 2023

github-actions bot commented Feb 23, 2023

davidbrochart Feb 23, 2023

hbcarlos Feb 23, 2023

hbcarlos Feb 23, 2023

hbcarlos Feb 23, 2023

davidbrochart Feb 23, 2023

davidbrochart Feb 23, 2023

fcollonval Feb 24, 2023

davidbrochart Feb 24, 2023 •

edited

Loading

hbcarlos Feb 24, 2023 •

edited

Loading

davidbrochart Feb 24, 2023

fcollonval left a comment

davidbrochart commented Feb 25, 2023

fcollonval left a comment

davidbrochart Feb 27, 2023 •

edited

Loading

hbcarlos Feb 27, 2023

davidbrochart Feb 27, 2023

fcollonval Feb 27, 2023

davidbrochart Feb 27, 2023

fcollonval Feb 27, 2023

davidbrochart commented Feb 27, 2023

hbcarlos commented Feb 27, 2023

davidbrochart commented Feb 27, 2023

fcollonval commented Feb 27, 2023

davidbrochart commented Feb 28, 2023

hbcarlos commented Feb 28, 2023

davidbrochart commented Mar 1, 2023 •

edited

Loading

davidbrochart commented Mar 1, 2023

davidbrochart commented Mar 1, 2023

davidbrochart commented Mar 1, 2023

fperez commented Mar 14, 2023

fperez commented Mar 14, 2023 •

edited

Loading

hbcarlos commented Mar 14, 2023

Creates document session #108

Creates document session #108

Conversation

hbcarlos commented Feb 23, 2023

github-actions bot commented Feb 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidbrochart Feb 24, 2023 • edited Loading

Choose a reason for hiding this comment

hbcarlos Feb 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fcollonval left a comment

Choose a reason for hiding this comment

davidbrochart commented Feb 25, 2023

fcollonval left a comment

Choose a reason for hiding this comment

davidbrochart Feb 27, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidbrochart commented Feb 27, 2023

hbcarlos commented Feb 27, 2023

davidbrochart commented Feb 27, 2023

fcollonval commented Feb 27, 2023

davidbrochart commented Feb 28, 2023

hbcarlos commented Feb 28, 2023

davidbrochart commented Mar 1, 2023 • edited Loading

davidbrochart commented Mar 1, 2023

davidbrochart commented Mar 1, 2023

davidbrochart commented Mar 1, 2023

fperez commented Mar 14, 2023

fperez commented Mar 14, 2023 • edited Loading

hbcarlos commented Mar 14, 2023

davidbrochart Feb 24, 2023 •

edited

Loading

hbcarlos Feb 24, 2023 •

edited

Loading

davidbrochart Feb 27, 2023 •

edited

Loading

davidbrochart commented Mar 1, 2023 •

edited

Loading

fperez commented Mar 14, 2023 •

edited

Loading