-
-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creates document session #108
Conversation
jupyter_collaboration/handlers.py
Outdated
self.log.info("Request for Y document '%s' with room ID: %s", path, ws_url) | ||
return self.finish(ws_url) | ||
self.log.info("Request for Y document '%s' with room ID: %s", path, idx) | ||
data = json.dumps({"file_id": idx, "session": DOCUMENT_SESSION}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove format:type:
from file_id
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we are not using it.
The WebSocketProvider requests the file id and sends the format and type in that request, then the backend retrieves the file id, composes the room id with format:type:file_id
, and returns the room id. We have that info in the front. We only need the file id. We cannot even retrieve the file id when we don't have the format and type. So I don't think sending that info just to compose the room id in the backend makes sense. I believe it is better to retrieve the file id and compose the room id on the client side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I'm thinking that maybe is better to use a get request since we only need to retrieve the file id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we are not setting anything with that request. We are only retrieving information. The one deciding to index the file if it's not indexed yet is the backend, not the frontend. The frontend is just asking for data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PUT /api/yjs/roomid/{path}
endpoint is for getting a room ID for a document path, format and type. What is returned is an implementation detail, it could also be a pure UUID. If it returns only the file ID, we cannot distinguish between a document e.g. as a plain file or as a notebook type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using WebSocket subprotocols for the Y-WebSocket? The client sends everything it supports, and the server checks them and accepts or closes the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The client will unlikely send
everything
it supports because it will need to come with a specific version ofjupyter_ydoc
(whatever the language) and any doc provider. And I don't see us going into the trouble of supporting multiple versions within the same package. So my guess is that this will be simplified to the client sends the "version" it supports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure supporting multiple versions is such an issue. The problem I see with the client sending only one supported version is that it could quickly lead to incompatible clients. But I'm not sure either this will be the case.
In any case, using WebSocket subprotocols leaves the door open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe using WebSocket subprotocols for the Y-WebSocket? The client sends everything it supports, and the server checks them and accepts or closes the connection.
I don't think using subprotocols is the solution. In my opinion, we should validate the requirements before opening a connection.
After investigating a little bit, I believe the only way we can clean and reset a client, it is by disposing the shared model (destroying the YDoc and YWebSocketProvider) and instantiating a new one. I don't think it is possible to remove the updates from a YDoc and reinitialize it again (It doesn't make sense either).
The document session requirements are the information we need before establishing the communication. Once clients start communicating, those requirements can not change. Even in the case of squashing updates in the Store, we can only do it when the room is empty. Otherwise, we will need to:
1 - communicate to every client that there was a squash in the document, then on each client
2 - we will need to close the connection
3 - dispose the resources
4 - instantiate a new shared document
5 - open a new connection
That's inconvenient from a user perspective. Imagine you are working on a document, and every X minutes, the document closes (or asks you to close it) and reopens.
I also thought using subprotocols was the best solution, but after debugging this issue and investigating a little, I don't think it is the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using subprotocols is the solution. In my opinion, we should validate the requirements before opening a connection.
With WebSocket subprotocols, the requirements are validated before opening the connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hbcarlos
In addition to the point raised by David, could you pass the ITranslator
object to the provider to the error message can be translated?
I guess you need to pass it through the IDrive
at
d8fab50
to
bef99bc
Compare
@hbcarlos do you need help to finish this PR? I think it's almost ready, as a quick fix to the duplication issue, and then we can iterate and improve on it. You just need to revert the changes to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hbcarlos
The new endpoints are better as they are based on meaning rather than technology and avoid the need to deal with regex order. Moreover as we need to backport this, this will allow to not break the client installation as for JLab 3.6.x, the user may end up calling the previous roomid not expecting a JSON answer.
There is naming convention that requires some changes but otherwise it is good to go.
@@ -59,5 +64,7 @@ def initialize_handlers(self): | |||
[ | |||
(r"/api/yjs/roomid/(.*)", YDocRoomIdHandler), | |||
(r"/api/yjs/(.*)", YDocWebSocketHandler), | |||
(r"/api/collaboration/room/(.*)", YDocWebSocketHandler), | |||
(r"/api/collaboration/session/(.*)", DocSessionHandler), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should grow the API surface.
I think we should use the already existing PUT /api/yjs/roomid/{path}
endpoint. It could return a value that is a combination of the file ID, the document type and format, and the session ID. That way, the frontend doesn't have to ask for a session ID, all it asks for is a room ID. When it tries to connect to the WebSocket with a room ID that doesn't exist (because the server session has changed), the room cannot be found and the frontend asks for the room ID again, discarding the shared document that it may have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I have to disagree. Returning a string with multiple attributes in it is not the proper solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By combination, I don't mean necessarily concatenating multiple attributes, it can be a UUID and the server keeps a mapping of this UUID to the attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with the opaque UUID is that on the frontend you will not be able to provide good error message to the user and follow-up action. In particular if the reason the websocket cannot be opened is the session id, we must tell the user to not use the current browser tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is easy for the frontend to know that the session has changed: it fails to connect with its room ID. It can now ask for the room ID again, if that fails it's because the document doesn't exist, otherwise it's because the server has restarted, and in that case it can tell the user to not use the tab.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implicit deduction is never a good path for code evolution, maintenance and understanding.
So to summarize, this PR (correct me if I'm wrong) introduces a new session ID on top of the room ID, and is a breaking change. |
It is not a breaking change. We did not change any API. We are adding a new endpoint to retrieve session information next to the room id.
Why is not needed anymore? |
But you are deprecating the old API, isn't that a breaking change?
Because we can do everything with the already existing room ID (see my comment). |
Ok let's move forward keeping a single opaque identifier. This will likely induce the need for storing a mapping between the identifier and the room + session information to make the bridge between the request of the single identifier and the request of the document connection. |
@hbcarlos mentioned the issue of transient rooms which don't require a room ID, so we cannot rely on room IDs to support sessions for them. I cannot think of a solution for them, that wouldn't involve a session ID, so let's got with explicit session ID and merge this PR. |
d83973b
to
7d9e5bf
Compare
Thanks, @fcollonval and @davidbrochart, for looking into this PR. I reverted my latest commits to come back to the session ID implementation. |
I just tested that PR on JupyterLab
@hbcarlos have you tried that scenario? |
It seems to be a
|
It seemed to be a |
Thanks @hbcarlos ! |
* Creates document session * Uses translator * Update jupyter_collaboration/handlers.py * Create a new endpoint for document sessions, changes tranlator property and deprecates old endpoints --------- Co-authored-by: David Brochart <[email protected]>
* Creates document session (#108) * Creates document session * Uses translator * Update jupyter_collaboration/handlers.py * Create a new endpoint for document sessions, changes tranlator property and deprecates old endpoints --------- Co-authored-by: David Brochart <[email protected]> * Deprecate endpoints * Fixes version, pre-commit, and runs pre-commit * Apply suggestions from code review * Update jupyter_server_ydoc/__init__.py --------- Co-authored-by: David Brochart <[email protected]> Co-authored-by: Frédéric Collonval <[email protected]>
Quick question - if we want to use this to mitigate the document corruption problems, do we need to wait for a 3.6.2 release or can we update a separate subpackage? I'm not sure whether any of this code goes into the core lab package or not, sorry! This is on the critical path for us to test RTC in teaching settings at Berkeley, so I'm trying to figure out what the order of operations is for us. Thanks for the great work! |
Ah, I just realized that 14128 may be the answer to my question, sorry for the noise. In that case I presume a 3.6.2 is coming soon with these fixes! 🎉 |
Yep, that is fixed and backported to lab 3.6 we just need to wait for the 3.6.2 release. |
Fixes: