-
-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop modelDB and use Yjs directly #253
Comments
Please take my comments with very limited value as I'm not deeply knowledgeable about this specific part of the codebase, and I may well be missing an important conterpoint. But a while back in Jupyter (before the name even existed, back when it was all IPython) we tried to really stick to a YAGNI principle, as we were bitten by premature and unnecessary generalizations. A clear example was our early adoption of a concept of "pages" in the notebook format without any UI to actually support it. Eventually there was an Emacs frontend that supported that feature, and it wasn't great when we decided to remove it and offer other simpler navigation functionality, as it meant deprecating something that (small, but still important) user community might have come to rely on. This seems like a perfect opportunity to apply that stricter principle and remove this otherwise unnecessary layer that is having current real costs for a supposed future benefit that we may never actually realize/need. |
Thanks @davidbrochart for opening the issue. As discussed in jupyterlab/jupyterlab#16481, one of the main challenges with this proposal will be the breaking changes it will introduce, and the need for a new major release of the related package(s). Posting my comment from jupyterlab/jupyterlab#16481 (comment) here for visibility:
If breaking changes can't be avoided, maybe there could be a way to improve the extension migration script to automatically perform the necessary changes on an extension code base. This would help extension authors migrate their code more easily with the less friction. Also, it would be useful to provide some examples of the kind of changes extension authors should expect, for example in a form of a before / after or diff. |
I very much support @jtpio. I am afraid that some folks my not appreciate how dire the situation is and how exhausted extension authors are with constant breaking changes, and users with the extensions not working correctly or at all in the supported versions. I am also afraid that this can increase the learning curve for developing extensions which interact with documents. Already with the previous migration changes were made without documenting how to do basic interactions with the documents, nor proper migration guidance: Edit: and there are outstanding issues that broke things in editors severely degrading the UX for me that have not been fixed since, like:
After this change, an extension author will need to learn one more thing which is the internals of yjs, and what yjs is in the first place. Right now they are not required to do so. I will likely be voting "No" but not because I oppose this change in principle and will oppose it always - I think that the advantages of the change do no outweigh the risks at this time, this can change if we come back to this in a years time for example. Quoting from jupyterlab/jupyterlab#16481 (comment)
In principle this could be solved by extending the ModelDB, right? We could have |
Maybe but it's not the only missing feature. |
A few years ago, we already had those discussions, and luckily we have today a ydoc package that proves that the abstraction approach makes sense. Other projects (SyncedStore...) outside of JupyterLab in the meantime have also successfully taken the abstraction approach, creating collaborative data structures, or distributed state management solutions, on top of Y.js. To be coherent, this proposal should be extended to the ydoc package. ALL the collaborative data structure of JupyterLab should be architectured the same way. Therefor, this proposal should be simply be closed and reopened covering both the
I was thinking to ask the same thing on the initial discussion but suddenly has been taken in this vote process... anyway, thank you @jtpio, this will help voters to make sense of the expected changes. Would it be possible to give 2 examples: a simple use case, and a complex use case? As additional information, could you also please list the impacted packages and identify the impact on the external jupyterlab repositories (including the non-official extensions), this will help voters.
@fperez this proposal is about removing the abstraction for the
@krassowski Like you and @jtpio I am much worried on the impacts for JupyterLab third-party consumers and extenders. You may be answered with a some crafted migration tooling (the complexity of the current tooling is btw the biggest complaints I hear), release and communication plans, by this proposal supporters. My strongest concern is however the proposed architecture. It looks to me that instead of making better an existing good and working thing, it it proposed to drop it and replace it with something weaker and bringing others and more issues, leading to non-coherent implementations (raw yjs for observables, abstraction layer for ydoc). I am voting -1. (to be clear, even if this proposal would cover both observables and ydoc, I would be voting -1 as favoring abstraction for that specific aspect). |
It only proves that we made it work with the modelDB abstraction, but as I said there are a number of issues with that.
Of course I would also drop modelDB in jupyter-ydoc.
You must have misunderstood me, jupyter-ydoc must be updated accordingly, as any other extension that depends on modelDB.
How is Yjs weaker than modelDB? What issues does it bring? |
@davidbrochart Can you clarify. Do you want to drop YNotebook as part of this proposal ? https://github.com/jupyter-server/jupyter_ydoc/blob/b3905b3aef1e6f821819e8ed120370145b0bed28/javascript/src/ynotebook.ts#L38-L40 |
Of course not, but |
Turning round again and again... the arguments you are using to justify the change you propose don't make sense to me and this change is just driving to bicephal implementations (raw yjs in some case, and abstraction in other case). I will further abstain on this thread and your replies to avoid cluttering the discussion. |
Sorry if I'm not clear enough. I just want to drop the modelDB abstraction, both in JupyterLab and jupyter-ydoc. No abstraction layer anymore, I hope that this is clear. |
We can actually close this one and jupyterlab/jupyterlab#16481 too. As it is solved by jupyterlab/jupyterlab#12695 + jupyterlab/jupyterlab#13168 + jupyterlab/jupyterlab#13247 for those in doubt we are still using
When refactoring the API to ease the integration of the shared models, Carlos and I after various discussions and reading of various issues and PRs decided to create ydoc to have a semantic API hiding YJs to ease the understanding of the developers. As any abstraction, it can definitely be improved and no solution are provided to deal with advanced features of YJs like transaction (although that should not be needed if the semantic API is well done) or the undo manager (that one can be tricky). The data duplication between a JupyterLab model and a shared model is still happening for the cell output model and the attachments (search for globalModelDBMutex). This was left as is by lack of time and to not block 4.0. If changing those parts (and the outputs is probably what you @davidbrochart have in mine) is the goal of this, then drafting a PR to see the extend of API breaking changes sound reasonable and not too expensive. Regarding using Yjs directly, I'm strongly against it because there are lots of subtle things about it and having a semantic layer as ydoc is much better for developers. Note ydoc does not depend on JupyterLab objects; hence not on the observables package. "dependencies": {
"@jupyterlab/nbformat": "^3.0.0 || ^4.0.0-alpha.21 || ^4.0.0",
"@lumino/coreutils": "^1.11.0 || ^2.0.0",
"@lumino/disposable": "^1.10.0 || ^2.0.0",
"@lumino/signaling": "^1.10.0 || ^2.0.0",
"y-protocols": "^1.0.5",
"yjs": "^13.5.40"
} |
When I say using Yjs directly, I don't mean getting rid of jupyter-ydoc's API, but not using the |
Overall, I am in favor of this proposed direction, but it sounds like there are still technical details to work through. However, I also agree with others comments about the need for us to be careful about breaking APIs. @davidbrochart could you update the proposal with more information about what parts of this work can be done in the 4.x series of releases (backwards compatible) and which parts would need to go into major releases (presumably 5.x). For example, can we switch to having Yjs be the single source of truth for all of our own code, but still offer a ModelDB compatibility layer for the rest of the 4.x series? I am generally aware of a number of other backward incompatible changes that are needed across different JupyterLab and Jupyter Server repos, so we can't avoid breaking APIs forever. It maybe helpful for planning purposes for us to begin mapping out upcoming breaking changes, so we can better assess the timing and tradeoffs. |
Is it the |
Only the |
Problem
JupyterLab has an abstraction layer called modelDB in the observables package, which currently serves as a proxy for the Yjs shared types. At the time real-time collaboration was being implemented in JupyterLab using Yjs, there was the question of whether to drop modelDB or keep it as an abstraction layer on top of Yjs, and it was decided to keep it so that another CRDT backend could be used, if needed. There are a number of issues with that:
All of this results in additional complexity, for really no benefit because JupyterLab already depends on jupyter-ydoc which depends on Yjs, so JupyterLab is not CRDT-agnostic (although I agree the CRDT-specific part is well contained in jupyter-ydoc).
Additionally, as Jupyter is moving towards server-side execution, new issues start to appear because the modelDB abstraction doesn't exist in the backend, where we directly use pycrdt (Python bindings to Yrs, the Rust port of Yjs). Therefore we cannot communicate at the modelDB level between the backend and the frontend.
Finally, the possibility of using another CRDT library in the future is getting smaller as Yjs is getting more and more popular. The only real alternative is Automerge that a team of JupyterLab developers evaluated and which was eventually discarded.
Proposed Solution
I propose to drop modelDB altogether from the JupyterLab code base, and directly use Yjs instead.
See jupyterlab/jupyterlab#16481 for more discussions.
@jupyterlab/jupyterlab-council Let's vote:
Jason Weill (@JasonWeill):
Afshin Darian (@afshin):
Alex Bozarth (@ajbozarth):
Andrii Ieroshenko (@andrii-i):
Steve Silvester (@blink1073):
Nicolas Brichet (@brichet):
Damián Avila (@damianavila):
Eric Charles (@echarles):
Brian Granger (@ellisonbg):
Eric Gentry (@ericsnekbytes):
Frederic Collonval (@fcollonval):
Fernando Perez (@fperez):
Gabriel Fouasnon (@gabalafou):
Isabela Presedo-Floyd (@isabela-pf):
Paul Ivanov (@ivanov):
Jason Grout (@jasongrout):
Jeremy Tuloup (@jtpio):
Michał Krassowski (@krassowski):
Martha Cryan (@marthacryan):
Mehmet Bektas (@mbektas):
R Ely (@ohrely):
Rosio Reyes (@RRosio):
Sylvain Corlay (@SylvainCorlay):
Zach Sailer (@Zsailer):
EDIT: I added @RRosio, @SylvainCorlay and @Zsailer to the list of voters, according to this list.
The text was updated successfully, but these errors were encountered: