Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Less data loss in external HTML #1605

Open
wants to merge 6 commits into
base: unit-tests-setup
Choose a base branch
from

Conversation

matthewlipski
Copy link
Collaborator

@matthewlipski matthewlipski commented Apr 11, 2025

Converting to lossy HTML and back

This PR re-adds data-* attributes to external HTML, so that you can export blocks to HTML, then re-import them, and only lose block nesting (which is still preserved for list items). In addition, some bugs have been fixed which also prevented props from being parsed properly from external HTML.

I've also added a bunch of test cases for this, which you can find in

tests/src/unit/formatConversion/exportParseEquality/exportParseEqualityTestInstances.ts

If we decide we don't want to re-add the data-* attributes in order to not clutter the external HTML, we can revert the changes in serializeBlocksExternalHTML (and the snapshot changes caused by those changes), and the tests + other fixes will still be worth keeping. Just note that some of the new tests will now fail since we won't be able to read props like text/background color.

Also, the data-* attributes are not the "ideal" solution (hence why we may not want to keep that change in this PR). Ideally, we would implement toExternalHTML for inline content and styles (this is the rabbit hole I went down last week), so that we can serialize things we would normally just put in data-* attributes to a more native HTML equivalent (e.g. style: color: red; for text color). However, this is a lot of work to implement, so better to leave it for the future.

Importing external HTML

This PR also adds additional parsing cases for the following:

  • Text color props from inline color styles
  • Text color marks from spans with inline color styles
  • Background color props from inline background-color styles
  • Background color marks from spans with inline background-color styles
  • Text alignment props from inline text-alignment styles
  • Numbered list item start index from start attribute

Test cases have also been added for these, along with tests for parsing other styles in various formats. For example, the tests now make sure that bold marks are read from font-weight: bold inline styles as well as b and strong tags. You can find these tests in

tests/src/unit/formatConversion/parse/parseTestInstances.ts

This is for cases when markup data is stored in HTML-native tags/attributes/inline styles that aren't used in BlockNote's internal HTML.

Unsolved issues

Block prop parsing

Text color, background color, and text alignment props are still not parsed from external HTML properly. This is because the parsing for these is done in TipTap extensions, which the DOMParser doesn't see.

To fix this, we need to get rid of the extensions and add the TipTap attributes from these nodes to the necessary nodes in their node specs. I was making quite some progress with this, but ran into issues adding color props to tableCell nodes.

List item nesting

Converting nested list items to lossy HTML and back does not preserve nesting. This is a bigger issue, as it stems from us needing to convert single elements into both blockContainer and blockContent nodes. This is because data is split across these 2 nodes (e.g. the blockContainer stores the block ID while the blockContent stores text alignment), but gets merged into a single element when serializing to external HTML.

I managed to get parsing a single element into both nodes working by adding consuming: false to the blockContainer parse rule, and making it parse any element with the data-node-type="blockContainer" or data-node-type="blockOuter" attributes. This works fine for other elements like ps and h1s, but lielements can be nested, and making them get parsed asblockContainer`s breaks the nesting.

It might be possible to get around this by storing the blockContainer attributes on the li element and the blockContent attributes on the p element within it when serializing the blocks. However, I didn't have that much time to play around with it.

Copy link

vercel bot commented Apr 11, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
blocknote ✅ Ready (Inspect) Visit Preview Apr 12, 2025 0:08am
blocknote-website ✅ Ready (Inspect) Visit Preview Apr 12, 2025 0:08am

Copy link

pkg-pr-new bot commented Apr 11, 2025

Open in StackBlitz

@blocknote/ariakit

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/ariakit@1605

@blocknote/code-block

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/code-block@1605

@blocknote/core

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/core@1605

@blocknote/mantine

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/mantine@1605

@blocknote/react

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/react@1605

@blocknote/server-util

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/server-util@1605

@blocknote/shadcn

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/shadcn@1605

@blocknote/xl-docx-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-docx-exporter@1605

@blocknote/xl-multi-column

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-multi-column@1605

@blocknote/xl-odt-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-odt-exporter@1605

@blocknote/xl-pdf-exporter

npm i https://pkg.pr.new/TypeCellOS/BlockNote/@blocknote/xl-pdf-exporter@1605

commit: e613254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant