Several issues with getSemanticHTML not preserving html represented in editor #4289

enzedonline · 2024-07-04T02:24:05Z

Quill documentation describes getSemanticHTML as:

Get the HTML representation of the editor contents. This method is useful for exporting the contents of the editor in a format that can be used in other applications.

It's should be a useable HTML representation of the editor contents. Critical requirement for using Quill as a form widget.

What happens is that the HTML is not preserved for syntax, video, formula blocks or check lists. Of those, only the syntax block is recoverable (by reapplying highlightjs on render), information necessary for video and formula are lost while check list requires some wrangling in javascript on render.

Syntax block:

The syntax highlighting markup is stripped out. The code block is instead just wrapped by a <pre> tag.

Editor:

<div class="ql-code-block-container" spellcheck="false">
    <select class="ql-ui" contenteditable="false">
        ....
    </select>
    <div class="ql-code-block" data-language="python"><span class="ql-token hljs-keyword">def</span> <span
            class="ql-token hljs-title">is_absolute_url</span>(<span class="ql-token hljs-params">url</span>):</div>
    <div class="ql-code-block" data-language="python"> parsed_url = urlparse(url)</div>
    <div class="ql-code-block" data-language="python"> <span class="ql-token hljs-keyword">return</span> <span
            class="ql-token hljs-built_in">bool</span>(parsed_url.scheme <span class="ql-token hljs-keyword">and</span>
        parsed_url.netloc)</div>
</div>

getSemanticHTML:

<pre data-language="python">
def is_absolute_url(url):
    parsed_url = urlparse(url)
    return bool(parsed_url.scheme and parsed_url.netloc)
</pre>

At the very least, this should be <pre><code class="language-${data-language-value}">...</code></pre> otherwise this is just rendered as plain text with whitespace preserved ... but why strip out the formatting? This means highlight.js needed to be reapplied on each render.

Video block

iframes inserted from Quill video block are stripped and replaced by a hyperlink.

Editor:

<iframe class="ql-video" frameborder="0" allowfullscreen="true" class="ql-iframe-align-right"
    height="270" width="542"
    src="https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0" 
></iframe>

getSemanticHTML:

<a href="https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0" target="_blank" rel="nofollow 
noopener">https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0</a>

Playground example.

The iframe needs to be preserved along with all attributes.

Formula block:

The katex markup is stripped out and replaced with a plain text span:

Editor:

<p>
    <span class="ql-formula" data-value="y=x^2">
        <span contenteditable="false">
            <span class="katex">
                <span class="katex-mathml">
                    <math xmlns="http://www.w3.org/1998/Math/MathML">
                        <semantics>
                            <mrow><mi>y</mi><mo>=</mo><msup><mi>x</mi><mn>2</mn></msup></mrow>
                            <annotation encoding="application/x-tex">y=x^2</annotation>
                        </semantics>
                    </math>
                </span>
                <span class="katex-html" aria-hidden="true">
                    <span class="base">
                        <span class="strut" style="height: 0.625em; vertical-align: -0.1944em;"></span>
                        <span class="mord mathnormal" style="margin-right: 0.0359em;">y</span>
                        <span class="mspace" style="margin-right: 0.2778em;"></span><span class="mrel">=</span>
                        <span class="mspace" style="margin-right: 0.2778em;"></span>
                    </span>
                    <span class="base">
                        <span class="strut" style="height: 0.8141em;"></span>
                        <span class="mord"><span class="mord mathnormal">x</span>
                        <span class="msupsub">
                            <span class="vlist-t">
                                <span class="vlist-r">
                                    <span class="vlist" style="height: 0.8141em;">
                                        <span class="" style="top: -3.063em; margin-right: 0.05em;">
                                            <span class="pstrut" style="height: 2.7em;"></span>
                                            <span class="sizing reset-size6 size3 mtight">
                                                <span class="mord mtight">2</span>
                                            </span>
                                        </span>
                                    </span>
                                </span>
                            </span>
                        </span>
                    </span>
                </span>
            </span>
        </span>
    </span>
</span> 
</p>

getSematicHTML:

<p>
    <span>y=x^2</span> 
</p>

katex markup should be preserved. At the very least, some identifier that this is a Quill formula block so that katex can be applied on render (this is not a favourable solution though).

Check lists

A single check list is converted to one unordered list per list item.

Editor:

<ol>
    <li data-list="unchecked"><span class="ql-ui" contenteditable="false"></span>one</li>
    <li data-list="checked"><span class="ql-ui" contenteditable="false"></span>two</li>
    <li data-list="unchecked"><span class="ql-ui" contenteditable="false"></span>three</li>
</ol>

getSemanticHTML:

<ul><li data-list="unchecked">one</li></ul>
<ul><li data-list="checked">two</li></ul>
<ul><li data-list="unchecked">three</li></ul>

This needs to be preserved as a single unordered list.

The text was updated successfully, but these errors were encountered:

raffaele-clevermind · 2024-07-12T12:32:14Z

I'm having problems too, when using text-align in the list format, in the semantic version the align is removed

Editor:

<ol>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>one</li>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>two</li>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>three</li>
</ol>

getSemanticHTML:

<ul>
  <li>one</li>
  <li>two</li>
  <li>three</li>
</ul>

This causes the list alignment to not be maintained if the HTML is exported to be used somewhere else

There is also a partial fix open at the moment
#4273

medi6 · 2024-07-26T07:46:29Z

Hi, you can, temporary, fix LI display using this less code.
But, you definitly loose alignment...

        padding-left: 21px;
        li {
            >ol, >ul {
                padding-left: 42px;
            }
            padding-left: 21px;
            list-style-type: none;            
            &:before {
                display: inline-block;
                margin-left: -21px;
                margin-right: 4px;
                text-align: right;
                white-space: nowrap;
                width: 17px;
                content:'\2022';
            }            
        }        
    }  
    ul {
        li {
            &:before {
                content:'\2022';
            }
        }
    }
    .ms-pub-body>ol {
        counter-reset: ol1;
        >li {
            counter-increment: ol1;
            &:before {
                content:counter(ol1, decimal) '. '
            }            
            >ol {
                counter-reset: ol2;
                >li {
                    counter-increment: ol2;
                    &:before {
                        content:counter(ol2, lower-alpha) '. ';
                        margin-right: 2px;
                        width: 19px;                        
                    }            
                    >ol {
                        counter-reset: ol3;
                        >li {
                            counter-increment: ol3;
                            &:before {
                                content:counter(ol3, lower-roman) '. ';
                                margin-right: 2px;
                                width: 19px;  
                            }    
                            >ol {
                                counter-reset: ol4;
                                >li {
                                    counter-increment: ol4;
                                    &:before {
                                        content:counter(ol4, decimal) '. '
                                    }            
                                    >ol {
                                        counter-reset: ol5;
                                        >li {
                                            counter-increment: ol5;
                                            &:before {
                                                content:counter(ol5, lower-alpha) '. '
                                            }            
                                            >ol {
                                                counter-reset: ol6;
                                                >li {
                                                    counter-increment: ol6;
                                                    &:before {
                                                        content:counter(ol6, lower-roman) '. ';
                                                        margin-right: 2px;
                                                        width: 19px;  
                                                    }      
                                                    >ol {
                                                        counter-reset: ol7;
                                                        >li {
                                                            counter-increment: ol7;
                                                            &:before {
                                                                content:counter(ol7, decimal) '. '
                                                            }    
                                                            >ol {
                                                                counter-reset: ol8;
                                                                >li {
                                                                    counter-increment: ol8;
                                                                    &:before {
                                                                        content:counter(ol8, lower-alpha) '. '
                                                                    }            
                                                                    >ol {
                                                                        counter-reset: ol9;
                                                                        >li {
                                                                            counter-increment: ol9;
                                                                            &:before {
                                                                                content:counter(ol9, lower-roman) '. ';
                                                                                margin-right: 2px;
                                                                                width: 19px;  
                                                                            }            
                                                                        }
                                                                    }                                                                        
                                                                }
                                                            }                                                                      
                                                        }
                                                    }                                                            
                                                }
                                            }                                                
                                        }
                                    }                                        
                                }
                            }                                    
                        }
                    }
                }
            }
        }
    }`

markuso · 2024-07-30T22:11:08Z

I had the same issue with the Video block, which I managed to find out why and solved it locally without waiting for Quill to make adjustments to how they generate the html from the getSemanticHTML() call. It is worth fixing in core, for sure, but in many cases, we need certain things to work differently than the default anyway.

Below is what the current Video block class looks like in the Quill package at file the location quill/formats/video.js.

class Video extends BlockEmbed {
  static blotName = 'video';
  static className = 'ql-video';
  static tagName = 'IFRAME';
  static create(value) {
    ...
  }
  static formats(domNode) {
    ...
  }
  static sanitize(url) {
    ...
  }
  static value(domNode) {
    return domNode.getAttribute('src');
  }
  format(name, value) {
    ...
  }
  html() {
    const {
      video
    } = this.value();
    return `<a href="${video}">${video}</a>`;
  }
}

You will notice that the class above has an instance method of html() that just returns a hyperlink rather than the actual video iframe code block. It is only using the URL of the video to make a hyperlink when it converts it to semantic html.

To change this, I created my own class and named it VideoBlock (but you can name it anything) that extends the original Video class and used it in my setup rather than the original. Below is the simple override of the html() method, and you may not need to override anything else on that class, unless you want to.

class VideoBlock extends Video {
  html () {
    return this.domNode.outerHTML;
  }
}

The above return this.domNode.outerHTML; line is what will return the actual block's html code intact rather than changing it for a basic link for some strange reason. I normally try to make my own class blots, even for built-in ones, as I normally need to override something about the behavior. For example, at times I need to allow the style attribute to be used and not stripped out.

I hope this helps someone out there. I believe that the same thing may be applied to some of the other Quill blocks mentioned in this issue by @enzedonline.

enzedonline · 2024-09-22T05:15:52Z

The fixes I'm using for code-block and video in case anyone needs:

Code (also available via npm i quill-syntax-code-block-container-html)

const QuillCodeBlockContainer = Quill.import('formats/code-block-container') as any;

class CodeBlockContainer extends QuillCodeBlockContainer {
    html(index: number, length: number): string {
        // Quill returns <pre data-language="...">...</pre> - highlight js doesn't recognise this format
        // return html formatted for hljs : <pre><code class="language-...">...</code></pre>
        // wrap the innerHTML of the returned <pre> in a <code> tag
        // add the hljs language class to the code tag using the data-language value of the <pre> tag
        const markup: string = super.html(index, length);
        const tempDiv: HTMLElement = document.createElement('div');
        tempDiv.innerHTML = markup;
        const preTag: HTMLElement | null = tempDiv.querySelector('pre');
        if (preTag) {
            const language: string = preTag.getAttribute('data-language') || '';
            const codeTag: HTMLElement = document.createElement('code');
            if (!!language) {
                codeTag.className = `language-${language}`;
            }
            codeTag.innerHTML = preTag.innerHTML;
            preTag.innerHTML = '';
            preTag.removeAttribute('data-language');
            preTag.appendChild(codeTag);
            return preTag.outerHTML;
        }
        return markup; // fallback
    }
}

Quill.register('formats/code-block-container', CodeBlockContainer, true);

Video (also adds aspect-ratio and full width instead of the default postage-stamp):

const VideoEmbed = Quill.import("formats/video") as any;

class VideoResponsive extends VideoEmbed {
    static aspectRatio: string = "16 / 9 auto"
    static create(value: string) {
        const node = super.create(value);
        node.setAttribute('width', '100%');
        node.style.aspectRatio = this.aspectRatio;
        return node;
    }
    html () {
        return this.domNode.outerHTML;
    }
}

Quill.register(VideoResponsive, true);

banders · 2024-11-12T19:33:35Z

I am also experiencing the problem caused by inconsistent list formatting between the editor and what's returned by getSemanticHTML():

Here's an example in which one list containing both numbered items and bullets in the editor is treated as three lists in getSemanticHTML:

In the editor:

<ol>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number One</li>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number Two</li>
  <li data-list="bullet"><span class="ql-ui" contenteditable="false"></span>First Bullet</li>
  <li data-list="bullet"><span class="ql-ui" contenteditable="false"></span>Second Bullet</li>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number Three</li>
</ol>

Returned by getSemanticHTML():

<ol>
  <li>Number One</li>
  <li>Number Two</li>
</ol>
<ul>
  <li>First Bullet</li>
  <li>Second Bullet</li>
</ul>
<ol>
  <li>Number Three</li>
</ol>

Ideally the editor and getSemanticHTML would be consistent so the final list item is labelled "3".

KillerCodeMonkey mentioned this issue Jul 22, 2024

List bullet and ordered KillerCodeMonkey/ngx-quill#1896

Closed

BBboy01 mentioned this issue Aug 28, 2024

formula format content lost format info KillerCodeMonkey/ngx-quill#1925

Closed

enzedonline mentioned this issue Nov 16, 2024

getHTML/getSemanticHTML strips embed, replaces with hyperlink #4280

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Several issues with getSemanticHTML not preserving html represented in editor #4289

Several issues with getSemanticHTML not preserving html represented in editor #4289

enzedonline commented Jul 4, 2024 •

edited

Loading

raffaele-clevermind commented Jul 12, 2024 •

edited

Loading

medi6 commented Jul 26, 2024 •

edited

Loading

markuso commented Jul 30, 2024 •

edited

Loading

enzedonline commented Sep 22, 2024 •

edited

Loading

banders commented Nov 12, 2024 •

edited

Loading

Several issues with getSemanticHTML not preserving html represented in editor #4289

Several issues with getSemanticHTML not preserving html represented in editor #4289

Comments

enzedonline commented Jul 4, 2024 • edited Loading

raffaele-clevermind commented Jul 12, 2024 • edited Loading

medi6 commented Jul 26, 2024 • edited Loading

markuso commented Jul 30, 2024 • edited Loading

enzedonline commented Sep 22, 2024 • edited Loading

banders commented Nov 12, 2024 • edited Loading

enzedonline commented Jul 4, 2024 •

edited

Loading

raffaele-clevermind commented Jul 12, 2024 •

edited

Loading

medi6 commented Jul 26, 2024 •

edited

Loading

markuso commented Jul 30, 2024 •

edited

Loading

enzedonline commented Sep 22, 2024 •

edited

Loading

banders commented Nov 12, 2024 •

edited

Loading