Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several issues with getSemanticHTML not preserving html represented in editor #4289

Open
enzedonline opened this issue Jul 4, 2024 · 5 comments

Comments

@enzedonline
Copy link

enzedonline commented Jul 4, 2024

Quill documentation describes getSemanticHTML as:

Get the HTML representation of the editor contents. This method is useful for exporting the contents of the editor in a format that can be used in other applications.

It's should be a useable HTML representation of the editor contents. Critical requirement for using Quill as a form widget.

What happens is that the HTML is not preserved for syntax, video, formula blocks or check lists. Of those, only the syntax block is recoverable (by reapplying highlightjs on render), information necessary for video and formula are lost while check list requires some wrangling in javascript on render.

Syntax block:

The syntax highlighting markup is stripped out. The code block is instead just wrapped by a <pre> tag.

Editor:

<div class="ql-code-block-container" spellcheck="false">
    <select class="ql-ui" contenteditable="false">
        ....
    </select>
    <div class="ql-code-block" data-language="python"><span class="ql-token hljs-keyword">def</span> <span
            class="ql-token hljs-title">is_absolute_url</span>(<span class="ql-token hljs-params">url</span>):</div>
    <div class="ql-code-block" data-language="python"> parsed_url = urlparse(url)</div>
    <div class="ql-code-block" data-language="python"> <span class="ql-token hljs-keyword">return</span> <span
            class="ql-token hljs-built_in">bool</span>(parsed_url.scheme <span class="ql-token hljs-keyword">and</span>
        parsed_url.netloc)</div>
</div>

getSemanticHTML:

<pre data-language="python">
def is_absolute_url(url):
    parsed_url = urlparse(url)
    return bool(parsed_url.scheme and parsed_url.netloc)
</pre>

At the very least, this should be <pre><code class="language-${data-language-value}">...</code></pre> otherwise this is just rendered as plain text with whitespace preserved ... but why strip out the formatting? This means highlight.js needed to be reapplied on each render.

Video block

iframes inserted from Quill video block are stripped and replaced by a hyperlink.

Editor:

<iframe class="ql-video" frameborder="0" allowfullscreen="true" class="ql-iframe-align-right"
    height="270" width="542"
    src="https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0" 
></iframe>

getSemanticHTML:

<a href="https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0" target="_blank" rel="nofollow 
noopener">https://www.youtube.com/embed/2o0zV4VOQ54?showinfo=0</a>

Playground example.

The iframe needs to be preserved along with all attributes.

Formula block:

The katex markup is stripped out and replaced with a plain text span:

Editor:

<p>
    <span class="ql-formula" data-value="y=x^2">
        <span contenteditable="false">
            <span class="katex">
                <span class="katex-mathml">
                    <math xmlns="http://www.w3.org/1998/Math/MathML">
                        <semantics>
                            <mrow><mi>y</mi><mo>=</mo><msup><mi>x</mi><mn>2</mn></msup></mrow>
                            <annotation encoding="application/x-tex">y=x^2</annotation>
                        </semantics>
                    </math>
                </span>
                <span class="katex-html" aria-hidden="true">
                    <span class="base">
                        <span class="strut" style="height: 0.625em; vertical-align: -0.1944em;"></span>
                        <span class="mord mathnormal" style="margin-right: 0.0359em;">y</span>
                        <span class="mspace" style="margin-right: 0.2778em;"></span><span class="mrel">=</span>
                        <span class="mspace" style="margin-right: 0.2778em;"></span>
                    </span>
                    <span class="base">
                        <span class="strut" style="height: 0.8141em;"></span>
                        <span class="mord"><span class="mord mathnormal">x</span>
                        <span class="msupsub">
                            <span class="vlist-t">
                                <span class="vlist-r">
                                    <span class="vlist" style="height: 0.8141em;">
                                        <span class="" style="top: -3.063em; margin-right: 0.05em;">
                                            <span class="pstrut" style="height: 2.7em;"></span>
                                            <span class="sizing reset-size6 size3 mtight">
                                                <span class="mord mtight">2</span>
                                            </span>
                                        </span>
                                    </span>
                                </span>
                            </span>
                        </span>
                    </span>
                </span>
            </span>
        </span>
    </span>
</span> 
</p>

getSematicHTML:

<p>
    <span>y=x^2</span> 
</p>

katex markup should be preserved. At the very least, some identifier that this is a Quill formula block so that katex can be applied on render (this is not a favourable solution though).

Check lists

A single check list is converted to one unordered list per list item.

Editor:

<ol>
    <li data-list="unchecked"><span class="ql-ui" contenteditable="false"></span>one</li>
    <li data-list="checked"><span class="ql-ui" contenteditable="false"></span>two</li>
    <li data-list="unchecked"><span class="ql-ui" contenteditable="false"></span>three</li>
</ol>

getSemanticHTML:

<ul><li data-list="unchecked">one</li></ul>
<ul><li data-list="checked">two</li></ul>
<ul><li data-list="unchecked">three</li></ul>

This needs to be preserved as a single unordered list.

@raffaele-clevermind
Copy link

raffaele-clevermind commented Jul 12, 2024

I'm having problems too, when using text-align in the list format, in the semantic version the align is removed

Editor:

<ol>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>one</li>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>two</li>
  <li data-list="bullet" style="text-align: center;"><span class="ql-ui" contenteditable="false"></span>three</li>
</ol>

getSemanticHTML:

<ul>
  <li>one</li>
  <li>two</li>
  <li>three</li>
</ul>

This causes the list alignment to not be maintained if the HTML is exported to be used somewhere else

There is also a partial fix open at the moment
#4273

@medi6
Copy link
Contributor

medi6 commented Jul 26, 2024

Hi, you can, temporary, fix LI display using this less code.
But, you definitly loose alignment...

        padding-left: 21px;
        li {
            >ol, >ul {
                padding-left: 42px;
            }
            padding-left: 21px;
            list-style-type: none;            
            &:before {
                display: inline-block;
                margin-left: -21px;
                margin-right: 4px;
                text-align: right;
                white-space: nowrap;
                width: 17px;
                content:'\2022';
            }            
        }        
    }  
    ul {
        li {
            &:before {
                content:'\2022';
            }
        }
    }
    .ms-pub-body>ol {
        counter-reset: ol1;
        >li {
            counter-increment: ol1;
            &:before {
                content:counter(ol1, decimal) '. '
            }            
            >ol {
                counter-reset: ol2;
                >li {
                    counter-increment: ol2;
                    &:before {
                        content:counter(ol2, lower-alpha) '. ';
                        margin-right: 2px;
                        width: 19px;                        
                    }            
                    >ol {
                        counter-reset: ol3;
                        >li {
                            counter-increment: ol3;
                            &:before {
                                content:counter(ol3, lower-roman) '. ';
                                margin-right: 2px;
                                width: 19px;  
                            }    
                            >ol {
                                counter-reset: ol4;
                                >li {
                                    counter-increment: ol4;
                                    &:before {
                                        content:counter(ol4, decimal) '. '
                                    }            
                                    >ol {
                                        counter-reset: ol5;
                                        >li {
                                            counter-increment: ol5;
                                            &:before {
                                                content:counter(ol5, lower-alpha) '. '
                                            }            
                                            >ol {
                                                counter-reset: ol6;
                                                >li {
                                                    counter-increment: ol6;
                                                    &:before {
                                                        content:counter(ol6, lower-roman) '. ';
                                                        margin-right: 2px;
                                                        width: 19px;  
                                                    }      
                                                    >ol {
                                                        counter-reset: ol7;
                                                        >li {
                                                            counter-increment: ol7;
                                                            &:before {
                                                                content:counter(ol7, decimal) '. '
                                                            }    
                                                            >ol {
                                                                counter-reset: ol8;
                                                                >li {
                                                                    counter-increment: ol8;
                                                                    &:before {
                                                                        content:counter(ol8, lower-alpha) '. '
                                                                    }            
                                                                    >ol {
                                                                        counter-reset: ol9;
                                                                        >li {
                                                                            counter-increment: ol9;
                                                                            &:before {
                                                                                content:counter(ol9, lower-roman) '. ';
                                                                                margin-right: 2px;
                                                                                width: 19px;  
                                                                            }            
                                                                        }
                                                                    }                                                                        
                                                                }
                                                            }                                                                      
                                                        }
                                                    }                                                            
                                                }
                                            }                                                
                                        }
                                    }                                        
                                }
                            }                                    
                        }
                    }
                }
            }
        }
    }`

@markuso
Copy link

markuso commented Jul 30, 2024

I had the same issue with the Video block, which I managed to find out why and solved it locally without waiting for Quill to make adjustments to how they generate the html from the getSemanticHTML() call. It is worth fixing in core, for sure, but in many cases, we need certain things to work differently than the default anyway.

Below is what the current Video block class looks like in the Quill package at file the location quill/formats/video.js.

class Video extends BlockEmbed {
  static blotName = 'video';
  static className = 'ql-video';
  static tagName = 'IFRAME';
  static create(value) {
    ...
  }
  static formats(domNode) {
    ...
  }
  static sanitize(url) {
    ...
  }
  static value(domNode) {
    return domNode.getAttribute('src');
  }
  format(name, value) {
    ...
  }
  html() {
    const {
      video
    } = this.value();
    return `<a href="${video}">${video}</a>`;
  }
}

You will notice that the class above has an instance method of html() that just returns a hyperlink rather than the actual video iframe code block. It is only using the URL of the video to make a hyperlink when it converts it to semantic html.

To change this, I created my own class and named it VideoBlock (but you can name it anything) that extends the original Video class and used it in my setup rather than the original. Below is the simple override of the html() method, and you may not need to override anything else on that class, unless you want to.

class VideoBlock extends Video {
  html () {
    return this.domNode.outerHTML;
  }
}

The above return this.domNode.outerHTML; line is what will return the actual block's html code intact rather than changing it for a basic link for some strange reason. I normally try to make my own class blots, even for built-in ones, as I normally need to override something about the behavior. For example, at times I need to allow the style attribute to be used and not stripped out.

I hope this helps someone out there. I believe that the same thing may be applied to some of the other Quill blocks mentioned in this issue by @enzedonline.

@enzedonline
Copy link
Author

enzedonline commented Sep 22, 2024

The fixes I'm using for code-block and video in case anyone needs:

Code (also available via npm i quill-syntax-code-block-container-html)

const QuillCodeBlockContainer = Quill.import('formats/code-block-container') as any;

class CodeBlockContainer extends QuillCodeBlockContainer {
    html(index: number, length: number): string {
        // Quill returns <pre data-language="...">...</pre> - highlight js doesn't recognise this format
        // return html formatted for hljs : <pre><code class="language-...">...</code></pre>
        // wrap the innerHTML of the returned <pre> in a <code> tag
        // add the hljs language class to the code tag using the data-language value of the <pre> tag
        const markup: string = super.html(index, length);
        const tempDiv: HTMLElement = document.createElement('div');
        tempDiv.innerHTML = markup;
        const preTag: HTMLElement | null = tempDiv.querySelector('pre');
        if (preTag) {
            const language: string = preTag.getAttribute('data-language') || '';
            const codeTag: HTMLElement = document.createElement('code');
            if (!!language) {
                codeTag.className = `language-${language}`;
            }
            codeTag.innerHTML = preTag.innerHTML;
            preTag.innerHTML = '';
            preTag.removeAttribute('data-language');
            preTag.appendChild(codeTag);
            return preTag.outerHTML;
        }
        return markup; // fallback
    }
}

Quill.register('formats/code-block-container', CodeBlockContainer, true);

Video (also adds aspect-ratio and full width instead of the default postage-stamp):

const VideoEmbed = Quill.import("formats/video") as any;

class VideoResponsive extends VideoEmbed {
    static aspectRatio: string = "16 / 9 auto"
    static create(value: string) {
        const node = super.create(value);
        node.setAttribute('width', '100%');
        node.style.aspectRatio = this.aspectRatio;
        return node;
    }
    html () {
        return this.domNode.outerHTML;
    }
}

Quill.register(VideoResponsive, true);

@banders
Copy link

banders commented Nov 12, 2024

I am also experiencing the problem caused by inconsistent list formatting between the editor and what's returned by getSemanticHTML():

Here's an example in which one list containing both numbered items and bullets in the editor is treated as three lists in getSemanticHTML:

In the editor:

<ol>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number One</li>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number Two</li>
  <li data-list="bullet"><span class="ql-ui" contenteditable="false"></span>First Bullet</li>
  <li data-list="bullet"><span class="ql-ui" contenteditable="false"></span>Second Bullet</li>
  <li data-list="ordered"><span class="ql-ui" contenteditable="false"></span>Number Three</li>
</ol>

image

Returned by getSemanticHTML():

<ol>
  <li>Number One</li>
  <li>Number Two</li>
</ol>
<ul>
  <li>First Bullet</li>
  <li>Second Bullet</li>
</ul>
<ol>
  <li>Number Three</li>
</ol>

image

Ideally the editor and getSemanticHTML would be consistent so the final list item is labelled "3".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants