SVG cleaning should not use HTML cleaner #1894

akx · 2022-10-28T05:17:36Z

The templates for actual image/svg+xml cells (e.g.

nbconvert/share/templates/lab/base.html.j2

Line 166 in a806744

{{ output.data['image/svg+xml'].encode("utf-8") | clean_html }}

) should use an XML-savvy cleaner since a image/svg+xml is not HTML to begin with.

Originally posted by @akx in #1890 (comment)

(I just wanted to make an issue out of this so it's searchable...)

The text was updated successfully, but these errors were encountered:

hadim · 2022-10-28T18:22:59Z

I confirm that | clean_html render corrupted SVG (black background, deformed shapes, etc). When removing it the SVG is rendered correctly.

It happens when generating HTML MKDocs page with mkdocs-jupyter on SVG rendered with data and rdkit.

desilinguist · 2022-11-01T19:28:53Z

I am also seeing the same issue in my nbconvert-ed HTML pages. See #1849 (comment)

therzka · 2022-11-15T20:27:23Z

👋 Myself and a colleague have been looking into this as well. It looks like despite this PR although the style attribute is allowed on valid SVG elements, SVGs still don't render because the value of the style attribute is stripped.

Our current workaround is to revert back to the old sanitization behavior and define our own lxml.Cleaner() to override default_filters["clean_html"]:

cleaner = Cleaner(
    style=True,
    scripts=True,
    inline_style=False,
    safe_attrs_only=False,
    remove_unknown_tags=False
)
default_filters["clean_html"] = cleaner.clean_html

stefmolin · 2022-11-20T23:29:26Z

I'll also share my hack while we wait for a fix. Change {{ output.data['image/svg+xml'].encode("utf-8") | clean_html }} to just {{ output.data['image/svg+xml'] }} in share/jupyter/nbconvert/templates/lab/base.html.j2 (this will be in your virtual environment).

I'm curious why there is an option to turn off HTML cleaning, but it doesn't apply to the SVG output (like line 173).

desilinguist · 2022-12-01T20:07:28Z

I use nbconvert programmatically and so modifying the base.html.j2 template, while effective, wasn't really an ideal solution for me. Here's the fix that worked for me. Basically, the goal is to turn the offending clean_html into a noop. To achieve this, I did:

from nbconvert.exporters import HTMLExporter
from nbconvert.exporters.templateexporter import default_filters

def convert_to_html(notebook_file, html_file):

    def custom_clean_html(element):
        return element.decode() if isinstance(element, bytes) else str(element)

    default_filters["clean_html"] = custom_clean_html

    exportHtml = HTMLExporter()
    output, _ = exportHtml.from_filename(notebook_file)
    open(html_file, mode="w", encoding="utf-8").write(output)

This works for me with nbconvert=7.2.5 and my SVG figures are now all back! 🎉

hadim · 2022-12-01T20:51:36Z

Thanks for sharing this fix.

For people like me using it trough tan external library (to generate a documentation in my case with mkdocs-jupyter), we will need that to be fixed in nbconvert directly.

carlosefr · 2022-12-21T12:17:23Z

This is also mentioned in #1863.

josephmcasey · 2023-03-01T05:31:46Z

Related to mkdocs-jupyter a consumer of nbconvert:
Thanks for writing that workaround @desilinguist . @hadim , I opened that leverages this solution for this mkdocs plugin, and it looks fully functional with the existing test suite. If you think you have a particularly complex SVG to render then it would be great to add your example to the repository.

Related to nbconvert:
As for this particular break, I think the obvious solution that would be suggested is that this library abide by the W3 specifications for SVG, so I will try to pose a question I think might be slightly more original and reduce the overall burden of overhead of library maintenance.

If this solution is not too lazy as to denigrate the character of the changeset author, would the maintainers find it an acceptable solution to introduce a github action that renders the html produced on a headless browser like chrome as a form of validation testing? I ask because it seems like that would quickly outsource to Google and consumers of the library most of the complexity that would come with writing unit tests that cover the entire spectrum of W3 Standards. When a break is found by a consumer, grab their notebook and add it to the array of validation tests.

details at jupyter/nbconvert#1894

yksantaro · 2023-05-03T17:31:34Z

I have the same problem on <xlink:href> SVG attribute. (Ex. axes labels in matplotlib objects)
I think that inline SVG image data can be bypassed rather than processed by clean_html() because the most images will probably be displayed in JupyterLab window and will be exported and displayed correctly in HTML without any cleaning.

So I added bypass_svg() function for strings.py (in lib/python3.11/site-packages/nbconvert/filters) and related files: templateexporter.py (in lib/python3.11/site-packages/nbconvert/exporters) and base.html.j2 (in share/jupyter/nbconvert/templates/lab) as attached patches below.

templateexporter.py.patch
base.html.j2.patch
strings.py.patch

This work around is so fine for me.

jstorrs · 2023-07-06T15:29:19Z

I don't know if this is correct, but while researching methods to sanitize SVG this site was mentioned on security.stackexchange.com which suggests that SVG loaded via img tags won't execute scripts. ~~Since the SVG is already inside an img tag [Edit: incorrect, misread the template]~~ Could we just put it in an img tag and base64 pack the SVG content?

Edit: this fixed nbconvert (7.6.0) html and webpdf output for me when using matplotlib with SVG output:

$ cd share/jupyter/nbconvert/templates/lab/
$ diff base.html.j2 base.html.j2.orig
167c167
< <img src="data:image/svg+xml;base64,{{ output.data['image/svg+xml'] | text_base64 | escape_html }}">
---
> {{ output.data['image/svg+xml'].encode("utf-8") | clean_html }}

manzt mentioned this issue Nov 1, 2022

Widget with "<" or ">" characters in Unicode state embeds "&lt" and "&gt" in emitted HTML #1900

Closed

Carreau mentioned this issue Nov 1, 2022

Post #1015 task. jupyter/nbviewer#1025

Open

desilinguist mentioned this issue Dec 1, 2022

No plots in report with nbconvert 7.0 EducationalTestingService/rsmtool#571

Closed

desilinguist mentioned this issue Dec 2, 2022

Incorrect conversion of matplotlib SVG plots #1849

Closed

josephmcasey pushed a commit to josephmcasey/mkdocs-jupyter that referenced this issue Mar 1, 2023

fix(nbconvert2): monkey patch partially functional clean_html

a457bf0

details at jupyter/nbconvert#1894

josephmcasey mentioned this issue Mar 1, 2023

dep(nbconvert): major version upgrade danielfrg/mkdocs-jupyter#127

Merged

jstorrs mentioned this issue Jul 6, 2023

html: write image/svg+xml data as base64 and skip clean_html #2018

Merged

blink1073 closed this as completed in #2018 Jul 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SVG cleaning should not use HTML cleaner #1894

SVG cleaning should not use HTML cleaner #1894

akx commented Oct 28, 2022 •

edited

Loading

hadim commented Oct 28, 2022

desilinguist commented Nov 1, 2022

therzka commented Nov 15, 2022

stefmolin commented Nov 20, 2022 •

edited

Loading

desilinguist commented Dec 1, 2022 •

edited

Loading

hadim commented Dec 1, 2022

carlosefr commented Dec 21, 2022

josephmcasey commented Mar 1, 2023

yksantaro commented May 3, 2023

jstorrs commented Jul 6, 2023 •

edited

Loading

SVG cleaning should not use HTML cleaner #1894

SVG cleaning should not use HTML cleaner #1894

Comments

akx commented Oct 28, 2022 • edited Loading

hadim commented Oct 28, 2022

desilinguist commented Nov 1, 2022

therzka commented Nov 15, 2022

stefmolin commented Nov 20, 2022 • edited Loading

desilinguist commented Dec 1, 2022 • edited Loading

hadim commented Dec 1, 2022

carlosefr commented Dec 21, 2022

josephmcasey commented Mar 1, 2023

yksantaro commented May 3, 2023

jstorrs commented Jul 6, 2023 • edited Loading

akx commented Oct 28, 2022 •

edited

Loading

stefmolin commented Nov 20, 2022 •

edited

Loading

desilinguist commented Dec 1, 2022 •

edited

Loading

jstorrs commented Jul 6, 2023 •

edited

Loading