Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: write image/svg+xml data as base64 and skip clean_html #2018

Merged
merged 5 commits into from
Jul 15, 2023

Conversation

jstorrs
Copy link
Contributor

@jstorrs jstorrs commented Jul 6, 2023

Fixes #1894 and #1863

The current HTML output template directly inlines SVG and attempts to sanitize it by passing it through the clean_html filter. Unfortunately, the clean_html filter is not appropriate for SVG and leads to corruption. Work arounds have included simply removing clean_html, but removing the filter is claimed to have security implications.

This patch instead passes the data through the text_base64 and escape_html filters to base64 encode the image and places it inside an <img src="data:image/svg+xml;base64..."> tag.

May address Security / Issue 6: XSS in output data image/svg+xml cells

@jstorrs jstorrs changed the title write SVG data as base64 instead of applying html_clean html: write image/svg+xml data as base64 and skip clean_html Jul 6, 2023
@maartenbreddels
Copy link
Collaborator

I think with the fix discussed in #1934 this may also be fixed.

@jstorrs
Copy link
Contributor Author

jstorrs commented Jul 6, 2023

@maartenbreddels I'm not entirely sure I understood your suggestion, but I just tried adding escape_html_script and updated the template this way:

❯ diff templates/lab/base.html.j2 templates/lab/base.html.j2.orig
167c167
< {{ output.data['image/svg+xml'] | escape_html_script }}
---
> {{ output.data['image/svg+xml'].encode("utf-8") | clean_html }}

But that seems to destroy the SVG completely in my test case. From a number of bug reports, this is known to work for fixing the SVG:

❯ diff templates/lab/base.html.j2 templates/lab/base.html.j2.orig
167c167
< {{ output.data['image/svg+xml'] }}
---
> {{ output.data['image/svg+xml'].encode("utf-8") | clean_html }}

but it's just dropping the SVG data straight into the DOM so any javascript within the SVG can do whatever it wants (from what I understand). Or if its malformed it might break the page.

To me it makes more sense to just treat SVG the same way PNG or JPG are handled and stuff them in as a base64 blob for the browser sandbox. This completely isolates the SVG content from any of the processing and eliminates the possibility of the filters messing with the SVG (as if the SVG were stored in an external resource).

IMHO it's just easier to not even have to think about sanitizing the SVG.

@maartenbreddels
Copy link
Collaborator

Yeah, you are right. It you compare it to normal images this makes more sense. Also less css issues possible!

@jstorrs
Copy link
Contributor Author

jstorrs commented Jul 6, 2023

The failures (on 3.8 and newer) seem related to jupyter/nbformat#368

@blink1073 blink1073 added the bug label Jul 15, 2023
Copy link
Contributor

@blink1073 blink1073 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks! I'll make a release early next week. I'm not worried about the Python 3.7 failure, #2008 removes support for 3.7

@gwincr11
Copy link
Contributor

gwincr11 commented Feb 7, 2024

This change will prevent GitHub from updating to the newest version of NbConvert as base64 encoded images are a security issue. I am going to try to open a toggle on this so that this does not happen, we also do not use bleach since it's performance is not fast enough for our needs.

@jstorrs
Copy link
Contributor Author

jstorrs commented Feb 13, 2024

base64 encoded images are a security issue

Just in case anyone reads this thread in the future, this claim is exactly backwards. When SVG is inlined (which is what nbconvert does), base64 encoding prevents injection attacks because the browser treats the base64-coded SVG as a separate resource (as if it were in a separate file) and isolates the content. See Security Issue 6 (GHSL-2021-1018):

https://securitylab.github.com/advisories/GHSL-2021-1013_1028-nbconvert/

The reason bleach was used was to try and patch the GHSL-2021-1018 injection vector, but bleach is an HTML sanitizer and was mangling SVG. So by encoding the SVG with base64 we plug the GHSL-2021-1018 injection vector by isolating the SVG from the DOM entirely instead. If after base64 encoding the SVG an injection vector remains, it's a browser vulnerability, not a nbconvert vulnerability.

@gwincr11
Copy link
Contributor

yes I apologize it is a browser vulnerability we are concerned about, and generally not because of the approach of converting svg's to base64 image but rather that we cannot allow any base64 images from a user and cannot discern the difference between a user's base64 image and a converted svg.

@maartenbreddels
Copy link
Collaborator

GitHub renders images in notebooks right? (Just tested at https://gist.github.com/maartenbreddels/c1fd9045c276efef9a080df1d3e9e9ae which includes a base64 encoded image)? All images in notebooks are base64 encoded AFAIK (except for SVG apparently until this PR).

@gwincr11
Copy link
Contributor

Hmm, I will have to check with security to further dive into this, I do know that what ever was happening with the svg encoding related to this change was not escaping our cleaning scripts, which were chosen for specific security concerns around base64 images.

@gwincr11
Copy link
Contributor

Yes you are correct, we do display some base 64 images, the stance has been that we would not display any images with data:image/svg+xml; as this could in fact contain an inlined script. I am revisiting this with security and will check back in with our findings, in the mean time the patch I submitted has allowed us to upgrade, so thanks for getting that in.

@jstorrs
Copy link
Contributor Author

jstorrs commented Feb 15, 2024

Maybe the issue is related to recent use of SVG itself

https://blog.talosintelligence.com/html-smugglers-turn-to-svg-images/

The base64 encoding prevents the injection attack, but apparently it is also correct that SVG can contain scripts that do malicious things that are not injection attacks.

To be clear: dropping that SVG directly in the DOM is even worse, but it is easier to scan and analyze if you want to enforce script-free SVG or have some way of analyzing the scripts.

Script content is legitimate SVG, but I think what may be going on is that some sort of security scanner enforces SVG that has no scripts? The scanner can safely ignore base64 PNG etc because they can't contain scripts.

So basically reading between the lines it sounds like GitHub turns off both the HTML sanitizing (which was destroying the SVG) and the base64 and then applies a different scanner on the output. If all they did was turn off the base64 and disable the HTML sanitizing it would be concerning, but it sounds like they layer on a different solution that checks HTML and script behaviors. So ultimately it's just swapping one solution for another.

I don't have a problem with making a toggle available, I just didn't want people to get the wrong idea.

@gwincr11
Copy link
Contributor

So basically reading between the lines it sounds like GitHub turns off both the HTML sanitizing (which was destroying the SVG) and the base64 and then applies a different scanner on the output. If all they did was turn off the base64 and disable the HTML sanitizing it would be concerning, but it sounds like they layer on a different solution that checks HTML and script behaviors. So ultimately it's just swapping one solution for another.

Yes, we have our own cleaning process outside of NbConvert, and the cleaning being done in this pr was interfering with it. Thanks for understanding, sorry I need to be a bit cryptic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SVG cleaning should not use HTML cleaner
4 participants