-
Notifications
You must be signed in to change notification settings - Fork 568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
html: write image/svg+xml data as base64 and skip clean_html #2018
Conversation
I think with the fix discussed in #1934 this may also be fixed. |
@maartenbreddels I'm not entirely sure I understood your suggestion, but I just tried adding
But that seems to destroy the SVG completely in my test case. From a number of bug reports, this is known to work for fixing the SVG:
but it's just dropping the SVG data straight into the DOM so any javascript within the SVG can do whatever it wants (from what I understand). Or if its malformed it might break the page. To me it makes more sense to just treat SVG the same way PNG or JPG are handled and stuff them in as a base64 blob for the browser sandbox. This completely isolates the SVG content from any of the processing and eliminates the possibility of the filters messing with the SVG (as if the SVG were stored in an external resource). IMHO it's just easier to not even have to think about sanitizing the SVG. |
Yeah, you are right. It you compare it to normal images this makes more sense. Also less css issues possible! |
The failures (on 3.8 and newer) seem related to jupyter/nbformat#368 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks! I'll make a release early next week. I'm not worried about the Python 3.7 failure, #2008 removes support for 3.7
This change will prevent GitHub from updating to the newest version of NbConvert as base64 encoded images are a security issue. I am going to try to open a toggle on this so that this does not happen, we also do not use bleach since it's performance is not fast enough for our needs. |
Just in case anyone reads this thread in the future, this claim is exactly backwards. When SVG is inlined (which is what nbconvert does), base64 encoding prevents injection attacks because the browser treats the base64-coded SVG as a separate resource (as if it were in a separate file) and isolates the content. See Security Issue 6 (GHSL-2021-1018): https://securitylab.github.com/advisories/GHSL-2021-1013_1028-nbconvert/ The reason bleach was used was to try and patch the GHSL-2021-1018 injection vector, but bleach is an HTML sanitizer and was mangling SVG. So by encoding the SVG with base64 we plug the GHSL-2021-1018 injection vector by isolating the SVG from the DOM entirely instead. If after base64 encoding the SVG an injection vector remains, it's a browser vulnerability, not a nbconvert vulnerability. |
yes I apologize it is a browser vulnerability we are concerned about, and generally not because of the approach of converting svg's to base64 image but rather that we cannot allow any base64 images from a user and cannot discern the difference between a user's base64 image and a converted svg. |
GitHub renders images in notebooks right? (Just tested at https://gist.github.com/maartenbreddels/c1fd9045c276efef9a080df1d3e9e9ae which includes a base64 encoded image)? All images in notebooks are base64 encoded AFAIK (except for SVG apparently until this PR). |
Hmm, I will have to check with security to further dive into this, I do know that what ever was happening with the svg encoding related to this change was not escaping our cleaning scripts, which were chosen for specific security concerns around base64 images. |
Yes you are correct, we do display some base 64 images, the stance has been that we would not display any images with |
Maybe the issue is related to recent use of SVG itself https://blog.talosintelligence.com/html-smugglers-turn-to-svg-images/ The base64 encoding prevents the injection attack, but apparently it is also correct that SVG can contain scripts that do malicious things that are not injection attacks. To be clear: dropping that SVG directly in the DOM is even worse, but it is easier to scan and analyze if you want to enforce script-free SVG or have some way of analyzing the scripts. Script content is legitimate SVG, but I think what may be going on is that some sort of security scanner enforces SVG that has no scripts? The scanner can safely ignore base64 PNG etc because they can't contain scripts. So basically reading between the lines it sounds like GitHub turns off both the HTML sanitizing (which was destroying the SVG) and the base64 and then applies a different scanner on the output. If all they did was turn off the base64 and disable the HTML sanitizing it would be concerning, but it sounds like they layer on a different solution that checks HTML and script behaviors. So ultimately it's just swapping one solution for another. I don't have a problem with making a toggle available, I just didn't want people to get the wrong idea. |
Yes, we have our own cleaning process outside of NbConvert, and the cleaning being done in this pr was interfering with it. Thanks for understanding, sorry I need to be a bit cryptic. |
Fixes #1894 and #1863
The current HTML output template directly inlines SVG and attempts to sanitize it by passing it through the
clean_html
filter. Unfortunately, theclean_html
filter is not appropriate for SVG and leads to corruption. Work arounds have included simply removing clean_html, but removing the filter is claimed to have security implications.This patch instead passes the data through the
text_base64
andescape_html
filters to base64 encode the image and places it inside an<img src="data:image/svg+xml;base64...">
tag.May address Security / Issue 6: XSS in output data image/svg+xml cells