Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitizing SVG tag contents against XXS/XXE #409

Open
ImrePyhvel opened this issue Nov 18, 2022 · 4 comments
Open

Sanitizing SVG tag contents against XXS/XXE #409

ImrePyhvel opened this issue Nov 18, 2022 · 4 comments

Comments

@ImrePyhvel
Copy link

ImrePyhvel commented Nov 18, 2022

SVG images could be embedded inside html and include XSS similar to html, ex:

<svg xmlns=“http://www.w3.org/1999/svg“>
    <script>alert(1)</script>
</svg>

.. and many other nuisances (ex, see: The Image that called me.pdf, slide 11. Then there are also XXE attacks via <!ENTITY> declarations ,ex "Billion Laughs". Just like by spec HTML, while some SVG content may be harmful, it can be a legitimate part of HTML content, if safe. In my case, a required feature.

I'd imagine HtmlSanitizer would be able to traverse and clean up the SVG tag content just as well as HTML, but would require a significantly different configuration regarding safe elements/attributes compared to the default conf. Could HtmlSanitizer be used for SVG? Are there any known blockers?

Since SVG contains a ton of safe tags and attributes, it would be cool to have (optional) by-SVG-standard configuration preset for SVG (standalone or inside HTML)? A shared implementation would be more secure compared to each rolling their own.

This seems to chime in with the talk on presets in #350 .

@tiesont
Copy link

tiesont commented Nov 18, 2022

Could HtmlSanitizer be used for SVG? Are there any known blockers?

I think the answer to both of those questions are: it depends on AngleSharp. AngleSharp does the actual markup parsing, so if it can handle XML that isn't XHTML, I'd imagine it wouldn't take much to wrangle it into parsing SVG.

In fact, it does look like AngleSharp supports SVG processing: AngleSharp/AngleSharp#886

So something like this example might work for you: https://github.com/mganss/HtmlSanitizer/wiki/Examples#ex3-replacing-the-default-formatter - this assumes you want parse the SVG separately from anything else.

@mganss
Copy link
Owner

mganss commented Nov 21, 2022

AFAICT this use case is already possible as discussed in #287.

@ImrePyhvel Do you think there currently is potential for malicious markup to slip through in the default configuration? Do you have examples?

Re presets: Are there known allow lists for SVG elements? How do other sanitizers handle this?

@ImrePyhvel
Copy link
Author

Yes, #287 hints the use case is possible, though the path of blindly trust the entire SVG tag content is not ok for me. Explicitly configuring allowedTags and allowedAttributes with XHtmlFormatter trick should be doable but requires enough knowledge about SVG and possible threats. Haven't had time yet to test blocking XXE and Billion-laughs-likes, though.

@mganss AFAIK the current default HtmlSanitizer configuration seems perfectly safe as it is stripping it out everything from svg root. this also prevents hopping into SVG XML-like behavior inside HTML.

I'm not too familiar with SVG or SVG-based attacks, which is why I'm reluctant to build an SVG whitelist myself and would prefer to find some community-verified whitelist. Also, I have a feeling the possible HTML-in-XML-in-HTML mess can cause all kinds of funky cases that a simple tag/attribute whitelist may require cutting big SVG features just to be on the safe side.

The key would be to get a comprehensive list of threat examples to sanitize against. Some things to consider:

Regarding existing tools and whitelists, I found surprisingly little. More promising leads:

@ImrePyhvel
Copy link
Author

ImrePyhvel commented Nov 22, 2022

Did some testing..

Testfile

<?xml version="1.0" encoding="UTF-8"?>
<!--comment -->
<!DOCTYPE text [
	<!ENTITY xxe SYSTEM "file:///etc/hostname">
	<!ENTITY % data SYSTEM "php://filter/convert.base64-encode/resource=/etc/hostname">
	<!ENTITY % param1 SYSTEM "ftp://example.org:2121/">
	<!ENTITY lol1 "laugh">
	<!ENTITY lol2 "&lol1;&lol1;&lol1;">
	<!ENTITY lol "&lol2;&lol2;&lol2;">
]>	
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:ev="http://www.w3.org/2001/xmlevents" onload="alert('loadSvg')">
	<g fill="white" stroke="green" stroke-width="5" onload="alert(0)">
		<circle cx="40" cy="40" r="25" />
		<circle cx="60" cy="60" r="25" />
	</g>

	<text font-size="16" x ="20" y="20">&lol;</text>
	<rect width="10" height="100">
		<animate xlink:href="alert(1)" attributeName="rx" values="0;5;0" dur="10s" repeatCount="indefinite"/>
		<set attributeName="onmouseover" to="alert(3)"/>
	</rect>
	<foreignObject xlink:href="alert(2)">
		<b> some html </b>
	</foreignObject>	
	<handler ev:event="load">alert(4)</handler>

	<script>
		alert('script')
	</script>

	<![CDATA[<SCRIPT>alert(5)</SCRIPT>]]>
	
	<style>
		svg { background-url : "javascript:alert(css)"; }
		<SCRIPT>alert(style)</SCRIPT>
	</style>
</svg>

I could not trigger the xlink:href and many other shown alerts as XSS, but retained them in sample file just in case. They should be removed to be on the safe side.

The code

Sanitizing code using whitelist from https://github.com/darylldoyle/svg-sanitizer:

var sanitizer = new HtmlSanitizer(
	allowedTags: AllowedSvgTags,
	allowedAttributes: AllowedSvgAttributes,
	allowedCssProperties: HtmlSanitizer.DefaultAllowedCssProperties,
	allowedSchemes: Enumerable.Empty<String>()
 );
var formatter = AngleSharp.Xhtml.XhtmlMarkupFormatter.Instance;
var sanitizedSvg = sanitizer.Sanitize(svg, outputFormatter: formatter);

The result

I removed manually a lot of empty whitespace left behind for brevity:

]&gt;
<svg xmlns="http://www.w3.org/2000/svg">
	<g fill="white" stroke="green" stroke-width="5">
		<circle cx="40" cy="40" r="25" />
		<circle cx="60" cy="60" r="25" />
	</g>

	<text font-size="16" x="20" y="20">&amp;lol;</text>
	<rect width="10" height="100">
	</rect>
	&lt;SCRIPT&gt;alert(5)&lt;/SCRIPT&gt;
</svg>

This works ok-ish, but it treats the input as html and hence:

  • strips the <?xml ... . Which is not a big deal but also not ideal.
  • does strip entire <!DOCTYPE ... >, including custom entities within, but keeps some ugly leftovers like ]&gt; in output.
  • leaves behind a lot of whitespace around removed elements - does not matter much.

I could remove <!doctype> cleanly using a preprocessing step, but that is not ideal as it causes to parse the file twice. For the sake of anti-XXE, I think it is ok to lose doctype entirely from SVG file, though it would be preferable to lose just <!Entity> elements. Could be improved if this starts to matter..

Also saw that Anglesharp.XML can also parse simpler SVG, but for some reason it choked with parser error when doctype was included. Anyway, it should not matter as there does not seem to be a way to reuse the already parsed ISvgDocument/IDocument for HtmlSanitizer.

To sum it up

it kinda works and checking with some real world samples some files will be hit by stripped features implemented as xml extensions in separate xmlns, including

  • metadata (rdf)
  • licensing (cc, copyright may also be in comments)
  • embedded raster images (ex: <image xlink:href="data:image/png;base64, ... "/>

Whitelisting some tags/attributes from xmlns extensions is trickier as xmlns could be set up with any alias name + it is difficult to assess if/which extension tags could be considered unsafe in their reader. The scope would become a mess easily.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants