-
Notifications
You must be signed in to change notification settings - Fork 113
JsHtmlSanitizer
(legacy summary: How to use caja as a stand-alone client side sanitizer)
The Caja project includes a html-sanitizer written in javascript which can be used independently of the cajoler. You can use it to remove potentially executable javascript from a snippet of html. To use it, first build html-sanitizer-minified.js by running ant
.
Use a <script>
tag to include the resulting com/google/caja/plugin/html-sanitizer-minified.js
in your program. To sanitize a snippet of javascript, use the html_sanitize(htmlSnippet, urlTransformer, nameIdClassTransformer)
to sanitize your html snippet where:
-
htmlSnippet
is the snippet you want to sanitize -
urlTransformer
is a function which is called on every url inhtmlSnippet
.javascript:
urls are removed before being passed to the urlTransformer. The transformer allows you to whitelist urls or rewrite them. For example, you may only want to allow urls to a particular domain. -
nameIdClassTransformer
is a function which is called on every id, name and class inhtmlSnippet
The return value is the html snippet with all script and style tags removed, and urls, ids, names and classes rewritten according to the transformers.
The sanitizer removes style tags because they can include code which is interpreted as javascript on some browsers and because styles can affect the entire page, not just the snippet being sanitized. Style attributes can be safely contained if they are sanitized. If you'd like to sanitize style attributes (rather than style tags), you can include com/google/caja/plugin/html-css-sanitizer-minified.js
instead. This exposes exactly the same api as html_sanitize
but also allows sanitized css property names and values in style attributes and rewrites any urls in inline styles using the urlTransformer
.
If you need more control, you can use html.makeSaxParser
to create your own SAX style processor. makeSaxParser
takes as its argument, an object that contains event handlers like:
var mySaxParser = html.makeSaxParser(
{
startDoc: function (x) { /* called first before processing starts */ },
startTag: function (tagNameLowerCase, attribs, x) {
// called on start tags. may modify attribs.
},
endTag: function (tagName, x) {
// called on end tags.
},
pcdata: function (plainText, x) {
// plainText has entities replaced with the literal value.
},
rcdata: function (plainText, x) {
// contents of a TITLE, TEXTAREA, or similar tag.
},
cdata: function (plainText, x) {
// contents of a SCRIPT, STYLE, XMP, or similar tag.
},
endDoc: function (x) {
// called when processing finished.
}
});
After this call, mySaxParser
is a function that takes HTML text and an arbitrary value that will be passed as the parameter x
to the event handlers above.
<script src="html-sanitizer-minified.js"></script>
<script>
function urlX(url) { if(/^https?:\/\//.test(url)) { return url }}
function idX(id) { return id }
alert(html_sanitize('<b>hello</b><img src="http://asdf"><a href="javascript:alert(0)"><script src="http://dfd"><\/script>', urlX, idX))
</script>