Lightweight schema-based XML data extraction to plain objects (JSON)
To read the data from the following XML
<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
<channel>
<title>MRSS Title</title>
<item>
<title>Item 1</title>
<media:content url="http://example.com/1.mp4" medium="video"/>
<media:comment>Item 1 Comment</media:comment>
</item>
<item>
<title>Item 2</title>
<media:content url="http://example.com/2.mp4" medium="video"/>
<media:comment>Item 2 Comment</media:comment>
</item>
</channel>
</rss>
Data extraction schema:
import { attribute, content, string, readXML, prefixNamespace } from "xmldom-js";
var rssSchema = {
rss: {
channel: {
title: content(string),
item: [{
title: content(string),
"media:content": { url: attribute() },
"media:comment": content(string)
}]
}
}
};
First xml is parsed using the browser DOMParser and then read into plain JS objects using the schema:
var rssXml = new DOMParser().parseFromString(rssString, "text/xml");
var rssJSON = readXML(rssXml, rssSchema, prefixNamespace);
The resulting JSON:
{
"rss": {
"channel": {
"title": "MRSS Title",
"item": [
{
"title": "Item 1",
"media:content": {
"url": "http://example.com/1.mp4"
},
"media:comment": "Item 1 Comment"
},
{
"title": "Item 2",
"media:content": {
"url": "http://example.com/2.mp4"
},
"media:comment": "Item 2 Comment"
}
]
}
}
}
A few modes for handling of the namespace are available:
- ignoreNamespace - namespace and prefixes are ignored
- prefixNamespace - use namespace prefixes as is ignoring namespaceURI. Should not be used if XML is provided by third-party as prefixes are not supposed to be stable, but quite safe for in-house XMLs.
- namespaces(defaultURI, { prefix: URI }) - provide map of supported namespace with URIs.
Npm compatible packager (webpack) is required. CommonJS and ES6 modules are provided, transpiled to es5.