Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not really a feature request after all, more of a I'm stupid and realized XMLBuilder2 is awesome, doing exactly what I thought it didn't.~~question: How to convert existing JSON to XML?~~ ~~Feature Request: Compatibility with xq/yq~~ #129

Closed
jasonkhanlar opened this issue Apr 5, 2022 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@jasonkhanlar
Copy link

Is your feature request related to a problem? Please describe.
I am using command line xq to convert XML file to JSON, and I would like to convert that JSON back to exact same XML using Node.js because yq -x returns "yq: Error running jq: ReaderError: unacceptable character #x0081: control characters are not allowed" in some cases working with data that should be preserved, if possible. The JavaScript object structure provided by xq is preferable over the layout generated by xmlbuilder2, or at least I'd like to not have manipulate the JavaScript object generated by xq to conform to xmlbuilder2 standards just output it back to the exact same XML that was initially processed (for my initial attempt to ensure smooth operation)

Describe the solution you'd like
I would like to use existing JavaScript object structure generated by existing XML<->JS Linux software applications to be easily manageable with Node.js swap-in replacements.

Describe alternatives you've considered
I can't keep track of the various different Node.js scripts I've encountered throughout the last month or two, but I haven't found an ideal Node.js solution to convert XML to JSON and JSON back to XML, and provide seamless functionality, especially preserving data without manipulation/modification (all sorts of data from Wikipedia XML dumps for Templates and Modules and other namespaces data). Some of the libraries didn't recognize the @ system in attribute names, or otherwise organize attributes into JS object in hierarchical locations that are not standard across other libraries, making interoperability to work with converting back and forth difficult and nonseamless.

Additional context

Also brief example:

Input XML (e.g. from Wikipedia export @ https://en.wikipedia.org/wiki/Special:Export):

<mediawiki xmlns="http://www.mediawiki.org/xml/export-0.10/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd" version="0.10" xml:lang="en">
  <siteinfo>
    <sitename>Wikipedia</sitename>
    <dbname>enwiki</dbname>
    <base>https://en.wikipedia.org/wiki/Main_Page</base>
    <generator>MediaWiki 9000</generator>
    <case>first-letter</case>
    <namespaces>
      <namespace key="-2" case="first-letter">Media</namespace>
      <namespace key="-1" case="first-letter">Special</namespace>
      <namespace key="0" case="first-letter" />
      <namespace key="1" case="first-letter">Talk</namespace>
      <namespace key="2" case="first-letter">User</namespace>
      <namespace key="3" case="first-letter">User talk</namespace>
      <namespace key="4" case="first-letter">Wikipedia</namespace>
      <namespace key="5" case="first-letter">Wikipedia talk</namespace>
      <namespace key="6" case="first-letter">File</namespace>
      <namespace key="7" case="first-letter">File talk</namespace>
      <namespace key="8" case="first-letter">MediaWiki</namespace>
      <namespace key="9" case="first-letter">MediaWiki talk</namespace>
      <namespace key="10" case="first-letter">Template</namespace>
      <namespace key="11" case="first-letter">Template talk</namespace>
      <namespace key="12" case="first-letter">Help</namespace>
      <namespace key="13" case="first-letter">Help talk</namespace>
      <namespace key="14" case="first-letter">Category</namespace>
      <namespace key="15" case="first-letter">Category talk</namespace>
      <namespace key="100" case="first-letter">Portal</namespace>
      <namespace key="101" case="first-letter">Portal talk</namespace>
      <namespace key="118" case="first-letter">Draft</namespace>
      <namespace key="119" case="first-letter">Draft talk</namespace>
      <namespace key="710" case="first-letter">TimedText</namespace>
      <namespace key="711" case="first-letter">TimedText talk</namespace>
      <namespace key="828" case="first-letter">Module</namespace>
      <namespace key="829" case="first-letter">Module talk</namespace>
      <namespace key="2300" case="first-letter">Gadget</namespace>
      <namespace key="2301" case="first-letter">Gadget talk</namespace>
      <namespace key="2302" case="case-sensitive">Gadget definition</namespace>
      <namespace key="2303" case="case-sensitive">Gadget definition talk</namespace>
    </namespaces>
  </siteinfo>
  <page>
    <title>Template:Template page name here</title>
    <ns>10</ns>
    <id>1</id>
    <revision>
      <id>1</id>
      <timestamp>1970-01-01T00:00:00Z</timestamp>
      <contributor>
        <username>User</username>
        <id>1</id>
      </contributor>
      <comment>/doc</comment>
      <model>wikitext</model>
      <format>text/x-wiki</format>
      <text bytes="2" xml:space="preserve">Hi</text>
      <sha1>94dd9e08c129c785f7f256e82fbe0a30e6d1ae40</sha1>
    </revision>
  </page>
</mediawiki>

Generated JavaScript Object (JSON) from cat "${file}"|xq:

 {
  mediawiki: {
    '@xmlns': 'http://www.mediawiki.org/xml/export-0.10/',
    '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
    '@xsi:schemaLocation': 'http://www.mediawiki.org/xml/export-0.10/ http://www.mediawiki.org/xml/export-0.10.xsd',
    '@version': '0.10',
    '@xml:lang': 'en',
    siteinfo: {
      sitename: 'Wikipedia',
      dbname: 'enwiki',
      base: 'https://en.wikipedia.org/wiki/Main_Page',
      generator: 'MediaWiki 9000',
      case: 'first-letter',
      namespaces: {
        namespace: [
          { '@key': '-2', '@case': 'first-letter', '#text': 'Media' },
          { '@key': '-1', '@case': 'first-letter', '#text': 'Special' },
          { '@key': '0', '@case': 'first-letter' },
          { '@key': '1', '@case': 'first-letter', '#text': 'Talk' },
          { '@key': '2', '@case': 'first-letter', '#text': 'User' },
          { '@key': '3', '@case': 'first-letter', '#text': 'User talk' },
          { '@key': '4', '@case': 'first-letter', '#text': 'Wikipedia' },
          { '@key': '5', '@case': 'first-letter', '#text': 'Wikipedia talk' },
          { '@key': '6', '@case': 'first-letter', '#text': 'File' },
          { '@key': '7', '@case': 'first-letter', '#text': 'File talk' },
          { '@key': '8', '@case': 'first-letter', '#text': 'MediaWiki' },
          { '@key': '9', '@case': 'first-letter', '#text': 'MediaWiki talk' },
          { '@key': '10', '@case': 'first-letter', '#text': 'Template' },
          { '@key': '11', '@case': 'first-letter', '#text': 'Template talk' },
          { '@key': '12', '@case': 'first-letter', '#text': 'Help' },
          { '@key': '13', '@case': 'first-letter', '#text': 'Help talk' },
          { '@key': '14', '@case': 'first-letter', '#text': 'Category' },
          { '@key': '15', '@case': 'first-letter', '#text': 'Category talk' },
          { '@key': '100', '@case': 'first-letter', '#text': 'Portal' },
          { '@key': '101', '@case': 'first-letter', '#text': 'Portal talk' },
          { '@key': '118', '@case': 'first-letter', '#text': 'Draft' },
          { '@key': '119', '@case': 'first-letter', '#text': 'Draft talk' },
          { '@key': '710', '@case': 'first-letter', '#text': 'TimedText' },
          { '@key': '711', '@case': 'first-letter', '#text': 'TimedText talk' },
          { '@key': '828', '@case': 'first-letter', '#text': 'Module' },
          { '@key': '829', '@case': 'first-letter', '#text': 'Module talk' },
          { '@key': '2300', '@case': 'first-letter', '#text': 'Gadget' },
          { '@key': '2301', '@case': 'first-letter', '#text': 'Gadget talk' },
          { '@key': '2302', '@case': 'case-sensitive', '#text': 'Gadget definition' },
          { '@key': '2303', '@case': 'case-sensitive', '#text': 'Gadget definition talk' }
        ]
      }
    },
    page: {
      title: 'Template:Template page name here',
      ns: '10',
      id: '1',
      revision: {
        id: '1',
        timestamp: '1970-01-01T00:00:00Z',
        contributor: { username: 'User', id: '1' },
        comment: '/doc',
        model: 'wikitext',
        format: 'text/x-wiki',
        text: { '@bytes': '2', '@xml:space': 'preserve', '#text': 'Hi' },
        sha1: '94dd9e08c129c785f7f256e82fbe0a30e6d1ae40'
      }
    }
  }
}

Note that mediawiki.page becomes an array instead of an object if more than one child element, like:

    ...
    page: [
      {
        id: '1',
        timestamp: '1970-01-01T00:00:00Z',
        contributor: { username: 'User', id: '1' },
        comment: '/doc',
        model: 'wikitext',
        format: 'text/x-wiki',
        text: { '@bytes': '2', '@xml:space': 'preserve', '#text': 'Hi' },
        sha1: '94dd9e08c129c785f7f256e82fbe0a30e6d1ae40'
      },
      {
        id: '2',
        timestamp: '1970-01-01T00:00:00Z',
        contributor: { username: 'User', id: '2' },
        comment: '/doc',
        model: 'wikitext',
        format: 'text/x-wiki',
        text: { '@bytes': '3', '@xml:space': 'preserve', '#text': 'Bye' },
        sha1: 'f792424064d0ca1a7d14efe0588f10c052d28e69'
      }
    ]
    ...

Also, I prepared this with consideration of reaching out to other related XML<->JSON projects to possibly suggest similarly to each, so that possibly other projects may also consider similar interoperability considerations, if possible.

@jasonkhanlar jasonkhanlar added the enhancement New feature or request label Apr 5, 2022
@jasonkhanlar
Copy link
Author

jasonkhanlar commented Apr 5, 2022

Also, comparing that XML above converting to JSON (piped using xq), using XMLBuilder2 to convert the XML to JSON object:

let data = await fs.promises.readFile(file, { encoding: 'utf8' });
printLog('test',xmlbuilder2.convert(data, { format: "object" }));

shows as almost the exact same thing, other than mediawiki.page.revision.text['#'] instead of mediawiki.page.revision.text['#text'], and same # vs #text in namespaces

This is actually much better than I initially noticed when reading documentation at https://npmjs.com/package/xmlbuilder2 but also I didn't realize converting XML to JSON, and I was looking more into how to convert the existing JSON I have into XML, to exactly reproduce the source XML file. And I initially was thinking that XMLBuilder2 would be able to convert the JSON that I have already, into XML, but I don't see how to do that.

@jasonkhanlar jasonkhanlar changed the title Feature Request: Compatibility with xq/yq Not really a feature request after all, more of a question: How to convert existing JSON to XML? ~~Feature Request: Compatibility with xq/yq~~ Apr 5, 2022
@jasonkhanlar
Copy link
Author

jasonkhanlar commented Apr 5, 2022

I figured out how to convert JSON to XML:

xmlbuilder2.convert(archive, { prettyPrint: true });

and the output is perfect! Even adding <?xml version="1.0"?> at the very beginning, exposing how MediaWiki software doesn't include that in the generated XML exports.

@jasonkhanlar jasonkhanlar changed the title Not really a feature request after all, more of a question: How to convert existing JSON to XML? ~~Feature Request: Compatibility with xq/yq~~ Not really a feature request after all, more of a I'm stupid and realized XMLBuilder2 is awesome, doing exactly what I thought it didn't.~~question: How to convert existing JSON to XML?~~ ~~Feature Request: Compatibility with xq/yq~~ Apr 5, 2022
@oozcitak
Copy link
Owner

oozcitak commented Apr 5, 2022

This is actually much better than I initially noticed when reading documentation at https://npmjs.com/package/xmlbuilder2 ...

FYI, the full documentation is here: https://oozcitak.github.io/xmlbuilder2/

@jasonkhanlar
Copy link
Author

I'll just stick this in here:

I glanced at https://en.wikipedia.org/wiki/XML#Key_terminology and noticed encoding="UTF-8" in the declaration, so I tried

xmlbuilder2.convert(json, { encoding: 'utf8', prettyPrint: true });

thinking maybe the output would appear as <?xml version="1.0" encoding="UTF-8"?> but it didn't. It stayed as <?xml version="1.0"?>. This is alright in my case, but I am curious (I haven't checked, and I'm not smart enough either), if a non-UTF8/16/32/64/128/256/512/1024/2048/4096/etc. encoding is specified, would that be ignored also? https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings

@jasonkhanlar
Copy link
Author

Aha! As learned from #130 (comment) the encoding option is only able to be input in structure that is recognized as BuilderOptions order of argument evaluation.

Therefore, this code works to set the encoding value:

let xmlverify = xmlbuilder2.convert({ encoding: 'utf8' }, json, { format: 'xml', prettyPrint: true, spaceBeforeSlash: true });

@oozcitak
Copy link
Owner

oozcitak commented Apr 7, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants