Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elements are ignored if a text node is present at the same level #9

Open
rimutaka opened this issue Oct 26, 2020 · 2 comments
Open

Comments

@rimutaka
Copy link
Collaborator

This issue describes an uncommon scenario where text and elements are mixed together at the same level. I have encountered it in the wild, but not in the context of XML to JSON conversion.

Example

Consider the following well-formed XML example:

<Root>
Some text is totally valid here
  <TaxRate>7.25</TaxRate>
  <Data>
  and also at this level
    <Category>A</Category>
    <Quantity>3</Quantity>
    <Price>24.50</Price>
  </Data>
</Root>

It has 2 instances of text and element nodes at the same level. The expected JSON would be:

{
  "Root": {
    "Data": {
      "Category": "A",
      "Price": 24.5,
      "Quantity": 3,
      "txt": "and also at this level"
    },
    "TaxRate": 7.25,
    "txt": "Some text is totally valid here"
  }
}

but because of the logic in the code where we check for the presence of the text node ( if el.text().trim() != "" { ...) and only handle child elements in the else to that the JSON looses the elements:

{
  "Root": "Some text is totally valid here"
}

Solution

The solution would be to refactor fn convert_node in lib.rs to process the children recursively regardless of the presence of the text node.

This is a low priority issue. No action is expected unless we actually have someone affected by it.

@rimutaka rimutaka changed the title Elements lements are ignore if a text node is present at the same level Elements are ignored if a text node is present at the same level Nov 21, 2020
@apolo49
Copy link

apolo49 commented Apr 14, 2024

I am unsure if this is related, but from what I can tell it seems to be...

I am working on parsing a PolyGlot save file, and have the output XML here:

<EtymologyCollection>
  <EtymologyInternalRelation>
    16
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    17
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    18
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    3
    <EtymologyInternalChild>
      6
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    19
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    20
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    6
    <EtymologyInternalChild>
      5
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    7
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    8
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    9
    <EtymologyInternalChild>
      10
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    14
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
  <EtymologyInternalRelation>
    15
    <EtymologyInternalChild>
      13
    </EtymologyInternalChild>
  </EtymologyInternalRelation>
</EtymologyCollection>

This converts into the following JSON, where you can see the EtymologyInternalChild nodes are removed, however the EtymologyInternalRelation nodes are preserved:

"EtymologyCollection": {
    "EtymologyInternalRelation": [
        16,
        17,
        18,
        3,
        19,
        20,
        6,
        7,
        8,
        9,
        14,
        15
    ]
}

I hope this helps, I don't know if it will though!

@AlecTroemel
Copy link
Owner

@apolo49 in your example, what would have been your desired json? It seems like something we'll have to invent a convention on how to parse. For example how @rimutaka created the txt field in their initial examples

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants