Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

saving as RDF/XML of OWL may fail for XML elements with XMLLiterals #439

Closed
egonw opened this issue Sep 9, 2015 · 2 comments
Closed

saving as RDF/XML of OWL may fail for XML elements with XMLLiterals #439

egonw opened this issue Sep 9, 2015 · 2 comments

Comments

@egonw
Copy link

egonw commented Sep 9, 2015

When I import the NanoParticle Ontology with 4.0.2, do some work on it (not on npo:definition) and then save it as RDF/XML again, the XMLLiteral gets expended as normal XML but the rdf:dataType attribute is not remove, leading to this parsing error (with OWLAPI again):

Parser: org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser@172c6c5
org.semanticweb.owlapi.rdf.rdfxml.parser.RDFParserException: [line=1088:column=111] rdf:datatype specified on a node with resource value.

Line 1088 matches the npo:definition element here:

    <owl:Class rdf:about="http://purl.bioontology.org/ontology/npo#NPO_1193">
        <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">cell concentration</rdfs:label>
        <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/PATO_0000033"/>
        <npo:FULL_SYN rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"><ncicp:ComplexTerm><ncicp:term-name>cell concentration</ncicp:term-name><ncicp:term-group>PT</ncicp:term-group><ncicp:term-source>NCI</ncicp:term-source></ncicp:ComplexTerm></npo:FULL_SYN>
        <npo:code rdf:datatype="http://www.w3.org/2001/XMLSchema#string">NPO_1193</npo:code>
        <npo:definition rdf:datatype="http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral"><ncicp:ComplexDefinition><ncicp:def-source>NPO</ncicp:def-source><ncicp:def-definition>Concentration, which is the amount of cells in a medium.</ncicp:def-definition>   </ncicp:ComplexDefinition></npo:definition>
        <npo:preferred_Name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">cell concentration</npo:preferred_Name>
    </owl:Class>

The original RDF/XML of the OWL file looks like (not RDF/XML):

Class: <http://purl.bioontology.org/ontology/npo#NPO_1193>

    Annotations: 
        <http://purl.bioontology.org/ontology/npo#preferred_Name> "cell concentration"^^xsd:string,
        <http://purl.bioontology.org/ontology/npo#definition> "<ncicp:ComplexDefinition><ncicp:def-source>NPO</ncicp:def-source><ncicp:def-definition>Concentration, which is the amount of cells in a medium.</ncicp:def-definition></ncicp:ComplexDefinition>"^^rdf:XMLLiteral,
        <http://purl.bioontology.org/ontology/npo#FULL_SYN> "<ncicp:ComplexTerm><ncicp:term-name>cell concentration</ncicp:term-name><ncicp:term-group>PT</ncicp:term-group><ncicp:term-source>NCI</ncicp:term-source></ncicp:ComplexTerm>"^^rdf:XMLLiteral,
        rdfs:label "cell concentration"^^xsd:string,
        <http://purl.bioontology.org/ontology/npo#code> "NPO_1193"^^xsd:string

    SubClassOf: 
        <http://purl.bioontology.org/ontology/npo#has_unit_of_measure> some    <http://purl.org/obo/owl/UO#UO_0000200>,
        <http://purl.bioontology.org/ontology/npo#NPO_1691>

I am not sure this is a bug in the input file, in the OWLAPI write, OWLAPI reader, or something I do wrong. For full disclosure, the code is available here: https://github.com/enanomapper/slimmer/blob/792a8bad77fad2279f5528c110dd3b68701c7f68/src/main/java/com/github/enanomapper/Slimmer.java

@ignazio1977
Copy link
Contributor

This is related to (or better, one of the problems also appears in) #412 where an XMLLiteral with ncicp: also appears. The XMLLiteral is not well formed, as the ncicp prefix is not defined in the XML. The specs require the XML fragment to be well formed.

I'm assuming what happens when the XML gets parsed is that, since it does not match what an XML literal is supposed to look like (i.e., a self contained XML fragment), the parser is treating it as part of the RDF/XML stream and finding it can kind of consider it an individual; this then causes a problem with the previous triple, which has a datatype mentioned and so is assumed to have an object literal.

For historical context, up to 4.0.1 we had a nonstandard behaviour here: xml literals would be escaped, so as to not have < or > characters in them. However this was not specs compliant. Once that was fixed, though, invalid XML literals gained the power to disrupt ontology parsing. Some you really can't win...

@egonw
Copy link
Author

egonw commented Sep 9, 2015

Yes, that indeed looks like the same issue! I did some searching, but had not seen that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants