Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-generated blank node labels are syntactically invalid #68

Open
wouterbeek opened this issue Oct 26, 2017 · 3 comments
Open

Auto-generated blank node labels are syntactically invalid #68

wouterbeek opened this issue Oct 26, 2017 · 3 comments

Comments

@wouterbeek
Copy link
Contributor

wouterbeek commented Oct 26, 2017

When a dataset that contains blank nodes is processed in the Semantic Web standard libraries, blank nodes are assigned auto-generated labels. The URI representation of the file path from which the data is loaded forms part of these generated labels (see the example below). Unfortunately, forward slashes are not allowed in Turtle-family blank node label syntax. This means that Prolog blank node labels cannot be directly emitted in the process of generating a Turtle-family export or a SPARQL result set.

?- [library(semweb/rdf11)].
?- [library(semweb/turtle)].
?- rdf_load('vocab.trig', [format(trig)]).
?- rdf(S, P, O).
S = '_:file:///home/wbeek/git/Triply/cshapes/vocab.trig1',
P = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#first',
O = 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type' ;

The predicates that performed batch exports, i.e., that export complete files at once, do turn these internal blank node labels into standards-compliant serialization output.

The problem remains with applications that stream through the data. Specifically, it is currently not possible to 'recode' an RDF dataset from one format into another using a statement-wide window. Renaming internal blank node labels to standard-compliant external blank node label requires an in-memory mapping (turtle.c uses a hash map for this) which can become arbitrarily long for arbitrary long data streams.

@wouterbeek wouterbeek added the bug label Oct 26, 2017
@JanWielemaker
Copy link
Member

Well, the Turtle writer will rename Prolog's blank nodes into nice and short ones. I'm not a big fan of hashes a they make debugging hard. I see various options: make sure they never leak through standard protocols, use an encoding that can be reverted (e.g, the url-friendly base64 variant), so
we can use portray or other tools to make them readable again or use a hash.

@wouterbeek
Copy link
Contributor Author

wouterbeek commented Oct 26, 2017

Indeed, the writers fix this. I was writing atoms (bnodes and IRIs) to N-Triples directly, but that is a recipe for disaster :) Thanks for pointing that out.

@wouterbeek
Copy link
Contributor Author

wouterbeek commented Oct 26, 2017

I've updated then issue to make clearer that blank node renaming is currently not implemented for streamed writers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants