-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LiteralOrIri
#1186
Implement LiteralOrIri
#1186
Conversation
Hi @joka921 , I finally got around to start working on the implementation of the This is so far just a draft and I would highly appreciate your opinion if the interface looks as you expected. The actual implementation is so far quite basic, and I wanted to introduce more logic once we are happy with the interface itself. |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1186 +/- ##
==========================================
+ Coverage 87.29% 87.52% +0.22%
==========================================
Files 305 309 +4
Lines 27514 27911 +397
Branches 3104 3122 +18
==========================================
+ Hits 24019 24429 +410
+ Misses 2365 2347 -18
- Partials 1130 1135 +5 ☔ View full report in Codecov by Sentry. |
Kudos, SonarCloud Quality Gate passed!
|
6ee3e90
to
3524d3a
Compare
* removing temporary main method used for testing
…ead of std::string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some initial comments.
Most of them are trivial cleanups which we can do ourselves.
For the deeper things (where does the parsing/normalization happen), that is something that we can discuss in person maybe next week.
src/parser/IriType.h
Outdated
// Created a new iri object | ||
explicit IriType(NormalizedString iri); | ||
|
||
// Returns the string value of the iri object | ||
[[nodiscard]] NormalizedStringView getIri() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interface should probably be something like
static Iri parseFromRdf(std::string_view rdf); // takes "<someIriWithEscapingsEtc>" and calls the appropriate normalization from RdfsEsaping)"
NormalizedStringView getNormalizedContent; // The normalized value without the <>
std::string toRdf () // Redo the escaping and readd the <>. Typically not used so much, as we have dedicated exporting routines, but at least for debugging and printing this is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just getContent()
then it's the same as for the Literal
class which makes the usage easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay after further thinking:
Maybe we mostly use the parser for the parsing and taking the NormalizedString directly is fine, but then you really really have to document whether this should be with or without the <>
. And maybe warn if it starts or ends with those (theoretically you can escape them, but what are the odds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So bottomline: It is important to maake it very clear when you get <>
and when not.
src/parser/LiteralType.h
Outdated
// Returns true if the literal has an assigned datatype | ||
bool hasDatatype() const; | ||
|
||
// Returns the value of the literal, without any datatype or language tag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also make it always clear whether you get the quotation marks.
I am still not sure the more I am thinking about this, whether we shouldn't doe the parsing directly inside these classes, then we can enforce more invariants about the " @ and <> characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least make the constructor private and have only certain authorized parsers create such objects.
But we can discuss this at a later point.
src/parser/LiteralOrIriType.cpp
Outdated
if (!isIri()) { | ||
AD_THROW( | ||
"LiteralOrIriType object does not contain an IriType object and thus " | ||
"cannot return it"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also use `AD_CONTRACT_CHECK as soon as I've merged it with the additional message.
src/parser/LiteralOrIriType.cpp
Outdated
if (!isLiteral()) { | ||
AD_THROW( | ||
"LiteralOrIriType object does not contain an LiteralType object and " | ||
"thus cannot return it"); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about AD_CONTRACT_CHECK
@joka921 Thank you for all the comments! I'll add the suggested changes and missing documentation on Monday ! :) I created a parser If you are available, I´d be happy for a call to discuss some open points :) I'll write you a mail for scheduling. |
* Added documentation to Iri interface
* Made code more c++-esque * minor bug fixes
* removed unnecessary todo
@joka921 Hi :) I implemented the changes as discussed on Monday. Does it now looks as expected? The CI step |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I did not send the review last week...
However this already looks very nice.
I will get back to you via Email.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be almost the last round,
we are converging.
// __________________________________________ | ||
Iri Iri::prefixed(const Iri& prefix, const std::string& suffix) { | ||
return Iri{prefix, RdfEscaping::normalizeIriWithoutBrackets(suffix)}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically this might be subtely wrong, but that is due to preexisting bugs in corner cases of the escaping module, which are unrelated to this PR.
Most notably there actually is not much to "escape" for an iriref, but that requires some other changes by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which way might this be subtly wrong?
…reated using the LiteralOrIri::literalWith[out]Quotes
Hi @joka921 Thank you for fixing the subtle bug and for the review! I implemented all the suggested changes, thank you especially for the suggestion how to handle the Literal creation functions! :) Is there anything missing in this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are almost there, it is only very minor stuff:)
Thank you! :) Changes are implemented |
…ation' into literal-or-iri-type-implementation # Conflicts: # src/parser/Literal.cpp # src/parser/Literal.h # src/parser/LiteralOrIri.h # test/parser/LiteralOrIriTest.cpp
|
So far, literals and IRIs were stored as a `std::string` (in a normalized way, that is, without escaping) and the distinction between literals and IRIs was made via the first character (`"` or `<`). The code to deal with datatypes (the stuff after `^^`) and language tags (e.g., `@en`) was also ad-hoc using low-level string operations. Now there are proper classes `Literal` and `Iri`, which internally still store their data as normalized strings starting with `"` or `<` (just like before), but they are now two different types and each with a proper interface. This continues work started with #1186. The new classes are used in the `TripleComponent` class, which is used by the Turtle and SPARQL parsers. As a consequence, a lot of code is affected by this change. The new classes are not yet used by `LocalVocab` and `ExportQueryExecutionTrees`.
We plan for this class to be consistently used across all of QLever to represent values that cannot be folded into an
ID
directly (e.g. during the parsing, for local vocabs, expressions, etc.).It is a strong type that stores the contents in a normalized way that makes it easy to work with and reason about.
This commit only introduces this type together with some unit tests, follow-up commits will roll it out across the codebase.