Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LiteralOrIri #1186

Merged

Conversation

greenBene
Copy link
Contributor

@greenBene greenBene commented Dec 11, 2023

We plan for this class to be consistently used across all of QLever to represent values that cannot be folded into an ID directly (e.g. during the parsing, for local vocabs, expressions, etc.).
It is a strong type that stores the contents in a normalized way that makes it easy to work with and reason about.
This commit only introduces this type together with some unit tests, follow-up commits will roll it out across the codebase.

@greenBene greenBene marked this pull request as draft December 11, 2023 15:21
@greenBene
Copy link
Contributor Author

Hi @joka921 ,

I finally got around to start working on the implementation of the LiteralOrIri class we spoke about last month.

This is so far just a draft and I would highly appreciate your opinion if the interface looks as you expected. The actual implementation is so far quite basic, and I wanted to introduce more logic once we are happy with the interface itself.

Copy link

codecov bot commented Dec 11, 2023

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (4ccc4d6) 87.29% compared to head (fdebba3) 87.52%.
Report is 7 commits behind head on master.

Files Patch % Lines
src/parser/LiteralOrIri.cpp 94.82% 2 Missing and 1 partial ⚠️
src/parser/RdfEscaping.cpp 96.77% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1186      +/-   ##
==========================================
+ Coverage   87.29%   87.52%   +0.22%     
==========================================
  Files         305      309       +4     
  Lines       27514    27911     +397     
  Branches     3104     3122      +18     
==========================================
+ Hits        24019    24429     +410     
+ Misses       2365     2347      -18     
- Partials     1130     1135       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 28 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@greenBene greenBene force-pushed the literal-or-iri-type-implementation branch from 6ee3e90 to 3524d3a Compare January 18, 2024 14:45
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some initial comments.
Most of them are trivial cleanups which we can do ourselves.
For the deeper things (where does the parsing/normalization happen), that is something that we can discuss in person maybe next week.

src/parser/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines 17 to 21
// Created a new iri object
explicit IriType(NormalizedString iri);

// Returns the string value of the iri object
[[nodiscard]] NormalizedStringView getIri() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface should probably be something like

static Iri parseFromRdf(std::string_view rdf); // takes "<someIriWithEscapingsEtc>" and calls the appropriate normalization from RdfsEsaping)"
NormalizedStringView getNormalizedContent; // The normalized value without the <>
std::string toRdf () // Redo the escaping and readd the <>. Typically not used so much, as we have dedicated exporting routines, but at least for debugging and printing this is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just getContent() then it's the same as for the Literal class which makes the usage easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay after further thinking:
Maybe we mostly use the parser for the parsing and taking the NormalizedString directly is fine, but then you really really have to document whether this should be with or without the <>. And maybe warn if it starts or ends with those (theoretically you can escape them, but what are the odds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So bottomline: It is important to maake it very clear when you get <> and when not.

src/parser/IriType.h Outdated Show resolved Hide resolved
src/parser/LiteralType.h Outdated Show resolved Hide resolved
// Returns true if the literal has an assigned datatype
bool hasDatatype() const;

// Returns the value of the literal, without any datatype or language tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make it always clear whether you get the quotation marks.
I am still not sure the more I am thinking about this, whether we shouldn't doe the parsing directly inside these classes, then we can enforce more invariants about the " @ and <> characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least make the constructor private and have only certain authorized parsers create such objects.
But we can discuss this at a later point.

Comment on lines 15 to 19
if (!isIri()) {
AD_THROW(
"LiteralOrIriType object does not contain an IriType object and thus "
"cannot return it");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use `AD_CONTRACT_CHECK as soon as I've merged it with the additional message.

src/parser/LiteralOrIriType.cpp Outdated Show resolved Hide resolved
Comment on lines 33 to 37
if (!isLiteral()) {
AD_THROW(
"LiteralOrIriType object does not contain an LiteralType object and "
"thus cannot return it");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about AD_CONTRACT_CHECK

src/parser/LiteralOrIriType.cpp Outdated Show resolved Hide resolved
src/parser/LiteralOrIriType.cpp Outdated Show resolved Hide resolved
@greenBene
Copy link
Contributor Author

@joka921 Thank you for all the comments! I'll add the suggested changes and missing documentation on Monday ! :)

I created a parser LiteralOrIriType LiteralOrIriType::fromStringToLiteralOrIri(std::stringview) and think that this should be the main way to create LiteralOrIriType from string. We can change the visibility of the constructors accordingly.

If you are available, I´d be happy for a call to discuss some open points :) I'll write you a mail for scheduling.

* removed unnecessary todo
@greenBene
Copy link
Contributor Author

@joka921 Hi :) I implemented the changes as discussed on Monday. Does it now looks as expected?

The CI step measure-code-coverage / build fails and I don't quite understand why. If you have time the coming days could you maybe have look?

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I did not send the review last week...
However this already looks very nice.
I will get back to you via Email.

src/parser/Literal.h Outdated Show resolved Hide resolved
src/parser/LiteralOrIri.h Show resolved Hide resolved
src/parser/LiteralOrIri.h Outdated Show resolved Hide resolved
src/parser/Iri.h Outdated Show resolved Hide resolved
src/parser/Iri.h Show resolved Hide resolved
src/parser/NormalizedString.h Outdated Show resolved Hide resolved
src/parser/NormalizedString.cpp Outdated Show resolved Hide resolved
test/parser/LiteralOrIriTypeTest.cpp Outdated Show resolved Hide resolved
src/parser/NormalizedString.cpp Outdated Show resolved Hide resolved
src/parser/RdfEscaping.cpp Outdated Show resolved Hide resolved
Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be almost the last round,
we are converging.

// __________________________________________
Iri Iri::prefixed(const Iri& prefix, const std::string& suffix) {
return Iri{prefix, RdfEscaping::normalizeIriWithoutBrackets(suffix)};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think technically this might be subtely wrong, but that is due to preexisting bugs in corner cases of the escaping module, which are unrelated to this PR.
Most notably there actually is not much to "escape" for an iriref, but that requires some other changes by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which way might this be subtly wrong?

src/parser/Literal.h Outdated Show resolved Hide resolved
src/parser/LiteralOrIri.h Outdated Show resolved Hide resolved
test/parser/CMakeLists.txt Outdated Show resolved Hide resolved
test/parser/LiteralOrIriTest.cpp Show resolved Hide resolved
@greenBene greenBene requested a review from joka921 February 1, 2024 12:51
@greenBene
Copy link
Contributor Author

Hi @joka921

Thank you for fixing the subtle bug and for the review!

I implemented all the suggested changes, thank you especially for the suggestion how to handle the Literal creation functions! :)

Is there anything missing in this PR?

Copy link
Member

@joka921 joka921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there, it is only very minor stuff:)

src/parser/Literal.h Outdated Show resolved Hide resolved
src/parser/Literal.h Outdated Show resolved Hide resolved
src/parser/Literal.cpp Outdated Show resolved Hide resolved
src/parser/LiteralOrIri.h Outdated Show resolved Hide resolved
test/parser/LiteralOrIriTest.cpp Outdated Show resolved Hide resolved
@greenBene
Copy link
Contributor Author

Thank you! :) Changes are implemented

…ation' into literal-or-iri-type-implementation

# Conflicts:
#	src/parser/Literal.cpp
#	src/parser/Literal.h
#	src/parser/LiteralOrIri.h
#	test/parser/LiteralOrIriTest.cpp
Copy link

sonarqubecloud bot commented Feb 2, 2024

Quality Gate Passed Quality Gate passed

The SonarCloud Quality Gate passed, but some issues were introduced.

3 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@joka921 joka921 changed the title LiteralOrIriType Implementation Implement LiteralOrIri Feb 2, 2024
@joka921 joka921 merged commit 9813a28 into ad-freiburg:master Feb 2, 2024
17 of 18 checks passed
hannahbast pushed a commit that referenced this pull request Mar 22, 2024
So far, literals and IRIs were stored as a `std::string` (in a normalized way, that is, without escaping) and the distinction between literals and IRIs was made via the first character (`"` or `<`). The code to deal with datatypes (the stuff after `^^`) and language tags (e.g., `@en`) was also ad-hoc using low-level string operations. Now there are proper classes `Literal` and `Iri`, which internally still store their data as normalized strings starting with `"` or `<` (just like before), but they are now two different types and each with a proper interface. This continues work started with #1186.

The new classes are used in the `TripleComponent` class, which is used by the Turtle and SPARQL parsers. As a consequence, a lot of code is affected by this change. The new classes are not yet used by `LocalVocab` and `ExportQueryExecutionTrees`.
hannahbast pushed a commit that referenced this pull request May 3, 2024
Continue work started with #1186 (which introduced the class `LiteralOrIri`) and #1318 (which used `LiteralOrIri` instead of `std::string` in all SPARQL expressions). This change now uses `LiteralOrIri` instead of `std::string` whenever dealing with words from a `LocalVocab`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants