-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LiteralOrIri
#1186
Merged
joka921
merged 47 commits into
ad-freiburg:master
from
greenBene:literal-or-iri-type-implementation
Feb 2, 2024
Merged
Implement LiteralOrIri
#1186
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
16c6904
* Rebase
greenBene 66fd51c
* Rebase
greenBene bac405d
* Formatting
greenBene d12a556
* Added test cases for draft implementation
greenBene 959636f
* Formatting
greenBene 3524d3a
* Formatting of header
greenBene a8cebea
* basic implementation of NormalizedString
greenBene eca253d
* fixed formatting
greenBene f326fae
* added warning
greenBene 8e52000
* Updated Literal, Iri, and LiteralOrIri to use NormalizedString inst…
greenBene 3c6153d
* tidying up and adding comments to interface
greenBene 49d267d
* created function to parse arbitrary string to LiteralOrIriType incl…
greenBene bd32608
* Removed unnecessary filenames from parser/CMakeLists
greenBene e27f682
* Removed "Type" suffix from new classes
greenBene 3b614fe
* Removed "Type" suffix from new classes
greenBene 9507c9d
* Updated Iri interface to be more similar to Literal interface
greenBene 6ff0ecf
* Improved documentation
greenBene e0372aa
* Removed unused code
greenBene bb8b089
* Fixed code smells reported by sonarcloud
greenBene f3ba1db
* Adapted code formatting
greenBene bd9b819
* Improved comments and variable naming
greenBene a234ef7
* typos
greenBene a075f44
* added toRdf export functions to Iri, Literal, and LiteralOrIri
greenBene 7683bfe
* added exception to Literal::toRdf() thrown if descriptorValue_ has …
greenBene a6d2cc1
* improved documentation
greenBene 766c2fb
* improved documentation
greenBene 8093a4d
* removed previously missed `this->var` occurrences
greenBene 0b80900
* code formatting
greenBene d3fd6e3
* Literal now uses IRI for datatype descriptors
greenBene 30c2815
* wrote normalizing functions for Iri, Literal, and LanguageTag strings
greenBene 5bc1345
Merge branch 'master' into literal-or-iri-type-implementation
greenBene 8223519
* formatting
greenBene d3e4e97
* formatting
greenBene e072cf4
* Implemented Review Feedback
greenBene f74084b
Move the `Literal` type into a namespace.
joka921 2ca364a
Move everything into a namespace.
joka921 4aef562
* Unified Literal constructing functions
greenBene 0fb6684
* added new line in test/parser/CMakeLists.txt
greenBene 72a9673
* Added test to ensure the given literal is encoded if a literal is c…
greenBene 74797f7
* Removed code duplication
greenBene a5d0649
* Used std::visit to avoid runtime `AD_THROW`
greenBene cb5155e
* Adapted comments
greenBene d91739e
* switched from raw literals to standard literals
greenBene f91df95
* adapted documentation to avoid duplication
greenBene 7d1c9b8
Some small improvements for the interface.
joka921 fe42703
Merge remote-tracking branch 'greenBene/literal-or-iri-type-implement…
joka921 fdebba3
A tiny change.
joka921 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
// Copyright 2023, University of Freiburg, | ||
// Chair of Algorithms and Data Structures. | ||
// Author: Benedikt Maria Beckermann <[email protected]> | ||
|
||
#include "parser/Iri.h" | ||
|
||
#include <utility> | ||
|
||
#include "util/StringUtils.h" | ||
|
||
namespace ad_utility::triple_component { | ||
// __________________________________________ | ||
Iri::Iri(NormalizedString iri) : iri_{std::move(iri)} {} | ||
|
||
// __________________________________________ | ||
Iri::Iri(const Iri& prefix, NormalizedStringView suffix) | ||
: iri_{NormalizedString{prefix.getContent()} + suffix} {}; | ||
|
||
// __________________________________________ | ||
NormalizedStringView Iri::getContent() const { return iri_; } | ||
|
||
// __________________________________________ | ||
Iri Iri::iriref(std::string_view stringWithBrackets) { | ||
return Iri{RdfEscaping::normalizeIriWithBrackets(stringWithBrackets)}; | ||
} | ||
|
||
// __________________________________________ | ||
Iri Iri::prefixed(const Iri& prefix, std::string_view suffix) { | ||
return Iri{std::move(prefix), | ||
RdfEscaping::normalizeIriWithoutBrackets(suffix)}; | ||
} | ||
|
||
} // namespace ad_utility::triple_component |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
// Copyright 2023, University of Freiburg, | ||
// Chair of Algorithms and Data Structures. | ||
// Author: Benedikt Maria Beckermann <[email protected]> | ||
|
||
#pragma once | ||
|
||
#include <string_view> | ||
|
||
#include "parser/NormalizedString.h" | ||
|
||
namespace ad_utility::triple_component { | ||
|
||
// A class to hold IRIs. It does not store the leading or trailing | ||
// angled bracket. | ||
// | ||
// E.g. For the input "<http://example.org/books/book1>", | ||
// only "http://example.org/books/book1" is to be stored in the iri_ variable. | ||
class Iri { | ||
private: | ||
// Store the string value of the IRI without any leading or trailing angled | ||
// brackets. | ||
NormalizedString iri_; | ||
|
||
// Create a new iri object | ||
explicit Iri(NormalizedString iri); | ||
greenBene marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
// Create a new iri using a prefix | ||
Iri(const Iri& prefix, NormalizedStringView suffix); | ||
|
||
public: | ||
// Create a new iri given an iri with brackets | ||
static Iri iriref(std::string_view stringWithBrackets); | ||
|
||
// Create a new iri given a prefix iri and its suffix | ||
static Iri prefixed(const Iri& prefix, std::string_view suffix); | ||
|
||
// Return the string value of the iri object without any leading or trailing | ||
// angled brackets. | ||
NormalizedStringView getContent() const; | ||
}; | ||
|
||
} // namespace ad_utility::triple_component |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,95 @@ | ||
// Copyright 2023, University of Freiburg, | ||
// Chair of Algorithms and Data Structures. | ||
// Author: Benedikt Maria Beckermann <[email protected]> | ||
|
||
#include "parser/Literal.h" | ||
|
||
#include <utility> | ||
#include <variant> | ||
|
||
namespace ad_utility::triple_component { | ||
// __________________________________________ | ||
Literal::Literal(NormalizedString content) : content_{std::move(content)} {} | ||
|
||
// __________________________________________ | ||
Literal::Literal(NormalizedString content, Iri datatype) | ||
: content_{std::move(content)}, descriptor_{std::move(datatype)} {} | ||
|
||
// __________________________________________ | ||
Literal::Literal(NormalizedString content, NormalizedString languageTag) | ||
: content_{std::move(content)}, descriptor_{std::move(languageTag)} {} | ||
|
||
// __________________________________________ | ||
bool Literal::hasLanguageTag() const { | ||
return std::holds_alternative<NormalizedString>(descriptor_); | ||
} | ||
|
||
// __________________________________________ | ||
bool Literal::hasDatatype() const { | ||
return std::holds_alternative<Iri>(descriptor_); | ||
} | ||
|
||
// __________________________________________ | ||
NormalizedStringView Literal::getContent() const { return content_; } | ||
|
||
// __________________________________________ | ||
Iri Literal::getDatatype() const { | ||
if (!hasDatatype()) { | ||
AD_THROW("The literal does not have an explicit datatype."); | ||
} | ||
return std::get<Iri>(descriptor_); | ||
} | ||
|
||
// __________________________________________ | ||
NormalizedStringView Literal::getLanguageTag() const { | ||
if (!hasLanguageTag()) { | ||
AD_THROW("The literal does not have an explicit language tag."); | ||
} | ||
return std::get<NormalizedString>(descriptor_); | ||
} | ||
|
||
// __________________________________________ | ||
Literal Literal::literalWithQuotes( | ||
std::string_view rdfContentWithQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor) { | ||
NormalizedString content = | ||
RdfEscaping::normalizeLiteralWithQuotes(rdfContentWithQuotes); | ||
|
||
return literalWithNormalizedContent(content, std::move(descriptor)); | ||
} | ||
|
||
// __________________________________________ | ||
Literal Literal::literalWithoutQuotes( | ||
std::string_view rdfContentWithoutQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor) { | ||
NormalizedString content = | ||
RdfEscaping::normalizeLiteralWithoutQuotes(rdfContentWithoutQuotes); | ||
|
||
return literalWithNormalizedContent(content, std::move(descriptor)); | ||
} | ||
|
||
// __________________________________________ | ||
Literal Literal::literalWithNormalizedContent( | ||
NormalizedString normalizedRdfContent, | ||
std::optional<std::variant<Iri, string>> descriptor) { | ||
if (!descriptor.has_value()) { | ||
return Literal(std::move(normalizedRdfContent)); | ||
} | ||
|
||
using namespace RdfEscaping; | ||
auto visitLanguageTag = | ||
[&normalizedRdfContent](std::string&& languageTag) -> Literal { | ||
return {std::move(normalizedRdfContent), | ||
normalizeLanguageTag(std::move(languageTag))}; | ||
}; | ||
|
||
auto visitDatatype = [&normalizedRdfContent](Iri&& datatype) -> Literal { | ||
return {std::move(normalizedRdfContent), std::move(datatype)}; | ||
}; | ||
|
||
return std::visit( | ||
ad_utility::OverloadCallOperator{visitDatatype, visitLanguageTag}, | ||
std::move(descriptor.value())); | ||
} | ||
|
||
} // namespace ad_utility::triple_component |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
// Copyright 2023, University of Freiburg, | ||
// Chair of Algorithms and Data Structures. | ||
// Author: Benedikt Maria Beckermann <[email protected]> | ||
|
||
#pragma once | ||
|
||
#include "parser/Iri.h" | ||
#include "parser/NormalizedString.h" | ||
|
||
namespace ad_utility::triple_component { | ||
// A class to hold literal values. | ||
class Literal { | ||
private: | ||
// Store the string value of the literal without the surrounding quotation | ||
// marks or trailing descriptor. | ||
// "Hello World"@en -> Hello World | ||
NormalizedString content_; | ||
|
||
using LiteralDescriptorVariant = | ||
std::variant<std::monostate, NormalizedString, Iri>; | ||
|
||
// Store the optional language tag or the optional datatype if applicable | ||
// without their prefixes. | ||
// "Hello World"@en -> en | ||
// "Hello World"^^test:type -> test:type | ||
LiteralDescriptorVariant descriptor_; | ||
|
||
// Create a new literal without any descriptor | ||
explicit Literal(NormalizedString content); | ||
|
||
// Create a new literal with a datatype | ||
Literal(NormalizedString content, Iri datatype); | ||
|
||
// Create a new literal with a language tag | ||
Literal(NormalizedString content, NormalizedString languageTag); | ||
|
||
// Similar to `literalWithQuotes`, except the rdfContent is expected to | ||
// already be normalized | ||
static Literal literalWithNormalizedContent( | ||
NormalizedString normalizedRdfContent, | ||
std::optional<std::variant<Iri, string>> descriptor = std::nullopt); | ||
|
||
public: | ||
// Return true if the literal has an assigned language tag | ||
bool hasLanguageTag() const; | ||
|
||
// Return true if the literal has an assigned datatype | ||
bool hasDatatype() const; | ||
|
||
// Return the value of the literal without quotation marks and without any | ||
// datatype or language tag | ||
NormalizedStringView getContent() const; | ||
|
||
// Return the language tag of the literal, if available, without leading @ | ||
// character. Throws an exception if the literal has no language tag. | ||
NormalizedStringView getLanguageTag() const; | ||
|
||
// Return the datatype of the literal, if available, without leading ^^ | ||
// prefix. Throws an exception if the literal has no datatype. | ||
Iri getDatatype() const; | ||
|
||
// For documentation, see documentation of function | ||
// LiteralORIri::literalWithQuotes | ||
static Literal literalWithQuotes( | ||
std::string_view rdfContentWithQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor = std::nullopt); | ||
|
||
// For documentation, see documentation of function | ||
// LiteralORIri::literalWithoutQuotes | ||
static Literal literalWithoutQuotes( | ||
std::string_view rdfContentWithoutQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor = std::nullopt); | ||
}; | ||
} // namespace ad_utility::triple_component |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
// Copyright 2023, University of Freiburg, | ||
// Chair of Algorithms and Data Structures. | ||
// Author: Benedikt Maria Beckermann <[email protected]> | ||
|
||
#include "parser/LiteralOrIri.h" | ||
|
||
#include <algorithm> | ||
#include <utility> | ||
|
||
namespace ad_utility::triple_component { | ||
// __________________________________________ | ||
LiteralOrIri::LiteralOrIri(Iri iri) : data_{std::move(iri)} {} | ||
|
||
// __________________________________________ | ||
LiteralOrIri::LiteralOrIri(Literal literal) : data_{std::move(literal)} {} | ||
|
||
// __________________________________________ | ||
bool LiteralOrIri::isIri() const { return std::holds_alternative<Iri>(data_); } | ||
|
||
// __________________________________________ | ||
const Iri& LiteralOrIri::getIri() const { | ||
if (!isIri()) { | ||
AD_THROW( | ||
"LiteralOrIri object does not contain an Iri object and thus " | ||
"cannot return it"); | ||
} | ||
return std::get<Iri>(data_); | ||
} | ||
|
||
// __________________________________________ | ||
NormalizedStringView LiteralOrIri::getIriContent() const { | ||
return getIri().getContent(); | ||
} | ||
|
||
// __________________________________________ | ||
bool LiteralOrIri::isLiteral() const { | ||
return std::holds_alternative<Literal>(data_); | ||
} | ||
|
||
// __________________________________________ | ||
const Literal& LiteralOrIri::getLiteral() const { | ||
if (!isLiteral()) { | ||
AD_THROW( | ||
"LiteralOrIri object does not contain an Literal object and " | ||
"thus cannot return it"); | ||
} | ||
return std::get<Literal>(data_); | ||
} | ||
|
||
// __________________________________________ | ||
bool LiteralOrIri::hasLanguageTag() const { | ||
return getLiteral().hasLanguageTag(); | ||
} | ||
|
||
// __________________________________________ | ||
bool LiteralOrIri::hasDatatype() const { return getLiteral().hasDatatype(); } | ||
|
||
// __________________________________________ | ||
NormalizedStringView LiteralOrIri::getLiteralContent() const { | ||
return getLiteral().getContent(); | ||
} | ||
|
||
// __________________________________________ | ||
NormalizedStringView LiteralOrIri::getLanguageTag() const { | ||
return getLiteral().getLanguageTag(); | ||
} | ||
|
||
// __________________________________________ | ||
Iri LiteralOrIri::getDatatype() const { return getLiteral().getDatatype(); } | ||
|
||
// __________________________________________ | ||
NormalizedStringView LiteralOrIri::getContent() const { | ||
if (isLiteral()) | ||
return getLiteralContent(); | ||
else if (isIri()) | ||
return getIriContent(); | ||
else | ||
AD_THROW("LiteralOrIri object contains neither Iri not Literal"); | ||
} | ||
|
||
// __________________________________________ | ||
LiteralOrIri LiteralOrIri::iriref(const std::string& stringWithBrackets) { | ||
return LiteralOrIri{Iri::iriref(stringWithBrackets)}; | ||
} | ||
|
||
// __________________________________________ | ||
LiteralOrIri LiteralOrIri::prefixedIri(const Iri& prefix, | ||
std::string_view suffix) { | ||
return LiteralOrIri{Iri::prefixed(prefix, suffix)}; | ||
} | ||
|
||
// __________________________________________ | ||
LiteralOrIri LiteralOrIri::literalWithQuotes( | ||
std::string_view rdfContentWithQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor) { | ||
return LiteralOrIri( | ||
Literal::literalWithQuotes(rdfContentWithQuotes, std::move(descriptor))); | ||
} | ||
|
||
// __________________________________________ | ||
LiteralOrIri LiteralOrIri::literalWithoutQuotes( | ||
std::string_view rdfContentWithoutQuotes, | ||
std::optional<std::variant<Iri, string>> descriptor) { | ||
return LiteralOrIri(Literal::literalWithoutQuotes(rdfContentWithoutQuotes, | ||
std::move(descriptor))); | ||
} | ||
} // namespace ad_utility::triple_component |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically this might be subtely wrong, but that is due to preexisting bugs in corner cases of the escaping module, which are unrelated to this PR.
Most notably there actually is not much to "escape" for an iriref, but that requires some other changes by me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which way might this be subtly wrong?