Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement LiteralOrIri #1186

Merged
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
16c6904
* Rebase
greenBene Jan 18, 2024
66fd51c
* Rebase
greenBene Dec 11, 2023
bac405d
* Formatting
greenBene Dec 11, 2023
d12a556
* Added test cases for draft implementation
greenBene Dec 11, 2023
959636f
* Formatting
greenBene Dec 11, 2023
3524d3a
* Formatting of header
greenBene Dec 11, 2023
a8cebea
* basic implementation of NormalizedString
greenBene Jan 19, 2024
eca253d
* fixed formatting
greenBene Jan 19, 2024
f326fae
* added warning
greenBene Jan 19, 2024
8e52000
* Updated Literal, Iri, and LiteralOrIri to use NormalizedString inst…
greenBene Jan 19, 2024
3c6153d
* tidying up and adding comments to interface
greenBene Jan 19, 2024
49d267d
* created function to parse arbitrary string to LiteralOrIriType incl…
greenBene Jan 19, 2024
bd32608
* Removed unnecessary filenames from parser/CMakeLists
greenBene Jan 22, 2024
e27f682
* Removed "Type" suffix from new classes
greenBene Jan 22, 2024
3b614fe
* Removed "Type" suffix from new classes
greenBene Jan 22, 2024
9507c9d
* Updated Iri interface to be more similar to Literal interface
greenBene Jan 22, 2024
6ff0ecf
* Improved documentation
greenBene Jan 22, 2024
e0372aa
* Removed unused code
greenBene Jan 22, 2024
bb8b089
* Fixed code smells reported by sonarcloud
greenBene Jan 22, 2024
f3ba1db
* Adapted code formatting
greenBene Jan 22, 2024
bd9b819
* Improved comments and variable naming
greenBene Jan 22, 2024
a234ef7
* typos
greenBene Jan 22, 2024
a075f44
* added toRdf export functions to Iri, Literal, and LiteralOrIri
greenBene Jan 22, 2024
7683bfe
* added exception to Literal::toRdf() thrown if descriptorValue_ has …
greenBene Jan 22, 2024
a6d2cc1
* improved documentation
greenBene Jan 22, 2024
766c2fb
* improved documentation
greenBene Jan 22, 2024
8093a4d
* removed previously missed `this->var` occurrences
greenBene Jan 22, 2024
0b80900
* code formatting
greenBene Jan 22, 2024
d3fd6e3
* Literal now uses IRI for datatype descriptors
greenBene Jan 25, 2024
30c2815
* wrote normalizing functions for Iri, Literal, and LanguageTag strings
greenBene Jan 26, 2024
5bc1345
Merge branch 'master' into literal-or-iri-type-implementation
greenBene Jan 26, 2024
8223519
* formatting
greenBene Jan 26, 2024
d3e4e97
* formatting
greenBene Jan 26, 2024
e072cf4
* Implemented Review Feedback
greenBene Jan 29, 2024
f74084b
Move the `Literal` type into a namespace.
joka921 Jan 30, 2024
2ca364a
Move everything into a namespace.
joka921 Jan 30, 2024
4aef562
* Unified Literal constructing functions
greenBene Feb 1, 2024
0fb6684
* added new line in test/parser/CMakeLists.txt
greenBene Feb 1, 2024
72a9673
* Added test to ensure the given literal is encoded if a literal is c…
greenBene Feb 1, 2024
74797f7
* Removed code duplication
greenBene Feb 1, 2024
a5d0649
* Used std::visit to avoid runtime `AD_THROW`
greenBene Feb 1, 2024
cb5155e
* Adapted comments
greenBene Feb 1, 2024
d91739e
* switched from raw literals to standard literals
greenBene Feb 1, 2024
f91df95
* adapted documentation to avoid duplication
greenBene Feb 2, 2024
7d1c9b8
Some small improvements for the interface.
joka921 Feb 2, 2024
fe42703
Merge remote-tracking branch 'greenBene/literal-or-iri-type-implement…
joka921 Feb 2, 2024
fdebba3
A tiny change.
joka921 Feb 2, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 9 additions & 1 deletion src/parser/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,14 @@ add_library(parser
SelectClause.cpp
GraphPatternOperation.cpp
# The `Variable.cpp` from the subdirectory is linked here because otherwise we get linking errors.
GraphPattern.cpp data/VariableToColumnMapPrinters.cpp)
GraphPattern.cpp data/VariableToColumnMapPrinters.cpp
LiteralType.h
IriType.h
IriType.cpp
LiteralType.cpp
LiteralOrIriType.cpp
LiteralOrIriType.h
NormalizedString.cpp
NormalizedString.h)
greenBene marked this conversation as resolved.
Show resolved Hide resolved
qlever_target_link_libraries(parser sparqlParser parserData sparqlExpressions rdfEscaping re2::re2 util engine)

11 changes: 11 additions & 0 deletions src/parser/IriType.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#include "IriType.h"

#include <utility>

IriType::IriType(NormalizedString iri) { this->iri = std::move(iri); }

NormalizedStringView IriType::getIri() const { return this->iri; }
22 changes: 22 additions & 0 deletions src/parser/IriType.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#pragma once

#include <string>

#include "NormalizedString.h"

class IriType {
greenBene marked this conversation as resolved.
Show resolved Hide resolved
private:
// Stores the string value of the IRI
NormalizedString iri;

public:
// Created a new iri object
explicit IriType(NormalizedString iri);

// Returns the string value of the iri object
[[nodiscard]] NormalizedStringView getIri() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface should probably be something like

static Iri parseFromRdf(std::string_view rdf); // takes "<someIriWithEscapingsEtc>" and calls the appropriate normalization from RdfsEsaping)"
NormalizedStringView getNormalizedContent; // The normalized value without the <>
std::string toRdf () // Redo the escaping and readd the <>. Typically not used so much, as we have dedicated exporting routines, but at least for debugging and printing this is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just getContent() then it's the same as for the Literal class which makes the usage easier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay after further thinking:
Maybe we mostly use the parser for the parsing and taking the NormalizedString directly is fine, but then you really really have to document whether this should be with or without the <>. And maybe warn if it starts or ends with those (theoretically you can escape them, but what are the odds?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So bottomline: It is important to maake it very clear when you get <> and when not.

};
64 changes: 64 additions & 0 deletions src/parser/LiteralOrIriType.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#include "LiteralOrIriType.h"

LiteralOrIriType::LiteralOrIriType(IriType data) : data(data) {}
LiteralOrIriType::LiteralOrIriType(LiteralType data) : data(data) {}
greenBene marked this conversation as resolved.
Show resolved Hide resolved

bool LiteralOrIriType::isIri() const {
return std::holds_alternative<IriType>(data);
}

IriType& LiteralOrIriType::getIriTypeObject() {
if (!isIri()) {
AD_THROW(
"LiteralOrIriType object does not contain an IriType object and thus "
"cannot return it");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use `AD_CONTRACT_CHECK as soon as I've merged it with the additional message.

return std::get<IriType>(data);
}

NormalizedStringView LiteralOrIriType::getIriString() {
greenBene marked this conversation as resolved.
Show resolved Hide resolved
IriType& iriType = getIriTypeObject();
return iriType.getIri();
}

bool LiteralOrIriType::isLiteral() const {
return std::holds_alternative<LiteralType>(data);
}

LiteralType& LiteralOrIriType::getLiteralTypeObject() {
if (!isLiteral()) {
AD_THROW(
"LiteralOrIriType object does not contain an LiteralType object and "
"thus cannot return it");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment about AD_CONTRACT_CHECK

return std::get<LiteralType>(data);
}

bool LiteralOrIriType::hasLanguageTag() {
LiteralType& literal = getLiteralTypeObject();
return literal.hasLanguageTag();
greenBene marked this conversation as resolved.
Show resolved Hide resolved
}

bool LiteralOrIriType::hasDatatype() {
LiteralType& literal = getLiteralTypeObject();
return literal.hasDatatype();
}

NormalizedStringView LiteralOrIriType::getLiteralContent() {
LiteralType& literal = getLiteralTypeObject();
return literal.getContent();
}

NormalizedStringView LiteralOrIriType::getLanguageTag() {
LiteralType& literal = getLiteralTypeObject();
return literal.getLanguageTag();
}

NormalizedStringView LiteralOrIriType::getDatatype() {
LiteralType& literal = getLiteralTypeObject();
return literal.getDatatype();
}
greenBene marked this conversation as resolved.
Show resolved Hide resolved
50 changes: 50 additions & 0 deletions src/parser/LiteralOrIriType.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#pragma once

#include "IriType.h"
#include "LiteralType.h"

class LiteralOrIriType {
greenBene marked this conversation as resolved.
Show resolved Hide resolved
private:
using LiteralOrIriVariant = std::variant<LiteralType, IriType>;
LiteralOrIriVariant data;

// Returns contained IriType object if available, throws exception otherwise
IriType& getIriTypeObject();
// Returns contained LiteralType object if available, throws exception
// otherwise
LiteralType& getLiteralTypeObject();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just getIri and getLiteral but also add the const& getBla() const;
overloads.


public:
// Creates a new LiteralOrIriType based on a LiteralType object
explicit LiteralOrIriType(LiteralType data);
// Creates a new LiteralOrIriType based on a IriType object
explicit LiteralOrIriType(IriType data);

// Returns true, if object contains an IriType object
[[nodiscard]] bool isIri() const;
// Returns iri string of contained IriType object if available, throws
greenBene marked this conversation as resolved.
Show resolved Hide resolved
// exception otherwise
NormalizedStringView getIriString();

// Returns true, if object contains an LiteralType object
[[nodiscard]] bool isLiteral() const;
greenBene marked this conversation as resolved.
Show resolved Hide resolved
// Returns true if contained LiteralType object has a language tag, throws
// exception if no LiteralType object is contained
bool hasLanguageTag();
// Returns true if contained LiteralType object has a datatype, throws
// exception if no LiteralType object is contained
bool hasDatatype();
// Returns content of contained LiteralType as string, throws exception if no
// LiteralType object is contained
NormalizedStringView getLiteralContent();
// Returns the language tag of the contained LiteralType, throws exception if
// no LiteralType object is contained or object has no language tag
NormalizedStringView getLanguageTag();
// Returns the datatype of the contained LiteralType, throws exception if no
// LiteralType object is contained or object has no datatype
NormalizedStringView getDatatype();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. If you want to do it like this, you also need getIriContent() for consistency.
  2. We also need a function getContent that gives us the normalized content of either the literal or the IRI.

};
44 changes: 44 additions & 0 deletions src/parser/LiteralType.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#include "LiteralType.h"

#include <utility>

LiteralType::LiteralType(NormalizedString content) {
this->content = std::move(content);
greenBene marked this conversation as resolved.
Show resolved Hide resolved
this->descriptorType = LiteralDescriptor::NONE;
}

LiteralType::LiteralType(NormalizedString content,
NormalizedString datatypeOrLanguageTag,
LiteralDescriptor type) {
this->content = std::move(content);
this->descriptorType = type;
this->descriptorValue = std::move(datatypeOrLanguageTag);
}
greenBene marked this conversation as resolved.
Show resolved Hide resolved

bool LiteralType::hasLanguageTag() const {
return this->descriptorType == LiteralDescriptor::LANGUAGE_TAG;
}

bool LiteralType::hasDatatype() const {
return this->descriptorType == LiteralDescriptor::DATATYPE;
}

greenBene marked this conversation as resolved.
Show resolved Hide resolved
NormalizedStringView LiteralType::getContent() const { return this->content; }

NormalizedStringView LiteralType::getDatatype() const {
if (!hasDatatype()) {
AD_THROW("The literal does not have an explicit datatype.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently working on an implementation of
AD_CONTRACT_CHECK(hasDatatype(), "getDatatype() was called on a Literal which has no datatype");
(More concise). I will let you know when this is merged.

return this->descriptorValue;
}

NormalizedStringView LiteralType::getLanguageTag() const {
if (!hasLanguageTag()) {
AD_THROW("The literal does not have an explicit language tag.");
}
return this->descriptorValue;
}
greenBene marked this conversation as resolved.
Show resolved Hide resolved
45 changes: 45 additions & 0 deletions src/parser/LiteralType.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
// Copyright 2023, University of Freiburg,
// Chair of Algorithms and Data Structures.
// Author: Benedikt Maria Beckermann <[email protected]>

#pragma once

#include "NormalizedString.h"

enum LiteralDescriptor { NONE, LANGUAGE_TAG, DATATYPE };

class LiteralType {
greenBene marked this conversation as resolved.
Show resolved Hide resolved
private:
// Stores the string value of the literal
NormalizedString content;
// Stores the optional language tag or the optional datatype if applicable
NormalizedString descriptorValue;
// Stores information if the literal has a language tag, a datatype, or non of
// these two assigned to it
LiteralDescriptor descriptorType;

public:
// Creates a new literal without any descriptor
LiteralType(NormalizedString content);

// Created a new literal with the given descriptor
LiteralType(NormalizedString content, NormalizedString datatypeOrLanguageTag,
LiteralDescriptor type);

// Returns true if the literal has an assigned language tag
bool hasLanguageTag() const;

// Returns true if the literal has an assigned datatype
bool hasDatatype() const;

// Returns the value of the literal, without any datatype or language tag
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make it always clear whether you get the quotation marks.
I am still not sure the more I am thinking about this, whether we shouldn't doe the parsing directly inside these classes, then we can enforce more invariants about the " @ and <> characters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least make the constructor private and have only certain authorized parsers create such objects.
But we can discuss this at a later point.

NormalizedStringView getContent() const;

// Returns the language tag of the literal if available.
greenBene marked this conversation as resolved.
Show resolved Hide resolved
// Throws an exception if the literal has no language tag.
NormalizedStringView getLanguageTag() const;

// Returns the datatype of the literal if available.
// Throws an exception if the literal has no datatype.
NormalizedStringView getDatatype() const;
};
32 changes: 32 additions & 0 deletions src/parser/NormalizedString.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
//
// Created by beckermann on 1/19/24.
//

#include "NormalizedString.h"

#include <iostream>

std::ostream& operator<<(std::ostream& str, NormalizedChar c) {
str << c.c_;
return str;
}

Check warning on line 12 in src/parser/NormalizedString.cpp

View check run for this annotation

Codecov / codecov/patch

src/parser/NormalizedString.cpp#L9-L12

Added lines #L9 - L12 were not covered by tests

NormalizedString fromStringUnsafe(std::string_view input) {
NormalizedString normalizedString;
normalizedString.resize(input.size());

std::transform(input.begin(), input.end(), normalizedString.begin(),
[](char c) { return NormalizedChar{c}; });
joka921 marked this conversation as resolved.
Show resolved Hide resolved

return normalizedString;
}

NormalizedString normalizeFromLiteralContent(std::string_view literal) {

Check warning on line 24 in src/parser/NormalizedString.cpp

View check run for this annotation

Codecov / codecov/patch

src/parser/NormalizedString.cpp#L24

Added line #L24 was not covered by tests
// TODO remove and replace invalid characters
return fromStringUnsafe(literal);
}

Check warning on line 27 in src/parser/NormalizedString.cpp

View check run for this annotation

Codecov / codecov/patch

src/parser/NormalizedString.cpp#L26-L27

Added lines #L26 - L27 were not covered by tests

std::string_view asStringView(NormalizedStringView normalizedStringView) {
return {reinterpret_cast<const char*>(normalizedStringView.data()),
normalizedStringView.size()};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is also unsafe in some way.

27 changes: 27 additions & 0 deletions src/parser/NormalizedString.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
//
// Created by beckermann on 1/19/24.
//
greenBene marked this conversation as resolved.
Show resolved Hide resolved

#pragma once

#include <string>
#include <string_view>

struct NormalizedChar {
char c_;
};

using NormalizedStringView = std::basic_string_view<NormalizedChar>;
using NormalizedString = std::basic_string<NormalizedChar>;
greenBene marked this conversation as resolved.
Show resolved Hide resolved

// Creates a new NormalizedString object by just copying the contents of the
// input.
// Warning: This function should only be used for testing as is to be removed
// once the normalizeFromLiteralContent function is implemented
NormalizedString fromStringUnsafe(std::string_view input);

// Normalizes the given literal and returns is as a new NormalizedString object
NormalizedString normalizeFromLiteralContent(std::string_view literal);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are various different normalizations, maybe we should talk about this in person on how to best do this.


// Returns the given NormalizedStringView as a string_view.
std::string_view asStringView(NormalizedStringView normalizedStringView);
joka921 marked this conversation as resolved.
Show resolved Hide resolved
1 change: 1 addition & 0 deletions test/parser/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
addLinkAndDiscoverTest(ParallelBufferTest parser)
addLinkAndDiscoverTest(LiteralOrIriTypeTest)
Loading