Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues and new features #4

Closed
rlaiola opened this issue Oct 7, 2024 · 50 comments
Closed

Issues and new features #4

rlaiola opened this issue Oct 7, 2024 · 50 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@rlaiola
Copy link

rlaiola commented Oct 7, 2024

Below there are some use cases that need further investigation.

Issues (potentially)

  • Queries 1 and 2
-- Dataset: RST
-- Should return all tuples in R, but returns an empty set
{ r | R(r) and ∀s (S(s)) }
{ t | R(t) ∨ t.a < 0}
  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }
  • Query 4
-- Dataset: RST
-- Error: at line 2: Expected "(", ".", or [a-zA-Z0-9_] but " " found.
-- Should consider the schema of the tuple variable
{ t | R(t) ∧ a > 0 }
  • Query 5
-- Dataset: RST
-- Error: could not find column "S.b" in schema [R.a : number, R.b : string, R.c : string]
{ t.a, p.b | R(t) and S(p) and t.a = p.b }
  • Query 6
-- Dataset: RST
-- Error: at line 1: Expected ".", "|", or [a-zA-Z0-9_] but "," found.
{ t, p | R(t) and S(p) and t.a = p.b }
  • Query 7
-- Dataset: RST
-- Error: at line 6: Expected ".", "[", "|", or [a-zA-Z0-9_] but "(" found.
-- Should allow generalized projection with value expression
-- See ValueExprFunction at grammar_ra.d.ts
{ concat(t.b, t.c) | R(t) }
{ (t.a > 1)->a | R(t) }
  • Query 8
-- Dataset: RST
-- Error: at line 6: Expected "[" but "}" found.
-- Should consider null value
{ t.b | R(t) and t.b = null }
  • Query 9
-- Dataset: RST
-- Should consider boolean values (as well as date, number and string)
{ t | R(t) and ((a>1) = true) }
  • Query 10
-- Dataset: Employee
-- Error: at line 6: Expected ")" or [a-zA-Z0-9_] but "." found.
{ t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }

New features

  • variable assignment : X = { t | R(t) and t.a = 1 }
@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

  • Query 5
-- Dataset: RST
-- Error: could not find column "S.b" in schema [R.a : number, R.b : string, R.c : string]
{ t.a, p.b | R(t) and S(p) and t.a = p.b }
  • Query 6
-- Dataset: RST
-- Error: at line 1: Expected ".", "|", or [a-zA-Z0-9_] but "," found.
{ t, p | R(t) and S(p) and t.a = p.b }

There's no support for multiple tuple variables at the moment. Maybe i should make that explicit with an error message

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

  • Query 10
-- Dataset: Employee
-- Error: at line 6: Expected ")" or [a-zA-Z0-9_] but "." found.
{ t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }

The problem here is that there's no such conversion into dates yet. Also, it's assumed that the right hand side of the predicate is always a RelationPredicate, I'll investigate it further and see if it can be generalized.

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }

Yes, i also thought about this issue before. Imo the renaming should be bound to the existential or universal operator, as they introduce a new "scope" where now the variable should be treated differently.

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

{ t | R(t) ∨ t.a < 0}

The problem here is that the R(t) is being skipped, i thought it wouldn't make a difference, but as seen here, it does. I'll take it into account.

@rlaiola
Copy link
Author

rlaiola commented Oct 7, 2024

  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }

Yes, i also thought about this issue before. Imo the renaming should be bound to the existential or universal operator, as they introduce a new "scope" where now the variable should be treated differently.

Good! It would meet the needs.

@rlaiola
Copy link
Author

rlaiola commented Oct 7, 2024

  • Query 5
-- Dataset: RST
-- Error: could not find column "S.b" in schema [R.a : number, R.b : string, R.c : string]
{ t.a, p.b | R(t) and S(p) and t.a = p.b }
  • Query 6
-- Dataset: RST
-- Error: at line 1: Expected ".", "|", or [a-zA-Z0-9_] but "," found.
{ t, p | R(t) and S(p) and t.a = p.b }

There's no support for multiple tuple variables at the moment. Maybe i should make that explicit with an error message

Perhaps, a way to solve that would be to consider the base relation as R x S.

@rlaiola
Copy link
Author

rlaiola commented Oct 7, 2024

  • Query 10
-- Dataset: Employee
-- Error: at line 6: Expected ")" or [a-zA-Z0-9_] but "." found.
{ t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }

The problem here is that there's no such conversion into dates yet. Also, it's assumed that the right hand side of the predicate is always a RelationPredicate, I'll investigate it further and see if it can be generalized.

Perhaps, the generalization could be something similar to valueExpressions from RA grammar:
https://github.com/KPMGE/relax/blob/development/src/db/parser/grammar_ra.pegjs#L1337

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

{ t | R(t) ∨ t.a < 0}

The problem here is that the R(t) is being skipped, i thought it wouldn't make a difference, but as seen here, it does. I'll take it into account.

@rlaiola Fixed on #5

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }

Yes, i also thought about this issue before. Imo the renaming should be bound to the existential or universal operator, as they introduce a new "scope" where now the variable should be treated differently.

Good! It would meet the needs.

@rlaiola fixed on #6. I also generalized the relation predicate positioning. Now, it can be placed anywhere as long as it is in the correct scope. I've added an error message it is missing.

Now a query like this works:

 { r | ∃p (p.a > 1 and R(p) ) and R(r) }

@KPMGE
Copy link
Owner

KPMGE commented Oct 7, 2024

  • Query 5
-- Dataset: RST
-- Error: could not find column "S.b" in schema [R.a : number, R.b : string, R.c : string]
{ t.a, p.b | R(t) and S(p) and t.a = p.b }
  • Query 6
-- Dataset: RST
-- Error: at line 1: Expected ".", "|", or [a-zA-Z0-9_] but "," found.
{ t, p | R(t) and S(p) and t.a = p.b }

There's no support for multiple tuple variables at the moment. Maybe i should make that explicit with an error message

Perhaps, a way to solve that would be to consider the base relation as R x S.

I think it may be more complicated than that, but i'll take a look at it! Thanks for the suggestion!

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

  • Query 9
-- Dataset: RST
-- Should consider boolean values (as well as date, number and string)
{ t | R(t) and ((a>1) = true) }

Idk, it looks a bit redundant, also as TRC relies on the truth or satisfaction of the conditions specified in the query, i don't see the benefit of adding these.

What do you think?

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

{ (t.a > 1)->a | R(t) }

I'm not sure if this query makes sense. In this case there's no separation between the condition and the tuple variable, which goes against the very definition of a TRC query, which is of the form:

{ t | Condition(t) }

The pipe separates the condition from the tuple variable, which does not happen in this query.

What do you think?

@rlaiola
Copy link
Author

rlaiola commented Oct 8, 2024

{ (t.a > 1)->a | R(t) }

I'm not sure if this query makes sense. In this case there's no separation between the condition and the tuple variable, which goes against the very definition of a TRC query, which is of the form:

{ t | Condition(t) }

The pipe separates the condition from the tuple variable, which does not happen in this query.

What do you think?

Indeed, this one does not make sense (copy and paste from another example). The expression that I tried to represent has a rename operator in the projection, as follows:

{ t.a->z | R(t) }

@rlaiola
Copy link
Author

rlaiola commented Oct 8, 2024

  • Query 9
-- Dataset: RST
-- Should consider boolean values (as well as date, number and string)
{ t | R(t) and ((a>1) = true) }

Idk, it looks a bit redundant, also as TRC relies on the truth or satisfaction of the conditions specified in the query, i don't see the benefit of adding these.

What do you think?

This expression could definitely be rewritten in a more elegant way but the main point that I wanted to convey here was that a predicate should be a general boolean value expression, as represented in the Relational Algebra grammar.

For instance, this works:

sigma (a>1) = true (R)

I have the impression that we should try using a similar approach (represent predicates as valueExpressions).

@rlaiola
Copy link
Author

rlaiola commented Oct 8, 2024

  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }

Yes, i also thought about this issue before. Imo the renaming should be bound to the existential or universal operator, as they introduce a new "scope" where now the variable should be treated differently.

Good! It would meet the needs.

@rlaiola fixed on #6. I also generalized the relation predicate positioning. Now, it can be placed anywhere as long as it is in the correct scope. I've added an error message it is missing.

Now a query like this works:

 { r | ∃p (p.a > 1 and R(p) ) and R(r) }

Great job!

Something like this would also work? Note that I refer to r.a in the quantified expression.

{ r | ∃p (p.a > r.a and R(p) ) and R(r) }

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

  • Query 3
-- Dataset: RST
-- Error: join would result in non unique column names R.a, R.b, R.c
-- Suggestion: always rename relation using variable name
--             (either free or bound to a quantified operator)
{ r | R(r) and ∃p (R(p) and p.a > 1) }

Yes, i also thought about this issue before. Imo the renaming should be bound to the existential or universal operator, as they introduce a new "scope" where now the variable should be treated differently.

Good! It would meet the needs.

@rlaiola fixed on #6. I also generalized the relation predicate positioning. Now, it can be placed anywhere as long as it is in the correct scope. I've added an error message it is missing.
Now a query like this works:

 { r | ∃p (p.a > 1 and R(p) ) and R(r) }

Great job!

Something like this would also work? Note that I refer to r.a in the quantified expression.

{ r | ∃p (p.a > r.a and R(p) ) and R(r) }

Yes! Definitely, as long as you correctly define the RelationPredicate at the right scope, everything is supposed to work as expected.

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

{ (t.a > 1)->a | R(t) }

I'm not sure if this query makes sense. In this case there's no separation between the condition and the tuple variable, which goes against the very definition of a TRC query, which is of the form:
{ t | Condition(t) }
The pipe separates the condition from the tuple variable, which does not happen in this query.
What do you think?

Indeed, this one does not make sense (copy and paste from another example). The expression that I tried to represent has a rename operator in the projection, as follows:

{ t.a->z | R(t) }

Oh, i see. In this case maybe it does make sense. I've got an idea on how to implement it, i'll take a look at it later. Thanks for the suggestion.

@rlaiola
Copy link
Author

rlaiola commented Oct 8, 2024

{ (t.a > 1)->a | R(t) }

I'm not sure if this query makes sense. In this case there's no separation between the condition and the tuple variable, which goes against the very definition of a TRC query, which is of the form:
{ t | Condition(t) }
The pipe separates the condition from the tuple variable, which does not happen in this query.
What do you think?

Indeed, this one does not make sense (copy and paste from another example). The expression that I tried to represent has a rename operator in the projection, as follows:

{ t.a->z | R(t) }

Oh, i see. In this case maybe it does make sense. I've got an idea on how to implement it, i'll take a look at it later. Thanks for the suggestion.

It might help: https://github.com/KPMGE/relax/blob/development/src/db/parser/grammar_ra.pegjs#L418-L440

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

{ (t.a > 1)->a | R(t) }

I'm not sure if this query makes sense. In this case there's no separation between the condition and the tuple variable, which goes against the very definition of a TRC query, which is of the form:
{ t | Condition(t) }
The pipe separates the condition from the tuple variable, which does not happen in this query.
What do you think?

Indeed, this one does not make sense (copy and paste from another example). The expression that I tried to represent has a rename operator in the projection, as follows:

{ t.a->z | R(t) }

Oh, i see. In this case maybe it does make sense. I've got an idea on how to implement it, i'll take a look at it later. Thanks for the suggestion.

It might help: https://github.com/KPMGE/relax/blob/development/src/db/parser/grammar_ra.pegjs#L418-L440

@rlaiola Implemented on #7
Yes. I've used a similar approach, but one more suitable for the already developed trc grammar. Now you can rename the columns. It's worth pointing out that renaming and projecting can also be combined. All the following queries work now:

{ r.a->x, r.b->y, r.c->z | R(r) }
{ r.a->x, r.b, r.c | R(r) }
{ r.a, r.b, r.c | R(r) }

I've also added the rename operator(arrow) on the editor tab.

@KPMGE
Copy link
Owner

KPMGE commented Oct 8, 2024

  • Query 9
-- Dataset: RST
-- Should consider boolean values (as well as date, number and string)
{ t | R(t) and ((a>1) = true) }

Idk, it looks a bit redundant, also as TRC relies on the truth or satisfaction of the conditions specified in the query, i don't see the benefit of adding these.
What do you think?

This expression could definitely be rewritten in a more elegant way but the main point that I wanted to convey here was that a predicate should be a general boolean value expression, as represented in the Relational Algebra grammar.

For instance, this works:

sigma (a>1) = true (R)

I have the impression that we should try using a similar approach (represent predicates as valueExpressions).

Oh i see, that's a good point. I'll revisit it later. Thanks for the suggestion.

@KPMGE
Copy link
Owner

KPMGE commented Oct 9, 2024

  • Query 8
-- Dataset: RST
-- Error: at line 6: Expected "[" but "}" found.
-- Should consider null value
{ t.b | R(t) and t.b = null }
  • Query 9
-- Dataset: RST
-- Should consider boolean values (as well as date, number and string)
{ t | R(t) and ((a>1) = true) }
  • Query 10
-- Dataset: Employee
-- Error: at line 6: Expected ")" or [a-zA-Z0-9_] but "." found.
{ t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }

@rlaiola Solved in #10 . I've incorporated the valueExpr from RA into the trc grammar. Now the predicate is a valueExpr so with some adjustments these cases are now covered.

It's worth pointing that now, the not operator must have parenthesis for predicates, just like the ! do, so i don't think this one is a problem. The tests were adjusted accordingly.

I've also took the liberty to come up with a slightly more interesting test case:

{ t | DEPENDENT(t) and date(t.Bdate) < date('1964-09-15') }

This one is a bit more interesting as it actually filters out and returns some tuples based on the condition. The previous example also works, but it returns no tuples(as expected).

@rlaiola
Copy link
Author

rlaiola commented Oct 9, 2024

Summary:

  • Query 1
  • Query 2
  • Query 3
  • Query 4
  • Query 5
  • Query 6
  • Query 7
  • Query 8
  • Query 9
  • Query 10

@KPMGE
Copy link
Owner

KPMGE commented Oct 9, 2024

{ r | R(r) and ∀s (S(s)) }

Fixed on #13

@KPMGE
Copy link
Owner

KPMGE commented Oct 10, 2024

  • Query 5
-- Dataset: RST
-- Error: could not find column "S.b" in schema [R.a : number, R.b : string, R.c : string]
{ t.a, p.b | R(t) and S(p) and t.a = p.b }
  • Query 6
-- Dataset: RST
-- Error: at line 1: Expected ".", "|", or [a-zA-Z0-9_] but "," found.
{ t, p | R(t) and S(p) and t.a = p.b }
  • Query 7
-- Dataset: RST
-- Error: at line 6: Expected ".", "[", "|", or [a-zA-Z0-9_] but "(" found.
-- Should allow generalized projection with value expression
-- See ValueExprFunction at grammar_ra.d.ts
{ concat(t.b, t.c) | R(t) }
{ (t.a > 1)->a | R(t) }

@rlaiola Implemented on #15

Summary:

  • Grammar updated
  • Translation updated to deal with multiple variables
  • New tests implemented

Now a query like this works:

{ t.a->z, p.b, k | R(t) and S(p) and T(k) }

This example is quite interesting because it shows that any combination of renaming and projecting can also be combined with this implementation (these scenarios were covered with unit tests)

As pointed out in the #9, if one tries to define a RelationPredicate twice for any tuple variable in the same scope, an error is shown as before

image

@rlaiola
Copy link
Author

rlaiola commented Oct 10, 2024

@rlaiola Implemented on #15

Summary:

  • Grammar updated
  • Translation updated to deal with multiple variables
  • New tests implemented

Now a query like this works:

{ t.a->z, p.b, k | R(t) and S(p) and T(k) }

This example is quite interesting because it shows that any combination of renaming and projecting can also be combined with this implementation (these scenarios were covered with unit tests)

As pointed out in the #9, if one tries to define a RelationPredicate twice for any tuple variable in the same scope, an error is shown as before

@KPMGE, you rock! With joins working the possibilities are endless! 🚀

Some remarks:

  1. Suggestion: swap the order of the rename operator with the projection (root). Otherwise, the second expression fails (it is correct)
-- Works
{ t.a->z | R(t) }
-- Does not work: Relation R has a column b
-- Error: could not add column "t.b" because of ambiguity
{ t.a->b | R(t) }
  1. Somehow, the your last query is not working for me:
Screenshot 2024-10-10 at 09 11 35
  1. Just 1 test is failing: translate trc ast to relational algebra > Projection > Multiple tuple variables: test mixed projection approaches

@KPMGE
Copy link
Owner

KPMGE commented Oct 10, 2024

3. Somehow, the your last query is not working for me:

Screenshot 2024-10-10 at 09 11 35 3. Just 1 test is failing: **translate trc ast to relational algebra > Projection > Multiple tuple variables: test mixed projection approaches**

Oh, it was a typo when i was renaming some variables that caused this lol. Sorry about that, it was kinda late by that time lol. Now it seems to work fine:

image
image

@KPMGE
Copy link
Owner

KPMGE commented Oct 10, 2024

{ t.a->b | R(t) }

@rlaiola Fixed on #16

I've also impoved the implementation using the columnValue and namedColumnExpr from the RA grammar. It simplified the implementation quite a bit.

@rlaiola
Copy link
Author

rlaiola commented Oct 11, 2024

Well done!

To not loose track, I write here the missing features that would be nice to have:

  1. Use of function expressions in projections and predicates
-- Dataset: RST
-- Error: Relation predicate must be defined!
{ concat(t.b, t.c)->bla | R(t) }
-- or
{ (t.b || t.c)->bla | R(t) }
-- Dataset: RST
-- sigma abs(a)>0 (R)
-- Works!
{ t | R(t) and (abs(a)>0) }
-- Does not work
{ t | R(t) and abs(a)>0 }

2 ) Variable assignment

X = { t | R(t) and t.a = 1 }
{ p | S(p) and ∃t (X(j)) and p.b = j.b }

@KPMGE
Copy link
Owner

KPMGE commented Oct 12, 2024

-- Dataset: RST
-- sigma abs(a)>0 (R)
-- Works!
{ t | R(t) and (abs(a)>0) }
-- Does not work
{ t | R(t) and abs(a)>0 }

Fixed on #17

@KPMGE
Copy link
Owner

KPMGE commented Oct 12, 2024

-- Dataset: RST
-- Error: Relation predicate must be defined!
{ concat(t.b, t.c)->bla | R(t) }
-- or
{ (t.b || t.c)->bla | R(t) }

Fixed on #18

@KPMGE
Copy link
Owner

KPMGE commented Oct 12, 2024

2 ) Variable assignment

X = { t | R(t) and t.a = 1 }
{ p | S(p) and ∃t (X(j)) and p.b = j.b }

@rlaiola I'm not sure if variable assignment makes sense for TRC. RA and TRC are different by nature, RA is a procedural language, meaning it tells "how to retrieve x" so that, you can "build up" a query by just splitting these steps and saving them into variables.

For TRC, it's the complete opposite, as it is a declarative language, meaning it tells "what you want to retrieve" instead of telling the steps to get to it. In this context, there's no really a notion of intermediate steps, the whole TRC expression is an unified logical expression that states the conditions for the data to be retrieved.

I see that, in TRC the concept of variables is "encapsulated" in quantified operations, namely and

In your example, let's say:

X = { t | R(t) and t.a = 1 }

That part could make sense, you kinda give a "label" to the tuples retrieved from this query. But now, as soon as you wanna use it, it becomes problematic:

 { p | S(p) and ∃t (X(j)) and p.b = j.b }

You can't "inject" a variable into a set of tuples(which is the result of the previous query, stored in X). In TRC it's not possible to use X as "calling a function", the tuples must be quantified and that needs to be done in the TRC query context.

As now there's support for multiple tuple variables, this kind of query becomes even easier to represent correctly:

{ p | S(p) and ∃t (R(t) and t.a = 1 and p.b = t.b) }

What do you think?

@rlaiola
Copy link
Author

rlaiola commented Oct 16, 2024

@rlaiola I'm not sure if variable assignment makes sense for TRC. RA and TRC are different by nature, RA is a procedural language, meaning it tells "how to retrieve x" so that, you can "build up" a query by just splitting these steps and saving them into variables.

For TRC, it's the complete opposite, as it is a declarative language, meaning it tells "what you want to retrieve" instead of telling the steps to get to it. In this context, there's no really a notion of intermediate steps, the whole TRC expression is an unified logical expression that states the conditions for the data to be retrieved.

I see that, in TRC the concept of variables is "encapsulated" in quantified operations, namely and

In your example, let's say:

X = { t | R(t) and t.a = 1 }

That part could make sense, you kinda give a "label" to the tuples retrieved from this query. But now, as soon as you wanna use it, it becomes problematic:

 { p | S(p) and ∃t (X(j)) and p.b = j.b }

You can't "inject" a variable into a set of tuples(which is the result of the previous query, stored in X). In TRC it's not possible to use X as "calling a function", the tuples must be quantified and that needs to be done in the TRC query context.

As now there's support for multiple tuple variables, this kind of query becomes even easier to represent correctly:

{ p | S(p) and ∃t (R(t) and t.a = 1 and p.b = t.b) }

What do you think?

@KPMGE, I'm very glad with your answer! It shows that you really got the point!

That said, I'll try to make an argument. While TRC doesn't formally support storing intermediate results the way procedural languages do, breaking down complex queries into smaller, reusable parts can greatly enhance legibility, maintainability, and reusability. We can see this modular and structured approach as an extension borrowed from Higher-Order Logic. In certain scenarios this would make TRC queries easier to write, understand, and maintain. We could say that these are as well some of the benefits of relational algebra's variables or views and CTEs in SQL.

Not really mandatory but I see it as a interesting feature, especially for learning purposes. And as this implementation is based on the RA one, the realization seems "an inch away".

My two cents!

@rlaiola
Copy link
Author

rlaiola commented Oct 16, 2024

Testing the latest commit, I found the following issues:

-- Generalized projection
-- Works in RA
-- pi (a*2)->doublea (R)
-- Does not work in TRC
-- Error: Relation predicate must be defined!
{ (t.a*2)->doublea | R(t) }
-- Value Expressions with date (Do not work in RA not TRC)
-- Error: Cannot read properties of undefined (reading '0')
-- sigma date(Bdate) > date('1990-01-01') (DEPENDENT)
-- { t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }
-- sigma date(Bdate)<date('1964-09-15') (DEPENDENT)
{ t | DEPENDENT(t) and date(t.Bdate) < date('1964-09-15') }

This latest one might be my fault 🙄

@KPMGE
Copy link
Owner

KPMGE commented Oct 16, 2024

-- Generalized projection
-- Works in RA
-- pi (a*2)->doublea (R)
-- Does not work in TRC
-- Error: Relation predicate must be defined!
{ (t.a*2)->doublea | R(t) }

@rlaiola fixed on #19. Thanks for pointing out, unfortunately js by default sets undefined, which broke the program. I've just filtered the result, now it works

image

@KPMGE
Copy link
Owner

KPMGE commented Oct 16, 2024

-- Value Expressions with date (Do not work in RA not TRC)
-- Error: Cannot read properties of undefined (reading '0')
-- sigma date(Bdate) > date('1990-01-01') (DEPENDENT)
-- { t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }
-- sigma date(Bdate)<date('1964-09-15') (DEPENDENT)
{ t | DEPENDENT(t) and date(t.Bdate) < date('1964-09-15') }

That's very strange, i've already solved this issue before #4 (comment). I recently updated my repository with the code from relax, maybe that's where the issue came from.

@KPMGE
Copy link
Owner

KPMGE commented Oct 16, 2024

-- Value Expressions with date (Do not work in RA not TRC)
-- Error: Cannot read properties of undefined (reading '0')
-- sigma date(Bdate) > date('1990-01-01') (DEPENDENT)
-- { t | DEPENDENT(t) and date(t.Bdate) > date('1990-01-01') }
-- sigma date(Bdate)<date('1964-09-15') (DEPENDENT)
{ t | DEPENDENT(t) and date(t.Bdate) < date('1964-09-15') }

That's very strange, i've already solved this issue before #4 (comment). I recently updated my repository with the code from relax, maybe that's the issue.

@rlaiola I've solved the issue, indeed it was a recently merged request on relax that caused this issue: dbis-uibk#201. I've opened a PR solving it: dbis-uibk#217

In this repository, i've already merged it though, so now it's working again:
image

@rlaiola
Copy link
Author

rlaiola commented Oct 16, 2024

@KPMGE sorry, this thread is getting quite long.

I've found another issue. I supposed that it was working before.

-- Works
{ t | R(t) }
-- Do not work
-- Error: at line 2: Expected "!=", "%", "*", "+", "-", "/", "<", "<=", "<>", "=", ">", ">=", "ilike", "like", "||", "}", "≠", "≤", "≥", logical AND, logical IMPLICATION, logical OR, or logical XOR but "∈" found.
{ t | t ∈ R }
-- Do not work
{ t | t in R }

@KPMGE
Copy link
Owner

KPMGE commented Oct 16, 2024

-- Works
{ t | R(t) }
-- Do not work
-- Error: at line 2: Expected "!=", "%", "*", "+", "-", "/", "<", "<=", "<>", "=", ">", ">=", "ilike", "like", "||", "}", "≠", "≤", "≥", logical AND, logical IMPLICATION, logical OR, or logical XOR but "∈" found.
{ t | t ∈ R }
-- Do not work
{ t | t in R }

@rlaiola thanks for pointing out! This issue has to do with the notation of TRC we're using. I've solved the issue #4 (comment) in which i've changed the order of evaluation of an AtomicFormula, so that queries like the one you've suggested would work.
image

But then, the grammar matches incorrectly when you have a RelatonPredicate alone as you've pointed out here. If i change the order back the this issue will be solved, but the other one will go back.

I'd suggest removing the R(t) notation of relation predicates and leaving only the last 2, that would work. What do you think?

@KPMGE
Copy link
Owner

KPMGE commented Oct 16, 2024

@rlaiola I'm not sure if variable assignment makes sense for TRC. RA and TRC are different by nature, RA is a procedural language, meaning it tells "how to retrieve x" so that, you can "build up" a query by just splitting these steps and saving them into variables.
For TRC, it's the complete opposite, as it is a declarative language, meaning it tells "what you want to retrieve" instead of telling the steps to get to it. In this context, there's no really a notion of intermediate steps, the whole TRC expression is an unified logical expression that states the conditions for the data to be retrieved.
I see that, in TRC the concept of variables is "encapsulated" in quantified operations, namely and
In your example, let's say:

X = { t | R(t) and t.a = 1 }

That part could make sense, you kinda give a "label" to the tuples retrieved from this query. But now, as soon as you wanna use it, it becomes problematic:

 { p | S(p) and ∃t (X(j)) and p.b = j.b }

You can't "inject" a variable into a set of tuples(which is the result of the previous query, stored in X). In TRC it's not possible to use X as "calling a function", the tuples must be quantified and that needs to be done in the TRC query context.
As now there's support for multiple tuple variables, this kind of query becomes even easier to represent correctly:

{ p | S(p) and ∃t (R(t) and t.a = 1 and p.b = t.b) }

What do you think?

@KPMGE, I'm very glad with your answer! It shows that you really got the point!

That said, I'll try to make an argument. While TRC doesn't formally support storing intermediate results the way procedural languages do, breaking down complex queries into smaller, reusable parts can greatly enhance legibility, maintainability, and reusability. We can see this modular and structured approach as an extension borrowed from Higher-Order Logic. In certain scenarios this would make TRC queries easier to write, understand, and maintain. We could say that these are as well some of the benefits of relational algebra's variables or views and CTEs in SQL.

Not really mandatory but I see it as a interesting feature, especially for learning purposes. And as this implementation is based on the RA one, the realization seems "an inch away".

My two cents!

@rlaiola Good points! I agree that in SQL and RA, variable assignment is an interesting feature, that greatly reduces the complexity of some queries.

But i'm not really sure if that's suitable for TRC, though. As i pointed out, both SQL and RA are procedural languages, that makes their expressions way more "self contained" as they're kind of like "instructions", so in this case it makes sense to me to break down a big query into smaller pieces.

Also, because of the "procedureness" of SQL and RA, the expressions have a well defined scope. For example:

pi a(sigma a > 3 (R))

In this case, it's clear that the query will be executed from the inner expression R, that will then be filtered and projected. This way, it makes sense to break it down:

k = sigma a > 3 (R)
pi a(k)

but now, in TRC, the scope for the variable must be defined, and for that a whole TRC query is necessary, for example, the following query(partial) doesn't make sense:

R(t) and t.a > 3

So, if you need a whole query every time, you end up creating a whole new scope, for example, the hypothetical query has 2 scopes:

x = { t | R(t) and t.a > 3 }
{ t.a | R(t) and x  }

In this case, the second query does not make sense, as the t for the first and second queries are different, they live in different scopes!

So, to mix them up and try to make it work would require "come up with" a new syntax for TRC, which i'm not sure would be good for learning purposes.

I think having a TRC query as a whole expression that tells what is to be retrieved is more suitable in this case, as breaking down a query and having to learn a new notation just to use this feature here would be a bit confused. Wouldn't it?

For me, one should think of a TRC query having in mind its "declarativeness".

I'm open to suggestions, though.

@rlaiola
Copy link
Author

rlaiola commented Oct 16, 2024

x = { t | R(t) and t.a > 3 }
{ t.a | R(t) and x  }

In this case, the second query does not make sense, as the t for the first and second queries are different, they live in different scopes!

In this example, the X variable represents a macro for a relation so the expressions could be rewritten as

X = { t | R(t) and t.a > 3 } -- in this scope the tuple variable is t
{ k.a | X(k)}  -- this is a new scope so the tuple variable is k (t is unknown)

So, to mix them up and try to make it work would require "come up with" a new syntax for TRC, which i'm not sure would be good for learning purposes.

The syntax would continue the same, this would be an alternative way.

I think having a TRC query as a whole expression that tells what is to be retrieved is more suitable in this case, as breaking down a query and having to learn a new notation just to use this feature here would be a bit confused. Wouldn't it?

Perhaps, we'd need to run a study with users to measure that anyway.

For me, one should think of a TRC query having in mind its "declarativeness".

I'm open to suggestions, though.

I do agree with you, let's at least keep as a suggestion for future work.

@rlaiola
Copy link
Author

rlaiola commented Oct 17, 2024

-- Works
{ t | R(t) }
-- Do not work
-- Error: at line 2: Expected "!=", "%", "*", "+", "-", "/", "<", "<=", "<>", "=", ">", ">=", "ilike", "like", "||", "}", "≠", "≤", "≥", logical AND, logical IMPLICATION, logical OR, or logical XOR but "∈" found.
{ t | t ∈ R }
-- Do not work
{ t | t in R }

@rlaiola thanks for pointing out! This issue has to do with the notation of TRC we're using. I've solved the issue #4 (comment) in which i've changed the order of evaluation of an AtomicFormula, so that queries like the one you've suggested would work. image

But then, the grammar matches incorrectly when you have a RelatonPredicate alone as you've pointed out here. If i change the order back the this issue will be solved, but the other one will go back.

I'd suggest removing the R(t) notation of relation predicates and leaving only the last 2, that would work. What do you think?

PR #22 should fix this.

@rlaiola
Copy link
Author

rlaiola commented Oct 17, 2024

@KPMGE found an issue when using a column by its index:

-- Works
{ t.a, t.c, p.b, p.d | R(t) and S(p) }
-- Works
{ t.[1], t.[3], p.b, p.d | R(t) and S(p) }
-- Does not work
-- Error: invalid projection "t.a, t.c, p.[1], p.[2]": column index &quot;p.[1]&quot; is out of range in schema [t.a : number, t.b : string, t.c : string, p.b : string, p.d : number]; index starts at 1
{ t.a, t.c, p.[1], p.[2] | R(t) and S(p) }

@KPMGE
Copy link
Owner

KPMGE commented Oct 17, 2024

-- Works
{ t | R(t) }
-- Do not work
-- Error: at line 2: Expected "!=", "%", "*", "+", "-", "/", "<", "<=", "<>", "=", ">", ">=", "ilike", "like", "||", "}", "≠", "≤", "≥", logical AND, logical IMPLICATION, logical OR, or logical XOR but "∈" found.
{ t | t ∈ R }
-- Do not work
{ t | t in R }

@rlaiola thanks for pointing out! This issue has to do with the notation of TRC we're using. I've solved the issue #4 (comment) in which i've changed the order of evaluation of an AtomicFormula, so that queries like the one you've suggested would work. image
But then, the grammar matches incorrectly when you have a RelatonPredicate alone as you've pointed out here. If i change the order back the this issue will be solved, but the other one will go back.
I'd suggest removing the R(t) notation of relation predicates and leaving only the last 2, that would work. What do you think?

PR #22 should fix this.

No, it wont.

As i explained, the problem is that abs(something) has the same structure as R(t), both have a string followed by a string enclosed by parenthesis. This only has to do with the order in which a AtomicFormula is evaluated:

image

If a Predicate is first evaluated, then it will try mathing a Predicate and will fail:

image

If you change the order, then, as you've pointed out, it will work again:

image

image

But then, this issue will come back #4 (comment)

image

In this case, as abs(a) has the structure of a RelationPredicate, the grammar will incorrectly match it here.

In this case, there are 2 solutions:

  1. Keep the relation predicate as it is and require parenthesis for the predicate in some formulas:
    image

  2. Try to match the predicate first and remove the R(t) syntax, leaving only t in R and t ∈ R. In this case there's no need to use extra parenthesis on the previous expression:
    image

Imo, the second approach is a bit better, but i'm open to suggestions.

What do you think?
Lemme know if i wasn't clear enough.

@KPMGE
Copy link
Owner

KPMGE commented Oct 17, 2024

@KPMGE found an issue when using a column by its index:

-- Works
{ t.a, t.c, p.b, p.d | R(t) and S(p) }
-- Works
{ t.[1], t.[3], p.b, p.d | R(t) and S(p) }
-- Does not work
-- Error: invalid projection "t.a, t.c, p.[1], p.[2]": column index &quot;p.[1]&quot; is out of range in schema [t.a : number, t.b : string, t.c : string, p.b : string, p.d : number]; index starts at 1
{ t.a, t.c, p.[1], p.[2] | R(t) and S(p) }

This issue has to do with the way the RA implementation was made. The same error occurs when trying do the same on a RA query:
image

The issue is that, the result schema starts from 1. So because a cross join was made, the P's indexes actually start from 4, because you have the 3 R's columns followed by the 2 columns from s. [t.a, t.b, t.c, p.b, p.d]

image

That works on TRC implementation as well:
image

@rlaiola
Copy link
Author

rlaiola commented Oct 17, 2024

@KPMGE found an issue when using a column by its index:

-- Works
{ t.a, t.c, p.b, p.d | R(t) and S(p) }
-- Works
{ t.[1], t.[3], p.b, p.d | R(t) and S(p) }
-- Does not work
-- Error: invalid projection "t.a, t.c, p.[1], p.[2]": column index &quot;p.[1]&quot; is out of range in schema [t.a : number, t.b : string, t.c : string, p.b : string, p.d : number]; index starts at 1
{ t.a, t.c, p.[1], p.[2] | R(t) and S(p) }

The issue is that, the result schema starts from 1. So because a cross join was made, the P's indexes actually start from 4, because you have the 3 R's columns followed by the 2 columns from s. [t.a, t.b, t.c, p.b, p.d]

Good to know. It is worth submitting an issue in the main repository.

@rlaiola
Copy link
Author

rlaiola commented Oct 17, 2024

No, it wont.

@KPMGE just to make sure, have you tried running the app with the changes in #22?

If not, please checkout and try this branch: https://github.com/rlaiola/relax/tree/add-iff-op_fix-in-op_cleanup

All the exemples seem to be working here.

-- Works
{ t | R(t) and abs(a)>3 }
-- Works
{ t | abs(a)>3 and R(t) }
-- Works
{ t | t in R and abs(a)>3 }
-- Works
{ t | abs(a)>3 and t in R }
-- Works
{ t | t ∈ R and abs(a)>3 }
-- Works
{ t | abs(a)>3 and t ∈ R }

Thanks!

@rlaiola
Copy link
Author

rlaiola commented Oct 17, 2024

It might help testing

# clone repo, checkout remote branch and apply changes
git clone https://github.com/KPMGE/relax.git \
&& cd relax \
&& git remote add fork_rlaiola https://github.com/rlaiola/relax.git \
&& git fetch fork_rlaiola fix-bags \
&& git fetch fork_rlaiola add-iff-op_fix-in-op_cleanup \
&& git checkout -b fix-bags fork_rlaiola/fix-bags \
&& git checkout -b add-iff-op_fix-in-op_cleanup fork_rlaiola/add-iff-op_fix-in-op_cleanup \
&& git remote remove fork_rlaiola \
&& git checkout implement-trc \
&& git merge fix-bags -m 'merge 1' \
&& git merge add-iff-op_fix-in-op_cleanup -m 'merge 2'
# open in vs code
code .
# Restore repo
git reset --merge HEAD~2

@KPMGE
Copy link
Owner

KPMGE commented Oct 18, 2024

It might help testing

# clone repo, checkout remote branch and apply changes
git clone https://github.com/KPMGE/relax.git \
&& cd relax \
&& git remote add fork_rlaiola https://github.com/rlaiola/relax.git \
&& git fetch fork_rlaiola fix-bags \
&& git fetch fork_rlaiola add-iff-op_fix-in-op_cleanup \
&& git checkout -b fix-bags fork_rlaiola/fix-bags \
&& git checkout -b add-iff-op_fix-in-op_cleanup fork_rlaiola/add-iff-op_fix-in-op_cleanup \
&& git remote remove fork_rlaiola \
&& git checkout implement-trc \
&& git merge fix-bags -m 'merge 1' \
&& git merge add-iff-op_fix-in-op_cleanup -m 'merge 2'
# open in vs code
code .
# Restore repo
git reset --merge HEAD~2

@rlaiola sorry, i think i had a problem with cache or something, i've cleaned the cache and reinstalled the application, now it works!

Sorry about that and thanks for the help. You've came up with a very interesting solution! congrats.

@KPMGE
Copy link
Owner

KPMGE commented Jan 1, 2025

@rlaiola since all issues have been resolved and all limitation have been addressed, do you think this issue can be closed?

Thanks for the help.

@KPMGE KPMGE self-assigned this Jan 1, 2025
@KPMGE KPMGE added bug Something isn't working enhancement New feature or request labels Jan 1, 2025
@KPMGE
Copy link
Owner

KPMGE commented Jan 6, 2025

@rlaiola Just to let you know, i was testing the project and found out that the trc error reporting could be improved a bit. I did it on #27

@rlaiola rlaiola closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants