diff --git a/404.html b/404.html deleted file mode 100644 index 706934c63..000000000 --- a/404.html +++ /dev/null @@ -1,42 +0,0 @@ - - -
- -The goal of this tutorial is to give a quick introduction how to use the relational algebra calculator + and its concepts. It assumes that you already know the relational algebra or are learning it from other + sources.
+ +There is no real standard for the relational algebra like there is for SQL. So every book or teacher
+ might have its slightly different interpretation and notation.
+ The goal of this progam was to support the most commonly used "mathematical" notation used by Database Systems The Complete Book 2nd
+ Edition by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom, Datenbanksysteme: Eine Einführung
+ by Alfons Kemper and André Eickler and Wikipedia - Relational Algebra and
+ others.
+
The core element of the calculator is the relation (or table) which consists of a fixed number of + attributes (or columns) in a fixed order (this is called the schema of the relation) and a set of tuples + or rows containing the specific values.
+ +Each attribute has three distinct properties: its type, its position and its name.
+ +The type or domain of an attribute is either string, number, date or boolean.
+ The type is used for example to determine if two values can be compared in a boolean expression or if
+ two schemas are union compatible. In most cases the type of the attributes are obvious if you look at
+ the values.
The position of each attribute in a schema is fixed and can be used to adress the attributes.
+ An example would be the projection of the first and third attribute or column of an arbitrary relation
+ R: π [1], [3] ( R )
+
The full qualified name of the attribute is a unique identifier of the attribute within the schema of its
+ relation.
+ It consists of the name itself and a relation qualifier and are written like in SQL as R.a
+ where a is the name and R is the relation qualifier.
+ An example would be the projectoin of the attributes a, b from a relation R: π R.a, R.b ( R )
+ The default relation qualifier of each attribute is the name of its relation.
+
+ If the attributes name without the qualifier is unique within the relation's schema, it also can be used
+ to address a specific attribute.
+ The previous example could also be written as π a, b ( R )
.
+
Each relation has a set of tuples (or rows). This means that there are no duplicate tuples within one
+ relation and the duplicate-elimination is implicitly executed after every single step of the
+ calculation.
+ The tuples in the calculator have a defined order and unlike a normal database system all operations are
+ implemented to preserve that order. This should help the users to see what has changed from one step to
+ the next.
+
so far we covered that
+π [1], [2] ( R )
,π a, b ( R )
π R.a,
+ S.a ( R x S )
Before we introduce how to use the operators this should be a quick introduction of a very handy feature + of the relational algebra calculator: the alternative plain text notation
+ +The "classic" mathematical notation uses greek letters like π, σ for the unary operations and
+ special symbols like the join symbol ⋈ or the union symbol ∪
+ for some binary operations.
+ This symbols can be entered using the toolbar at the top of the editor.
+
This calculator also supports a alternative syntax for all this symbols that follows two very simple + rules: Every greek letter can be substituted with its name spelled out ("pi" for π, "gamma" + for γ) and every other symbol has an equivalent name that is borrowed from SQL, programming + languages like C and Set theory.
+ +This substitutions should be easy to read and much more important very easy to write because you don't
+ need any toolbar or mouse. The calculator also supports autocomplete: just press [CTRL]+[SPACE] to
+ complete the current keyword.
+ This feature should help you to write your statements more quickly and fluently.
+
π R.a, S.a, S.b
+σ R.a = S.a ∧ ( R.a > 5 ∨ R.a < 0 ) (
+ R ⨯ S
+)
+
+ is equivalent to:
+
+ pi R.a, S.a, S.b
+sigma R.a = S.a and ( R.a > 5 or R.a < 0 ) (
+ R cross join S
+)
+
+ In the following table you can see a list of all supported substitutions:
+ classical notation | +alternative notation | +
---|---|
π | +pi | +
σ | +sigma | +
ρ | +rho | +
τ | +tau | +
γ | +gamma | +
∩ | +intersect | +
∪ | +union | +
- | +\ | +
÷ | +/ | +
⨯ | +
+
|
+
⋈ | +
+
|
+
⟕ | +
+
|
+
⟖ | +
+
|
+
⟗ | +full outer join | +
⋉ | +left semi join | +
⋊ | +right semi join | +
▷ | +
+
|
+
← | +<-
+ pi new_name <- a ( R )
+ |
+
→ | +->
+ pi a -> new_name ( R )
+ |
+
For this Part we use the "bank example" + Dataset with 3 relations: Customers, Accounts and PremiumCustomers. By + convention relations start with a uppercase letter and attributes with a lower case letter.
+ +Open the dataset used in this tutorial using the following link to the "bank example" Dataset.
+ +You find the relations and their attributes listed on the side and if you hover a relations name a + preview of the first view tuples is displayed.
+ + +After you have found the Dataset you can formulate the very first and most basic query in relational + algebra: a relation without any further manipulation.
+ + +Just enter the name of a relation into the code editor or click on the relation/attribute names to insert
+ them into the code editor.
Note that the editor supports auto completing the relation/attribute names
+ of the current dataset and the operators with [CTRL]+[SPACE]
So if you want all tuples of the relation Customer you should have the following statement: Customer
.
+ And get all the tuples if you press the execute button or press [CTRL]+[RETURN].
All unary operations have the same basic syntax FUNCTION ARGUMENT (
+ CHILD_EXPRESSION )
.
The braces around the CHILD_EXPRESSION
can be omitted. In this case the
+ predefined operator precedence for relational algebra applies.
+
A complete list of the supported relalg operations can be found here: general + syntax, unary operations and binary operations.
+ +The projection is one of the basic operations that allow to choose which of the attributes of the parent + relations should be included in the new one and in which order they should be.
+ +Renaming a relation (ρ) changes the qualifiers of all the relations attributes but does not touch the + tuples.
+ +Renaming an attribute (ρ) only changes the name of a specific attribute (and leaves his + relation-qualifier unchanged).
+ +The statement pi balance ( Accounts )
returns a new relation with only the balance
+ attribute.
The next statement gets the balance with the account-id after renaming the relation to A and + renames one of the attributes.
+rho account_number <- aid (
+ pi aid, A.balance (
+ rho A (
+ Accounts
+ )
+ )
+)
+
+ Like in SQL or most programming languages you can format your statement and use SQL like
+ comments (with -- ...
or /* ... */
) to increase the readability.
The next statement uses a selection to filter the tuples of a relation based on a boolean expression. The
+ calculator supports complex boolean expression with functions and built in operator precedence.
+
The attributes in the boolean expression can be given as name or numeric position like with the
+ projection.
+
The next statement selects all customers-ids of customers who have the same value for their firstname and + lastname.
+-- this should return an relation with no tuples:
+pi cid (
+ sigma firstname=lastname ( Customers )
+)
+
+
+ The next example uses a more complex expression to get all accounts with a balance over 100 or under + -100.
+sigma balance > 100 or (balance*-1 > 100) ( Accounts )
+-- (balance < -100) would also be correct
+
+ As a shorter alternative you can use a function in your expression to get + the same result:
+sigma abs(balance) > 100 ( Accounts )
+
+
+
+
+
+
+ Everybody can provide datasets that can be used in the relational algebra calculator and share them with
+ others.
+ We assume the scenario of a teacher wanting to provide a dataset for his/her students for this short
+ tutorial.
+
+ The datasets are specified in a simple text format and can be shared with others via GitHub Gists (a simple and free platform to share snippets).
+ The shared gist gets an unique ID and the relational algebra calculator can load the dataset directly
+ using this ID.
+
+ The fist step is the creation of the dataset which is actually only a group of relation definitions with
+ some additional information and is therefore refered as group in the specified format.
+ The relations can then be used by the students to formulate the there statements.
+ Lets assume we want to create a dataset of employees of a company.
+
group: bank example
+
+ every header field starts with the name of the field and is followed by a colon for single line values.
+
+ The next (optional) header field we should specify is the description. It should contain information like
+ who is maintaining this dataset or where to find additional information.In our example we add a description that takes more than a single line and therefore we enclose the value + in double brackets instead of using the colon.
+ +group: bank example
+description[[ the data for this dataset was generated using <http://www.generatedata.com/>
+
+* the relation _Customers_ contains basic information about the customers of the bank.
+The relation _Accounts_ contains the basic information of a single account.
+Note that a customer can have any number of accounts.
+* the relation _PremiumCustomers_ contains
+the customer-ids of all customers with a total balance over 1000
+]]
+
+
+ The next step is to actually add the relations the students can use for their queries.
+
+ The relation definitions are use the relational algebra syntax that can be used in this tool.
+
+ Every relation is defined by a single variable assignment where the name of the variable is used as the
+ relations name and the result of the expression defines the relation.
+ The relalg expression can use all features that can be used in the tool and can also use other relations
+ defined within the same dataset.
+ Note that the name of the relation is used as the qualifier of each attribute/column.
For the relation Customers
relation we use the inline
+ relation syntax as the most basic method to define the relation and can also be edited using the
+ relation editor which is a simple spread-sheet like
+ editor.
+ The inline relations in combination with the editor should be very easy to use if you enter the data
+ directly or if you have them as a csv/spread-sheet file.
+
group: bank example
+description[[ the data for this dataset was generated using <http://www.generatedata.com/>
+
+* the relation _Customers_ contains basic information about the customers of the bank.
+* the relation _Accounts_ contains the basic information of a single account. Note that a customer can
+have any number of accounts.
+* the relation _PremiumCustomers_ contains the customer-ids of all customers with a total balance over
+1000
+]]
+
+Customers = { cid firstname lastname
+1 Dexter Simpson
+2 Kaseem Gallagher
+3 Kuame Hamilton
+4 Robert Thompson
+5 Rhiannon Valentine
+6 Calvin Mays
+}
+
+Accounts = {
+aid, cid, balance:number
+1, 1, 66
+2, 1, -304
+3, 2, 272
+4, 3, 3472
+5, 4, 975.7
+6, 4, 93
+7, 5, 534
+8, 5, -75.5
+}
+
+
+
+ As we can see the name of the relations are defined by the assignments.
+ The inline-relations are enclosed by curly brackets and contain the names of the attributes/columns in
+ the first line and then a tuple/row per line where the values are simply separated by whitespace. You
+ can also other seperators and can define the types explicitly as we can see at the Accounts relation.
+ For further information can be found at inline relation
+ description.
+
+ Note that, unlike the variables used in a query, the definition of a new relation implicitly sets the
+ attribute qualifier to the name of the relation.
+ So the schema of the account relation is (Accounts.aid, Accounts.cid, Accounts.balance)
.
+ This allows each attribute to be accessible with this name.
+
The last relation we need to add in this example is the relation containing the banks premium Customers.
+
+ They are specified by using the other two relations in a simple relational algebra statement that
+ selects all customers with a total balance over 1000.
+
group: bank example
+description[[ the data for this dataset was generated using <http://www.generatedata.com/>
+
+* the relation _Customers_ contains basic information about the customers of the bank.
+* the relation _Accounts_ contains the basic information of a single account. Note that a customer can
+have any number of accounts.
+* the relation _PremiumCustomers_ contains the customer-ids of all customers with a total balance over
+1000
+]]
+
+Customers = { cid firstname lastname
+1 Dexter Simpson
+2 Kaseem Gallagher
+3 Kuame Hamilton
+4 Robert Thompson
+5 Rhiannon Valentine
+6 Calvin Mays
+}
+
+Accounts = {
+aid, cid, balance:number
+1, 1, 66
+2, 1, -304
+3, 2, 272
+4, 3, 3472
+5, 4, 975.7
+6, 4, 93
+7, 5, 534
+8, 5, -75.5
+}
+
+PremiumCustomers =
+ pi cid (
+ sigma sum > 1000 (
+ gamma cid; sum(balance)->sum (
+ Accounts
+ join Customers
+ )
+ )
+ )
+
+
+ We have now seen how to define a dataset with its header containing the name and the description followed + by relational algebra assignments defining the relations of the dataset.
+ +We can paste this definition directly in the Group Editor to load and use it.
+ In the next section we want to look at how we can publish this definition so that we give our students a
+ single url or id to directly load the dataset.
+
If we want to share the definition of a dataset with other people we could simply give them the + definition and they load it using the group editor, but that would not be that practical for most + cases.
+ +The simpler solution (for the users) is to publish the definition as a GitHub Gist and share the ID of the gist with others.
+ +Just create a gist with the definition as its content (the filename does not matter) and publish it. The + ID of the Gist can now be found at the top of the page as gist:xxxxxxxxxxxx or in the url after + the last slash.
+ +This ID can then be shared and loaded in the interface or the calculator can be called directly with a
+ specific ID by using using the parameter ?data=gist:xxxxxxxxxxxxxx
.
For example the simple bank definition of this tutorial has been published as a Gist with the ID
+ 2cfb981fbc5676182d64 and can therefore be loaded directly by appending ?data=gist:2cfb981fbc5676182d64
+ to the url of the calculator.
syntax | +NAME = EXPRESSION |
+
---|
Defines a new local variable with the name NAME; its content is defined by + EXPRESSION
+ +The name of the new relation must be unique.
+ +The definition has to be executed with the statement.
+ +TestA = π a,b R
+TestB = σ d > 0 S
+
+-- statement using the variable
+TestA join TestB
+ An assignment (= definition of a variable) is no valid relational algebra expression on its own. + If you miss the acutal query a error is thrown (Error: only assignments found; query is + missing). +
+ +-- this is the definition of the variable
+Test = π Customer.firstname, surname ( Customer )
+
+-- this is the actual query/statement using the variable
+Test
+
+ The defined variable can be used like the assigned expression could be used because
+ every usage of the variable gets replaced with its definition before the query gets executed.
+
This also means that the variable-name has no influence on the schema of the expression
+ and the names of the attributes/columns are not affected by assignment:
+ X = R
+X join S
+ The attributes of the relation R are only accessible with its original names (R.a, R.b, ..),
+ and are not affected by the assignment.
+
There is a known problem when the last assignment ends with a natural join and the query consists + of a single relation:
+A = S join R
+A -- this is the query
+
+
+ The statement is ambiguous and the parser interprets it as A = (S join R A)
+ where R is interpreted as a column argument for the theta-join and therefore detects a
+ cyclic usage of the variable A.
+
+ Solution: To get the expected behaviour you have to set braces around the assigned expression
+ like A = (S join R)
+ A
+
syntax | +-- COMMENT_TEXTEXPRESSION |
+
---|
the '--' must be followed by at least one whitespace charater! inserts a comment; its text goes + until the end of the line
+ +comments are recognized as whitespace
+ +Test =
+-- This is the expression that is assigned to Test:
+π Customer.firstname, surname ( Customer )
+ syntax | +RELATION_NAME |
+
---|
Uses a pre defined relation with the name RELATION_NAME
+ +The code completion only works for this relations.
+ +( Customers ) cross join ( Accounts )
+ syntax | +/* COMMENT_TEXT */
+ EXPRESSION |
+
+
---|
inserts a comment that can span multiple lines
+ +comments are recognized as whitespace
+ +/* This is
+a very
+very
+long comment */
+Test = π Customer.firstname, surname ( Customer )
+ syntax | +{COLUMN_NAME_1:COLUMN_TYPE_1 ...
+ROW_1
+ROW_2
+...
+}
+ |
+
---|
The inline-relation is a temporary relation that can be defined directly in the statement. It is + only valid in the defining statement
+ +Every inline-relation is a valid expression and thus can be used at any position a + EXPRESSION is expected.
+ +The inline-relation is defined by a header, that specifies the schema of the relation + and the rows containing the values and is surrounded by curly brackets.
+ +QUALIFIER.COLUMN_NAME:COLUMN_TYPE
separated by any whitespace,
+ comma or semicolon.
+ The QUALIFIER is optional. Also the COLUMN_TYPE can be omitted if the type is well
+ defined by the values of that column. The first non null value of a column defines its type.
+
+ The rows of the relation are defined by a list of values per row with the type of the
+ corresponding column. The values are separated by whitespace comma or semicolon.
+
Simple strings only containing letters, numbers, hyphens, underlines, dots or periods
+ ([0-9a-zA-Z\-_\.]+) can be written without single quotes. NULL and null are
+ constant values. If null, true or false should be used as string they have te be quoted.
+
More complex strings must be surrounded by single quotes: 'content' or content
+ but '' or 'long content containing spaces and special characters like !' or 'null'.
+
Dates are written in ISO-format: YYYY-MM-DD without single quotes
+
A null-value can be written as null or NULL (without single quotes).
+
Numbers could be integers in the form (-?[0-9]+) or floats in the form (-?[0-9]+\.[0-9]+).
+
Numbers in single quotes are recognized as numbers if the column type is defined as number
+ or has been detected to be number from a previous value; otherwise it will be a string value..
+
A boolean value is denoted as either true or false (case insensitive).
+
The header and rows can be indented if needed.
+ +-- type for column b is defined by the first value
+rho A {
+ a:number, b
+ 1, 2
+ 3, 4
+}
+cross join
+{
+ a:string X.b:date c:number
+ Alpha 1970-01-01 1
+ 'Beta 2' 1970-01-02 3
+ '' 1970-01-03 4
+}
+ A valid relational algebra expression is built by connecting relation-name or inline-relation + as atoms with the defined unary and binary operators.
+ + So a relational algebra expression is recursively defined as follows: + +FUNCTION ARGUMENT ( CHILD_EXPRESSION )
+ symbol | +π | +
---|---|
alternative syntax | +pi | +
example | +
+ pi a, b ( R )
+ |
+
The argument is a subset of columns of the schema of the CHILD_EXPRESSION or a value expression
+ +π Customer.firstname, surname ( Customer )
+ pi c.id, [1] ( ρ c ( Customer ) )
+ pi c.id, lower(username)->user, concat(firstname, concat(' ', lastname))->fullname (
+ ρ c ( Customer )
+)
+ rownum()
expression can be used to get the same information. And it can also be used
+ directly in the boolean condition of a selection or join.
+
+ pi firstname, lastname
+sigma rownum() <= 5
+tau countOrders desc
+Customer
+
+ symbol | +σ | +
---|---|
alternative syntax | +pi | +
example | +
+ sigma a > 2 ( R )
+ |
+
The argument is a boolean expression that each row of CHILD_EXPRESSION + is checked on
+ +σ firstname = 'Bob' or firstname = 'Alice' ( Customer )
+ σ (id > 10 and id < 100) or id = 42 ( Customer )
+ σ mod(length(firstname),2) = 0 ( Customer )
+ symbol | +ρ | +
---|---|
alternative syntax | +rho | +
example | +
+ ( R ) join R.a = X.b (rho X ( R ))
+ |
+
π a.id, a.firstname ( ρ a ( Customer ) )
+ symbol | +ρ | +
---|---|
alternative syntax | +rho | +
example | +
+ the old and the new column names in a list (see example) + "←" can be substituted with "<-" + pi x, b rho a->x {a, b
+ 1, 2
+ 3, 4
+}
+ |
+
ρ myId←id, foobar←firstname (π id, firstname ( Customer ) )
+ symbol | +τ | +
---|---|
alternative syntax | +tau | +
example | +
+ tau a asc, b desc ( R )
+ |
+
τ [1], firstname desc (π id, firstname ( Customer ) )
+ symbol | +γ | +
---|---|
alternative syntax | +gamma | +
example | +
+ gamma a, count(*)->x ( R )
+ |
+
γ a, b ; sum(c)->x ( Customer )
+ If no grouping columns are provided the entire relation is the group.
+ ++ | number | +string | +date | +
---|---|---|---|
COUNT( * ) | +yes | +yes | +yes | +
COUNT( column ) | +yes | +yes | +yes | +
MIN( column ) | +yes | +yes | +yes | +
MAX( column ) | +yes | +yes | +yes | +
SUM( column ) | +yes | +no | +no | +
AVG( column ) | +yes | +no | +no | +
( CHILD_EXPRESSION ) FUNCTION ARGUMENT (
+ CHILD_EXPRESSION )
+ symbol | +∩ | +
---|---|
alternative syntax | +intersect | +
( Customer ) ∩ ( Customer )
+ symbol | +∪ | +
---|---|
alternative syntax | +union | +
( Customer ) ∪ ( Customer )
+ symbol | +÷ | +
---|---|
alternative syntax | +/ | +
( Customer ) ÷ ( Customer )
+ - | +∪ | +
---|---|
alternative syntax | +\ + except + |
+
( pi firstname ( Customer ) ) - ( rho test<-lastname (
+ pi lastname ( Customer )
+) )
+ symbol | +⨯ | +
---|---|
alternative syntax | +cross join | +
symbol | +⋈ | +
---|---|
alternative syntax | +join inner join |
+
+ Special case:
+ The name of a single boolean column (like R join a S
) can not be used directly
+ as a join condition due to ambiguities in the relational algebra syntax.
+
+ The column must either be specified with its qualifier (R join R.a S
) or wrapped in
+ parentheses (R join (a) S
).
+
+ This is not necessary for more complex boolean expressions. The problem is only that the single
+ column name
+ can not be distinguished from a relation name: the expression X=R join S X
could be
+ interpreted as A=(R join S A)
instead of A=(R join S) A
.
+
symbol | +⋈ | +
---|---|
alternative syntax | +join natural join |
+
ρ a ( Customer )
+⋈ a.name < b.name ( ρ b ( Customer ) )
+ symbol | +⟕ | +
---|---|
alternative syntax | +left outer join left join |
+
symbol | +⟖ | +
---|---|
alternative syntax | +right outer join right join |
+
symbol | +⟗ | +
---|---|
alternative syntax | +full outer join | +
symbol | +⋉ | +
---|---|
alternative syntax | +left semi join | +
symbol | +⋊ | +
---|---|
alternative syntax | +right semi join | +
symbol | +▷ | +
---|---|
alternative syntax | +anti semi join anti join |
+
The operator precedence allows to obmit most of braces.
+
The used precedence is shown in the table below.
+
All operators are left associative.
+
order of precedence | +Operator | +
---|---|
0 | ++ relation-name, + inline-relation + | +
1 | ++ projection, + selection, + rename (columns), + rename (relations), + group, + order by + | +
2 | ++ cross product, + theta join, + natural join, + left outer join, + right outer join, + full outer join, + left semi-join, + right semi-join, + anti semi-join, + division + | +
3 | ++ intersection + | +
4 | ++ union, + subtraction + | +
A join B x C
+ ((A join B) x C)
+ sigma true A join sigma true B
+ (sigma true (A)) join (sigma true (B))
+ rownum()
expression always represents the index of the lefthand relation
+ syntax | +returns type | +description | +
---|---|---|
a AND b |
+ boolean | +logical and | +
a OR b |
+ boolean | +logical or | +
a XOR b |
+ boolean | +logical exclusive or | +
NOT b |
+ boolean | +logical not | +
+ a = b
+a != b
+a < b
+a > b
+a <= b
+a >= b
+a != b
+ |
+ boolean | +compares two values of the same type | +
a:string LIKE 'PATTERN' |
+ boolean | +returns true if expression evaluating to a string a matches the pattern
+ given as the second operand.
+ The pattern has to be given as a string literal; + +
+ An underscore ( |
+
a:string ILIKE 'PATTERN' |
+ boolean | +
+ same as LIKE but matches case-insensitive.
+ This is not in the SQL standard but is a PostgreSQL extension. + |
+
+a + b
+a - b
+a * b
+a / b
+a % b
+ |
+ number | +arithmetic addition, subtraction, multiplication, division, modulo | +
rand() |
+ number | +returns a random number in the interval [0, 1] | +
rownum() |
+ number | +
+ returns the index of the current row (starting with 0).
+ If the function is used in a binary relational algebra expression (e.g. a join) + it always represents the index of the left operand. + |
+
length(a:string) |
+ number | +returns the length of the string | +
date(a:string) |
+ date | +parses the given string to a date object. The string must be in the form YYYY-MM-DD | +
adddate(a:date, b:number) |
+ date | +adds the given number of days to date a |
+
subdate(a:date, b:number) |
+ date | +subtracts the given number of days from date a |
+
now()
+ transaction_timestamp()
+ statement_timestamp() |
+ date | +returns a timestamp representing the start of the query execution
+ transaction- and statement start are the very same value due to the lack of a transaction + concept + |
+
clock_timestamp() |
+ date | +returns the current timestamp while executing the query | +
year(a:date) |
+ number | +returns the year component of a given date | +
month(a:date) |
+ number | +returns the month component of a given date as a number. (1 = Sunday, 2 = Monday, ..., 7 = + Saturday) + | +
day(a:date)
+ dayofmonth(a:date) |
+ number | +returns the day component of a given date as a number in the range 1 to 31 | +
hour(a:date) |
+ number | +returns the hour component of a given date as a number in the range 0 to 23 | +
minute(a:date) |
+ number | +returns the minute component of a given date as a number in the range 0 to 59 | +
second(a:date) |
+ number | +returns the second component of a given date as a number in the range 0 to 59 | +
concat(a:string [, ...]) |
+ string | +concatenates the given strings | +
upper(a:string)
+ ucase(a:string) |
+ string | +converts the given string to upper-case | +
lower(a:string)
+ lcase(a:string) |
+ string | +converts the given string to lower-case | +
strlen(a:string) |
+ number | +number of characters of the string | +
abs(a:number) |
+ number | +the absolute value of the given number | +
+ add(a, b)
+sub(a, b)
+mul(a, b)
+div(a, b)
+mod(a, b)
+ |
+ number | +arithmetic addition, + subtraction, + multiplication, + division or + modulo of the given numbers + |
+
+ round(a)
+floor(a)
+ceil(a)
+ |
+ number | +round to nearest integer, + largest integer not greater than the argument or + smallest integer not less than the argument + |
+
coalesce(value [, ...]) |
+ type of value | +returns the first of its arguments that is not null or null if all arguments are null. Note that + all arguments must have the same datatype. + | +
CASE WHEN condition THEN result
+[WHEN ...]
+[ELSE result]
+END
+ |
+ type of result | +returns the first result where the condition evaluates to true. If all conditions are false the + else part is executed or null is returnt if the else part is missing. Note that all results must + have the same datatype. + | +
order of precedence | +Operators | +
---|---|
0 | +functions, constants, columns | +
1 | +! (boolean not) | +
2 | +- (unary minus) | +
3 | +*, /, % | +
4 | +-, + | +
5 | += (comparison), >=, >, <=, <, <>, !=, LIKE, ILIKE | +
6 | +CASE, WHEN, THEN, ELSE | +
7 | +AND | +
8 | +XOR | +
9 | +OR, || | +
The goal of the SQL mode of the relational algebra calculator is to provide a translation from SQL to + relational algebra to show how they are related. + It does not support all features a real database system like PostgreSQL or MySQL does because the goal is to provide a + translation into relational algebra. + This means that many features like correlated-substatements are not supported because the translation + into relational algebra is not trivial and modern database systems use an extended set of operators + internally that do not require a one-to-one translation into "classical" relational algebra. + Therefore the learning effect for users of this tool would not be that big. +
+ + +All keywords are case insensitv.
+ +The following Synopsis is a adapted version of PostgreSQL and + shows the general syntax of the supported SQL. Brackets indicate optional parts. Braces and vertical + lines indicate that one of the alternatives has to be chosen. Dots mean that the preceding element can + be repeated.
+[ WITH with_query [, ...] ]
+SELECT [ DISTINCT ]
+* | expression [ [ AS ] output_name ] [, ...]
+FROM from_item [, ...]
+[ WHERE condition ]
+[ GROUP BY column [, ...] ]
+[ HAVING condition ]
+[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
+[ ORDER BY column [ ASC | DESC ] [, ...] ]
+[ LIMIT { count | ALL } ]
+[ OFFSET start [ ROW | ROWS ] ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+
+where from_item can be one of:
+
+table_name [ AS alias ]
+with_query_name [ AS alias ]
+( select ) AS alias
+from_item CROSS JOIN from_item
+from_item NATURAL JOIN from_item
+from_item [ INNER ] JOIN from_item ON join_condition
+from_item [ INNER ] JOIN from_item NATURAL
+from_item [ INNER ] JOIN from_item USING ( join_column [, ...] )
+from_item
+from_item { LEFT | RIGHT | FULL } [ OUTER ] JOIN ON join_condition
+from_item
+from_item { LEFT | RIGHT | FULL } [ OUTER ] JOIN NATURAL from_item
+from_item { LEFT | RIGHT | FULL } [ OUTER ] JOIN USING ( join_column [, ...] ) from_item
+
+and with_query is:
+
+with_query_name AS ( select )
+
+ The SQL statement is translated directly into relational algebra. To understand some of the effects of + the tool it might be helpful to understand the steps of the translation process. + As mentioned before, real database systems might take a different (more complex) aproach but this should + help to get an idea of how SQL could be translated into "classical" relational algebra.
+ +The following list shows the translation from SQL into relational algebra starting with the inner most + relational algebra expression at the top.
+,
+ is a cross join)
+ The direct translation into relational algebra with implicit duplication elimination requires the distinct
+ keyword to be equivalent. A warning is shown if you omit it.
The select-clause is translated into up to two relalg operators.
+select * ...
where no changes are made to the
+ schema of the result. Therefore no projection is needed. You can use the optional table-alias-prefix
+ if the columns of a single table/relation should be returned only: select distinct
+ R.* from R inner join S
select a, b from ...
) then
+ a single projection is used.
+ as foo
.
+ e.g. select a as foo from ...
The allowed expressions are the same as for the projection.
+ So it can be either the name of the column (with optional renaming using as
) or a complex
+ value expression. In the latter case a output-name must be given using
+ as
because the tool requires every column to have a name.
+
In its simplest form the from-clause holds a single table/relation name that is used directly in the
+ relalg statement. If the optional table-alias is specified with table as foo
this is
+ reflected by wrapping the relation in a rename-relation
+ operator with the given output-name.
select distinct x.a, x.b
+ from R as x
+
+ Joins can be expressed using the ANSI join syntax
+ +select distinct *
+from A, B
+ inner join C on ...
+ inner join D natural
+ natural join E
+ left outer join F on ...
+ left outer join G natural
+ right outer join H on ...
+ right outer join I natural
+ full outer join J on ...
+ full outer join K natural
+where ...
+
+
+ The comma is part of the old join syntax and is translated into a cross + join.
+select distinct *
+from R, S as s, T
+where s.a = R.a
+
+ Instead of the name of relation a non-correlated substatement can be used. A table alias must be provided
+ with (...) as foo
.
+
A non-correlated substatement can be directly translated into relational algebra by just translate
+ the sub-statement into relational algebra and use the resulting operator tree instead of the relation.
+
Non-correlated means that it must not reference/use any columns of tables defined in the outer scope.
+
This limitation is intentionally because the translation into relational algebra is not trivial and
+ modern database systems use an extended set of operators internally that do not require a one-to-one
+ translation into "classical" relational algebra. Therefore the learning effect for users of this tool
+ would not be that big.
select distinct *
+from R, (select * from S where a > 0) as x
+where x.a = R.a
+
+ The boolean condition in the where-clause can be any expression evaluating + to boolean.
+ +The where clause is directly translated to an relational algebra + selection with the very same condition. This selection is applied after joining relations of the + from-clause therefore has to use the original column names.
+ +Subquery Expressions like EXISTS
, IN
, ANY/SOME
or ALL
+ are not supported because their translation into relational algebra is not trivial and
+ modern database systems use an extended set of operators internally that do not require a one-to-one
+ translation into "classical" relational algebra. Therefore the learning effect for users of this tool
+ would not be that big.
The GROUP-BY-clause takes a list of column names only argument.
+ +The GROUP-BY-clause is directly translated to the relational algebra + group-by operation and is executed directly after the selection built from the WHERE-clause and + before the projection/renaming build from the SELECT-clause. Therefore the column names that can be used + are the ones available after all joining all tables.
+ +The aggregations used in the relational algebra group-by
+ operation are taken from the SELECT-clause and an output-name must be given using as
+ because the tool requires every column to have a name.
If no aggregations are present in the SELECT-clause a projection is used instead of the group-by + operation because sigma without aggregation has the very same + effect.
+ +Every non-aggregation-column in the SELECT-clause must be present in the group by clause because the + would not be available after the grouping.
+ + +The HAVING-Clause represents an optional relational algebra + selection. The boolean condition can be any expression evaluating to + boolean.
+ +The resulting selection is executed directly after the relational + algebra group-by operation.
+ +Unlike PostgreSQL + the HAVING-clause is only allowed when either a aggregation or grouping is present.
+ + +Order by takes a list of column names or indices of columns (starting with 1) as its argument.
+ +It is directly translated to the extended relational algebra + operation order by (tau).
+ + +The LIMIT-clause can be either specified with the LIMIT-OFFSET syntax used by PostgreSQL and MySQL or the + FETCH-FIRST syntax introduced in SQL:2008.
+ +It is translated into a relational algebra selection using the
+ rownum()
-function to limit the number of rows returned.
The Set-Operators UNION, INTERSECT and EXCEPT directly map to the relational algebra operators union, intersection and subtraction.
+ +The keyword DISTINCT
is optional because it represents the default behavior. The keyword
+ ALL
is ignored and a warning is shown because the targeted relational algebra has a
+ implicit elimination duplicate rows.
+ Parentheses can be used to create more complex statements:
+ (
+ select distinct * from S
+ union
+ select distinct * from T
+)
+except select distinct * from T
+order by 1
+limit 1
+
The WITH-clause (also known as common table expressions) provides a way to define subqueries for + single or multiple use in a statement. This can be thought as defining a temporary table for that query + in SQL terminology or creating variables in relational algebra. + Recursive evaluation is not supported.
+ +Each subquery can be referenced by the name from the WITH-clause. The subquery is automatically renamed + to the name used in the WITH-clause.
+ + +Value expressions are used for boolean expressions for WHERE- and HAVING-clause, the boolean conditions + of joins and calculated values in the SELECT-clause. + The type of a expression is either string, number, date or boolean and is + determined by the used operations and columns.
+ +The supported functions and operations are the same for SQL and relational algebra: value expression
+ + + + +
+
+
+
+ This work by
+ Johannes Kessler
+ is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
+
calculates any relational algebra statement like
+ ( σ a > 42 ( A ) ) ⋈ ( π a, b ( B ) )
+ on a set of relations.
+
If you want to learn SQL you take a database system and try some queries.
+
But if you want to learn relational algebra what do you use? Pen and paper?
The relational algebra calculator helps you learn relational algebra (RelAlg) by executing it.
+ +suId | +name | +
---|---|
0 | +Computing | +
1 | +Maths | +
2 | +English | +
3 | +History | +
stId | +name | +
---|---|
1 | +John | +
2 | +Mike | +
3 | +Lisa | +
4 | +Julia | +
suId | +stId | +mark | +
---|---|---|
0 | +1 | +A | +
0 | +2 | +B | +
0 | +3 | +C | +
0 | +4 | +A | +
2 | +2 | +A | +
3 | +2 | +F | +
4 | +1 | +C | +
Name | +Source | +Language | +
---|---|---|
Kemper Datenbanksysteme | +Alfons Kemper, André Eickler: + Datenbanksysteme: Eine Einführung 8. Auflage, Seite 84, Abbildung 3.8 + | +de | +
UIBK - KursDB | +Tables from and for the lecture Databases: + Foundations, Data Models and System Concepts - University of Innsbruck chapter 3 + | +de | +
UIBK - R, S, T | +Tables from and for the lecture Databases: + Foundations, Data Models and System Concepts - University of Innsbruck chapter 3 + | +de | +
The Complete Book - Exercise 2.4.1 | +Sample Data from + Database Systems The Complete Book 2nd + Edition by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom: + Exercise 2.4.1 Page 52-55 + | +en | +
The Complete Book - Exercise 2.4.3 | +Sample Data from + Database Systems The Complete Book 2nd + Edition by Hector Garcia-Molina, Jeff Ullman, and Jennifer Widom: + Exercise 2.4.1 Page 55-57 + | +en | +
The tool is not meant to be a full database system. + The goal of the implementation was to create a tool to support people to learn RelAlg. +
+ + ++ | Pen and Paper | +relational algebra calculator | +MySQL | +
---|---|---|---|
SQL support | +yes :-) | +basic SQL | +yes | +
relational algebra | +yes | +yes | +no | +
use variables for RelAlg | +yes | +yes | +no | +
large datasets | +no | +no | +yes | +
query plan | +no | +yes | +no | +
intermediate results | +no | +yes | +no | +
iterative working | +no | +yes | +yes | +
pi 'abc' like 'a%'->x R
+ ||
-concat
+ operatorexcept
as alternative syntax for the relational algebra set-difference
+ operatorA=R join S A
was interpreted as
+ A=(R join S A)
instead of A=(R join S) A
.
+ pi hour(now())->x, minute(now())->y, second(now())->z ( R )
+ pi concat(b, '_', c)->x R
+ pi coalesce(a, b, 1)->x R
, select coalesce(a, b, 1) as x from R
+ select case when a > 3 then a+1 else a end as x from R
+ (select * from S union ( select * from T except select * from T )) order by 1
+ select distinct * from R, (S natural join T)
+ select * from R offset 2 rows fetch first 10 rows only
+ select distinct a+2 as x from R where length(b) > 2
+ 1970-01-01
could not be used any more because it is
+ ambiguous (arithmetic operaton or date).
+ The SQL notation (date('1970-01-01')
) is now used for the relational algebra
+ mode.
+ The relational algebra calculator was created by Johannes Kessler BSc at Databases and Information Systems Group + at the Institute of Computer Science at the University of Innsbruck under supervision of + Michael Tschuggnall PhD and Prof. Dr. Günther Specht
+ + +This tool was not written from scratch but many different external resources/frameworks/projects/libs are + used.
+ +This is a list of resources/frameworks/projects/libs used for this tool (in alphabetical order) to give + credit where credit is due and guide anyone interested to them without having to look through the + code.
+Source:
') + .append($('gist.github.com') + .attr('href', data.html_url) + ); + var additional = $('' + numRows + ' row' + (numRows == 1 ? '' : 's') + '
'); + + // add inline table definition + if (n.hasMetaData('isInlineRelation') && n.getMetaData('isInlineRelation') === true && n.hasMetaData('inlineRelationDefinition')) { + container.append($('').text(n.getMetaData('inlineRelationDefinition'))); + } + + return container; + })(n), + html: true + }); + + formula.append(n.getFormulaHtml(false, false)); + formula.data('raNode', n); + + li.append(formula); + + // first child + if (n.hasChild() || n.hasChild2()) { + var ul = $('