Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Like operator #160

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 47 additions & 6 deletions optimade.md
Original file line number Diff line number Diff line change
Expand Up @@ -1287,8 +1287,23 @@ In addition to the standard equality and inequality operators, matching of parti

OPTIONAL features:

The following comparison operators are OPTIONAL:

* `identfier LIKE x`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `identfier LIKE x`
* `identifier LIKE x`


* `identfier UNLIKE x`
Comment on lines +1292 to +1294
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `identfier LIKE x`
* `identfier UNLIKE x`
* `identifier LIKE pattern`: Is true if the property matches the provided `pattern`.
* `identifier UNLIKE x`: Is true if the property does not match the provided `pattern`.

Adds a bit of explanation matching the format for CONTAINS etc. above. I think pattern is clearer than x here, as x is used as a placeholder for the substrings above. Also fixes a typo in "identifier" - will also add a separate suggestion for this in isolation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `identfier UNLIKE x`
* `identifier UNLIKE x`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strong preference for UNLIKE rather than just NOT(identifier LIKE pattern)? Is UNLIKE is an SQL construct that I have not come across?

I guess the precedent for this is our inclusion of KNOWN and UNKNOWN -- I just don't think I like having two ways of doing different optional things, e.g., an implementation could choose to implement only LIKE and therefore has to implement ~LIKE but can still choose to ignore UNLIKE. At least KNOWN/UNKNOWN are both mandatory. I guess we could add to the spec that if LIKE is implemented then UNLIKE must also be implemented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a strong preference for UNLIKE rather than just NOT(identifier LIKE pattern)? Is UNLIKE is an SQL construct that I have not come across?

UNLIKE is shorter, which is significant for URL-embedded querries...

I guess the precedent for this is our inclusion of KNOWN and UNKNOWN -- I just don't think I like having two ways of doing different optional things, e.g., an implementation could choose to implement only LIKE and therefore has to implement ~LIKE but can still choose to ignore UNLIKE. At least KNOWN/UNKNOWN are both mandatory. I guess we could add to the spec that if LIKE is implemented then UNLIKE must also be implemented.

I would say that if LIKE is implemented, UNLIKE also MUST be implemented; i.e. x UNLIKE "%s%" and NOT(x LIKE "%s%" MUST be synonyms – can we decide on that? If the LIKE is implemented, then implementing UNLIKE does not require any extra effort.


* Support for x to be an identifier, rather than a string is OPTIONAL.

If implemented, the "LIKE" operator MUST behave as the correspoding standard SQL operator. In particular,
The `x` string MUST be interpreted as a pattern where an underscore character ('_', ASCII DEC 95, HEX 5F)
matches any single character and a percent character ('%', ASCII DEC 37, HEX 25) matches an arbitrary
sequence of characters (including zero characters).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't then we also explain how to escape these characters, and how to escape the escape character? (should be \ IIRC

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. MySQL specification says that to search for \ in LIKE one would have to write \\\\ (instead of \\, which is common way to escape the escaping sequences in C and the like). Which will we choose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @giovannipizzi & @merkys for spotting this deficiency; I'll look up the existing escape rules and suggest a version in the text.

I agree with @giovannipizzi that we can use single backslash to escape characters, but we need to think over all possible combinations. I'll propose a variant in a while.

Comment on lines +1298 to +1301
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If implemented, the "LIKE" operator MUST behave as the correspoding standard SQL operator. In particular,
The `x` string MUST be interpreted as a pattern where an underscore character ('_', ASCII DEC 95, HEX 5F)
matches any single character and a percent character ('%', ASCII DEC 37, HEX 25) matches an arbitrary
sequence of characters (including zero characters).
If implemented, the `LIKE` operator MUST behave as the corresponding standard SQL operator.
The `x` string MUST be interpreted as a string-matching pattern where an underscore character ('_', ASCII DEC 95, HEX 5F) matches any single character and a percent character ('%', ASCII DEC 37, HEX 25) matches an arbitrary sequence of characters (including zero characters).

Fix a few typos and enforces one line per sentence

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sauliusg makes a good point about the escapes. The text in the json schema standard points to the syntax standardized in ECMA-262, section 21.2.1 which says the syntax is "modeled" after Perl 5. However, the json schema standard then says - on SHOULD level - that the expressions should be "limit[ed] [...] to the following regular expression tokens", which doesn't actually include any description of the escape token.

This is likely an oversight. In the ECMA standard it is handled in the grammar where an Atom can be \ plus an AtomEscape, which is then further discussed in 21.2.2.8.1.

If we describe, as suggested, the RE syntax in the OPTIMADE independently, we can strengthen the requirements as we see it fit. This is a good opportunity to correct the oversight, if it really was.


If the `UNLIKE` operator is supported, the behavior of this operator MUST be the negation of the "LIKE" operator; i.e.

an expression `(property UNLIKE "value")" must behave exactly as `(NOT (property LIKE "value"))`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
an expression `(property UNLIKE "value")" must behave exactly as `(NOT (property LIKE "value"))`.
expression `(property UNLIKE "value")` must behave exactly as `(NOT (property LIKE "value"))`.


Examples:

* `chemical_formula_anonymous CONTAINS "C2" AND chemical_formula_anonymous STARTS WITH "A2"`
Expand Down Expand Up @@ -2188,7 +2203,13 @@ ValueOpRhs = Operator, Value ;

KnownOpRhs = IS, ( KNOWN | UNKNOWN ) ;

FuzzyStringOpRhs = CONTAINS, String | STARTS, [ WITH ], String | ENDS, [ WITH ], String ;
StringProperty = String | Property ;

FuzzyStringOpRhs = CONTAINS, StringProperty |
STARTS, [ WITH ], StringProperty |
ENDS, [ WITH ], StringProperty |
MATCH, ( RegularExpression | StringProperty ) |
NOT, MATCH, ( RegularExpression | StringProperty ) ;

SetOpRhs = HAS, ( [ Operator ], Value | ALL, ValueList | ANY, ValueList | ONLY, ValueList ) ;
(* Note: support for ONLY in SetOpRhs is OPTIONAL *)
Expand Down Expand Up @@ -2236,6 +2257,8 @@ ALL = 'A', 'L', 'L', [Spaces] ;
ONLY = 'O', 'N', 'L', 'Y', [Spaces] ;
ANY = 'A', 'N', 'Y', [Spaces] ;

MATCH = 'M', 'A', 'T', 'C', 'H', [Spaces];

(* OperatorComparison operator tokens: *)

Operator = ( '<', [ '=' ] | '>', [ '=' ] | '=' | '!', '=' ), [Spaces] ;
Expand All @@ -2262,16 +2285,34 @@ LowercaseLetter =

String = '"', { EscapedChar }, '"', [Spaces] ;

EscapedChar = UnescapedChar | '\', '"' | '\', '\' ;
UnescapedChar = Letter | Digit | Space | '/' |
Punctuator | RegexpMetacharacter |
UnicodeHighChar ;

UnescapedChar = Letter | Digit | Space | Punctuator | UnicodeHighChar ;
EscapedChar = UnescapedChar | '\', '"' | '\', '\' ;

Punctuator =
'!' | '#' | '$' | '%' | '&' | "'" | '(' | ')' | '*' | '+' | ',' |
'-' | '.' | '/' | ':' | ';' | '<' | '=' | '>' | '?' | '@' | '[' |
']' | '^' | '`' | '{' | '|' | '}' | '~'
'!' | '#' | '%' | '&' | "'" | ',' |
'-' | ':' | ';' | '<' | '=' | '>' | '@' |
'`' | '~'
;

RegexpMetacharacter =
'(' | ')' | '[' | ']' | '+' | '*' | '?' | '.' | '{' | '}' |
'|' | '^' | '$'
;

(* Regular expressions: *)

UnescapedREChar = Letter | Digit | Space | '"' |
Punctuator | RegexpMetacharacter |
UnicodeHighChar ;

EscapedREChar = UnescapedREChar | '\', '/' | '\', '\' |
'\', RegexpMetacharacter ;

RegularExpression = '/', { EscapedREChar }, '/', [Spaces] ;

(* BEGIN EBNF GRAMMAR Number *)
(* Number token syntax: *)

Expand Down
1 change: 1 addition & 0 deletions tests/cases/Filter_072.inp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chemical_formula LIKE "H2 O2"
1 change: 1 addition & 0 deletions tests/cases/Filter_072.opt
1 change: 1 addition & 0 deletions tests/cases/Filter_073.inp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chemical_formula NOT LIKE "C6 H12 O6"
1 change: 1 addition & 0 deletions tests/cases/Filter_073.opt
1 change: 1 addition & 0 deletions tests/cases/Filter_074.inp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
chemical_formula UNLIKE "C6 H12 O6"
1 change: 1 addition & 0 deletions tests/cases/Filter_074.opt
1 change: 1 addition & 0 deletions tests/cases/Filter_075.inp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
property MATCH /"^.?abc+[a-z0-9]*$"/
1 change: 1 addition & 0 deletions tests/cases/Filter_075.opt
1 change: 1 addition & 0 deletions tests/cases/Filter_076.inp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
property MATCH "\"^.?abc+[a-z0-9]\\*$\""
1 change: 1 addition & 0 deletions tests/cases/Filter_076.opt
4 changes: 3 additions & 1 deletion tests/outputs/Filter_019.out
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ Filter(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 26
Error: in tests/cases/Filter_019.inp: line 1:
unexpected token "4", expected """
unexpected token "4", expected one of """, "a", "b", "c", "d",
"e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q",
"r", "s", "t", "u", "v", "w", "x", "y", "z", or "_"

chemical_formula CONTAINS 42
^
4 changes: 3 additions & 1 deletion tests/outputs/Filter_020.out
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ Filter(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 26
Error: in tests/cases/Filter_020.inp: line 1:
unexpected token "S", expected """
unexpected token "S", expected one of """, "a", "b", "c", "d",
"e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q",
"r", "s", "t", "u", "v", "w", "x", "y", "z", or "_"

chemical_formula CONTAINS STARTS "Al"
^
101 changes: 52 additions & 49 deletions tests/outputs/Filter_021.out
Original file line number Diff line number Diff line change
Expand Up @@ -55,22 +55,23 @@ Filter(9999)
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 26
String(9999)
TOKEN_3(9999): """, line: 1, col: 27
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 28
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 29
TOKEN_3(9999): """, line: 1, col: 30
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 31
StringProperty(9999)
String(9999)
TOKEN_3(9999): """, line: 1, col: 27
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 28
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 29
TOKEN_3(9999): """, line: 1, col: 30
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 31
AND(9999)
TOKEN_33(9999): "A", line: 1, col: 32
TOKEN_46(9999): "N", line: 1, col: 33
Expand Down Expand Up @@ -130,22 +131,23 @@ Filter(9999)
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 59
String(9999)
TOKEN_3(9999): """, line: 1, col: 60
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 61
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 62
TOKEN_3(9999): """, line: 1, col: 63
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 64
StringProperty(9999)
String(9999)
TOKEN_3(9999): """, line: 1, col: 60
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 61
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 62
TOKEN_3(9999): """, line: 1, col: 63
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 64
AND(9999)
TOKEN_33(9999): "A", line: 1, col: 65
TOKEN_46(9999): "N", line: 1, col: 66
Expand Down Expand Up @@ -203,20 +205,21 @@ Filter(9999)
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 90
String(9999)
TOKEN_3(9999): """, line: 1, col: 91
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 92
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 93
TOKEN_3(9999): """, line: 1, col: 94
Spaces(9999)
Space(9999)
nl(9999)
SPECIAL_1(9999): "(...)", line: 1, col: 95
StringProperty(9999)
String(9999)
TOKEN_3(9999): """, line: 1, col: 91
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 92
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
LowercaseLetter(9999)
TOKEN_76(9999): "l", line: 1, col: 93
TOKEN_3(9999): """, line: 1, col: 94
Spaces(9999)
Space(9999)
nl(9999)
SPECIAL_1(9999): "(...)", line: 1, col: 95
2 changes: 1 addition & 1 deletion tests/outputs/Filter_022.out
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Filter(9999)
TOKEN_1(9999): " ", line: 1, col: 18
Error: in tests/cases/Filter_022.inp: line 1:
unexpected token "U", expected one of "<", ">", "=", "!", "I",
"C", "S", "E", "H", or ":"
"C", "S", "E", "M", "N", "H", or ":"

prototype_formula UNKNOWN
^
2 changes: 1 addition & 1 deletion tests/outputs/Filter_023.out
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Filter(9999)
TOKEN_1(9999): " ", line: 1, col: 9
Error: in tests/cases/Filter_023.inp: line 1:
unexpected token "f", expected one of "<", ">", "=", "!", "I",
"C", "S", "E", "H", or ":"
"C", "S", "E", "M", "N", "H", or ":"

chemical formula IS KNOWN 42
^
Expand Down
2 changes: 1 addition & 1 deletion tests/outputs/Filter_024.out
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ Filter(9999)
TOKEN_1(9999): " ", line: 1, col: 17
Error: in tests/cases/Filter_024.inp: line 1:
unexpected token "K", expected one of "<", ">", "=", "!", "I",
"C", "S", "E", "H", or ":"
"C", "S", "E", "M", "N", "H", or ":"

chemical_formula KNOWN
^
2 changes: 1 addition & 1 deletion tests/outputs/Filter_028.out
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ Filter(9999)
TOKEN_1(9999): " ", line: 1, col: 9
Error: in tests/cases/Filter_028.inp: line 1:
unexpected token "L", expected one of "<", ">", "=", "!", "I",
"C", "S", "E", "H", or ":"
"C", "S", "E", "M", "N", "H", or ":"

elements LENGTH 42
^
56 changes: 29 additions & 27 deletions tests/outputs/Filter_059.out
Original file line number Diff line number Diff line change
Expand Up @@ -57,21 +57,22 @@ Filter(9999)
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 27
String(9999)
TOKEN_3(9999): """, line: 1, col: 28
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_35(9999): "C", line: 1, col: 29
EscapedChar(9999)
UnescapedChar(9999)
Digit(9999)
TOKEN_18(9999): "2", line: 1, col: 30
TOKEN_3(9999): """, line: 1, col: 31
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 32
StringProperty(9999)
String(9999)
TOKEN_3(9999): """, line: 1, col: 28
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_35(9999): "C", line: 1, col: 29
EscapedChar(9999)
UnescapedChar(9999)
Digit(9999)
TOKEN_18(9999): "2", line: 1, col: 30
TOKEN_3(9999): """, line: 1, col: 31
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 32
AND(9999)
TOKEN_33(9999): "A", line: 1, col: 33
TOKEN_46(9999): "N", line: 1, col: 34
Expand Down Expand Up @@ -141,15 +142,16 @@ Filter(9999)
Spaces(9999)
Space(9999)
TOKEN_1(9999): " ", line: 1, col: 66
String(9999)
TOKEN_3(9999): """, line: 1, col: 67
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 68
EscapedChar(9999)
UnescapedChar(9999)
Digit(9999)
TOKEN_18(9999): "2", line: 1, col: 69
TOKEN_3(9999): """, line: 1, col: 70
StringProperty(9999)
String(9999)
TOKEN_3(9999): """, line: 1, col: 67
EscapedChar(9999)
UnescapedChar(9999)
Letter(9999)
UppercaseLetter(9999)
TOKEN_33(9999): "A", line: 1, col: 68
EscapedChar(9999)
UnescapedChar(9999)
Digit(9999)
TOKEN_18(9999): "2", line: 1, col: 69
TOKEN_3(9999): """, line: 1, col: 70
Loading