-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contains, startsWith and endsWith (LIKE queries) #22
Comments
refs: #22 Implemented basic support and tests for contains, startsWith and endsWith in NQL - This should unblock development, whilst deeper testing is done This adds 3 new operators ~, ~^ and ~$ for contains, startsWith and endsWith - These operators are now not permitted as the first characters of literals, but it's extremely unlikely this will cause problems The literals used with these operators are parsed into case insensitive regular expressions - Starts with and ends with use the same $regex and $not operators in mongo, but modify the regex - Because of the need to modify the literal dependent on the operator, the pattern in the parser in nql.y is slightly different - It groups the parsing of the operator and value together - This also enforces that these operators only work with literals, which may be a problem & needs some more testing
ref: #22 - NQL lang will parse ~ ~^ and ~$ into $regex and $not - This adds support in the convertor for $regex and $not in these simple cases - As we are moving into SQL, we have to de-escape the regex and convert it back into a string but regexp.source makes this fairly easy - Had to add regexEnabledJSONStringify as a side effect because JSON.stringify doesn't expect or handle regex
refs: #22 - Added some tests to show that these queries work against both SQL and JSON
There is a basic implementation in place now, but this only works with literals. Given the limitations on single character literals and that numbers are parsed as numbers, not literals, I think I probably need to at change this to at least support strings if not implement a concept of "regex" which is anything which comes after these operators. |
Having slept on this and thought about it a bit, the main use case is type-ahead style lookups for matching values. In this case, the user will be typing a single character first, followed by multiple characters that they expect to be treated literally. We also expect that users might type spaces, e.g. in a name they're looking for. Email addresses also commonly include the To resolve all of these use-cases in one fell swoop, the correct value type in NQL is STRING, not LITERAL. Literals are for lazy matching, and work best for slugs, not user input strings. Therefore rather than looking at adding support for multiple value types with regex operators, instead we'll be changing to only support strings. |
refs: TryGhost#22 The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. We also need to be super careful about how we handle regex characters and strings so I've added some more tests to demonstrate that the right thing is happening
It turns out that my original spec around LIKE not being case insensitive is wrong. That's what MySQL and SQLite do by default. I also missed that knex has |
refs: TryGhost#22 - The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. - To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. - We also need to be super careful about how we handle regex characters and strings with SQL There I've added some more tests to demonstrate that the right thing is happening. I reordered some regex chars as the order ${} means something in JS template literals so this made it easier to test. And I added a few test cases for some scary-looking sqli attempts which should not work thanks to our usage of knex. NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have whereLike and whereILike which work correctly.
refs: #22 - The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. - To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. - We also need to be super careful about how we handle regex characters and strings with SQL There I've added some more tests to demonstrate that the right thing is happening. I reordered some regex chars as the order ${} means something in JS template literals so this made it easier to test. And I added a few test cases for some scary-looking sqli attempts which should not work thanks to our usage of knex. NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have whereLike and whereILike which work correctly.
Not sure if this is the right place, but the current implementation fails when querying nested properties. For example, filtering posts that have a tag with a slug that starts with a certain string: The DB logs for the query: {
method: 'select',
options: {},
timeout: false,
cancelOnTimeout: false,
bindings: [
'published',
'post',
/^hash-test/i,
15
],
__knexQueryUid: '-ukkP2B0O_WFqzb_B8AoP',
sql: 'select id,uuid,title,slug,html,comment_id,plaintext,feature_image,featured,type,status,locale,visibility,email_recipient_filter,created_at,created_by,updated_at,updated_by,published_at,published_by,custom_excerpt,codeinjection_head,codeinjection_foot,custom_template,canonical_url,newsletter_id,show_title_and_feature_image from `posts` where (`posts`.`status` = ? and (`posts`.`type` = ? and `posts`.`id` in (select `posts_tags`.`post_id` from `posts_tags` inner join `tags` on `tags`.`id` = `posts_tags`.`tag_id` where `tags`.`slug` like ?))) order by `posts`.`published_at` DESC limit ?'
} As you can see in the bindings, the regexp wasn't transformed to a LIKE expression. |
There's been a long-time desire to support LIKE queries in Ghost.
From a user perspective, this would allow us to do "contains", "startsWith" and "endsWith" type filters.
Reference implementations
NQL Syntax
~
contains~^
starts with~$
ends withAll 3 can be prefixed with
-
for notExamples
members.email:~@gmail
- contains "@gmail"members.email:~^fred@
- starts with "fred@"members.email:[email protected]
- ends with "@gmail.com"members.email:-~@gmail
- does not contain "@gmail"Mongo Syntax
Under the hood, NQL gets converted into Mongo's JSON query representation.
To check “contains” in mongo you need to do one of the below ref:
{ email: /fred@/i }
{ email: { $regex: /fred@/i } }
Not contains is one of:
{ email: { $not: /fred@/i } }
{ email: { $not: { $regex: /fred@/i } } }
The
i
is used to make the query case insensitive.To make this easy to parse and process, we'll use the following two patterns:
{ email: { $regex: /fred@/i } }
{ email: { $not: /fred@/i } }
Starts with and ends with will become modifications on the regex:
{ email: { $regex: /^fred@/i } }
{ email: { $regex: /@gmail.com$/i } }
Have confirmed this works in
mingo
(the tool we use to query JSON directly).A gotcha here is it will treat characters as regex characters, so we probably have to escape regex chars in the transform from NQL → mongo and then unescape them in the transform from mongo → SQL 🤔
SQL Syntax
SQL has two operators we could use here - LIKE or REGEXP:
Neither of these is case insensitive by default. LIKE only does wildcard matches, REGEXP allows for a wider range of queries & is therefore slower. We don’t need that power, complexity or risk right now, nor has there ever been a request for it, so we will stick to LIKE.
In order to use LIKE and case insensitive matching the SQL queries we need are:
The text was updated successfully, but these errors were encountered: