Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contains, startsWith and endsWith (LIKE queries) #22

Open
ErisDS opened this issue Mar 1, 2022 · 4 comments
Open

Contains, startsWith and endsWith (LIKE queries) #22

ErisDS opened this issue Mar 1, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@ErisDS
Copy link
Member

ErisDS commented Mar 1, 2022

There's been a long-time desire to support LIKE queries in Ghost.

From a user perspective, this would allow us to do "contains", "startsWith" and "endsWith" type filters.

Reference implementations

NQL Syntax

  • ~ contains
  • ~^ starts with
  • ~$ ends with

All 3 can be prefixed with - for not

Examples

  • members.email:~@gmail - contains "@gmail"
  • members.email:~^fred@ - starts with "fred@"
  • members.email:[email protected] - ends with "@gmail.com"
  • members.email:-~@gmail - does not contain "@gmail"

Mongo Syntax

Under the hood, NQL gets converted into Mongo's JSON query representation.

To check “contains” in mongo you need to do one of the below ref:

{ email: /fred@/i }

{ email: { $regex: /fred@/i } }

Not contains is one of:

{ email: { $not: /fred@/i } }

{ email: { $not: { $regex: /fred@/i } } }

The i is used to make the query case insensitive.

To make this easy to parse and process, we'll use the following two patterns:

{ email: { $regex: /fred@/i } }
{ email: { $not: /fred@/i } }

Starts with and ends with will become modifications on the regex:

{ email: { $regex: /^fred@/i } }
{ email: { $regex: /@gmail.com$/i } }

Have confirmed this works in mingo (the tool we use to query JSON directly).

A gotcha here is it will treat characters as regex characters, so we probably have to escape regex chars in the transform from NQL → mongo and then unescape them in the transform from mongo → SQL 🤔

SQL Syntax

SQL has two operators we could use here - LIKE or REGEXP:

Neither of these is case insensitive by default. LIKE only does wildcard matches, REGEXP allows for a wider range of queries & is therefore slower. We don’t need that power, complexity or risk right now, nor has there ever been a request for it, so we will stick to LIKE.

In order to use LIKE and case insensitive matching the SQL queries we need are:

select * from members.email where LOWER(email) LIKE LOWER('%fred@%'); // contains
select * from members.email where LOWER(email) LIKE LOWER('fred@%'); // starts with
select * from members.email where LOWER(email) LIKE LOWER('%@gmail.com'); //ends with
ErisDS added a commit that referenced this issue Mar 2, 2022
refs: #22

Implemented basic support and tests for contains, startsWith and endsWith in NQL
  - This should unblock development, whilst deeper testing is done
This adds 3 new operators ~, ~^ and ~$ for contains, startsWith and endsWith
  - These operators are now not permitted as the first characters of literals, but it's extremely unlikely this will cause problems
The literals used with these operators are parsed into case insensitive regular expressions
  - Starts with and ends with use the same $regex and $not operators in mongo, but modify the regex
  - Because of the need to modify the literal dependent on the operator, the pattern in the parser in nql.y is slightly different
  - It groups the parsing of the operator and value together
  - This also enforces that these operators only work with literals, which may be a problem & needs some more testing
ErisDS added a commit that referenced this issue Mar 2, 2022
ref: #22

- NQL lang will parse ~ ~^ and ~$ into $regex and $not
- This adds support in the convertor for $regex and $not in these simple cases
- As we are moving into SQL, we have to de-escape the regex and convert it back into a string but regexp.source makes this fairly easy
- Had to add regexEnabledJSONStringify as a side effect because JSON.stringify doesn't expect or handle regex
ErisDS added a commit that referenced this issue Mar 2, 2022
refs: #22

- Added some tests to show that these queries work against both SQL and JSON
@ErisDS
Copy link
Member Author

ErisDS commented Mar 3, 2022

There is a basic implementation in place now, but this only works with literals.

Given the limitations on single character literals and that numbers are parsed as numbers, not literals, I think I probably need to at change this to at least support strings if not implement a concept of "regex" which is anything which comes after these operators.

@ErisDS
Copy link
Member Author

ErisDS commented Mar 8, 2022

Having slept on this and thought about it a bit, the main use case is type-ahead style lookups for matching values.

In this case, the user will be typing a single character first, followed by multiple characters that they expect to be treated literally. We also expect that users might type spaces, e.g. in a name they're looking for. Email addresses also commonly include the + char.

To resolve all of these use-cases in one fell swoop, the correct value type in NQL is STRING, not LITERAL.

Literals are for lazy matching, and work best for slugs, not user input strings.

Therefore rather than looking at adding support for multiple value types with regex operators, instead we'll be changing to only support strings.

ErisDS added a commit to ErisDS/NQL that referenced this issue Mar 8, 2022
refs: TryGhost#22

The main use case for regex operators is doing type-ahead type matches on values.
These kinds of values will start with a single char, contain spaces, and contain characters
that should be interpretted literally.

To support this, the right value type in NQL is STRING, not LITERAL.

LITERALs are designed for matching tag slugs and similar computed values.

This commit changes the behaviour completely to only support STRING.

We also need to be super careful about how we handle regex characters and strings so
I've added some more tests to demonstrate that the right thing is happening
@ErisDS
Copy link
Member Author

ErisDS commented Mar 8, 2022

It turns out that my original spec around LIKE not being case insensitive is wrong. That's what MySQL and SQLite do by default.

I also missed that knex has whereLike and whereILike functions that we could use instead of whereRaw. It feels nicer, so I'll have a stab at fixing this later.

ErisDS added a commit to ErisDS/NQL that referenced this issue Mar 8, 2022
refs: TryGhost#22

- The main use case for regex operators is doing type-ahead type matches on values.
These kinds of values will start with a single char, contain spaces, and contain characters
that should be interpretted literally.

- To support this, the right value type in NQL is STRING, not LITERAL.
LITERALs are designed for matching tag slugs and similar computed values.
This commit changes the behaviour completely to only support STRING.

- We also need to be super careful about how we handle regex characters and strings with SQL
There I've added some more tests to demonstrate that the right thing is happening.
I reordered some regex chars as the order ${} means something in JS template literals so this
made it easier to test.
And I added a few test cases for some scary-looking sqli attempts which should not work thanks
to our usage of knex.

NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have
whereLike and whereILike which work correctly.
ErisDS added a commit that referenced this issue Mar 8, 2022
refs: #22

- The main use case for regex operators is doing type-ahead type matches on values.
These kinds of values will start with a single char, contain spaces, and contain characters
that should be interpretted literally.

- To support this, the right value type in NQL is STRING, not LITERAL.
LITERALs are designed for matching tag slugs and similar computed values.
This commit changes the behaviour completely to only support STRING.

- We also need to be super careful about how we handle regex characters and strings with SQL
There I've added some more tests to demonstrate that the right thing is happening.
I reordered some regex chars as the order ${} means something in JS template literals so this
made it easier to test.
And I added a few test cases for some scary-looking sqli attempts which should not work thanks
to our usage of knex.

NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have
whereLike and whereILike which work correctly.
@daniellockyer daniellockyer added the enhancement New feature or request label Aug 29, 2023
@ramrami
Copy link

ramrami commented Oct 29, 2024

Not sure if this is the right place, but the current implementation fails when querying nested properties.

For example, filtering posts that have a tag with a slug that starts with a certain string:
{{#get "posts" filter="tags.slug:~^'hash-test'" include="tags" }}

The DB logs for the query:

{
  method: 'select',
  options: {},
  timeout: false,
  cancelOnTimeout: false,
  bindings: [ 
     'published', 
     'post', 
     /^hash-test/i, 
     15 
  ],
  __knexQueryUid: '-ukkP2B0O_WFqzb_B8AoP',
  sql: 'select id,uuid,title,slug,html,comment_id,plaintext,feature_image,featured,type,status,locale,visibility,email_recipient_filter,created_at,created_by,updated_at,updated_by,published_at,published_by,custom_excerpt,codeinjection_head,codeinjection_foot,custom_template,canonical_url,newsletter_id,show_title_and_feature_image from `posts` where (`posts`.`status` = ? and (`posts`.`type` = ? and `posts`.`id` in (select `posts_tags`.`post_id` from `posts_tags` inner join `tags` on `tags`.`id` = `posts_tags`.`tag_id` where `tags`.`slug` like ?))) order by `posts`.`published_at` DESC limit ?'
}

As you can see in the bindings, the regexp wasn't transformed to a LIKE expression.
Tested on Ghost 5.98 with local sqlite db.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants