Contains, startsWith and endsWith (LIKE queries) #22

ErisDS · 2022-03-01T18:06:36Z

There's been a long-time desire to support LIKE queries in Ghost.

From a user perspective, this would allow us to do "contains", "startsWith" and "endsWith" type filters.

Reference implementations

NQL Syntax

~ contains
~^ starts with
~$ ends with

All 3 can be prefixed with - for not

Examples

members.email:~@gmail - contains "@gmail"
members.email:~^fred@ - starts with "fred@"
members.email:[email protected] - ends with "@gmail.com"
members.email:-~@gmail - does not contain "@gmail"

Mongo Syntax

Under the hood, NQL gets converted into Mongo's JSON query representation.

To check “contains” in mongo you need to do one of the below ref:

{ email: /fred@/i }

{ email: { $regex: /fred@/i } }

Not contains is one of:

{ email: { $not: /fred@/i } }

{ email: { $not: { $regex: /fred@/i } } }

The i is used to make the query case insensitive.

To make this easy to parse and process, we'll use the following two patterns:

{ email: { $regex: /fred@/i } }
{ email: { $not: /fred@/i } }

Starts with and ends with will become modifications on the regex:

{ email: { $regex: /^fred@/i } }
{ email: { $regex: /@gmail.com$/i } }

Have confirmed this works in mingo (the tool we use to query JSON directly).

A gotcha here is it will treat characters as regex characters, so we probably have to escape regex chars in the transform from NQL → mongo and then unescape them in the transform from mongo → SQL 🤔

SQL Syntax

SQL has two operators we could use here - LIKE or REGEXP:

Neither of these is case insensitive by default. LIKE only does wildcard matches, REGEXP allows for a wider range of queries & is therefore slower. We don’t need that power, complexity or risk right now, nor has there ever been a request for it, so we will stick to LIKE.

In order to use LIKE and case insensitive matching the SQL queries we need are:

select * from members.email where LOWER(email) LIKE LOWER('%fred@%'); // contains
select * from members.email where LOWER(email) LIKE LOWER('fred@%'); // starts with
select * from members.email where LOWER(email) LIKE LOWER('%@gmail.com'); //ends with

The text was updated successfully, but these errors were encountered:

refs: #22 Implemented basic support and tests for contains, startsWith and endsWith in NQL - This should unblock development, whilst deeper testing is done This adds 3 new operators ~, ~^ and ~$ for contains, startsWith and endsWith - These operators are now not permitted as the first characters of literals, but it's extremely unlikely this will cause problems The literals used with these operators are parsed into case insensitive regular expressions - Starts with and ends with use the same $regex and $not operators in mongo, but modify the regex - Because of the need to modify the literal dependent on the operator, the pattern in the parser in nql.y is slightly different - It groups the parsing of the operator and value together - This also enforces that these operators only work with literals, which may be a problem & needs some more testing

ref: #22 - NQL lang will parse ~ ~^ and ~$ into $regex and $not - This adds support in the convertor for $regex and $not in these simple cases - As we are moving into SQL, we have to de-escape the regex and convert it back into a string but regexp.source makes this fairly easy - Had to add regexEnabledJSONStringify as a side effect because JSON.stringify doesn't expect or handle regex

refs: #22 - Added some tests to show that these queries work against both SQL and JSON

ErisDS · 2022-03-03T16:17:06Z

There is a basic implementation in place now, but this only works with literals.

Given the limitations on single character literals and that numbers are parsed as numbers, not literals, I think I probably need to at change this to at least support strings if not implement a concept of "regex" which is anything which comes after these operators.

ErisDS · 2022-03-08T14:29:57Z

Having slept on this and thought about it a bit, the main use case is type-ahead style lookups for matching values.

In this case, the user will be typing a single character first, followed by multiple characters that they expect to be treated literally. We also expect that users might type spaces, e.g. in a name they're looking for. Email addresses also commonly include the + char.

To resolve all of these use-cases in one fell swoop, the correct value type in NQL is STRING, not LITERAL.

Literals are for lazy matching, and work best for slugs, not user input strings.

Therefore rather than looking at adding support for multiple value types with regex operators, instead we'll be changing to only support strings.

refs: TryGhost#22 The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. We also need to be super careful about how we handle regex characters and strings so I've added some more tests to demonstrate that the right thing is happening

ErisDS · 2022-03-08T16:19:22Z

It turns out that my original spec around LIKE not being case insensitive is wrong. That's what MySQL and SQLite do by default.

I also missed that knex has whereLike and whereILike functions that we could use instead of whereRaw. It feels nicer, so I'll have a stab at fixing this later.

refs: TryGhost#22 - The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. - To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. - We also need to be super careful about how we handle regex characters and strings with SQL There I've added some more tests to demonstrate that the right thing is happening. I reordered some regex chars as the order ${} means something in JS template literals so this made it easier to test. And I added a few test cases for some scary-looking sqli attempts which should not work thanks to our usage of knex. NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have whereLike and whereILike which work correctly.

refs: #22 - The main use case for regex operators is doing type-ahead type matches on values. These kinds of values will start with a single char, contain spaces, and contain characters that should be interpretted literally. - To support this, the right value type in NQL is STRING, not LITERAL. LITERALs are designed for matching tag slugs and similar computed values. This commit changes the behaviour completely to only support STRING. - We also need to be super careful about how we handle regex characters and strings with SQL There I've added some more tests to demonstrate that the right thing is happening. I reordered some regex chars as the order ${} means something in JS template literals so this made it easier to test. And I added a few test cases for some scary-looking sqli attempts which should not work thanks to our usage of knex. NOTE: In testing this, I realised we don't need to be using whereRaw, as knex does actually have whereLike and whereILike which work correctly.

ramrami · 2024-10-29T15:35:52Z

Not sure if this is the right place, but the current implementation fails when querying nested properties.

For example, filtering posts that have a tag with a slug that starts with a certain string:
{{#get "posts" filter="tags.slug:~^'hash-test'" include="tags" }}

The DB logs for the query:

{
  method: 'select',
  options: {},
  timeout: false,
  cancelOnTimeout: false,
  bindings: [ 
     'published', 
     'post', 
     /^hash-test/i, 
     15 
  ],
  __knexQueryUid: '-ukkP2B0O_WFqzb_B8AoP',
  sql: 'select id,uuid,title,slug,html,comment_id,plaintext,feature_image,featured,type,status,locale,visibility,email_recipient_filter,created_at,created_by,updated_at,updated_by,published_at,published_by,custom_excerpt,codeinjection_head,codeinjection_foot,custom_template,canonical_url,newsletter_id,show_title_and_feature_image from `posts` where (`posts`.`status` = ? and (`posts`.`type` = ? and `posts`.`id` in (select `posts_tags`.`post_id` from `posts_tags` inner join `tags` on `tags`.`id` = `posts_tags`.`tag_id` where `tags`.`slug` like ?))) order by `posts`.`published_at` DESC limit ?'
}

As you can see in the bindings, the regexp wasn't transformed to a LIKE expression.
Tested on Ghost 5.98 with local sqlite db.

ErisDS added a commit that referenced this issue Mar 2, 2022

Added integration test coverage for contains

fa068eb

refs: #22 - Added some tests to show that these queries work against both SQL and JSON

ErisDS mentioned this issue Mar 8, 2022

Changed regex operators to only work with STRINGs #31

Merged

daniellockyer added the enhancement New feature or request label Aug 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contains, startsWith and endsWith (LIKE queries) #22

Contains, startsWith and endsWith (LIKE queries) #22

ErisDS commented Mar 1, 2022 •

edited

Loading

ErisDS commented Mar 3, 2022

ErisDS commented Mar 8, 2022

ErisDS commented Mar 8, 2022

ramrami commented Oct 29, 2024

Contains, startsWith and endsWith (LIKE queries) #22

Contains, startsWith and endsWith (LIKE queries) #22

Comments

ErisDS commented Mar 1, 2022 • edited Loading

NQL Syntax

Examples

Mongo Syntax

SQL Syntax

ErisDS commented Mar 3, 2022

ErisDS commented Mar 8, 2022

ErisDS commented Mar 8, 2022

ramrami commented Oct 29, 2024

ErisDS commented Mar 1, 2022 •

edited

Loading