From 3d14dafc332098b8d6495efd19b233949e413627 Mon Sep 17 00:00:00 2001 From: Damien de Lemeny Date: Tue, 19 Mar 2024 10:00:25 -0500 Subject: [PATCH] Improve QL docs Co-Authored-By: trinity-1686a --- docs/get-started/query-language-intro.md | 16 ++++----- docs/reference/query-language.md | 41 ++++++++++-------------- 2 files changed, 25 insertions(+), 32 deletions(-) diff --git a/docs/get-started/query-language-intro.md b/docs/get-started/query-language-intro.md index c2cf2081c0b..93d1d09b1b1 100644 --- a/docs/get-started/query-language-intro.md +++ b/docs/get-started/query-language-intro.md @@ -13,7 +13,7 @@ The main concept of this language is a clause, which represents a simple conditi A clause operates on fields of your document. It has the following syntax : ``` -field: condition +field:condition ``` For example, when searching documents where the field `app_name` contains the token `tantivy`, you would write the following clause: @@ -29,14 +29,14 @@ Quickwit support various types of clauses to express different kinds of conditio | type | syntax | examples | description| `default_search_field`| |-------------|--------|----------|------------|-----------------------| -| term | `field: token` | `app_name: tantivy`
`process_id:1234`
`word` | A term clause tests the existence of avalue in the field's tokens | yes | -| term prefix | `field: prefix*` | `app_name: tant*`
`quick*` | A term clause tests the existence of a token starting with the provided value | yes | -| term set | `field: IN [token token ..]` |`severity: IN [error warn]` | A term set clause tests the existence of any of the provided value in the field's tokens| yes | -| phrase | `field: "sequence of tokens"` | `full_name: "john doe"` | A phrase clause tests the existence of the provided sequence of tokens | yes | -| phrase prefix | `field: "sequence of tokens"*` | `title: "how to m"*` | A phrase prefix clause tests the exsitence of a sequence of tokens, the last one used like in a prefix clause | yes | +| term | `field:token` | `app_name:tantivy`
`process_id:1234`
`word` | A term clause tests the existence of avalue in the field's tokens | yes | +| term prefix | `field:prefix*` | `app_name:tant*`
`quick*` | A term clause tests the existence of a token starting with the provided value | yes | +| term set | `field:IN [token token ..]` |`severity:IN [error warn]` | A term set clause tests the existence of any of the provided value in the field's tokens| yes | +| phrase | `field:"sequence of tokens"` | `full_name:"john doe"` | A phrase clause tests the existence of the provided sequence of tokens | yes | +| phrase prefix | `field:"sequence of tokens"*` | `title:"how to m"*` | A phrase prefix clause tests the exsitence of a sequence of tokens, the last one used like in a prefix clause | yes | | all | `*` | `*` | A match-all clause will match every document | no | -| exist | `field: *` | `error: *` | An exist clause tests the existence of any value for the field, it will match only if the field exists | no | -| range | `field: bounds` |`duration: [0 1000}`
`last_name: [banner miller]` | A term clause tests the existence of a token between the provided bounds | no | +| exist | `field:*` | `error:*` | An exist clause tests the existence of any value for the field, it will match only if the field exists | no | +| range | `field:bounds` |`duration:[0 TO 1000}`
`last_name: [banner TO miller]` | A term clause tests the existence of a token between the provided bounds | no | ## Queries diff --git a/docs/reference/query-language.md b/docs/reference/query-language.md index b06b5b36461..68fe7f7356e 100644 --- a/docs/reference/query-language.md +++ b/docs/reference/query-language.md @@ -9,6 +9,7 @@ sidebar_position: 40 query = '(' query ')' | query operator query | unary_operator query + | query query | clause operator = 'AND' | 'OR' @@ -26,12 +27,13 @@ defaultable_clause = term | term_prefix | term_set | phrase | phrase_prefix ## Writing Queries ### Escaping Special Characters -Special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. Such characters can still appear in query terms, but they need to be escaped by an anti-slash `\` . +Some characters need to be escaped in non quoted terms because they are syntactically significant otherwise: special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. If such such characters appear in query terms, they need to be escaped by prefixing them with an anti-slash `\`. - +In quoted terms, the quote character in use `'` or `"` needs to be escaped. -### Allowed characters in field names - +###### Allowed characters in field names + +See the [Field name validation rules](https://quickwit.io/docs/configuration/index-config#field-name-validation-rules) in the index config documentation. ### Addressing nested structures @@ -66,7 +68,7 @@ There is no support for searching for a range of IP using CIDR notation, but you ### Term `field:term` ``` -term: term_char+ +term = term_char+ ``` Matches documents if the targeted field contains a token equal to the provided term. @@ -75,15 +77,14 @@ Matches documents if the targeted field contains a token equal to the provided t ### Term Prefix `field:prefix*` ``` -term_prefix: term '*' +term_prefix = term '*' ``` Matches documents if the targeted field contains a token which starts with the provided value. `field:quick*` will match any document where the field 'field' has a token like `quickwit` or `quickstart`, but not `qui` or `abcd`. - -### Term set `field: IN [a b c]` +### Term set `field:IN [a b c]` ``` term_set = 'IN' '[' term_list ']' term_list = term_list term @@ -92,7 +93,7 @@ term_list = term_list term Matches if the document contains any of the tokens provided. ###### Examples -`field: IN [ab cd]` will match 'ab' or 'cd', but nothing else. +`field:IN [ab cd]` will match 'ab' or 'cd', but nothing else. ###### Perfomance Note This is a lot like writing `field:ab OR field:cd`. When there are only a handful of terms to search for, using ORs is usually faster. @@ -133,24 +134,20 @@ There is no slop for phrase prefix queries. ###### Limitation -Quickwit may trim some results matched by this clause in some cases. If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co", and search for any documents where "thanks for your" is followed by any of these tokens. +Quickwit may trim some results matched by this clause in some cases. If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co" (in their storage order), and search for any documents where "thanks for your" is followed by any of these tokens. If there are many tokens starting with "co", "contribution" might not be one of the 50 selected tokens, and the query won't match a document containing "thanks for your contribution". Normal prefix queries don't suffer from this issue. - - - - -### Range `field: [low_bound high_bound}` +### Range `field:[low_bound TO high_bound}` ``` range = explicit_range | comparison_half_range explicit_range = left_bound_char bounds right_bound_char left_bound_char = '[' | '{' right_bound_char = '}' | ']' -bounds = term term - | term '*' - | '*' term +bounds = term TO term + | term TO '*' + | '*' TO term comparison_range = comparison_operator term comparision_operator = '<' | '>' | '<=' | '>=' @@ -171,7 +168,7 @@ Exclusive bounds are represented by curly brackets `{}`. They will not match tok You can make an half open range by using `*` as one of the bounds. `field:[b TO *]` will match 'bb' and 'zz', but not 'ab'. You can also use a comparison based syntax:`field:b`, `field:<=b` or `field:>=b`. - + ###### Examples - Inclusive Range: `ip:[127.0.0.1 TO 127.0.0.50]` @@ -224,13 +221,9 @@ Without parentheses, `AND` takes precedence over `OR`. That is, `a AND b OR c` i ## Other considerations ### Default Search Fields -In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration. - - +In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration. If more than one field is configured as default, the resulting implicit clauses are combined using a conjunction ('OR'). ### Tokenization Note that the result of a query can depend on the tokenizer used for the field getting searched. Hence this document always speaks of tokens, which may be the exact value the document contain (in case of the raw tokenizer), or a subset of it (for instance any tokenizer cutting on spaces). -Quickwit uses a query mini-language which is used by providing a `query` parameter to the search endpoints. -