Improve QL docs

Co-Authored-By: trinity-1686a <[email protected]>
quickwit-oss · Mar 19, 2024 · 3d14daf · 3d14daf
1 parent 318e0f5
commit 3d14daf
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 32 deletions.
diff --git a/docs/get-started/query-language-intro.md b/docs/get-started/query-language-intro.md
@@ -13,7 +13,7 @@ The main concept of this language is a clause, which represents a simple conditi
 
 A clause operates on fields of your document. It has the following syntax :
 ```
-field: condition
+field:condition
 ```
 
 For example, when searching documents where the field `app_name` contains the token `tantivy`, you would write the following clause:
@@ -29,14 +29,14 @@ Quickwit support various types of clauses to express different kinds of conditio
 
 | type | syntax | examples | description| `default_search_field`|
 |-------------|--------|----------|------------|-----------------------|
-| term | `field: token` | `app_name: tantivy` <br/> `process_id:1234` <br/> `word` | A term clause tests the existence of avalue in the field's tokens | yes |
-| term prefix | `field: prefix*` | `app_name: tant*` <br/> `quick*` | A term clause tests the existence of a token starting with the provided value | yes |
-| term set | `field: IN [token token ..]` |`severity: IN [error warn]` | A term set clause tests the existence of any of the provided value in the field's tokens| yes |
-| phrase | `field: "sequence of tokens"` | `full_name: "john doe"` | A phrase clause tests the existence of the provided sequence of tokens | yes |
-| phrase prefix | `field: "sequence of tokens"*` | `title: "how to m"*` | A phrase prefix clause tests the exsitence of a sequence of tokens, the last one used like in a prefix clause | yes |
+| term | `field:token` | `app_name:tantivy` <br/> `process_id:1234` <br/> `word` | A term clause tests the existence of avalue in the field's tokens | yes |
+| term prefix | `field:prefix*` | `app_name:tant*` <br/> `quick*` | A term clause tests the existence of a token starting with the provided value | yes |
+| term set | `field:IN [token token ..]` |`severity:IN [error warn]` | A term set clause tests the existence of any of the provided value in the field's tokens| yes |
+| phrase | `field:"sequence of tokens"` | `full_name:"john doe"` | A phrase clause tests the existence of the provided sequence of tokens | yes |
+| phrase prefix | `field:"sequence of tokens"*` | `title:"how to m"*` | A phrase prefix clause tests the exsitence of a sequence of tokens, the last one used like in a prefix clause | yes |
 | all | `*` | `*` | A match-all clause will match every document | no |
-| exist | `field: *` | `error: *` | An exist clause tests the existence of any value for the field, it will match only if the field exists | no |
-| range | `field: bounds` |`duration: [0 1000}` <br/> `last_name: [banner miller]` | A term clause tests the existence of a token between the provided bounds | no |
+| exist | `field:*` | `error:*` | An exist clause tests the existence of any value for the field, it will match only if the field exists | no |
+| range | `field:bounds` |`duration:[0 TO 1000}` <br/> `last_name: [banner TO miller]` | A term clause tests the existence of a token between the provided bounds | no |
 
 ## Queries
 

diff --git a/docs/reference/query-language.md b/docs/reference/query-language.md
@@ -9,6 +9,7 @@ sidebar_position: 40
 query = '(' query ')'
       | query operator query
       | unary_operator query
+      | query query
       | clause
 
 operator = 'AND' | 'OR'
@@ -26,12 +27,13 @@ defaultable_clause = term | term_prefix | term_set | phrase | phrase_prefix
 ## Writing Queries
 ### Escaping Special Characters
 
-Special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. Such characters can still appear in query terms, but they need to be escaped by an anti-slash `\` .
+Some characters need to be escaped in non quoted terms because they are syntactically significant otherwise: special reserved characters are: `+` , `^`, `` ` ``, `:`, `{`, `}`, `"`, `[`, `]`, `(`, `)`, `~`, `!`, `\\`, `*`, `SPACE`. If such such characters appear in query terms, they need to be escaped by prefixing them with an anti-slash `\`.
 
-<!-- NEED CLARIFICATION: where is escaping necessary ? non-quoted terms ? field names ?-->
+In quoted terms, the quote character in use `'` or `"` needs to be escaped.
 
-### Allowed characters in field names
-<!-- NEED CLARIFICATION: this should refer to a section of the index documentation that explains allowed field names -->
+###### Allowed characters in field names
+
+See the [Field name validation rules](https://quickwit.io/docs/configuration/index-config#field-name-validation-rules) in the index config documentation.
 
 ### Addressing nested structures
 
@@ -66,7 +68,7 @@ There is no support for searching for a range of IP using CIDR notation, but you
 
 ### Term `field:term`
 ```
-term: term_char+
+term = term_char+
 ```
 
 Matches documents if the targeted field contains a token equal to the provided term. 
@@ -75,15 +77,14 @@ Matches documents if the targeted field contains a token equal to the provided t
 
 ### Term Prefix `field:prefix*`
 ```
-term_prefix: term '*'
+term_prefix = term '*'
 ```
 
 Matches documents if the targeted field contains a token which starts with the provided value.
 
 `field:quick*` will match any document where the field 'field' has a token like `quickwit` or `quickstart`, but not `qui` or `abcd`.
 
-
-### Term set `field: IN [a b c]`
+### Term set `field:IN [a b c]`
 ```
 term_set = 'IN' '[' term_list ']'
 term_list = term_list term
@@ -92,7 +93,7 @@ term_list = term_list term
 Matches if the document contains any of the tokens provided. 
 
 ###### Examples
-`field: IN [ab cd]` will match 'ab' or 'cd', but nothing else.
+`field:IN [ab cd]` will match 'ab' or 'cd', but nothing else.
 
 ###### Perfomance Note
 This is a lot like writing `field:ab OR field:cd`. When there are only a handful of terms to search for, using ORs is usually faster.
@@ -133,24 +134,20 @@ There is no slop for phrase prefix queries.
 
 ###### Limitation
 
-Quickwit may trim some results matched by this clause in some cases.  If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co", and search for any documents where "thanks for your" is followed by any of these tokens.
+Quickwit may trim some results matched by this clause in some cases.  If you search for `"thanks for your co"*`, it will enumerate the first 50 tokens which start with "co" (in their storage order), and search for any documents where "thanks for your" is followed by any of these tokens.
 
 If there are many tokens starting with "co", "contribution" might not be one of the 50 selected tokens, and the query won't match a document containing "thanks for your contribution". Normal prefix queries don't suffer from this issue.
 
-
-<!-- NEEDS CLARIFICATION : what does "first 50 tokens" mean ? in what order ? can the value be tuned ? -->
-
-
-### Range `field: [low_bound high_bound}`
+### Range `field:[low_bound TO high_bound}`
 ```
 range = explicit_range | comparison_half_range
 
 explicit_range = left_bound_char bounds right_bound_char
 left_bound_char = '[' | '{' 
 right_bound_char = '}' | ']'
-bounds = term term
-       | term '*'
-       | '*' term
+bounds = term TO term
+       | term TO '*'
+       | '*' TO term
 
 comparison_range = comparison_operator term
 comparision_operator = '<' | '>' | '<=' | '>='
@@ -171,7 +168,7 @@ Exclusive bounds are represented by curly brackets `{}`. They will not match tok
 You can make an half open range by using `*` as one of the bounds. `field:[b TO *]` will match 'bb' and 'zz', but not 'ab'.
 You can also use a comparison based syntax:`field:<b`, `field:>b`, `field:<=b` or `field:>=b`.
 
-<!-- NEEDS CLARIFICATION : ordering of empty values ? -->
+<!-- NOTE : empty values likely not indexed -->
 
 ###### Examples
 - Inclusive Range: `ip:[127.0.0.1 TO 127.0.0.50]`
@@ -224,13 +221,9 @@ Without parentheses, `AND` takes precedence over `OR`. That is, `a AND b OR c` i
 ## Other considerations 
 
 ### Default Search Fields
-In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration.
-
-<!-- NEED CLARIFICATION : default fields clauses behavior on an array is combined using OR or AND ? -->
+In many case it is possible to omit the field you search if it was configured in the `default_search_fields` array of the index configuration. If more than one field is configured as default, the resulting implicit clauses are combined using a conjunction ('OR').
 
 ### Tokenization
 Note that the result of a query can depend on the tokenizer used for the field getting searched. Hence this document always speaks of tokens, which may be the exact value the document contain (in case of the raw tokenizer), or a subset of it (for instance any tokenizer cutting on spaces).
 
 <!-- NOTE : should dig deeper ? -->
-Quickwit uses a query mini-language which is used by providing a `query` parameter to the search endpoints.
-<!-- todo also used in some place in ES: where? -->