Merge pull request #3554 from vespa-engine/kkraune/doc-selection

Kkraune/doc selection
vespa-engine · Jan 2, 2025 · 0fa74fc · 0fa74fc
2 parents 02a10a7 + 3df0f61
commit 0fa74fc
Showing 1 changed file with 67 additions and 55 deletions.
diff --git a/en/reference/document-select-language.html b/en/reference/document-select-language.html
@@ -13,39 +13,49 @@
 
 
 <h2 id="examples">Examples</h2>
-<dl>
-<dt><code>music.author and music.length &lt;= 1000</code>
-<dd>This expression selects all documents with the above two conditions
-    set. The first condition states that the documents should be of type music,
-    and the author field must exist. The second states that the field length
-    must be set, and be less than 1000.
-<dt><code>book.author = "*John*Doe\n" or not book.author</code>
-<dd>This expression selects all documents where either of the subexpressions are
-    true. The first one states that the author field should include the name
-    John Doe, with anything in between or in front. The \n escape is converted
-    to a newline before the field comparison is done. Thus requiring the field
-    to end with Doe and a newline for a match to be true. The second expression
-    selects all books where no author is defined.
-<dt><code>not (music.length &gt; 1000) and (false or music.test)</code>
-<dd>Here is an example of how parentheses are used to group expressions.
-    Also, a constant value false has been used. Note that the <code>(false or music.test)</code>
-    sub-expression could be exchanged
-    with just <code>music.test</code> without altering the result of the
-    selection. The sub-expression within the <code>not</code> clause selects all documents
-    where the size field is above 1000 and the test field is defined. The <code>not</code>
-    clause inverts the selection, thus selecting all documents with size
-    less than or equal to 1000 or the test field undefined.
-</dl>
+<p>Match all documents in the <code>music</code> schema:</p>
+<p><code>music</code></p>
+<p>
+  As applications can have multiple schemas,
+  match schema and then field-specific:
+</p>
+<p><code>music and music.artistname == "Coldplay"</code></p>
+<p>
+  The below selects all documents with the above two conditions set.
+  The first condition states that the documents should be of type music,
+  and the author field must exist.
+  The second states that the field length must be set, and be less than 1000:
+</p>
+<p><code>music.author and music.length &lt;= 1000</code></p>
+<p>
+  This expression selects all documents where either of the subexpressions are true.
+  The first one states that the author field should include the name John Doe, with anything in between or in front.
+  The <code>\n</code> escape is converted to a newline before the field comparison is done.
+  Thus requiring the field to end with Doe and a newline for a match to be true.
+  The second expression selects all books where no author is defined:
+</p>
+<p><code>book.author = "*John*Doe\n" or not book.author</code></p>
+<p>
+  Here is an example of how parentheses are used to group expressions.
+  Also, a constant value false has been used. Note that the <code>(false or music.test)</code>
+  sub-expression could be exchanged with just <code>music.test</code> without altering the result of the selection.
+  The sub-expression within the <code>not</code> clause selects all documents
+  where the size field is above 1000 and the test field is defined.
+  The <code>not</code> clause inverts the selection,
+  thus selecting all documents with size less than or equal to 1000 or the test field undefined:
+</p>
+<p><code>not (music.length &gt; 1000) and (false or music.test)</code></p>
 <p>Other examples:</p>
 <ul>
-<li><code>music.version() == 3 and (music.givenname + " " + music.surname).lowercase() = "bruce springsteen"</code></li>
-<li><code>id.user.hash().abs() % 300 % 7 = 1</code></li>
-<li><code>music.wavstream.hash() == music.checksum</code></li>
-<li><code>music.size / music.length &gt; 10</code></li>
-<li><code>music.expire &gt; now() - 7200</code></li>
+  <li><code>music.version() == 3 and (music.givenname + " " + music.surname).lowercase() = "bruce spring*"</code></li>
+  <li><code>id.user.hash().abs() % 300 % 7 = 1</code></li>
+  <li><code>music.wavstream.hash() == music.checksum</code></li>
+  <li><code>music.size / music.length &gt; 10</code></li>
+  <li><code>music.expire &gt; now() - 7200</code></li>
 </ul>
 
 
+
 <h2 id="case-sensitiveness">Case sensitiveness</h2>
 <p>
 The identifiers used in this language (<code>and or not true false null id
@@ -245,19 +255,21 @@ <h2 id="null-behavior-with-imported-fields">Null behavior with imported fields</
   If you only want your cluster to retain recordings from artists that are certifiably cool,
   you might be tempted to write a selection like the following:
 </p>
-<pre>
-&lt;document type="music_recording" selection="music_recording.artist_is_cool == true"&gt;
-</pre>
+<pre>{% highlight xml %}
+<document type="music_recording"
+          selection="music_recording.artist_is_cool == true">
+{% endhighlight %}</pre>
 <p>
 <strong>This won't work as expected</strong>, because this expression is evaluated as part of the feeding pipeline to figure
 out if a cluster should accept a given document. At that point in time, there is no access to the parent document.
 Consequently, the field will return <code>null</code> and the document won't be routed to the cluster.
 </p><p>
 Instead, write your expressions to handle the case where the parent document <em>may not exist</em>:
 </p>
-<pre>
-&lt;document type="music_recording" selection="(music_recording.artist_is_cool == null) or (music_recording.artist_is_cool == true)"&gt;
-</pre>
+<pre>{% highlight xml %}
+<document type="music_recording"
+          selection="(music_recording.artist_is_cool == null) or (music_recording.artist_is_cool == true)">
+{% endhighlight %}</pre>
 <p>
 With this selection, we explicitly let a document be accepted into the cluster if its imported field
 is <em>not</em> available. However, if it <em>is</em> available, we allow it to be used for GC.
@@ -456,40 +468,40 @@ <h3 id="example">Example</h3>
 <p>Grandparent schema:</p>
 <pre>
 schema grandparent {
-  document grandparent {
-    field a1 type int {
-      indexing: attribute | summary
+    document grandparent {
+        field a1 type int {
+            indexing: attribute | summary
+        }
     }
-  }
 }
 </pre>
 <p>Parent schema, with reference to grandparent:</p>
 <pre>
 schema parent {
-  document parent {
-    field a2 type int {
-      indexing: attribute | summary
-    }
-    field ref type reference&lt;grandparent&gt; {
-      indexing: attribute | summary
+    document parent {
+        field a2 type int {
+            indexing: attribute | summary
+        }
+        field ref type reference&lt;grandparent&gt; {
+            indexing: attribute | summary
+        }
     }
-  }
-  import field ref.a1 as a1 {}
+    import field ref.a1 as a1 {}
 }
 </pre>
 <p>Child schema, with reference to parent and (transitively) grandparent:</p>
 <pre>
 schema child {
-  document child {
-    field a3 type int {
-      indexing: attribute | summary
-    }
-    field ref type reference&lt;parent&gt; {
-      indexing: attribute | summary
+    document child {
+        field a3 type int {
+            indexing: attribute | summary
+        }
+        field ref type reference&lt;parent&gt; {
+            indexing: attribute | summary
+        }
     }
-  }
-  import field ref.a1 as a1 {}
-  import field ref.a2 as a2 {}
+    import field ref.a1 as a1 {}
+    import field ref.a2 as a2 {}
 }
 </pre>
 <p>Using these in document selection expressions is easy:</p>