Add vector search tuning tips (hazelcast#1196)

Results of internal benchmarking --------- Co-authored-by: rebekah-lawrence <[email protected]> Co-authored-by: Oliver Howell <[email protected]>
oliverhowell · Jul 29, 2024 · 24be30f · 24be30f
1 parent c853e7d
commit 24be30f
Showing 1 changed file with 5 additions and 0 deletions.
diff --git a/docs/modules/data-structures/pages/vector-search-overview.adoc b/docs/modules/data-structures/pages/vector-search-overview.adoc
@@ -231,3 +231,8 @@ If using partitions that are larger than the recommended size, ensure that you h
 To decrease pressure on heap memory, you can decrease the number of parallel migrations using `hazelcast.partition.max.parallel.migrations` and `hazelcast.partition.max.parallel.replications`.
 ====
 
+== Tuning tips
+
+1. For searches with small `topK` (for example, 10) it may be beneficial to artificially increase `topK`, adjust `partitionLimit` accordingly, and discard extra results. If you need 10 results, a good starting point for tuning could be `topK=100` and a `partitionLimit` between 50 and 100. While this will make the search slower, it will also improve quality, sometimes significantly. Overall, this setup can be more efficient than increasing index build parameters (`max-degree`, `ef-construction`) which results in slower index builds and searches. With a very small `topK` or `paritionLimit`, the search algorithm is less able to escape local minima and find the best results.
+2. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a  similarity search may return poor quality results.
+