Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

antonha · 2024-03-13T06:25:39Z

This PR is the smallest I could make (except for number of LongNrq queries, could probably be fewer) to prove that the changes in apache/lucene/pull/13149 work.

I aimed at reproducing for wikimediumall. This needs to be run with optimize = True for indexing and commitPoint = 'single' for the competition - otherwise it is hard to see the performance difference. The reason for this is that the BkdTree IntsWriter otherwise chooses too good of a compression, since the number of documents is too low in each segment.

I'm not sure if this should be merged - the PR is mostly here for reference.

antonha · 2024-03-13T06:35:08Z

src/python/competition.py

@@ -422,7 +422,7 @@ def __init__(self, cold=False,
               # Pass fixed randomSeed so separate runs are comparable (pick the same tasks):
               randomSeed=None,
               benchSearch=True,
-               taskCountPerCat = 1,
+               taskCountPerCat = 20,


This is important, to trigger multiple implementations of the IntersectVisitor in PointRangeQueries

I should also note that this might make the benchmark slower, since more implementations of the IntersectVisitor might drag down performance due do virtual calls now being used.

In real Lucene applications these multiple implementations is probably the norm though, so that makes the benchmark better. apache/lucene#13149 should lower the performance decrease from this.

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149

6d48713

antonha commented Mar 13, 2024

View reviewed changes

antonha mentioned this pull request Mar 13, 2024

Made DocIdsWriter use DISI when reading documents with an IntersectVisitor apache/lucene#13149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

antonha commented Mar 13, 2024

antonha Mar 13, 2024

antonha Mar 13, 2024

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

Are you sure you want to change the base?

Minimal proof for Bulk DocIdSetIterator for Lucene PR 13149 #257

Conversation

antonha commented Mar 13, 2024

antonha Mar 13, 2024

Choose a reason for hiding this comment

antonha Mar 13, 2024

Choose a reason for hiding this comment