-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager #240
Changes from 3 commits
e23d11b
776759d
0c2c12b
4af7740
86bac52
3a3a86a
d21e5f9
be6977d
2df81c5
81163de
aa114d7
0f7b327
06b0f5a
84641b4
a9a6828
0b4f9cd
d588a82
8f2797f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,7 @@ | |
import org.apache.lucene.search.BooleanQuery; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.TermQuery; | ||
import org.apache.lucene.search.TotalHitCountCollector; | ||
import org.apache.lucene.search.TotalHitCountCollectorManager; | ||
import org.apache.lucene.util.BytesRef; | ||
|
||
/** | ||
|
@@ -179,10 +179,8 @@ private Map<BytesRef, Integer> getWordFreqForClassess(String word) throws IOExce | |
if (query != null) { | ||
booleanQuery.add(query, BooleanClause.Occur.MUST); | ||
} | ||
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector(); | ||
indexSearcher.search(booleanQuery.build(), totalHitCountCollector); | ||
|
||
int ret = totalHitCountCollector.getTotalHits(); | ||
int ret = indexSearcher.search(booleanQuery.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could make it even better by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
if (ret != 0) { | ||
searched.put(cclass, ret); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,7 @@ | |
import org.apache.lucene.search.IndexSearcher; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.TermQuery; | ||
import org.apache.lucene.search.TotalHitCountCollector; | ||
import org.apache.lucene.search.TotalHitCountCollectorManager; | ||
import org.apache.lucene.search.WildcardQuery; | ||
import org.apache.lucene.util.BytesRef; | ||
|
||
|
@@ -169,7 +169,6 @@ protected int countDocsWithClass() throws IOException { | |
Terms terms = MultiTerms.getTerms(this.indexReader, this.classFieldName); | ||
int docCount; | ||
if (terms == null || terms.getDocCount() == -1) { // in case codec doesn't support getDocCount | ||
TotalHitCountCollector classQueryCountCollector = new TotalHitCountCollector(); | ||
BooleanQuery.Builder q = new BooleanQuery.Builder(); | ||
q.add( | ||
new BooleanClause( | ||
|
@@ -179,8 +178,7 @@ protected int countDocsWithClass() throws IOException { | |
if (query != null) { | ||
q.add(query, BooleanClause.Occur.MUST); | ||
} | ||
indexSearcher.search(q.build(), classQueryCountCollector); | ||
docCount = classQueryCountCollector.getTotalHits(); | ||
docCount = indexSearcher.search(q.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's simplify by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Also updated the implementation of |
||
} else { | ||
docCount = terms.getDocCount(); | ||
} | ||
|
@@ -276,9 +274,7 @@ private int getWordFreqForClass(String word, Term term) throws IOException { | |
if (query != null) { | ||
booleanQuery.add(query, BooleanClause.Occur.MUST); | ||
} | ||
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector(); | ||
indexSearcher.search(booleanQuery.build(), totalHitCountCollector); | ||
return totalHitCountCollector.getTotalHits(); | ||
return indexSearcher.search(booleanQuery.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. likewise here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
|
||
private double calculateLogPrior(Term term, int docsWithClassSize) throws IOException { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,9 +66,9 @@ | |
* match lots of documents, counting the number of hits may take much longer than computing the top | ||
* hits so this trade-off allows to get some minimal information about the hit count without slowing | ||
* down search too much. The {@link TopDocs#scoreDocs} array is always accurate however. If this | ||
* behavior doesn't suit your needs, you should create collectors manually with either {@link | ||
* TopScoreDocCollector#create} or {@link TopFieldCollector#create} and call {@link #search(Query, | ||
* Collector)}. | ||
* behavior doesn't suit your needs, you should create collectorManagers manually with either {@link | ||
* TopScoreDocCollectorManager#create} or {@link TopFieldCollectorManager#create} and call {@link | ||
* #search(Query, CollectorManager)}. | ||
* | ||
* <p><a id="thread-safety"></a> | ||
* | ||
|
@@ -478,34 +478,8 @@ public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOEx | |
} | ||
|
||
final int cappedNumHits = Math.min(numHits, limit); | ||
|
||
final CollectorManager<TopScoreDocCollector, TopDocs> manager = | ||
new CollectorManager<TopScoreDocCollector, TopDocs>() { | ||
|
||
private final HitsThresholdChecker hitsThresholdChecker = | ||
(executor == null || leafSlices.length <= 1) | ||
? HitsThresholdChecker.create(Math.max(TOTAL_HITS_THRESHOLD, numHits)) | ||
: HitsThresholdChecker.createShared(Math.max(TOTAL_HITS_THRESHOLD, numHits)); | ||
|
||
private final MaxScoreAccumulator minScoreAcc = | ||
(executor == null || leafSlices.length <= 1) ? null : new MaxScoreAccumulator(); | ||
|
||
@Override | ||
public TopScoreDocCollector newCollector() throws IOException { | ||
return TopScoreDocCollector.create( | ||
cappedNumHits, after, hitsThresholdChecker, minScoreAcc); | ||
} | ||
|
||
@Override | ||
public TopDocs reduce(Collection<TopScoreDocCollector> collectors) throws IOException { | ||
final TopDocs[] topDocs = new TopDocs[collectors.size()]; | ||
int i = 0; | ||
for (TopScoreDocCollector collector : collectors) { | ||
topDocs[i++] = collector.topDocs(); | ||
} | ||
return TopDocs.merge(0, cappedNumHits, topDocs); | ||
} | ||
}; | ||
CollectorManager<TopScoreDocCollector, TopDocs> manager = | ||
new TopScoreDocCollectorManager(cappedNumHits, after, TOTAL_HITS_THRESHOLD); | ||
|
||
return search(query, manager); | ||
} | ||
|
@@ -527,7 +501,10 @@ public TopDocs search(Query query, int n) throws IOException { | |
* | ||
* @throws TooManyClauses If a query would exceed {@link IndexSearcher#getMaxClauseCount()} | ||
* clauses. | ||
* @deprecated This method is being deprecated in favor of {@link IndexSearcher#search(Query, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we like the Is your reasoning behind keeping this in there for now that all of our uses of this (internal to Lucene) haven't yet migrated as part of this change? Would the plan me to migrate all usages off of this internally and then actually remove it on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you mean
As explained in https://issues.apache.org/jira/browse/LUCENE-10002?focusedCommentId=17397827&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17397827, I use deprecation instead of direct removal in this PR exactly for the reasons you mentioned:
I would be interested in seeing how folks feel about the timing of removal, but after #1 is completed, complete removal of IndexSearcher#search(Query, Collector) in lucene codebase should require only straightforward deletion changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should try to keep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 makes sense to keep it in 9.0. As for my comment about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Though, I think there are both |
||
* CollectorManager)} due to its support for concurrency in IndexSearcher | ||
*/ | ||
@Deprecated | ||
public void search(Query query, Collector results) throws IOException { | ||
query = rewrite(query); | ||
search(leafContexts, createWeight(query, results.scoreMode(), 1), results); | ||
|
@@ -614,33 +591,7 @@ private TopFieldDocs searchAfter( | |
final Sort rewrittenSort = sort.rewrite(this); | ||
|
||
final CollectorManager<TopFieldCollector, TopFieldDocs> manager = | ||
new CollectorManager<>() { | ||
|
||
private final HitsThresholdChecker hitsThresholdChecker = | ||
(executor == null || leafSlices.length <= 1) | ||
? HitsThresholdChecker.create(Math.max(TOTAL_HITS_THRESHOLD, numHits)) | ||
: HitsThresholdChecker.createShared(Math.max(TOTAL_HITS_THRESHOLD, numHits)); | ||
|
||
private final MaxScoreAccumulator minScoreAcc = | ||
(executor == null || leafSlices.length <= 1) ? null : new MaxScoreAccumulator(); | ||
|
||
@Override | ||
public TopFieldCollector newCollector() throws IOException { | ||
// TODO: don't pay the price for accurate hit counts by default | ||
return TopFieldCollector.create( | ||
rewrittenSort, cappedNumHits, after, hitsThresholdChecker, minScoreAcc); | ||
} | ||
|
||
@Override | ||
public TopFieldDocs reduce(Collection<TopFieldCollector> collectors) throws IOException { | ||
final TopFieldDocs[] topDocs = new TopFieldDocs[collectors.size()]; | ||
int i = 0; | ||
for (TopFieldCollector collector : collectors) { | ||
topDocs[i++] = collector.topDocs(); | ||
} | ||
return TopDocs.merge(rewrittenSort, 0, cappedNumHits, topDocs); | ||
} | ||
}; | ||
new TopFieldCollectorManager(rewrittenSort, cappedNumHits, after, TOTAL_HITS_THRESHOLD); | ||
|
||
TopFieldDocs topDocs = search(query, manager); | ||
if (doDocScores) { | ||
|
@@ -659,9 +610,12 @@ public TopFieldDocs reduce(Collection<TopFieldCollector> collectors) throws IOEx | |
*/ | ||
public <C extends Collector, T> T search(Query query, CollectorManager<C, T> collectorManager) | ||
throws IOException { | ||
if (executor == null || leafSlices.length <= 1) { | ||
if (executor == null || leafSlices.length == 0) { | ||
gsmiller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
final C collector = collectorManager.newCollector(); | ||
search(query, collector); | ||
final Query rewrittenQuery = rewrite(query); | ||
|
||
search(leafContexts, createWeight(rewrittenQuery, collector.scoreMode(), 1), collector); | ||
|
||
return collectorManager.reduce(Collections.singletonList(collector)); | ||
} else { | ||
final List<C> collectors = new ArrayList<>(leafSlices.length); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,7 +18,6 @@ | |
|
||
import java.io.IOException; | ||
import java.util.Arrays; | ||
import java.util.Collection; | ||
import java.util.Comparator; | ||
import java.util.List; | ||
import java.util.Objects; | ||
|
@@ -174,7 +173,7 @@ private static boolean canEarlyTerminateOnPrefix(Sort searchSort, Sort indexSort | |
* Implements a TopFieldCollector over one SortField criteria, with tracking | ||
zacharymorn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* document scores and maxScore. | ||
*/ | ||
private static class SimpleFieldCollector extends TopFieldCollector { | ||
static class SimpleFieldCollector extends TopFieldCollector { | ||
final Sort sort; | ||
final FieldValueHitQueue<Entry> queue; | ||
|
||
|
@@ -214,7 +213,7 @@ public void collect(int doc) throws IOException { | |
/* | ||
* Implements a TopFieldCollector when after != null. | ||
*/ | ||
private static final class PagingFieldCollector extends TopFieldCollector { | ||
static final class PagingFieldCollector extends TopFieldCollector { | ||
|
||
final Sort sort; | ||
int collectedHits; | ||
|
@@ -383,9 +382,13 @@ protected void updateMinCompetitiveScore(Scorable scorer) throws IOException { | |
* count of the result will be accurate. {@link Integer#MAX_VALUE} may be used to make the hit | ||
* count accurate, but this will also make query processing slower. | ||
* @return a {@link TopFieldCollector} instance which will sort the results by the sort criteria. | ||
* @deprecated This method is being deprecated in favor of {@link | ||
* TopFieldCollectorManager#create(Sort, int, int)} due to its support for concurrency in | ||
* IndexSearcher | ||
*/ | ||
@Deprecated | ||
gsmiller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public static TopFieldCollector create(Sort sort, int numHits, int totalHitsThreshold) { | ||
return create(sort, numHits, null, totalHitsThreshold); | ||
return TopFieldCollectorManager.create(sort, numHits, totalHitsThreshold).newCollector(); | ||
} | ||
|
||
/** | ||
|
@@ -407,97 +410,13 @@ public static TopFieldCollector create(Sort sort, int numHits, int totalHitsThre | |
* field is indexed both with doc values and points. In this case, there is an assumption that | ||
* the same data is stored in these points and doc values. | ||
* @return a {@link TopFieldCollector} instance which will sort the results by the sort criteria. | ||
* @deprecated This method is being deprecated in favor of using the constructor of {@link | ||
zacharymorn marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* TopFieldCollectorManager} due to its support for concurrency in IndexSearcher | ||
*/ | ||
@Deprecated | ||
gsmiller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
public static TopFieldCollector create( | ||
Sort sort, int numHits, FieldDoc after, int totalHitsThreshold) { | ||
if (totalHitsThreshold < 0) { | ||
throw new IllegalArgumentException( | ||
"totalHitsThreshold must be >= 0, got " + totalHitsThreshold); | ||
} | ||
|
||
return create( | ||
sort, | ||
numHits, | ||
after, | ||
HitsThresholdChecker.create(Math.max(totalHitsThreshold, numHits)), | ||
null /* bottomValueChecker */); | ||
} | ||
|
||
/** | ||
* Same as above with additional parameters to allow passing in the threshold checker and the max | ||
* score accumulator. | ||
*/ | ||
static TopFieldCollector create( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since you're fully killing two methods here, does that mean that you're not intending to maintain back compat on 9.0? I assume these would get added back in and marked deprecated if backporting to 8x? I think it's reasonable, just checking what your intention is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry just realized I missed a few comments earlier (they were collapsed on github UI). I think this one and the one from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Indeed we don't support access to pkg-private code. |
||
Sort sort, | ||
int numHits, | ||
FieldDoc after, | ||
HitsThresholdChecker hitsThresholdChecker, | ||
MaxScoreAccumulator minScoreAcc) { | ||
|
||
if (sort.fields.length == 0) { | ||
throw new IllegalArgumentException("Sort must contain at least one field"); | ||
} | ||
|
||
if (numHits <= 0) { | ||
throw new IllegalArgumentException( | ||
"numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count"); | ||
} | ||
|
||
if (hitsThresholdChecker == null) { | ||
throw new IllegalArgumentException("hitsThresholdChecker should not be null"); | ||
} | ||
|
||
FieldValueHitQueue<Entry> queue = FieldValueHitQueue.create(sort.fields, numHits); | ||
|
||
if (after == null) { | ||
return new SimpleFieldCollector(sort, queue, numHits, hitsThresholdChecker, minScoreAcc); | ||
} else { | ||
if (after.fields == null) { | ||
throw new IllegalArgumentException( | ||
"after.fields wasn't set; you must pass fillFields=true for the previous search"); | ||
} | ||
|
||
if (after.fields.length != sort.getSort().length) { | ||
throw new IllegalArgumentException( | ||
"after.fields has " | ||
+ after.fields.length | ||
+ " values but sort has " | ||
+ sort.getSort().length); | ||
} | ||
|
||
return new PagingFieldCollector( | ||
sort, queue, after, numHits, hitsThresholdChecker, minScoreAcc); | ||
} | ||
} | ||
|
||
/** | ||
* Create a CollectorManager which uses a shared hit counter to maintain number of hits and a | ||
* shared {@link MaxScoreAccumulator} to propagate the minimum score accross segments if the | ||
* primary sort is by relevancy. | ||
*/ | ||
public static CollectorManager<TopFieldCollector, TopFieldDocs> createSharedManager( | ||
Sort sort, int numHits, FieldDoc after, int totalHitsThreshold) { | ||
return new CollectorManager<>() { | ||
|
||
private final HitsThresholdChecker hitsThresholdChecker = | ||
HitsThresholdChecker.createShared(Math.max(totalHitsThreshold, numHits)); | ||
private final MaxScoreAccumulator minScoreAcc = new MaxScoreAccumulator(); | ||
|
||
@Override | ||
public TopFieldCollector newCollector() throws IOException { | ||
return create(sort, numHits, after, hitsThresholdChecker, minScoreAcc); | ||
} | ||
|
||
@Override | ||
public TopFieldDocs reduce(Collection<TopFieldCollector> collectors) throws IOException { | ||
final TopFieldDocs[] topDocs = new TopFieldDocs[collectors.size()]; | ||
int i = 0; | ||
for (TopFieldCollector collector : collectors) { | ||
topDocs[i++] = collector.topDocs(); | ||
} | ||
return TopDocs.merge(sort, 0, numHits, topDocs); | ||
} | ||
}; | ||
return new TopFieldCollectorManager(sort, numHits, after, totalHitsThreshold).newCollector(); | ||
} | ||
|
||
/** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if anyone out there has sub-classed this to provide their own
Collector
through overriding this method? It's possible right? Would we want to support this with aCollectorManager
now instead? You might consider creating a new protected method--createCollectorManager
--that sub-classes could override if they want, then add javadoc here with a@lucene.deprecated
tag pointing users to the new method.Also, if this change is going onto
main
, I might kill this method entirely and then add it back in as deprecated if you choose to backport to 8x.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you are right. I reverted some changes in this class to still use the
ReadTask#createCollector
method, in case there's overriding from users.I think this
createCollector
method was originally added (11 years ago) to support benchmarking, and not sure how widely this is used actually? So hopefully marking it as deprecated for release 9.0 and removing it entirely once we release 10.0 should be good enough?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking along the lines of keeping the change you had but providing a protected method that would allow sub-classes to provide their own
CollectorManager
if they wanted to do so (which would give users a migration path if they were previously providing their ownCollector
). So in 9.0, I think you'd move completely to theCollectorManager
approach like you had. What do you think of that approach?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are coming from. However, upon looking at this method more closely, I'm afraid this method is effectively not useful, since the result of using this collector was commented out :D :
lucene/lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/ReadTask.java
Lines 119 to 121 in 3b3f960
So I'm now leaning more towards just removing this method altogether if no users ever noticed / complaint about this. What do you think @gsmiller @jpountz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it would still be useful to assess the performance of a collector but you probably have a point that if performance is the only thing that the benchmark module reports, then there's a chance that it's not actually used much.
I did a quick GitHub search to see how much code I could find that extends this code:
"extends ReadTask" createCollector
: no hits"extends SearchTask" createCollector -SearchWithCollectorTask
: no hits (I had to excludeSearchWithCollectorTask
or all Lucene forks would be returned).Given that the benchmark module is quite expert, I'd be ok with breaking hard here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I've removed this method, and also fixed the commented out code.