-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager) - TopFieldCollectorManager & TopScoreDocCollectorManager #240
Changes from 5 commits
e23d11b
776759d
0c2c12b
4af7740
86bac52
3a3a86a
d21e5f9
be6977d
2df81c5
81163de
aa114d7
0f7b327
06b0f5a
84641b4
a9a6828
0b4f9cd
d588a82
8f2797f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,7 +32,7 @@ | |
import org.apache.lucene.search.BooleanQuery; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.TermQuery; | ||
import org.apache.lucene.search.TotalHitCountCollector; | ||
import org.apache.lucene.search.TotalHitCountCollectorManager; | ||
import org.apache.lucene.util.BytesRef; | ||
|
||
/** | ||
|
@@ -179,10 +179,8 @@ private Map<BytesRef, Integer> getWordFreqForClassess(String word) throws IOExce | |
if (query != null) { | ||
booleanQuery.add(query, BooleanClause.Occur.MUST); | ||
} | ||
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector(); | ||
indexSearcher.search(booleanQuery.build(), totalHitCountCollector); | ||
|
||
int ret = totalHitCountCollector.getTotalHits(); | ||
int ret = indexSearcher.search(booleanQuery.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could make it even better by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
if (ret != 0) { | ||
searched.put(cclass, ret); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,7 +35,7 @@ | |
import org.apache.lucene.search.IndexSearcher; | ||
import org.apache.lucene.search.Query; | ||
import org.apache.lucene.search.TermQuery; | ||
import org.apache.lucene.search.TotalHitCountCollector; | ||
import org.apache.lucene.search.TotalHitCountCollectorManager; | ||
import org.apache.lucene.search.WildcardQuery; | ||
import org.apache.lucene.util.BytesRef; | ||
|
||
|
@@ -169,7 +169,6 @@ protected int countDocsWithClass() throws IOException { | |
Terms terms = MultiTerms.getTerms(this.indexReader, this.classFieldName); | ||
int docCount; | ||
if (terms == null || terms.getDocCount() == -1) { // in case codec doesn't support getDocCount | ||
TotalHitCountCollector classQueryCountCollector = new TotalHitCountCollector(); | ||
BooleanQuery.Builder q = new BooleanQuery.Builder(); | ||
q.add( | ||
new BooleanClause( | ||
|
@@ -179,8 +178,7 @@ protected int countDocsWithClass() throws IOException { | |
if (query != null) { | ||
q.add(query, BooleanClause.Occur.MUST); | ||
} | ||
indexSearcher.search(q.build(), classQueryCountCollector); | ||
docCount = classQueryCountCollector.getTotalHits(); | ||
docCount = indexSearcher.search(q.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's simplify by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. Also updated the implementation of |
||
} else { | ||
docCount = terms.getDocCount(); | ||
} | ||
|
@@ -276,9 +274,7 @@ private int getWordFreqForClass(String word, Term term) throws IOException { | |
if (query != null) { | ||
booleanQuery.add(query, BooleanClause.Occur.MUST); | ||
} | ||
TotalHitCountCollector totalHitCountCollector = new TotalHitCountCollector(); | ||
indexSearcher.search(booleanQuery.build(), totalHitCountCollector); | ||
return totalHitCountCollector.getTotalHits(); | ||
return indexSearcher.search(booleanQuery.build(), new TotalHitCountCollectorManager()); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. likewise here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} | ||
|
||
private double calculateLogPrior(Term term, int docsWithClassSize) throws IOException { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,9 +66,9 @@ | |
* match lots of documents, counting the number of hits may take much longer than computing the top | ||
* hits so this trade-off allows to get some minimal information about the hit count without slowing | ||
* down search too much. The {@link TopDocs#scoreDocs} array is always accurate however. If this | ||
* behavior doesn't suit your needs, you should create collectors manually with either {@link | ||
* TopScoreDocCollector#create} or {@link TopFieldCollector#create} and call {@link #search(Query, | ||
* Collector)}. | ||
* behavior doesn't suit your needs, you should create collectorManagers manually with either {@link | ||
* TopScoreDocCollectorManager} or {@link TopFieldCollectorManager} and call {@link #search(Query, | ||
* CollectorManager)}. | ||
* | ||
* <p><a id="thread-safety"></a> | ||
* | ||
|
@@ -478,34 +478,10 @@ public TopDocs searchAfter(ScoreDoc after, Query query, int numHits) throws IOEx | |
} | ||
|
||
final int cappedNumHits = Math.min(numHits, limit); | ||
|
||
final CollectorManager<TopScoreDocCollector, TopDocs> manager = | ||
new CollectorManager<TopScoreDocCollector, TopDocs>() { | ||
|
||
private final HitsThresholdChecker hitsThresholdChecker = | ||
(executor == null || leafSlices.length <= 1) | ||
? HitsThresholdChecker.create(Math.max(TOTAL_HITS_THRESHOLD, numHits)) | ||
: HitsThresholdChecker.createShared(Math.max(TOTAL_HITS_THRESHOLD, numHits)); | ||
|
||
private final MaxScoreAccumulator minScoreAcc = | ||
(executor == null || leafSlices.length <= 1) ? null : new MaxScoreAccumulator(); | ||
|
||
@Override | ||
public TopScoreDocCollector newCollector() throws IOException { | ||
return TopScoreDocCollector.create( | ||
cappedNumHits, after, hitsThresholdChecker, minScoreAcc); | ||
} | ||
|
||
@Override | ||
public TopDocs reduce(Collection<TopScoreDocCollector> collectors) throws IOException { | ||
final TopDocs[] topDocs = new TopDocs[collectors.size()]; | ||
int i = 0; | ||
for (TopScoreDocCollector collector : collectors) { | ||
topDocs[i++] = collector.topDocs(); | ||
} | ||
return TopDocs.merge(0, cappedNumHits, topDocs); | ||
} | ||
}; | ||
final boolean supportsConcurrency = executor != null && leafSlices.length > 1; | ||
CollectorManager<TopScoreDocCollector, TopDocs> manager = | ||
new TopScoreDocCollectorManager( | ||
cappedNumHits, after, TOTAL_HITS_THRESHOLD, supportsConcurrency); | ||
|
||
return search(query, manager); | ||
} | ||
|
@@ -527,7 +503,10 @@ public TopDocs search(Query query, int n) throws IOException { | |
* | ||
* @throws TooManyClauses If a query would exceed {@link IndexSearcher#getMaxClauseCount()} | ||
* clauses. | ||
* @deprecated This method is being deprecated in favor of {@link IndexSearcher#search(Query, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we like the Is your reasoning behind keeping this in there for now that all of our uses of this (internal to Lucene) haven't yet migrated as part of this change? Would the plan me to migrate all usages off of this internally and then actually remove it on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you mean
As explained in https://issues.apache.org/jira/browse/LUCENE-10002?focusedCommentId=17397827&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17397827, I use deprecation instead of direct removal in this PR exactly for the reasons you mentioned:
I would be interested in seeing how folks feel about the timing of removal, but after #1 is completed, complete removal of IndexSearcher#search(Query, Collector) in lucene codebase should require only straightforward deletion changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should try to keep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 makes sense to keep it in 9.0. As for my comment about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Though, I think there are both |
||
* CollectorManager)} due to its support for concurrency in IndexSearcher | ||
*/ | ||
@Deprecated | ||
public void search(Query query, Collector results) throws IOException { | ||
query = rewrite(query); | ||
search(leafContexts, createWeight(query, results.scoreMode(), 1), results); | ||
|
@@ -613,34 +592,10 @@ private TopFieldDocs searchAfter( | |
final int cappedNumHits = Math.min(numHits, limit); | ||
final Sort rewrittenSort = sort.rewrite(this); | ||
|
||
final boolean supportsConcurrency = executor != null && leafSlices.length > 1; | ||
final CollectorManager<TopFieldCollector, TopFieldDocs> manager = | ||
new CollectorManager<>() { | ||
|
||
private final HitsThresholdChecker hitsThresholdChecker = | ||
(executor == null || leafSlices.length <= 1) | ||
? HitsThresholdChecker.create(Math.max(TOTAL_HITS_THRESHOLD, numHits)) | ||
: HitsThresholdChecker.createShared(Math.max(TOTAL_HITS_THRESHOLD, numHits)); | ||
|
||
private final MaxScoreAccumulator minScoreAcc = | ||
(executor == null || leafSlices.length <= 1) ? null : new MaxScoreAccumulator(); | ||
|
||
@Override | ||
public TopFieldCollector newCollector() throws IOException { | ||
// TODO: don't pay the price for accurate hit counts by default | ||
return TopFieldCollector.create( | ||
rewrittenSort, cappedNumHits, after, hitsThresholdChecker, minScoreAcc); | ||
} | ||
|
||
@Override | ||
public TopFieldDocs reduce(Collection<TopFieldCollector> collectors) throws IOException { | ||
final TopFieldDocs[] topDocs = new TopFieldDocs[collectors.size()]; | ||
int i = 0; | ||
for (TopFieldCollector collector : collectors) { | ||
topDocs[i++] = collector.topDocs(); | ||
} | ||
return TopDocs.merge(rewrittenSort, 0, cappedNumHits, topDocs); | ||
} | ||
}; | ||
new TopFieldCollectorManager( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wonderful to see all this redundant code go away! |
||
rewrittenSort, cappedNumHits, after, TOTAL_HITS_THRESHOLD, supportsConcurrency); | ||
|
||
TopFieldDocs topDocs = search(query, manager); | ||
if (doDocScores) { | ||
|
@@ -659,9 +614,12 @@ public TopFieldDocs reduce(Collection<TopFieldCollector> collectors) throws IOEx | |
*/ | ||
public <C extends Collector, T> T search(Query query, CollectorManager<C, T> collectorManager) | ||
throws IOException { | ||
if (executor == null || leafSlices.length <= 1) { | ||
if (executor == null || leafSlices.length == 0) { | ||
gsmiller marked this conversation as resolved.
Show resolved
Hide resolved
|
||
final C collector = collectorManager.newCollector(); | ||
search(query, collector); | ||
final Query rewrittenQuery = rewrite(query); | ||
|
||
search(leafContexts, createWeight(rewrittenQuery, collector.scoreMode(), 1), collector); | ||
|
||
return collectorManager.reduce(Collections.singletonList(collector)); | ||
} else { | ||
final List<C> collectors = new ArrayList<>(leafSlices.length); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if anyone out there has sub-classed this to provide their own
Collector
through overriding this method? It's possible right? Would we want to support this with aCollectorManager
now instead? You might consider creating a new protected method--createCollectorManager
--that sub-classes could override if they want, then add javadoc here with a@lucene.deprecated
tag pointing users to the new method.Also, if this change is going onto
main
, I might kill this method entirely and then add it back in as deprecated if you choose to backport to 8x.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm you are right. I reverted some changes in this class to still use the
ReadTask#createCollector
method, in case there's overriding from users.I think this
createCollector
method was originally added (11 years ago) to support benchmarking, and not sure how widely this is used actually? So hopefully marking it as deprecated for release 9.0 and removing it entirely once we release 10.0 should be good enough?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking along the lines of keeping the change you had but providing a protected method that would allow sub-classes to provide their own
CollectorManager
if they wanted to do so (which would give users a migration path if they were previously providing their ownCollector
). So in 9.0, I think you'd move completely to theCollectorManager
approach like you had. What do you think of that approach?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see where you are coming from. However, upon looking at this method more closely, I'm afraid this method is effectively not useful, since the result of using this collector was commented out :D :
lucene/lucene/benchmark/src/java/org/apache/lucene/benchmark/byTask/tasks/ReadTask.java
Lines 119 to 121 in 3b3f960
So I'm now leaning more towards just removing this method altogether if no users ever noticed / complaint about this. What do you think @gsmiller @jpountz ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect it would still be useful to assess the performance of a collector but you probably have a point that if performance is the only thing that the benchmark module reports, then there's a chance that it's not actually used much.
I did a quick GitHub search to see how much code I could find that extends this code:
"extends ReadTask" createCollector
: no hits"extends SearchTask" createCollector -SearchWithCollectorTask
: no hits (I had to excludeSearchWithCollectorTask
or all Lucene forks would be returned).Given that the benchmark module is quite expert, I'd be ok with breaking hard here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I've removed this method, and also fixed the commented out code.