Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove IndexSearcher#search(Query,Collector) in favor of IndexSearcher#search(Query,CollectorManager) [LUCENE-10002] #11041

Closed
asfimport opened this issue Jun 14, 2021 · 24 comments

Comments

@asfimport
Copy link

It's a bit trappy that you can create an IndexSearcher with an executor, but that it would always search on the caller thread when calling IndexSearcher#search(Query,Collector).

 Let's remove IndexSearcher#search(Query,Collector), point our users to IndexSearcher#search(Query,CollectorManager) instead, and change factory methods of our main collectors (e.g. TopScoreDocCollector#create) to return a CollectorManager instead of a Collector?


Migrated from LUCENE-10002 by Adrien Grand (@jpountz), updated Apr 06 2022
Pull requests: #240

@asfimport
Copy link
Author

Greg Miller (@gsmiller) (migrated from JIRA)

+1. I also don't love that users can setup an IndexSearcher with an Executor and then not leverage it since they're not providing a CollectorManager. We have the same mixed pattern in DrillSideways as well. If we end up moving forward with this change in IndexSearcher, I'll create another issue to make the same update in DrillSideways for consistency.

@asfimport
Copy link
Author

Zach Chen (@zacharymorn) (migrated from JIRA)

Hi @jpountz @gsmiller, I have created a PR for this to deprecate the collector API in favor of the collector manager API, as well as some initial refactoring to some tests and the parts in DrillSideways that use TopScoreDocCollector & TopFieldCollector to use the latter API. I plan to submit more PRs afterward for other areas in the codebase.

Please note that I did first try to remove the collector API entirely, but that ended up resulting in way too many changes than I'm comfortable with in a single PR, and I also feel this API is such a commonly used one that users may prefer a more gradual deprecation / transition period. Hence I reverted my previous effort and adopted a phased approach.

@asfimport
Copy link
Author

asfimport commented Aug 14, 2021

Greg Miller (@gsmiller) (migrated from JIRA)

Nice @zacharymorn! Quite a large change for sure! I took a look at the DrillSideways changes and they appear correct to me at first glance, but I'll see if I can spend more time going through the whole PR in the next couple of days.

In the meantime, I went ahead and spun off #11088 to track making a similar API change to DrillSideways.

@asfimport
Copy link
Author

asfimport commented Aug 15, 2021

Zach Chen (@zacharymorn) (migrated from JIRA)

Nice @zacharymorn! Quite a large change for sure! I took a look at the DrillSideways changes and they appear correct to me at first glance, but I'll see if I can spend more time going through the whole PR in the next couple of days.

In the meantime, I went ahead and spun off #11088 to track making a similar API change to DrillSideways.

Sounds great, thanks Greg!

@asfimport
Copy link
Author

Mark Robert Miller (@markrmiller) (migrated from JIRA)

I did this some time ago. Read in Mike's blog you can just toss in an executor, tossed it, looked for my that was easy button.

+1 on making it obvious that I literally had changed nothing. Unfortunately, would not have saved me from the complications around implementing the "map reduce" collectors, all hopefully efficiently, and working out when it makes sense to use them vs the std search method, given the overhead and the index size and number of segments needed to make the tradeoff worth it, or at least not a performance hit. Or working out how to deal with changing segment count makeup, what kind of merge control or shenanigans make sense when or how and when and why to manage the segment slices and how they are grouped. Etc. Etc.

Went from, "drop in an executor? sweet" to ...

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 11006fb in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=11006fb

LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612)

In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling IndexSearcher#count(Query) makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance.

Co-authored-by: Adrien Grand <[email protected]>

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 794f941 in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=794f941

LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612)

In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling IndexSearcher#count(Query) makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance.

Co-authored-by: Adrien Grand <[email protected]>

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit ee7a8d6 in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=ee7a8d6

LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639)

Also use LuceneTestCase#newSearcher

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit dba9b18 in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=dba9b18

LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639)

Also use LuceneTestCase#newSearcher

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1b083ea in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=1b083ea

LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716)

In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit bfe7096 in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=bfe7096

LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716)

In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit bff4246 in lucene's branch refs/heads/main from Adrien Grand
https://gitbox.apache.org/repos/asf?p=lucene.git;h=bff4246

LUCENE-10002: Fix test failure.

When IndexSearcher is created with a threadpool it becomes impossible to assert
on the number of evaluated hits overall.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 2a6b2ca in lucene's branch refs/heads/branch_9x from Adrien Grand
https://gitbox.apache.org/repos/asf?p=lucene.git;h=2a6b2ca

LUCENE-10002: Fix test failure.

When IndexSearcher is created with a threadpool it becomes impossible to assert
on the number of evaluated hits overall.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 66bbc95 in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=66bbc95

LUCENE-10002: Add FixedBitSetCollector and corresponding collector manager to test framework (#766)

Some tests collect matching docs in a FixedBitSet. In the effort of moving such tests to using IndexSearcher#search(Query, CollectorManager) as part of LUCENE-10002, this commit adds a new FixedBitSetCollector class that exposes this functionality as well as a createManager method that returns a corresponding CollectorManager.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 3bd6e32 in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=3bd6e32

LUCENE-10002: Add FixedBitSetCollector and corresponding collector manager to test framework (#766)

Some tests collect matching docs in a FixedBitSet. In the effort of moving such tests to using IndexSearcher#search(Query, CollectorManager) as part of LUCENE-10002, this commit adds a new FixedBitSetCollector class that exposes this functionality as well as a createManager method that returns a corresponding CollectorManager.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit e7f9f2c in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=e7f9f2c50d2

LUCENE-10002: Replaces usages of FacetsCollector with FacetsCollectorManager (#764)

In the effort of decreasing usages of IndexSearcher#search(query, Collector) by using the corresponding method that accepts a collector manager, this commit replaces many usages of FacetsCollector with its corresponding existing collector manager.

@asfimport
Copy link
Author

asfimport commented Apr 5, 2022

ASF subversion and git services (migrated from JIRA)

Commit 7ed0f3d in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=7ed0f3d7adb

#11520: Add support for concurrent facets random sampling (#765)

This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.

This relates to LUCENE-10002. It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 395126a in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=395126a35b9

LUCENE-10002: Replaces usages of FacetsCollector with FacetsCollectorManager (#764)

In the effort of decreasing usages of IndexSearcher#search(query, Collector) by using the corresponding method that accepts a collector manager, this commit replaces many usages of FacetsCollector with its corresponding existing collector manager.

@asfimport
Copy link
Author

asfimport commented Apr 5, 2022

ASF subversion and git services (migrated from JIRA)

Commit b0ecba4 in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=b0ecba4fb24

#11520: Add support for concurrent facets random sampling (#765)

This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.

This relates to LUCENE-10002. It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 74e9716 in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=74e9716aec7

LUCENE-10002: move MemoryIndex to search(Query, CollectorManager) (#785)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 1cf1b30 in lucene's branch refs/heads/main from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=1cf1b301af0

LUCENE-10002: replace more usages of search(Query, Collector) in tests (#787)

This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager.

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit 37434ff in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=37434ffb1fc

LUCENE-10002: move MemoryIndex to search(Query, CollectorManager) (#785)

@asfimport
Copy link
Author

ASF subversion and git services (migrated from JIRA)

Commit ccd21fa in lucene's branch refs/heads/branch_9x from Luca Cavanna
https://gitbox.apache.org/repos/asf?p=lucene.git;h=ccd21fa5d9d

LUCENE-10002: replace more usages of search(Query, Collector) in tests (#787)

This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager.

javanna added a commit to javanna/lucene that referenced this issue Sep 6, 2024
These can now be replaced by the corresponding methods added to FacetsCollectorManager
that accept a FacetsCollectorManager as last argument in place of a Collector

Relates to apache#11041
javanna added a commit that referenced this issue Sep 7, 2024
These can now be replaced by the corresponding methods added to FacetsCollectorManager
that accept a FacetsCollectorManager as last argument in place of a Collector

Relates to #11041
@javanna
Copy link
Contributor

javanna commented Sep 14, 2024

Only the actual removal of the deprecated search(Query, Collector) is left to do. That is tracked by #12892 hence this issue can be closed.

@javanna javanna closed this as completed Sep 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants