-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Update similarity search with new base classes : Query Search #1508
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Thank you for contributing to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. All experimental so there some leeway, not going to say I went through every function with a fine tooth comb. Some of the comments may be a bit more general and apply to more than the highlighted code.
Would be good to update aeon-toolkit/aeon-admin#1 with the contents of your comments just so there a centralised document for interested people until we decide to make this non-experimental.
I would be careful with the base class parameters, if it does not apply to all (or the vast majority) of inheriting classes then it probably should not be here. Not saying that is the case, but there are some bases i.e. clustering with n_clusters
where we are going to run into issues later.
…oolkit/aeon into ag/similaritysearchupdate
Any reason for the new base class direction? Do they other base classes not have much shared functionality? |
Yeah, after doing some pen and paper for the other two classes, the base class for similarity search would be pretty much empty. For example, index search will actually do things differently during fit as it need to build a similarity model, while series-search in the most naive way (without computational optimisations like MP) is simply looping over a query search for all possible candidates. The computational optimisations do require some rethinking of the fit also compared to query search. |
On second thought, we could call the preprocess function of the CollectionEstimator for the 3D data given during fit method in a BaseSimilaritySearch, but that's about it ... Would that be better for structure’s sake's ? |
Couple of things to consider but ofc can change later as you say. If there are a significant number of shared parameters/attributes/functions used then it may be a good idea to keep even if its mostly abstract methods. Also there may be some situations where you want to use |
For some reason, I forgot to consider this ... I'm a bit out of touch today ! Swapping structure to add it back with the adjustment. |
Noticed some issue with the base class structure for the case of #1311, where the optimisation relies on lower bounding and not returning the distance profile fully computed, so I'll revamp it to be useable for all type of optimisations. |
Sorry for the mess in this PR ... To summarize : The previous Additionally, there was the problem of computational optimisations that only compute part of the distance profiles (e.g. dtw lower bounding). These optimisations need to know the matching condition (top-k, ...) to work. The previous structure would not have allowed that, as distance profile computations were happening in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with this going in if you are happy with it @baraline. If any issues arise can be dealt with in a separate PR.
Reference Issues/PRs
Part of #1243
What does this implement/fix? Explain your changes.
As described in #1243, this is a first PR that implement base classes for query search task, and transfers the existing code under this new submodule.
Remaining TODOs:
Reported to another PR as this one is already big enough
PR checklist
For new estimators and functions
__maintainer__
at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.