Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Update similarity search with new base classes : Query Search #1508

Merged
merged 20 commits into from
Jun 17, 2024

Conversation

baraline
Copy link
Member

@baraline baraline commented May 9, 2024

Reference Issues/PRs

Part of #1243

What does this implement/fix? Explain your changes.

As described in #1243, this is a first PR that implement base classes for query search task, and transfers the existing code under this new submodule.

Remaining TODOs:

  • Adapt distance profile function for unequal length
  • Add tests for unequal length for both dummy and top-k
  • Fix missing docstrings

Reported to another PR as this one is already big enough

  • Add typing to function/class parameters

PR checklist

For new estimators and functions
  • I've added the estimator to the online API documentation.
  • (OPTIONAL) I've added myself as a __maintainer__ at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.

@baraline baraline linked an issue May 9, 2024 that may be closed by this pull request
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@aeon-actions-bot aeon-actions-bot bot added examples Example notebook related maintenance Continuous integration, unit testing & package distribution similarity search Similarity search package labels May 9, 2024
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#EC843A}{\textsf{maintenance}}$ ].
I have added the following labels to this PR based on the changes made: [ $\color{#45FD64}{\textsf{examples}}$, $\color{#006b75}{\textsf{similarity search}}$ ]. Feel free to change these if they do not properly represent the PR.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

@baraline baraline marked this pull request as ready for review May 18, 2024 20:12
@baraline baraline added the codecov actions Run the codecov action on a PR label May 18, 2024
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. All experimental so there some leeway, not going to say I went through every function with a fine tooth comb. Some of the comments may be a bit more general and apply to more than the highlighted code.

Would be good to update aeon-toolkit/aeon-admin#1 with the contents of your comments just so there a centralised document for interested people until we decide to make this non-experimental.

I would be careful with the base class parameters, if it does not apply to all (or the vast majority) of inheriting classes then it probably should not be here. Not saying that is the case, but there are some bases i.e. clustering with n_clusters where we are going to run into issues later.

aeon/similarity_search/query_search/base.py Outdated Show resolved Hide resolved
aeon/similarity_search/tests/test_base.py Outdated Show resolved Hide resolved
aeon/similarity_search/series_search/__init__.py Outdated Show resolved Hide resolved
aeon/similarity_search/query_search/dummy.py Outdated Show resolved Hide resolved
@MatthewMiddlehurst
Copy link
Member

Any reason for the new base class direction? Do they other base classes not have much shared functionality?

@baraline
Copy link
Member Author

baraline commented Jun 9, 2024

Any reason for the new base class direction? Do they other base classes not have much shared functionality?

Yeah, after doing some pen and paper for the other two classes, the base class for similarity search would be pretty much empty. For example, index search will actually do things differently during fit as it need to build a similarity model, while series-search in the most naive way (without computational optimisations like MP) is simply looping over a query search for all possible candidates. The computational optimisations do require some rethinking of the fit also compared to query search.
So if we consider the three submodule, at least for now, I don't see a use for BaseSimilaritySearch class. Still possible to refactor afterward if we find a good reason to.

@baraline
Copy link
Member Author

baraline commented Jun 9, 2024

On second thought, we could call the preprocess function of the CollectionEstimator for the 3D data given during fit method in a BaseSimilaritySearch, but that's about it ... Would that be better for structure’s sake's ?

@MatthewMiddlehurst
Copy link
Member

Couple of things to consider but ofc can change later as you say.

If there are a significant number of shared parameters/attributes/functions used then it may be a good idea to keep even if its mostly abstract methods.

Also there may be some situations where you want to use isinstance to cover all of them? Maybe not also, not thought about it that much 🙂.

@baraline
Copy link
Member Author

baraline commented Jun 9, 2024

If there are a significant number of shared parameters/attributes/functions used then it may be a good idea to keep even if its mostly abstract methods.

For some reason, I forgot to consider this ... I'm a bit out of touch today ! Swapping structure to add it back with the adjustment.

@baraline baraline marked this pull request as draft June 14, 2024 14:06
@baraline
Copy link
Member Author

Noticed some issue with the base class structure for the case of #1311, where the optimisation relies on lower bounding and not returning the distance profile fully computed, so I'll revamp it to be useable for all type of optimisations.

@baraline
Copy link
Member Author

baraline commented Jun 16, 2024

Sorry for the mess in this PR ... To summarize :

The previous BaseQuerySearch class was made to allow any matching condition on the output of the similarity search (e.g. top k matches, all matches below a distance threshold, ...). In practice, as there are only few plausible conditions: top k and/or threshold and worse-k and/or threshold, I just made a QuerySearch estimator with k, threshold and inverse_distance parameter that cover all these cases.

Additionally, there was the problem of computational optimisations that only compute part of the distance profiles (e.g. dtw lower bounding). These optimisations need to know the matching condition (top-k, ...) to work. The previous structure would not have allowed that, as distance profile computations were happening in the BaseQuerySearch class, which didn't have access to the matching condition.

@baraline baraline marked this pull request as ready for review June 16, 2024 18:24
Copy link
Member

@MatthewMiddlehurst MatthewMiddlehurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with this going in if you are happy with it @baraline. If any issues arise can be dealt with in a separate PR.

@baraline baraline merged commit 42e55c2 into main Jun 17, 2024
16 checks passed
@baraline baraline deleted the ag/similaritysearchupdate branch June 17, 2024 13:17
@MatthewMiddlehurst MatthewMiddlehurst added enhancement New feature, improvement request or other non-bug code enhancement and removed maintenance Continuous integration, unit testing & package distribution labels Jul 10, 2024
@MatthewMiddlehurst MatthewMiddlehurst changed the title [MNT] Update similarity search with new base classes : Query Search [ENH] Update similarity search with new base classes : Query Search Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codecov actions Run the codecov action on a PR enhancement New feature, improvement request or other non-bug code enhancement examples Example notebook related similarity search Similarity search package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MNT] Update similarity search with new base classes
2 participants