Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Add individual estimator capabilities in the documentation #1430

Open
baraline opened this issue Apr 16, 2024 · 13 comments · May be fixed by #2468
Open

[DOC] Add individual estimator capabilities in the documentation #1430

baraline opened this issue Apr 16, 2024 · 13 comments · May be fixed by #2468
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@baraline
Copy link
Member

Describe the issue linked to the documentation

Following #1242 and #1426, we would like to add the estimator capabilities in the API documentation. While an automatic way to do this would be best, another option is to manually add a table in the docstring of each estimator.

One way to (manually) achieve the result would be to use the notes sections of the class docstring as follows:

Notes
-----
   +-----------------+----------------+
   | Capability      |    Support     |
   +=================+================+
   | multivariate    |       ✗        |
   +-----------------+----------------+
   | missing values  |       ✓        |
   +-----------------+----------------+
   | unequal length  |       ✓        |
   +-----------------+----------------+

The value of these tags can be determined by calling the estimator.get_class_tags() .

Suggest a potential alternative/fix

No response

@baraline baraline added the documentation Improvements or additions to documentation label Apr 16, 2024
@TonyBagnall TonyBagnall added the good first issue Good for newcomers label Jun 8, 2024
@inclinedadarsh
Copy link

inclinedadarsh commented Dec 18, 2024

Hello @baraline,
this seems to be a pretty good issue to start with, I'd like to attempt it.

As much as I have understood, there are a lot of estimators in the package (coming from estimator overview page)

Taking BOSSEnsemble as an example, I'll have to add the following lines in this file in the NOTES section:

   +-----------------+----------------+
   | Capability      |    Support     |
   +=================+================+
   | missing values  |       ✗        |
   +-----------------+----------------+
   | multithreading  |       ✓        |
   +-----------------+----------------+
   | univariate      |       ✓        |
   +-----------------+----------------+
   | multivariate    |       ✗        |
   +-----------------+----------------+
   | unequal length  |       ✗        |
   +-----------------+----------------+
   | train estimat   |       ✓        |
   +-----------------+----------------+
   | contractable    |       ✗        |
   +-----------------+----------------+

Please let me if I'm missing something

Thank you!

@baraline
Copy link
Member Author

baraline commented Dec 18, 2024

Hi @inclinedadarsh, this is the idea, yes. I think we could default to only adding the tags that are set to True to avoid bloating the notes sections.

Although the ideal solution would be to have this done through a script like for the estimator page in #1426, as we have numerous estimators and their capabilities might change on rare occasions.

@SebastianSchmidl
Copy link
Member

If we add only the true tags, then a table is not necessary and a simple list will do.

For anomaly detectors, we already have a list of capabilities in the form of a table:

.. list-table:: Capabilities
:stub-columns: 1
* - Input data format
- univariate
* - Output data format
- anomaly scores
* - Learning Type
- unsupervised

This renders as seen here: https://www.aeon-toolkit.org/en/latest/api_reference/auto_generated/aeon.anomaly_detection.DWT_MLEAD.html#id4

This capabilities table does not include all kinds of tags and, thus, needs to be adapted or changed to the new format.

@inclinedadarsh
Copy link

Hi @inclinedadarsh, this is the idea, yes. I think we could default to only adding the tags that are set to True to avoid bloating the notes sections.

Although the ideal solution would be to have this done through a script like for the estimator page in #1426, as we have numerous estimators and their capabilities might change on rare occasions.

Okay @baraline
Here's what I understand --

I'll have to add a script in conf.py file just like someone did in #1426
The script should:

  1. Get all the estimators
  2. Get the capabilities of those estimators
  3. Add capabilities of the estimators in their respective files

Let me know if I'm going in the wrong direction.

@baraline
Copy link
Member Author

I'm not 100% sure how such a script would actually perform the insertion in the api docs files, but if we already have a format as the one presented by @SebastianSchmidl, we should follow this (e.g. the dwt_mlead example he gave) instead of the table I described in the original issue.

But yeah the script should loop through all estimators, get their _tags, create the table based on them, and insert it in the generated api doc files.

Note that i'm unsure if it is possible to do it like this, so you can explore a bit. But manually adding these tables for all estimators is kind of the last resort solution.

@inclinedadarsh
Copy link

Okay that works

I'll be experimenting a bit. Any other resource that aeon has used to achieve anything similar to this will be helpful.
Also, If we're only going to list the true tags, then we won't be needing a table. Will a simple list like this work?

**Capabilities**:
- Multi Threading
- Univariate
- Train Estimate

Which will render to:
Capabilities:

  • Multi Threading
  • Univariate
  • Train Estimate

If not this, then I'll be glad if @baraline @SebastianSchmidl can help me figure out what way we can present this.


Also @SebastianSchmidl I tried looking through the codebase to understand how that capabilities table in all the anomaly detection modules are added, but couldn't :(

Can you let me know if those tables are added manually or a script was used to do that?

@baraline
Copy link
Member Author

You can copy the list format used in Sebastian example. And they added it manually in the mentioned estimators I'm afraid.

@SebastianSchmidl
Copy link
Member

Yes, the capabilities-tables in the AD module were added manually. So, I don't have any other resources for adding this manually, I'm afraid.

@inclinedadarsh
Copy link

@baraline there are two ways we can write the script:

  1. The script will add the table in the actual documentation and then the html will be generated, as a side-effect the table will remain in all the documentations (& will also be pushed to github)

The problem with this is, how will the program understand if the table that already exists in the files is correct or not, to fix this we can create it such that everytime documentation is generated, it simply removes the table if it exists and adds a new one

  1. the script will first generate the html first, and then the table will be added

i would want it to happen this way, but I'm not sure if we can achieve that using Sphinx.

Please let me know how i should proceed

@inclinedadarsh
Copy link

Moreover, here's what im currently exploring:

I am writing a custom Sphinx, which basically --

  1. calls the all_estimators() function
  2. gets all the estimators
  3. crawls all the modules using walk_packages function from pkgutil (all_estimators function also uses this`)
  4. if the module name is similar to the estimators from all_estimators() function then get it's path and update the table there (by calling estimator.get_class_tags()

this is kinda slow because I'll be crawling through all the modules but this is the only way I can think of

something else I'll try after this is successful is that, i'll try to pass the file path of the estimator module directly from the all_estimators() function because it's already crawling all the modules once, we don't really need to do the same thing again

@baraline
Copy link
Member Author

Hey,

I think a good start is looking at the sphinx event callback graph here to find at which step we could insert the tables into the docs.

Having to modify the GitHub sources is not ideal, but we indeed have to deal with the case where the table is already present in the estimator documentation, as it is the case for some anomaly detector. This will likely be done through some regular expression on the doc string.

Concerning the rest, I think you got it right, use the all estimator function and loop through them, checking the tags, the builded doc, and making the adjustement/additions needed. So I think we should use a callback after "build", so we can browse (and modify) the generated HTML before it is displayed.

@inclinedadarsh
Copy link

Aha, got it!

Thanks for this @baraline, searched for the event callback we need and figured out that's html-page-context
Using it I'm able to edit & build the context sources before building without affecting the actual source files.

Now I just need to figure out how to get the NOTES section and put our table there. Most probably using regex.

Also, assigning this issue to myself as I'm working on it.

@inclinedadarsh
Copy link

@aeon-actions-bot assign @inclinedadarsh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants