Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple calls of .fit() has inconsistent behavior with scikit-learn #167

Open
hombit opened this issue Mar 14, 2024 · 6 comments
Open

Multiple calls of .fit() has inconsistent behavior with scikit-learn #167

hombit opened this issue Mar 14, 2024 · 6 comments

Comments

@hombit
Copy link
Member

hombit commented Mar 14, 2024

Currently, AADForest.fit() would do nothing when called second time, while with scikit-learn it would would cause model retraining. The same applies for fit_known() which just ignores data argument, even if it differs from data the model was previously trained with.

Here I propose to modify the Coniferest interface to add an additional method, .tune_known(known_data, known_labels). In this case:

  • .fit(data) would refit every time it is called dropping all the previous training
  • .fit_known(data, known_data, known_labels) would also refit
  • .tune_known(known_data, known_labels) would use the same "base" (isolation forest) model and tune it for labeled data. It would fail if called before .fit or .fit_known
@hombit hombit added this to the coniferest 0.1 release milestone Mar 14, 2024
@matwey
Copy link
Contributor

matwey commented Mar 14, 2024

Are this names (fit_known, tune_known) exist in sklearn?

@matwey
Copy link
Contributor

matwey commented Mar 14, 2024

Related to #113

@hombit
Copy link
Member Author

hombit commented Mar 14, 2024

Are this names (fit_known, tune_known) exist in sklearn?

No, but it would be weird if .fit and fit_known behavior would be different in this way

@matwey
Copy link
Contributor

matwey commented Mar 14, 2024

Then I would propose the following alternative since it seems that having three functions is redundant:

  • .fit(data, known_labels = None, known_data = None) would refit every time it is called dropping all the previous training.
  • .fit_known(known_labels, known_data = None) doesn't accept data and doesn't do refit.

Mind known_labels and known_data order. If known_data is missed then known_labels are associated with data itself.

@hombit
Copy link
Member Author

hombit commented Mar 14, 2024

I would try to be duck-consistent with scikit-learn, including .fit(X, y)

@matwey
Copy link
Contributor

matwey commented Mar 14, 2024

And .fit(data, known_labels) is mostly the same as .fit(X, y).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants