Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary level of abstraction #45

Open
vincent-laurent opened this issue Apr 29, 2024 · 1 comment
Open

Unnecessary level of abstraction #45

vincent-laurent opened this issue Apr 29, 2024 · 1 comment
Assignees

Comments

@vincent-laurent
Copy link
Contributor

vincent-laurent commented Apr 29, 2024

The ModelSelector API seems redundant with the Optimizer class :

Something like this would be more suitable and simple:

engine = FlamlOptimizer(**parameters)
engine.fit(project)

For consistency of naming we should have also

model = engine.best_model_
evaluator = ModelEvaluator(model=model) 

Rather than the confusing usage of estimator/model

Related issue40

@mancellin
Copy link
Member

I tried to strip out most of the abstraction and got the code below.
As long as we have a single optimizer, I'm not it make sense to factor out more in the AbstractOptimizer class.

Will try to include in the code and run the test suite.

from flaml import AutoML
from datetime import datetime
from palma.utils.utils import get_hash
from palma import logger

import logging
from contextlib import contextmanager

@contextmanager
def disable_logging():
    logging.disable()
    yield
    logging.basicConfig(level=logging.DEBUG)


class AbstractOptimizer:
    def __init__(self, **parameters):
        self.date = datetime.now()
        self.run_id = get_hash(date=self.date)
        self.parameters = parameters


class FlamlOptimizer(AbstractOptimizer):
    def __init__(self, **parameters):
        super().__init__(**parameters)
        self.optimizer = AutoML(**parameters)

    def find_best_model(self, project):
        X = project.X.iloc[project.validation_strategy.train_index]
        y = project.y.iloc[project.validation_strategy.train_index]
        splitter = project.validation_strategy
        split_type = None if splitter is None else splitter.splitter
        groups = None if splitter is None else splitter.groups
        groups = groups if groups is None else groups[splitter.train_index]
        with disable_logging():
            self.optimizer.fit(
                    X_train=pd.DataFrame(X.values, index=range(len(X))),
                    y_train=pd.Series(y.values, index=range(len(X))),
                    split_type=split_type, groups=groups,
                    mlflow_logging=False,
                    task=project.problem
                    )
        best_model = self.optimizer.model.model
        logger.logger.log_artifact(best_model, self.run_id)
        return best_model

optim = FlamlOptimizer(time_budget=2)
model = optim.find_best_model(project)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants