Unnecessary level of abstraction #45

vincent-laurent · 2024-04-29T20:04:56Z

The ModelSelector API seems redundant with the Optimizer class :

Something like this would be more suitable and simple:

engine = FlamlOptimizer(**parameters)
engine.fit(project)

For consistency of naming we should have also

model = engine.best_model_
evaluator = ModelEvaluator(model=model)

Rather than the confusing usage of estimator/model

Related issue40

The text was updated successfully, but these errors were encountered:

mancellin · 2024-05-03T13:20:20Z

I tried to strip out most of the abstraction and got the code below.
As long as we have a single optimizer, I'm not it make sense to factor out more in the AbstractOptimizer class.

Will try to include in the code and run the test suite.

from flaml import AutoML
from datetime import datetime
from palma.utils.utils import get_hash
from palma import logger

import logging
from contextlib import contextmanager

@contextmanager
def disable_logging():
    logging.disable()
    yield
    logging.basicConfig(level=logging.DEBUG)


class AbstractOptimizer:
    def __init__(self, **parameters):
        self.date = datetime.now()
        self.run_id = get_hash(date=self.date)
        self.parameters = parameters


class FlamlOptimizer(AbstractOptimizer):
    def __init__(self, **parameters):
        super().__init__(**parameters)
        self.optimizer = AutoML(**parameters)

    def find_best_model(self, project):
        X = project.X.iloc[project.validation_strategy.train_index]
        y = project.y.iloc[project.validation_strategy.train_index]
        splitter = project.validation_strategy
        split_type = None if splitter is None else splitter.splitter
        groups = None if splitter is None else splitter.groups
        groups = groups if groups is None else groups[splitter.train_index]
        with disable_logging():
            self.optimizer.fit(
                    X_train=pd.DataFrame(X.values, index=range(len(X))),
                    y_train=pd.Series(y.values, index=range(len(X))),
                    split_type=split_type, groups=groups,
                    mlflow_logging=False,
                    task=project.problem
                    )
        best_model = self.optimizer.model.model
        logger.logger.log_artifact(best_model, self.run_id)
        return best_model

optim = FlamlOptimizer(time_budget=2)
model = optim.find_best_model(project)

vincent-laurent pinned this issue Apr 29, 2024

vincent-laurent assigned vincent-laurent and mancellin Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unnecessary level of abstraction #45

Unnecessary level of abstraction #45

vincent-laurent commented Apr 29, 2024 •

edited

Loading

mancellin commented May 3, 2024

Unnecessary level of abstraction #45

Unnecessary level of abstraction #45

Comments

vincent-laurent commented Apr 29, 2024 • edited Loading

mancellin commented May 3, 2024

vincent-laurent commented Apr 29, 2024 •

edited

Loading