Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HEBO sampler supports Define-by-Run manner, maximization, parallelization, and constant_liar #196

Merged
merged 28 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0e8592b
Update sampler.py
eukaryo Dec 4, 2024
1e4f749
fix lint
eukaryo Dec 4, 2024
320872e
fix mypy
eukaryo Dec 4, 2024
120f968
Update README.md
eukaryo Dec 5, 2024
7855dce
fix lint
eukaryo Dec 5, 2024
e5ff63a
Update README.md
eukaryo Dec 5, 2024
ae5e5a6
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
f339b9b
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
0574576
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
cd4f8f9
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
02f146b
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
8d2cf0b
Update package/samplers/hebo/README.md
eukaryo Dec 8, 2024
2a2069b
Update package/samplers/hebo/README.md
eukaryo Dec 8, 2024
0aa860f
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
95800b5
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
810d701
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
39f2671
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
d9c00b2
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
a82754e
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
4e1a019
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
ee03025
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
41ce715
Update sampler.py
eukaryo Dec 8, 2024
9803705
fix ruff
eukaryo Dec 8, 2024
a46a02e
Update README.md
eukaryo Dec 8, 2024
1de364d
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
2d52a31
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
34ec5f3
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
bb3b9f8
fix lint
eukaryo Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 43 additions & 5 deletions package/samplers/hebo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,54 @@ cd HEBO/HEBO
pip install -e .
```

## APIs

- `HEBOSampler(search_space: dict[str, BaseDistribution] | None = None, *, seed: int | None = None, constant_liar: bool = False, independent_sampler: BaseSampler | None = None)`
- `search_space`: By specifying search_space, the sampling speed at each iteration becomes slightly quicker, but this argument is not necessary to run this sampler.

Example:

```python
search_space = {
"x": optuna.distributions.FloatDistribution(-5, 5),
"y": optuna.distributions.FloatDistribution(-5, 5),
}
HEBOSampler(search_space=search_space)
```

- `seed`: Seed for random number generator.

- `constant_liar`: If `True`, penalize running trials to avoid suggesting parameter configurations nearby. Default is `False`.

- Note: Abnormally terminated trials often leave behind a record with a state of `RUNNING` in the storage. Such "zombie" trial parameters will be avoided by the constant liar algorithm during subsequent sampling. When using an `optuna.storages.RDBStorage`, it is possible to enable the `heartbeat_interval` to change the records for abnormally terminated trials to `FAIL`. (This note is quoted from [TPESampler](https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L215-L222).)
- Note: It is recommended to set this value to `True` during distributed optimization to avoid having multiple workers evaluating similar parameter configurations. In particular, if each objective function evaluation is costly and the durations of the running states are significant, and/or the number of workers is high. (This note is quoted from [TPESampler](https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L224-L229).)
- Note: HEBO algorithm involves multi-objective optimization of multiple acquisition functions. While `constant_liar` is a simple way to get diverse params for parallel optimization, it may not be the best approach for HEBO.

- `independent_sampler`: A `optuna.samplers.BaseSampler` instance that is used for independent sampling. The parameters not contained in the relative search space are sampled by this sampler. If `None` is specified, `optuna.samplers.RandomSampler` is used as the default.

## Example

```python
search_space = {
"x": FloatDistribution(-10, 10),
"y": IntDistribution(0, 10),
import optuna
import optunahub


}
sampler = HEBOSampler(search_space)
def objective(trial: optuna.trial.Trial) -> float:
x = trial.suggest_float("x", -10, 10)
y = trial.suggest_int("y", -10, 10)
return x**2 + y**2


module = optunahub.load_module("samplers/hebo")
sampler = module.HEBOSampler(search_space={
"x": optuna.distributions.FloatDistribution(-10, 10),
"y": optuna.distributions.IntDistribution(-10, 10),
})
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
# sampler = module.HEBOSampler() # Note: `search_space` is not required, and thus it works too.
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=100)

print(study.best_trial.params, study.best_trial.value)
```

See [`example.py`](https://github.com/optuna/optunahub-registry/blob/main/package/samplers/hebo/example.py) for a full example.
Expand Down
165 changes: 155 additions & 10 deletions package/samplers/hebo/sampler.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
from __future__ import annotations

from collections.abc import Sequence
from typing import Any
import warnings

import numpy as np
import optuna
from optuna.distributions import BaseDistribution
from optuna.distributions import CategoricalDistribution
from optuna.distributions import FloatDistribution
from optuna.distributions import IntDistribution
from optuna.samplers import BaseSampler
from optuna.search_space import IntersectionSearchSpace
from optuna.study import Study
from optuna.study._study_direction import StudyDirection
from optuna.trial import FrozenTrial
from optuna.trial import TrialState
import optunahub
Expand All @@ -18,19 +24,131 @@


class HEBOSampler(optunahub.samplers.SimpleBaseSampler):
def __init__(self, search_space: dict[str, BaseDistribution]) -> None:
super().__init__(search_space)
self._hebo = HEBO(self._convert_to_hebo_design_space(search_space))
"""A sampler using `HEBO <https://github.com/huawei-noah/HEBO/tree/master/HEBO>__` as the backend.

def sample_relative(
For further information about HEBO algorithm, please refer to the following paper:
- `HEBO Pushing The Limits of Sample-Efficient Hyperparameter Optimisation <https://arxiv.org/abs/2012.03826>__`

Args:
search_space:
By specifying search_space, the sampling speed at each iteration becomes slightly quicker, but this argument is not necessary to run this sampler. Default is :obj:`None`.

seed:
A seed for ``HEBOSampler``. Default is :obj:`None`.

constant_liar:
If :obj:`True`, penalize running trials to avoid suggesting parameter configurations
nearby. Default is :obj:`False`.

.. note::
Abnormally terminated trials often leave behind a record with a state of
``RUNNING`` in the storage.
Such "zombie" trial parameters will be avoided by the constant liar algorithm
during subsequent sampling.
When using an :class:`~optuna.storages.RDBStorage`, it is possible to enable the
``heartbeat_interval`` to change the records for abnormally terminated trials to
``FAIL``.
(This note is quoted from `TPESampler <https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L215-L222>__`.)

.. note::
It is recommended to set this value to :obj:`True` during distributed
optimization to avoid having multiple workers evaluating similar parameter
configurations. In particular, if each objective function evaluation is costly
and the durations of the running states are significant, and/or the number of
workers is high.
(This note is quoted from `TPESampler <https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L224-L229>__`.)

.. note::
HEBO algorithm involves multi-objective optimization of multiple acquisition functions.
While `constant_liar` is a simple way to get diverse params for parallel optimization,
it may not be the best approach for HEBO.

independent_sampler:
A :class:`~optuna.samplers.BaseSampler` instance that is used for independent
sampling. The parameters not contained in the relative search space are sampled
by this sampler. If :obj:`None` is specified, :class:`~optuna.samplers.RandomSampler`
is used as the default.
""" # NOQA

def __init__(
self,
search_space: dict[str, BaseDistribution] | None = None,
*,
seed: int | None = None,
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
constant_liar: bool = False,
independent_sampler: BaseSampler | None = None,
) -> None:
super().__init__(search_space, seed)
if search_space is not None and not constant_liar:
self._hebo = HEBO(self._convert_to_hebo_design_space(search_space), scramble_seed=seed)
else:
self._hebo = None
self._intersection_search_space = IntersectionSearchSpace()
self._independent_sampler = independent_sampler or optuna.samplers.RandomSampler(seed=seed)
self._is_independent_sample_necessary = False
self._constant_liar = constant_liar
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
self._rng = np.random.default_rng(seed)

def _sample_relative_define_and_run(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
params_pd = self._hebo.suggest()
return {
name: row.iloc[0]
for name, row in self._hebo.suggest().items()
if name in search_space.keys()
}

params = {}
for name in search_space.keys():
params[name] = params_pd[name].to_numpy()[0]
return params
def _sample_relative_stateless(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
if self._constant_liar:
target_states = [TrialState.COMPLETE, TrialState.RUNNING]
else:
target_states = [TrialState.COMPLETE]

use_cache = not self._constant_liar
trials = study._get_trials(deepcopy=False, states=target_states, use_cache=use_cache)
is_complete = np.array([t.state == TrialState.COMPLETE for t in trials])
if not np.any(is_complete):
# note: The backend HEBO implementation uses Sobol sampling here.
# This sampler does not call `hebo.suggest()` here because
# Optuna needs to know search space by running the first trial in Define-by-Run.
self._is_independent_sample_necessary = True
return {}
else:
self._is_independent_sample_necessary = False
trials = [t for t in trials if set(search_space.keys()) <= set(t.params.keys())]

# Assume that the back-end HEBO implementation aims to minimize.
values = np.array([t.value if t.state == TrialState.COMPLETE else np.nan for t in trials])
worst_value = (
np.nanmax(values) if study.direction == StudyDirection.MINIMIZE else np.nanmin(values)
)
sign = 1 if study.direction == StudyDirection.MINIMIZE else -1

seed = int(self._rng.integers(low=1, high=(1 << 31)))
hebo = HEBO(self._convert_to_hebo_design_space(search_space), scramble_seed=seed)
params = pd.DataFrame([t.params for t in trials])
values[np.isnan(values)] = worst_value
values *= sign
hebo.observe(params, values)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check if the shape is correct here (seems (1, N) is correct?)
Anyways, I will benchmark your code and merge this PR!

Copy link
Contributor Author

@eukaryo eukaryo Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! fyi: I confirmed that simply changing line134 to hebo.observe(params, values[None, :]) or hebo.observe(params, values[None, :].T) fails.

return {
name: row.iloc[0]
for name, row in hebo.suggest().items()
if name in search_space.keys()
}

def sample_relative(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
if study._is_multi_objective():
raise ValueError(
f"{self.__class__.__name__} has not supported multi-objective optimization."
)
if self._hebo is None or self._constant_liar is True:
return self._sample_relative_stateless(study, trial, search_space)
else:
return self._sample_relative_define_and_run(study, trial, search_space)

def after_trial(
self,
Expand All @@ -39,7 +157,11 @@ def after_trial(
state: TrialState,
values: Sequence[float] | None,
) -> None:
self._hebo.observe(pd.DataFrame([trial.params]), np.asarray([values]))
if self._hebo is not None and values is not None:
# Assume that the back-end HEBO implementation aims to minimize.
if study.direction == StudyDirection.MAXIMIZE:
values = [-x for x in values]
self._hebo.observe(pd.DataFrame([trial.params]), np.asarray([values]))

def _convert_to_hebo_design_space(
self, search_space: dict[str, BaseDistribution]
Expand Down Expand Up @@ -103,3 +225,26 @@ def _convert_to_hebo_design_space(
else:
raise NotImplementedError(f"Unsupported distribution: {distribution}")
return DesignSpace().parse(design_space)

def infer_relative_search_space(
self, study: Study, trial: FrozenTrial
) -> dict[str, BaseDistribution]:
return optuna.search_space.intersection_search_space(
study._get_trials(deepcopy=False, use_cache=True)
)

def sample_independent(
self,
study: Study,
trial: FrozenTrial,
param_name: str,
param_distribution: BaseDistribution,
) -> Any:
if not self._is_independent_sample_necessary:
warnings.warn(
"`HEBOSampler` falls back to `RandomSampler` due to dynamic search space."
)

return self._independent_sampler.sample_independent(
study, trial, param_name, param_distribution
)
Loading