Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HEBO sampler supports Define-by-Run manner, maximization, parallelization, and constant_liar #196

Merged
merged 28 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0e8592b
Update sampler.py
eukaryo Dec 4, 2024
1e4f749
fix lint
eukaryo Dec 4, 2024
320872e
fix mypy
eukaryo Dec 4, 2024
120f968
Update README.md
eukaryo Dec 5, 2024
7855dce
fix lint
eukaryo Dec 5, 2024
e5ff63a
Update README.md
eukaryo Dec 5, 2024
ae5e5a6
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
f339b9b
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
0574576
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
cd4f8f9
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
02f146b
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
8d2cf0b
Update package/samplers/hebo/README.md
eukaryo Dec 8, 2024
2a2069b
Update package/samplers/hebo/README.md
eukaryo Dec 8, 2024
0aa860f
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
95800b5
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
810d701
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
39f2671
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
d9c00b2
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
a82754e
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
4e1a019
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
ee03025
Update package/samplers/hebo/sampler.py
eukaryo Dec 8, 2024
41ce715
Update sampler.py
eukaryo Dec 8, 2024
9803705
fix ruff
eukaryo Dec 8, 2024
a46a02e
Update README.md
eukaryo Dec 8, 2024
1de364d
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
2d52a31
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
34ec5f3
Update package/samplers/hebo/sampler.py
eukaryo Dec 10, 2024
bb3b9f8
fix lint
eukaryo Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 42 additions & 5 deletions package/samplers/hebo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,53 @@ cd HEBO/HEBO
pip install -e .
```

## APIs

- `HEBOSampler(*, search_space: dict[str, BaseDistribution] | None = None, seed: int | None = None, constant_liar: bool = False, independent_sampler: BaseSampler | None = None)`
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
- `search_space`: A search space required for Define-and-Run manner.
eukaryo marked this conversation as resolved.
Show resolved Hide resolved

Example:

```python
search_space = {
"x": optuna.distributions.FloatDistribution(-5, 5),
"y": optuna.distributions.FloatDistribution(-5, 5),
}
HEBOSampler(search_space=search_space)
```

- `seed`: Seed for random number generator.

- `constant_liar`: If `True`, penalize running trials to avoid suggesting parameter configurations nearby. Default is `False`.

- Note: Abnormally terminated trials often leave behind a record with a state of `RUNNING` in the storage. Such "zombie" trial parameters will be avoided by the constant liar algorithm during subsequent sampling. When using an `optuna.storages.RDBStorage`, it is possible to enable the `heartbeat_interval` to change the records for abnormally terminated trials to `FAIL`. (This note is quoted from [TPESampler](https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L215-L222).)
- Note: It is recommended to set this value to `True` during distributed optimization to avoid having multiple workers evaluating similar parameter configurations. In particular, if each objective function evaluation is costly and the durations of the running states are significant, and/or the number of workers is high. (This note is quoted from [TPESampler](https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L224-L229).)
- Note: HEBO algorithm involves multi-objective optimization of multiple acquisition functions. While `constant_liar` is a simple way to get diverse params for parallel optimization, it may not be the best approach for HEBO.

- `independent_sampler`: A `optuna.samplers.BaseSampler` instance that is used for independent sampling. The parameters not contained in the relative search space are sampled by this sampler. If `None` is specified, `optuna.samplers.RandomSampler` is used as the default.

## Example

```python
search_space = {
"x": FloatDistribution(-10, 10),
"y": IntDistribution(0, 10),
import optuna
import optunahub


}
sampler = HEBOSampler(search_space)
def objective(trial: optuna.trial.Trial) -> float:
x = trial.suggest_float("x", -10, 10)
y = trial.suggest_int("y", -10, 10)
return x**2 + y**2


module = optunahub.load_module("samplers/hebo")
sampler = module.HEBOSampler(search_space={
"x": optuna.distributions.FloatDistribution(-10, 10),
"y": optuna.distributions.IntDistribution(-10, 10),
})
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
study = optuna.create_study(sampler=sampler)
study.optimize(objective, n_trials=100)

print(study.best_trial.params, study.best_trial.value)
```

See [`example.py`](https://github.com/optuna/optunahub-registry/blob/main/package/samplers/hebo/example.py) for a full example.
Expand Down
160 changes: 155 additions & 5 deletions package/samplers/hebo/sampler.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
from __future__ import annotations

from collections.abc import Sequence
from typing import Any
import warnings

import numpy as np
import optuna
from optuna.distributions import BaseDistribution
from optuna.distributions import CategoricalDistribution
from optuna.distributions import FloatDistribution
from optuna.distributions import IntDistribution
from optuna.samplers import BaseSampler
from optuna.search_space import IntersectionSearchSpace
from optuna.study import Study
from optuna.study._study_direction import StudyDirection
from optuna.trial import FrozenTrial
from optuna.trial import TrialState
import optunahub
Expand All @@ -18,11 +24,70 @@


class HEBOSampler(optunahub.samplers.SimpleBaseSampler):
def __init__(self, search_space: dict[str, BaseDistribution]) -> None:
super().__init__(search_space)
self._hebo = HEBO(self._convert_to_hebo_design_space(search_space))
"""A sampler using `HEBO <https://github.com/huawei-noah/HEBO/tree/master/HEBO>__` as the backend.

def sample_relative(
For further information about HEBO algorithm, please refer to the following papers:
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
- `Cowen-Rivers, Alexander I., et al. An Empirical Study of Assumptions in Bayesian Optimisation. arXiv preprint arXiv:2012.03826 (2021).<https://arxiv.org/abs/2012.03826>__`
eukaryo marked this conversation as resolved.
Show resolved Hide resolved

Args:
search_space:
A search space required for Define-and-Run manner. Default is :obj:`None`.
eukaryo marked this conversation as resolved.
Show resolved Hide resolved

seed:
A seed for ``HEBOSampler``. Default is :obj:`None`.

constant_liar:
If :obj:`True`, penalize running trials to avoid suggesting parameter configurations
nearby. Default is :obj:`False`.

.. note::
Abnormally terminated trials often leave behind a record with a state of
``RUNNING`` in the storage.
Such "zombie" trial parameters will be avoided by the constant liar algorithm
during subsequent sampling.
When using an :class:`~optuna.storages.RDBStorage`, it is possible to enable the
``heartbeat_interval`` to change the records for abnormally terminated trials to
``FAIL``.
(This note is quoted from `TPESampler <https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L215-L222>__`.)

.. note::
It is recommended to set this value to :obj:`True` during distributed
optimization to avoid having multiple workers evaluating similar parameter
configurations. In particular, if each objective function evaluation is costly
and the durations of the running states are significant, and/or the number of
workers is high.
(This note is quoted from `TPESampler <https://github.com/optuna/optuna/blob/v4.1.0/optuna/samplers/_tpe/sampler.py#L224-L229>__`.)

.. note::
HEBO algorithm involves multi-objective optimization of multiple acquisition functions.
While `constant_liar` is a simple way to get diverse params for parallel optimization,
it may not be the best approach for HEBO.

independent_sampler:
A :class:`~optuna.samplers.BaseSampler` instance that is used for independent
sampling. The parameters not contained in the relative search space are sampled
by this sampler. If :obj:`None` is specified, :class:`~optuna.samplers.RandomSampler`
is used as the default.
""" # NOQA

def __init__(
self,
search_space: dict[str, BaseDistribution] | None = None,
seed: int | None = None,
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
constant_liar: bool = False,
independent_sampler: BaseSampler | None = None,
) -> None:
super().__init__(search_space, seed)
if search_space is not None and constant_liar is False:
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
self._hebo = HEBO(self._convert_to_hebo_design_space(search_space), scramble_seed=seed)
else:
self._hebo = None
self._intersection_search_space = IntersectionSearchSpace()
self._independent_sampler = independent_sampler or optuna.samplers.RandomSampler(seed=seed)
self._is_independent_sampler_specified = independent_sampler is not None
self._constant_liar = constant_liar
eukaryo marked this conversation as resolved.
Show resolved Hide resolved

def _sample_relative_define_and_run(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
params_pd = self._hebo.suggest()
Expand All @@ -32,14 +97,76 @@ def sample_relative(
params[name] = params_pd[name].to_numpy()[0]
return params

def _sample_relative_stateless(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
if self._constant_liar:
target_states = [TrialState.COMPLETE, TrialState.RUNNING]
else:
target_states = [TrialState.COMPLETE]

trials = study.get_trials(deepcopy=False, states=target_states)
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
if len([t for t in trials if t.state == TrialState.COMPLETE]) < 1:
# note: The backend HEBO implementation use Sobol sampling here.
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
# This sampler does not call `hebo.suggest()` here because
# Optuna needs to know search space by running the first trial in Define-by-Run.
return {}

# Assume that the back-end HEBO implementation aims to minimize.
if study.direction == StudyDirection.MINIMIZE:
worst_values = max([t.values for t in trials if t.state == TrialState.COMPLETE])
else:
worst_values = min([t.values for t in trials if t.state == TrialState.COMPLETE])
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
sign = 1 if study.direction == StudyDirection.MINIMIZE else -1

hebo = HEBO(self._convert_to_hebo_design_space(search_space), scramble_seed=self._seed)
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
for t in trials:
if t.state == TrialState.COMPLETE:
hebo_params = {name: t.params[name] for name in search_space.keys()}
hebo.observe(
pd.DataFrame([hebo_params]),
np.asarray([x * sign for x in t.values]),
)
elif t.state == TrialState.RUNNING:
try:
hebo_params = {name: t.params[name] for name in search_space.keys()}
except: # NOQA
# There are one or more params which are not suggested yet.
continue
# If `constant_liar == True`, assume that the RUNNING params result in bad values,
# thus preventing the simultaneous suggestion of (almost) the same params
# during parallel execution.
hebo.observe(pd.DataFrame([hebo_params]), np.asarray([worst_values]))
else:
assert False
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
params_pd = hebo.suggest()
params = {}
for name in search_space.keys():
params[name] = params_pd[name].to_numpy()[0]
return params
eukaryo marked this conversation as resolved.
Show resolved Hide resolved

def sample_relative(
self, study: Study, trial: FrozenTrial, search_space: dict[str, BaseDistribution]
) -> dict[str, float]:
if study._is_multi_objective():
raise ValueError("This function does not support multi-objective optimization study.")
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
if self._hebo is None or self._constant_liar is True:
return self._sample_relative_stateless(study, trial, search_space)
else:
return self._sample_relative_define_and_run(study, trial, search_space)

def after_trial(
self,
study: Study,
trial: FrozenTrial,
state: TrialState,
values: Sequence[float] | None,
) -> None:
self._hebo.observe(pd.DataFrame([trial.params]), np.asarray([values]))
if self._hebo is not None and values is not None:
# Assume that the back-end HEBO implementation aims to minimize.
if study.direction == StudyDirection.MAXIMIZE:
values = [-x for x in values]
self._hebo.observe(pd.DataFrame([trial.params]), np.asarray([values]))

def _convert_to_hebo_design_space(
self, search_space: dict[str, BaseDistribution]
Expand Down Expand Up @@ -103,3 +230,26 @@ def _convert_to_hebo_design_space(
else:
raise NotImplementedError(f"Unsupported distribution: {distribution}")
return DesignSpace().parse(design_space)

def infer_relative_search_space(
self, study: Study, trial: FrozenTrial
) -> dict[str, BaseDistribution]:
return optuna.search_space.intersection_search_space(
study._get_trials(deepcopy=False, use_cache=True)
)

def sample_independent(
self,
study: Study,
trial: FrozenTrial,
param_name: str,
param_distribution: BaseDistribution,
) -> Any:
if not self._is_independent_sampler_specified:
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
warnings.warn(
"`HEBOSampler` falls back to `RandomSampler` due to dynamic search space. Is this intended?"
eukaryo marked this conversation as resolved.
Show resolved Hide resolved
)

return self._independent_sampler.sample_independent(
study, trial, param_name, param_distribution
)
Loading