-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JAHSBench #19
JAHSBench #19
Changes from all commits
8758423
bb6e219
56411bc
da77c27
9689394
231e3fe
7dcfe0d
5de6e89
4a33011
1e24dcb
325b70e
c49948b
4ca4e0e
7de3156
84ef281
ddb2aa1
690d438
7284970
01875c6
1db221b
4a32c5f
183f184
79720fd
8a24ae5
195250b
e2bfd84
c52157b
375ab70
15a5e8b
5a376e3
bf3716d
519eb3f
8b4d5cb
737bef1
8e9e416
e5d642d
120dc7e
f9536f8
ea4868f
ba47d31
2c6c6a4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,109 @@ | ||
# JAHS-Bench-201 | ||
# JAHS Benchmark Suite | ||
|
||
* [JAHS-Bench-201 - Github](https://github.com/automl/jahs_bench_201) | ||
This module contains a DeepHyper wrapper for | ||
[JAHS-Bench-201](https://github.com/automl/jahs_bench_201). | ||
|
||
JAHSBench implements a random forest surrogate model, trained on real-world | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. JAHS is using XGBoost I think: https://openreview.net/pdf?id=_HLcjaVlqJ |
||
performance data for neural networks trained on three standard benchmark | ||
problems: | ||
- ``cifar10`` (default), | ||
- ``colorectal_history``, and | ||
- ``fashion_mnist``. | ||
|
||
Using these models as surrogates for the true performance, we can use this | ||
benchmark problem to study the performance of AutoML techniques on joint | ||
architecture-hyperparameter search tasks at minimal expense. | ||
|
||
The models allow us to tune 2 continuous training hyperparameters | ||
- ``LearningRate`` and | ||
- ``WeightDecay``, | ||
|
||
2 categorical training hyperparameters | ||
- ``Activation`` and | ||
- ``TrivialAugment``, | ||
|
||
and 5 categorical architecture parameters | ||
- ``Op{i}`` for ``i=0, ..., 4``. | ||
|
||
For DeepHyper's implementation, we have added 9th integer-valued parameter, | ||
which is the number of epochs trained | ||
- ``nepochs``. | ||
|
||
When run with the option ``wait=True``, ``JAHSBench`` will wait for an | ||
amount of time proportional to the ``runtime`` field returned by | ||
JAHS-Bench-201's surrogates. By default, this is 1% of the true runtime. | ||
|
||
The benchmark can be run to tune a single objective (``valid-acc``) or | ||
three objectives (``valid-acc``, ``latency``, and ``size_MB``). | ||
|
||
For further information, see: | ||
|
||
``` | ||
@inproceedings{NEURIPS2022_fd78f2f6, | ||
author = {Bansal, Archit and Stoll, Danny and Janowski, Maciej and Zela, Arber and Hutter, Frank}, | ||
booktitle = {Advances in Neural Information Processing Systems}, | ||
editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh}, | ||
pages = {38788--38802}, | ||
publisher = {Curran Associates, Inc.}, | ||
title = {JAHS-Bench-201: A Foundation For Research On Joint Architecture And Hyperparameter Search}, | ||
url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/fd78f2f65881c1c7ce47e26b040cf48f-Paper-Datasets_and_Benchmarks.pdf}, | ||
volume = {35}, | ||
year = {2022} | ||
} | ||
``` | ||
|
||
## Usage | ||
|
||
To use the benchmark follow this example set of instructions: | ||
|
||
```python | ||
|
||
import deephyper_benchmark as dhb | ||
|
||
# Install JAHS-bench-201 and fetch data | ||
dhb.install("JAHSBench") | ||
|
||
# Load JAHS-bench-201 | ||
dhb.load("JAHSBench") | ||
|
||
from deephyper_benchmark.lib.jahsbench import hpo | ||
|
||
# Example of running one evaluation of JAHSBench | ||
from deephyper.evaluator import RunningJob | ||
config = hpo.problem.jahs_obj.__sample__() # get a default config to test | ||
res = hpo.run(RunningJob(parameters=config)) | ||
|
||
``` | ||
|
||
Note that JAHS-Bench-201 uses XGBoost, which may not be compatible with older | ||
versions of MacOS. | ||
Additionally, the surrogate data has been pickled with an older version | ||
of scikit-learn and newer versions will fail to correctly load the surrogate | ||
models. | ||
|
||
For more information, see the following GitHub issues: | ||
- https://github.com/automl/jahs_bench_201/issues/6 | ||
- https://github.com/automl/jahs_bench_201/issues/18 | ||
|
||
## Evaluating Results | ||
|
||
To evaluate the results, the AutoML team recommends using the validation | ||
error for single-objective runs or the hypervolume metric over both | ||
validation error and evaluation latency for multiobjective-runs. | ||
See their | ||
[Evaluation Protocol](https://automl.github.io/jahs_bench_201/evaluation_protocol) | ||
for more details. | ||
|
||
For multiobjective runs, we recommend a reference point of | ||
``(val_acc = 0, latency=10, size_MB=100)``, as discussed in | ||
[this GitHub issue](https://github.com/automl/jahs_bench_201/issues/19). | ||
|
||
To evaluate hypervolume with this reference point, use our metrics | ||
|
||
```python | ||
|
||
from deephyper_benchmark.lib.jahsbench import metrics | ||
evaluator = metrics.PerformanceEvaluator() | ||
hv = evaluator.hypervolume(res) | ||
|
||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
jahs-bench | ||
xgboost |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
__version__ = "0.0.1" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
import os | ||
|
||
from deephyper_benchmark import * | ||
|
||
DIR = os.path.dirname(os.path.abspath(__file__)) | ||
|
||
|
||
class JAHS201Benchmark(Benchmark): | ||
|
||
version = "0.0.1" | ||
requires = { | ||
"py-pip-requirements": { | ||
"type": "pip", | ||
"name": "-r " + os.path.join(DIR, "REQUIREMENTS.txt"), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. using the more standard |
||
}, | ||
"bash-install": { | ||
"type": "cmd", | ||
"cmd": "cd . && bash " + os.path.join(DIR, "./install.sh"), | ||
}, | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
import os | ||
import numpy as np | ||
import time | ||
|
||
from deephyper.evaluator import profile, RunningJob | ||
from deephyper.problem import HpProblem | ||
from . import model | ||
|
||
|
||
# Read in whether to do single- or multi-objectives | ||
multiobj = int(os.environ.get("DEEPHYPER_BENCHMARK_MOO", 1)) | ||
|
||
# Create problem | ||
problem = HpProblem() | ||
jahs_obj = model.jahs_bench() | ||
# 2 continuous hyperparameters | ||
problem.add_hyperparameter((1.0e-3, 1.0), "LearningRate") | ||
problem.add_hyperparameter((1.0e-5, 1.0e-3), "WeightDecay") | ||
# 2 categorical hyperparameters | ||
problem.add_hyperparameter(["ReLU", "Hardswish", "Mish"], "Activation") | ||
problem.add_hyperparameter(["on", "off"], "TrivialAugment") | ||
# 6 categorical architecture design variables | ||
for i in range(1, 7): | ||
problem.add_hyperparameter([0, 1, 2, 3, 4], f"Op{i}") | ||
# 1 integer hyperparameter number of training epochs (1 to 200) | ||
problem.add_hyperparameter((1, 200), "nepochs") | ||
|
||
@profile | ||
def run(job: RunningJob, sleep=False, sleep_scale=0.01) -> dict: | ||
|
||
config = job.parameters | ||
result = jahs_obj(config) | ||
|
||
if sleep: | ||
t_sleep = result["runtime"] * sleep_scale | ||
time.sleep(t_sleep) | ||
|
||
dh_data = {} | ||
dh_data["metadata"] = result | ||
if multiobj: | ||
dh_data["objective"] = [ | ||
result["valid-acc"], | ||
-result["latency"], | ||
-result['size_MB'] | ||
] | ||
else: | ||
dh_data["objective"] = result["valid-acc"] | ||
return dh_data | ||
|
||
|
||
if __name__ == "__main__": | ||
print(problem) | ||
default_config = problem.default_configuration | ||
print(f"{default_config=}") | ||
result = run(RunningJob(parameters=default_config)) | ||
print(f"{result=}") |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
python -m jahs_bench.download --target surrogates |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
import os | ||
import numpy as np | ||
from deephyper.skopt.moo import pareto_front, hypervolume | ||
|
||
|
||
class PerformanceEvaluator: | ||
""" A class defining performance evaluators for JAHS Bench 201 problems. | ||
|
||
Contains the following public methods: | ||
|
||
* `__init__()` constructs a new instance by reading the problem defn | ||
from environment variables, | ||
* `hypervolume(pts)` calculates the total hypervolume dominated by | ||
the current solution, using the Nadir point as the reference point | ||
and filtering out solutions that do not dominate the Nadir point, | ||
* `nadirPt()` calculates the Nadir point for the current problem, | ||
* `numPts(pts)` calculates the number of solution points that dominate | ||
the Nadir point, and | ||
|
||
""" | ||
|
||
def __init__(self, p_name="fashion_mnist"): | ||
""" Read the current DTLZ problem defn from environment vars. """ | ||
|
||
self.p_name = p_name | ||
multiobj = int(os.environ.get("DEEPHYPER_BENCHMARK_MOO", 1)) | ||
if multiobj: | ||
self.nobjs = 3 | ||
else: | ||
self.nobjs = 1 | ||
|
||
def hypervolume(self, pts): | ||
""" Calculate the hypervolume dominated by soln, wrt the Nadir point. | ||
|
||
Args: | ||
pts (numpy.ndarray): A 2d array of objective values. | ||
Each row is an objective value in the solution set. | ||
|
||
Returns: | ||
float: The total hypervolume dominated by the current solution, | ||
filtering out points worse than the Nadir point and using the | ||
Nadir point as the reference. | ||
|
||
""" | ||
|
||
if self.nobjs < 2: | ||
raise ValueError("Cannot calculate hypervolume for 1 objective") | ||
if pts.size > 0 and pts[0, 0] > 0: | ||
filtered_pts = -pts.copy() | ||
else: | ||
filtered_pts = pts.copy() | ||
nadir = self.nadirPt() | ||
for i in range(pts.shape[0]): | ||
if np.any(filtered_pts[i, :] > nadir): | ||
filtered_pts[i, :] = nadir | ||
return hypervolume(filtered_pts, nadir) | ||
|
||
def nadirPt(self): | ||
""" Calculate the Nadir point for the given problem definition. """ | ||
|
||
if self.p_name in ["cifar10", "colorectal_history", "fashion_mnist"]: | ||
nadir = np.ones(self.nobjs) | ||
nadir[0] = 0 | ||
if self.nobjs > 1: | ||
nadir[1] = 10.0 | ||
nadir[2] = 100.0 | ||
return nadir | ||
else: | ||
raise ValueError(f"{self.p_name} is not a valid problem") | ||
|
||
def numPts(self, pts): | ||
""" Calculate the number of solutions that dominate the Nadir point. | ||
|
||
Args: | ||
pts (numpy.ndarra): A 2d array of objective values. | ||
Each row is an objective value in the solution set. | ||
|
||
Returns: | ||
int: The number of fi in pts such that all(fi < self.nadirPt). | ||
|
||
""" | ||
|
||
if np.any(pts < 0): | ||
pareto_pts = pareto_front(-pts) | ||
else: | ||
pareto_pts = pareto_front(pts) | ||
return sum([all(fi <= self.nadirPt()) for fi in pareto_pts]) | ||
|
||
|
||
if __name__ == "__main__": | ||
""" Driver code to test performance metrics. """ | ||
|
||
result = np.array([[80, -8, -10], [90, -9, -90], [10, -9.1, -99], [99.0, -1.0, -200.0]]) | ||
|
||
evaluator = PerformanceEvaluator() | ||
|
||
assert abs(evaluator.hypervolume(result) - 14500) < 1.0e-8 | ||
assert evaluator.numPts(result) == 2 | ||
assert np.all(np.abs(evaluator.nadirPt() - np.array([0, 10, 100])) | ||
< 1.0e-8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For each benchmark documentation we need to include at least the following sections