Multi fidelity is not "multi" for some fidelity boundaries #138

AwePhD · 2024-08-22T15:12:27Z

Hi,

The multi-fidelity by PriorBand is not applied correctly, probably because I did not configure the Optimizer. I would like to know if it's normal or I misunderstood something about the multi-fidelity setup in NePS.

Here a Python script that reproduces the behaviour with neps 0.12.2:

from pathlib import Path
from neps import IntegerParameter, FloatParameter, run

MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = ..., ..., ...

pipeline_space = {
    "epoch": IntegerParameter(lower=MIN_EPOCH, upper=MAX_EPOCH, is_fidelity=True),
    "p1": IntegerParameter(lower=5, upper=15, default=10),
    "p2": FloatParameter(lower=.5, upper=5, default=3),
}


def run_pipeline(epoch, p1, p2) -> dict | float:
    loss = (p1 + p2) / epoch

    return {"loss": loss, "cost": epoch}


def main():
    run(
        run_pipeline=run_pipeline,
        pipeline_space=pipeline_space,
        root_directory=Path("/tmp/debug_neps_fidelity"),
        max_cost_total=TOTAL_BUDGET_EPOCHS,
    )


if __name__ == '__main__':
    main()

When I run with various values for the boundaries I have different numbers of fidelity (it's called rungs, right?):

# 9 configs, all with 200 epochs
MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = 80, 200, 2_000
# 11 configs and 2 configs reach 3 trials, 2 reach 2 trials and the rest have one trial 
MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = 1, 10, 40

The first example is important for debugging when I take a tiny subset of my dataset for debugging purpose. I need a lot of epochs to overfit the model (+300M parameters) on few samples, before 80 epochs the model is not learning at all. It seems that the fidelity starts at 80 is a problem? Is it an expected ((multi-))fidelity strategy? Is it because the ratio for computing the number of rungs is max_fidelity/min_fidelity? (I read that from PriorBand paper if I remember well).

Should I setup the eta parameter? By the way, I am happy about the second example, which represents an actual training on the whole dataset.

The text was updated successfully, but these errors were encountered:

AwePhD · 2024-08-23T13:39:00Z

Another addition for this multi fidelity settings. If I do not use max_cost_total and use the simple max_evaluations_total instead, PriorBand only evaluate at the max fidelity (200).

from pathlib import Path
from neps import IntegerParameter, FloatParameter, run

WORKDIR_PATH = "/tmp/debug_neps_fidelity"

MIN_EPOCH, MAX_EPOCH = 80, 200
TOTAL_EVALUATIONS = 20

pipeline_space = {
    "epoch": IntegerParameter(lower=MIN_EPOCH, upper=MAX_EPOCH, is_fidelity=True),
    "p1": IntegerParameter(lower=5, upper=15, default=10),
    "p2": FloatParameter(lower=.5, upper=5, default=3),
}

def run_pipeline(epoch, p1, p2) -> dict | float:
    loss = (p1 + p2) / epoch

    return loss


def main():
    run(
        run_pipeline=run_pipeline,
        pipeline_space=pipeline_space,
        root_directory=Path(WORKDIR_PATH),
        max_evaluations_total=TOTAL_EVALUATIONS,
    )


if __name__ == '__main__':
    main()

Once again, I probably misunderstand something about the choice of the fidelity. Maybe this behavior is OK.

eddiebergman · 2024-08-30T17:12:38Z

@Neeratyoy

Neeratyoy · 2024-09-07T07:58:49Z

Hi @AwePhD
Sorry for the late response and thanks for the reproducible example

Looking at the example and what you described, the behaviour seen is expected
The short answer is that when MAX_BUDGET is 200, and ETA is 3 (by default), the first approximated fidelity is approx. 200 // 3 which is <MIN_BUDGET (=80) in your first case above. Hence, it defaults to just one rung, that is, the MAX_BUDGET.

You can use this function to approximately check what will happen with HyperBand budgets given your fidelity bounds:

import math

def check_budget_levels(MIN_EPOCH, MAX_EPOCH, ETA=3):
    _min = MAX_EPOCH
    counter = 0
    fid_level = math.ceil(math.log(MAX_EPOCH / MIN_EPOCH) / math.log(ETA))
    while _min >= MIN_EPOCH:
        print(f"Level: {fid_level} -> {_min}")
        _min = _min // ETA
        counter += 1
        fid_level -= 1
    return

Do you think it will be a better interface if we raise a warning and breakout if there is only one budget level available (like your case 1 with 80,200) and we cannot really run HyperBand (multi-fidelity)?

AwePhD · 2024-09-09T11:46:22Z

Hi,

No problem for the delay at all.

Thanks for the function. I come from Vision/Language Deep Learning and I have little familiarity with HPO research, I read some papers though. I think it might be good to add in the documentation what are the value of the rungs based on ETA for multi-fidelity optimizer. Namely, what are the values of the rungs based on min and max value of the multi-fidelity parameter.* Also, I think, the levels of fidelity should be logged somewhere in my opinion: in the console or in the .optimizer_info.yaml - I know there is the eta parameter inside it but, as you showed, it needs a bit of math to get the level of multi-fidelity.

Also, to answer more directly, when a multi-fidelity parameter is passed and there is only one level of fidelity, I think a warning should be logged. Actually, when the user sets a parameter for multi-fidelity then there might be some chances that the user does not want the number of rungs to be equal to 1. I do not think a breakout is necessary.

*: I know that this computation is written in hyperband paper and probably in other papers using multi-fidelity. But, in my opinion, a practical user of NePS, not coming from the HPO community, should not read a paper to use the code. But, it's up to you, maybe you reasonably wish that the users of NePS have some knowledge of the HPO litterature. Plus, maybe some other multi-fidelity methods (will) have a different way to compute the number of rungs. IMO for a practical HPO user, the first thing accessible should be the number of rungs and their values, then a place in the documentation to see how they are computed based on eta and the fidelity values (or anything else).

eddiebergman · 2024-09-16T06:49:58Z

Dumping this information would definitely be useful and we should leave this open for sure.

Neeratyoy · 2024-09-16T10:07:22Z

IMO for a practical HPO user, the first thing accessible should be the number of rungs and their values, then a place in the documentation to see how they are computed based on eta and the fidelity values (or anything else).

This is good feedback.
We do plan to have a more dedicated docs for the searchers available and that could perhaps be a guide for users too.

Shall we close this issue then?

eddiebergman · 2024-09-16T10:11:55Z

Nay, keep it open unless you want to create a new specific issue for this

Neeratyoy · 2024-09-17T14:05:21Z

a new specific issue for this

Hmm fair point. I would probably create a Feature request issue for editing docs and examples and some more utils functions.
Will leave this open until then.

github-project-automation bot added this to NePS Project Board Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi fidelity is not "multi" for some fidelity boundaries #138

Multi fidelity is not "multi" for some fidelity boundaries #138

AwePhD commented Aug 22, 2024

AwePhD commented Aug 23, 2024 •

edited

Loading

eddiebergman commented Aug 30, 2024

Neeratyoy commented Sep 7, 2024

AwePhD commented Sep 9, 2024

eddiebergman commented Sep 16, 2024

Neeratyoy commented Sep 16, 2024

eddiebergman commented Sep 16, 2024

Neeratyoy commented Sep 17, 2024

Multi fidelity is not "multi" for some fidelity boundaries #138

Multi fidelity is not "multi" for some fidelity boundaries #138

Comments

AwePhD commented Aug 22, 2024

AwePhD commented Aug 23, 2024 • edited Loading

eddiebergman commented Aug 30, 2024

Neeratyoy commented Sep 7, 2024

AwePhD commented Sep 9, 2024

eddiebergman commented Sep 16, 2024

Neeratyoy commented Sep 16, 2024

eddiebergman commented Sep 16, 2024

Neeratyoy commented Sep 17, 2024

AwePhD commented Aug 23, 2024 •

edited

Loading