Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi fidelity is not "multi" for some fidelity boundaries #138

Open
AwePhD opened this issue Aug 22, 2024 · 8 comments
Open

Multi fidelity is not "multi" for some fidelity boundaries #138

AwePhD opened this issue Aug 22, 2024 · 8 comments

Comments

@AwePhD
Copy link

AwePhD commented Aug 22, 2024

Hi,

The multi-fidelity by PriorBand is not applied correctly, probably because I did not configure the Optimizer. I would like to know if it's normal or I misunderstood something about the multi-fidelity setup in NePS.

Here a Python script that reproduces the behaviour with neps 0.12.2:

from pathlib import Path
from neps import IntegerParameter, FloatParameter, run

MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = ..., ..., ...

pipeline_space = {
    "epoch": IntegerParameter(lower=MIN_EPOCH, upper=MAX_EPOCH, is_fidelity=True),
    "p1": IntegerParameter(lower=5, upper=15, default=10),
    "p2": FloatParameter(lower=.5, upper=5, default=3),
}


def run_pipeline(epoch, p1, p2) -> dict | float:
    loss = (p1 + p2) / epoch

    return {"loss": loss, "cost": epoch}


def main():
    run(
        run_pipeline=run_pipeline,
        pipeline_space=pipeline_space,
        root_directory=Path("/tmp/debug_neps_fidelity"),
        max_cost_total=TOTAL_BUDGET_EPOCHS,
    )


if __name__ == '__main__':
    main()

When I run with various values for the boundaries I have different numbers of fidelity (it's called rungs, right?):

# 9 configs, all with 200 epochs
MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = 80, 200, 2_000
# 11 configs and 2 configs reach 3 trials, 2 reach 2 trials and the rest have one trial 
MIN_EPOCH, MAX_EPOCH, TOTAL_BUDGET_EPOCHS = 1, 10, 40

The first example is important for debugging when I take a tiny subset of my dataset for debugging purpose. I need a lot of epochs to overfit the model (+300M parameters) on few samples, before 80 epochs the model is not learning at all. It seems that the fidelity starts at 80 is a problem? Is it an expected ((multi-))fidelity strategy? Is it because the ratio for computing the number of rungs is max_fidelity/min_fidelity? (I read that from PriorBand paper if I remember well).

Should I setup the eta parameter? By the way, I am happy about the second example, which represents an actual training on the whole dataset.

@AwePhD
Copy link
Author

AwePhD commented Aug 23, 2024

Another addition for this multi fidelity settings. If I do not use max_cost_total and use the simple max_evaluations_total instead, PriorBand only evaluate at the max fidelity (200).

from pathlib import Path
from neps import IntegerParameter, FloatParameter, run

WORKDIR_PATH = "/tmp/debug_neps_fidelity"

MIN_EPOCH, MAX_EPOCH = 80, 200
TOTAL_EVALUATIONS = 20

pipeline_space = {
    "epoch": IntegerParameter(lower=MIN_EPOCH, upper=MAX_EPOCH, is_fidelity=True),
    "p1": IntegerParameter(lower=5, upper=15, default=10),
    "p2": FloatParameter(lower=.5, upper=5, default=3),
}

def run_pipeline(epoch, p1, p2) -> dict | float:
    loss = (p1 + p2) / epoch

    return loss


def main():
    run(
        run_pipeline=run_pipeline,
        pipeline_space=pipeline_space,
        root_directory=Path(WORKDIR_PATH),
        max_evaluations_total=TOTAL_EVALUATIONS,
    )


if __name__ == '__main__':
    main()

Once again, I probably misunderstand something about the choice of the fidelity. Maybe this behavior is OK.

@eddiebergman
Copy link
Contributor

@Neeratyoy

@Neeratyoy
Copy link
Contributor

Hi @AwePhD
Sorry for the late response and thanks for the reproducible example

Looking at the example and what you described, the behaviour seen is expected
The short answer is that when MAX_BUDGET is 200, and ETA is 3 (by default), the first approximated fidelity is approx. 200 // 3 which is <MIN_BUDGET (=80) in your first case above. Hence, it defaults to just one rung, that is, the MAX_BUDGET.

You can use this function to approximately check what will happen with HyperBand budgets given your fidelity bounds:

import math

def check_budget_levels(MIN_EPOCH, MAX_EPOCH, ETA=3):
    _min = MAX_EPOCH
    counter = 0
    fid_level = math.ceil(math.log(MAX_EPOCH / MIN_EPOCH) / math.log(ETA))
    while _min >= MIN_EPOCH:
        print(f"Level: {fid_level} -> {_min}")
        _min = _min // ETA
        counter += 1
        fid_level -= 1
    return

Do you think it will be a better interface if we raise a warning and breakout if there is only one budget level available (like your case 1 with 80,200) and we cannot really run HyperBand (multi-fidelity)?

@AwePhD
Copy link
Author

AwePhD commented Sep 9, 2024

Hi,

No problem for the delay at all.

Thanks for the function. I come from Vision/Language Deep Learning and I have little familiarity with HPO research, I read some papers though. I think it might be good to add in the documentation what are the value of the rungs based on ETA for multi-fidelity optimizer. Namely, what are the values of the rungs based on min and max value of the multi-fidelity parameter.* Also, I think, the levels of fidelity should be logged somewhere in my opinion: in the console or in the .optimizer_info.yaml - I know there is the eta parameter inside it but, as you showed, it needs a bit of math to get the level of multi-fidelity.

Also, to answer more directly, when a multi-fidelity parameter is passed and there is only one level of fidelity, I think a warning should be logged. Actually, when the user sets a parameter for multi-fidelity then there might be some chances that the user does not want the number of rungs to be equal to 1. I do not think a breakout is necessary.

*: I know that this computation is written in hyperband paper and probably in other papers using multi-fidelity. But, in my opinion, a practical user of NePS, not coming from the HPO community, should not read a paper to use the code. But, it's up to you, maybe you reasonably wish that the users of NePS have some knowledge of the HPO litterature. Plus, maybe some other multi-fidelity methods (will) have a different way to compute the number of rungs. IMO for a practical HPO user, the first thing accessible should be the number of rungs and their values, then a place in the documentation to see how they are computed based on eta and the fidelity values (or anything else).

@eddiebergman
Copy link
Contributor

Dumping this information would definitely be useful and we should leave this open for sure.

@Neeratyoy
Copy link
Contributor

IMO for a practical HPO user, the first thing accessible should be the number of rungs and their values, then a place in the documentation to see how they are computed based on eta and the fidelity values (or anything else).

This is good feedback.
We do plan to have a more dedicated docs for the searchers available and that could perhaps be a guide for users too.

Shall we close this issue then?

@eddiebergman
Copy link
Contributor

Nay, keep it open unless you want to create a new specific issue for this

@Neeratyoy
Copy link
Contributor

a new specific issue for this

Hmm fair point. I would probably create a Feature request issue for editing docs and examples and some more utils functions.
Will leave this open until then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants