Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SMAC sampler #170

Merged
merged 6 commits into from
Nov 8, 2024
Merged

Add SMAC sampler #170

merged 6 commits into from
Nov 8, 2024

Conversation

dengdifan
Copy link
Contributor

@dengdifan dengdifan commented Nov 1, 2024

Contributor Agreements

Please read the contributor agreements and if you agree, please click the checkbox below.

  • I agree to the contributor agreements.

Tip

Please follow the Quick TODO list to smoothly merge your PR.

Motivation

Add the SMAC3 sampler.

Description of the changes

Add SMAC sampler. This allows users to partially customize a SMAC optimizer with its hyperparameters, such as the types of surrogate models, acquisition functions, and initial designs or some important hyper-hyperparameters

TODO List towards PR Merge

Please remove this section if this PR is not an addition of a new package.
Otherwise, please check the following TODO list:

  • Copy ./template/ to create your package
  • Replace <COPYRIGHT HOLDER> in LICENSE of your package with your name
  • Fill out README.md in your package
  • Add import statements of your function or class names to be used in __init__.py
  • (Optional) Add from __future__ import annotations at the head of any Python files that include typing to support older Python versions
  • Apply the formatter based on the tips in README.md
  • Check whether your module works as intended based on the tips in README.md

@HideakiImamura
Copy link
Member

@nabenabe0928 Could you review this PR?

Copy link
Contributor

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!
I left some minor comments and we can merge this PR after you check these points!

@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Difan Deng
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] Do you wanna make it more general?

For example, here it says the license holder is ml4aad, but maybe do you wanna also follow this?
https://github.com/automl/SMAC3/blob/main/LICENSE.txt#L7C40-L7C62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I am not quite sure for this. I was about to have the same license owner as the one in SMAC, but ml4aad was renamed as Automl.org then and now smac is maintained by the AutoML Hannover SMAC teams, should I change that to ml4add or our SMAC team?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you don't have to!
It was just a question, so if you don't mind, it is totally fine:)

@@ -0,0 +1,44 @@
---
author: Difan Deng
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, maybe you want to change it.

package/samplers/smac_sampler/README.md Outdated Show resolved Hide resolved
package/samplers/smac_sampler/example.py Outdated Show resolved Hide resolved
package/samplers/smac_sampler/sampler.py Outdated Show resolved Hide resolved
def __init__(
self,
search_space: dict[str, BaseDistribution],
n_trials: int = 100, # This is required for initial design
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] What happens if this variable is incorrectly given?
In my environment, it seems to work without any problem even if n_trials is incompatible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value determines the number of initial value: https://github.com/automl/SMAC3/blob/main/smac/initial_design/abstract_initial_design.py#L40

In SMAC, we set the maximal number of initial design as min(max_ration * n_trials, n_hps_per_dim * n_dim).
So if the value is larger than the value that we actually need, we might waste too many trials on the initial designs instead of BO iterations. On the opposite, if we set this value too small, then we might not have enough initial designs for early exploration.

package/samplers/smac_sampler/sampler.py Outdated Show resolved Hide resolved
package/samplers/smac_sampler/sampler.py Outdated Show resolved Hide resolved
package/samplers/smac_sampler/sampler.py Outdated Show resolved Hide resolved
@nabenabe0928
Copy link
Contributor

Important

PLEASE NOTE THAT THIS COMMENT DOES NOT HAVE TO BE ADDRESSED TO MERGE THIS PR!!!
If necessary, let's work on this followup in another PR!

Just as a followup comment:
If we would like to support the exact distributed optimization feature, we probably need to tell the results obtained by the other workers.
For example, when we have two workers (let's say Worker A and Worker B), the results by Worker B will not be reported to Worker A, leading to the loss of information.
To avoid this problem, we can have a dict attribute like self._reported_trial_numbers so that we know which trial was reported to each worker.

@dengdifan
Copy link
Contributor Author

Important

PLEASE NOTE THAT THIS COMMENT DOES NOT HAVE TO BE ADDRESSED TO MERGE THIS PR!!! If necessary, let's work on this followup in another PR!

Just as a followup comment: If we would like to support the exact distributed optimization feature, we probably need to tell the results obtained by the other workers. For example, when we have two workers (let's say Worker A and Worker B), the results by Worker B will not be reported to Worker A, leading to the loss of information. To avoid this problem, we can have a dict attribute like self._reported_trial_numbers so that we know which trial was reported to each worker.

Do you mean the case that we have two samplers? I am not quite sure if that will work with SMAC. SMAC assign a seed to each of the trial, the performance of two configurations will only be compared if their seeds are the same. So if we have another sampler that might assign a different seed to its trail, the running results of worker B might not be accepted by the worker A. But I need to dive deeper into that to check if it is really the case

Copy link
Contributor

@nabenabe0928 nabenabe0928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changes, LGTM!

@nabenabe0928 nabenabe0928 merged commit ab6767c into optuna:main Nov 8, 2024
4 checks passed
@nabenabe0928
Copy link
Contributor

nabenabe0928 commented Nov 8, 2024

Do you mean the case that we have two samplers? I am not quite sure if that will wo

Actually, we can add some external information to each trial, so we can do it, but as this part is a bit tricky, let me get you back with the info of how to do it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-package New packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants