-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate neighbors and sample configurations correctly with deeply nested conditions #197
Conversation
One-exchange neighborhood iterator rigorously checks some of the generated configurations. The check is performed with the probability 5 %, as decided by `np.random`. Make the check deterministic (using an explicitly seeded `np.random.RandomState`) instead of relying on the default numpy random state, which need not be seeded.
of `get_one_exchange_neighborhood`. Update the hard-coded expected values.
The PCS contains a conditional dependency with the operator "||", which is interpreted as `OrConjunction`.
An optimization declares a hyperparameter (HP) inactive as soon as any of the HPs parents according to a condition is inactive. This logic is incorrect in case the condition is an `OrConjunction`. This commit ensures that the optimization is not applied to `OrConjunction`s.
Codecov Report
@@ Coverage Diff @@
## master #197 +/- ##
==========================================
- Coverage 67.15% 66.96% -0.20%
==========================================
Files 17 17
Lines 1629 1662 +33
==========================================
+ Hits 1094 1113 +19
- Misses 535 549 +14
Continue to review full report at Codecov.
|
When searching the hierarchy of hyperparameters (HPs), every time a HP is activated or deactivated, its children need to be re-considered for activation or deactivation. If there is no cycle in the condition graph, this process will terminate and converge. Before this commit, each HP was only considered at most once. This suffices if there is no `OrConjunction` condition: as soon as one of the parents of a HP is inactive, the HP is necessarily inactive.
This solution resolves the constraints in an arbitrary order, possibly updating the status of a HP more than once. My guess is that following a topological order of the HPs (which seems to be respected by |
Since the parents may be nested in a hierarchy of OrConjunction and AndConjunction, we may only infer inactivity without inspecting the whole hierarchy when all the parents are inactive.
Maintain a minimum heap of indices of HPs that should be visited. Since the HPs are topologically ordered by index, this ensures that all the parents of a HP are visited before that HP is visited. Forbid changing the value of an inactive HP. Raise ValueError on such attempt.
In some configuration spaces that include deeply nested conditions, sampling could yield an invalid configuration. This commit ensures that the procedure that normalizes the freshly sampled configuration handles all configuration spaces, including the complex ones, correctly. The new implementation of correct_sampled_array visits all the conditional hyperparameters in topological order. If any visited HP is deemed inactive, its vector value is set to NaN.
Break lines that are longer than 100 characters.
I have extended the changes so that generating neighbors and sampling random configurations works correctly with configuration spaces with conditions nested deeper than 1 level of |
@@ -1269,6 +1269,7 @@ class ConfigurationSpace(collections.abc.Mapping): | |||
|
|||
unconditional_hyperparameters = self.get_all_unconditional_hyperparameters() | |||
hyperparameters_with_children = list() | |||
conditional_hyperparameters = sorted(self.get_all_conditional_hyperparameters(), key=lambda hp_name: self._hyperparameter_idx[hp_name]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n ... number of conditional hyperparameters
N ... number of all hyperparameters
This implementation has the complexity O(n log n d(N)), where d(N) is the complexity of fetching an element from a dictionary indexed by N strings.
Alternative:
[hp_name for hp_name in self._hyperparameters if hp_name in self.get_all_conditional_hyperparameters()]
Complexity: O(N s(n)), where s(n) is the complexity of checking membership in a set of n strings.
to_visit = [index] | ||
|
||
# Since one hyperparameter may be reachable in several ways, we need to make sure we don't process it twice. | ||
scheduled = np.zeros(len(configuration_space), dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using a np.ndarray
without declaring it with cdef
harm performance significantly?
This pull request probably solves the issue solved by #219. |
Hi @dengdifan and @filipbartek, As a next step I'd like to figure why the two PRs are so different in size and whether they implement different things underneath. From what I can see the PR of @difandeng touches the functions:
and the PR from @filipbartek touches
and none of your PRs touches
PR #194 is more complete than PR #219 as it handles the test added there but not vice versa. I do like the rather small changes of @difandeng to Therefore, I propose the following concrete steps forward, let me know what you think about them:
|
We need to check if this PR fixes #196 as well. |
to_disable = set() | ||
for ch in children: | ||
to_disable.add(ch.name) | ||
while len(to_disable) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop adds all the descendants of current
to disabled
. No elements are ever removed from disabled
. On lines 345 to 346, all of the HPs in disabled
are set to NaN
. However, if any of these descendants is conditioned by an OrConjunction
, deactivating current
need not suffice to deactivate the descendant. For each descendant, we should evaluate its conditions before disabling it.
disabled = [] | ||
# We maintain to_visit as a minimum heap of indices of hyperparameters that may need to be updated. | ||
# We assume that the hyperparameters are sorted topologically by their index. | ||
to_visit = [index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a minimum heap (sorted by HP index) instead of a dequeue. Since the HP indices form a topological order with respect to the condition (directed acyclic) graph and since we process the HPs in a strictly increasing order by index, when we process a HP, we know that all of its parents have been processed already. Minimum heap is a data structure that allows us to ensure the topological order of processing with little computational overhead.
to_visit = [index] | ||
|
||
# Since one hyperparameter may be reachable in several ways, we need to make sure we don't process it twice. | ||
scheduled = np.zeros(len(configuration_space), dtype=bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scheduled
is a replacement of visited
. While visited
is a set of HP names, scheduled
is a boolean vector representation of a set of HPs, where each HP is identified by its index. I estimate that this change is probably a premature potential optimization.
for condition in conditions: | ||
if not condition._evaluate_vector(configuration_array): | ||
active = False | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of active
for the HP current
need not be definitive. For example, let's assume that current
is being deactivated because one of its parent conditions does not hold. Since we do not process the HPs in topological order, there may be a future iteration of the to_visit
loop that activates one of the parents of current
. If current
is conditioned by an OrConjunction
, current
may need to be re-activated then. However, we will not get to re-visit and re-activate current
afterward because of its membership in visited
.
Processing the HPs in topological order resolves this issue.
if current_idx == index: | ||
if not active: | ||
raise ValueError( | ||
"Attempting to change the value of the inactive hyperparameter '%s' to '%s'." % (hp_name, hp_value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a nice side effect of the code reorganization, we explicitly prevent the caller from changing the value of an inactive HP.
because there is another assertion at line 261 that covers mostly the same issues and because this one is rather uninformative.
To explain the changes in
When I modified the function, I attempted to make the modification rather conservative, keeping namely variable names and semantics where possible. However, I could still stay closer to the original implementation. I can try to undo some of the changes (namely switching from set I agree with merging #219 first. I will try to document the changes in |
Hi @filipbartek, thank you very much for your explanation. As discussed, I just merged #219. Could you please rebase your PR, I will then check again how to proceed. |
I am trying to minimize the changes introduced by this PR. 3 functions are affected:
I believe each of these is broken in a distinct way and can be fixed independently. Breaking the fix down might further facilitate the inspection of the changes. @mfeurer, would you prefer to have 3 PRs for these 3 fixes, or does breaking them down into sets of commits suffice? |
Hi @filipbartek, a single PR per problem would be appreciated as this will make reviewing and merging as simple and quickly as possible. Please excuse the delay, I've been on a longer break after finishing my PhD. |
Sorry to close this after so long of inactivity. I did a major rework in #346 and the code around this has completely changed. It's also been benchmarked to be just as fast/faster than the existing solution. I'm not sure if this closes the existing issue #194 though, I will have to take a deeper look |
Thanks for the update, @eddiebergman!
Consider scavenging the tests from this pull request. |
Resolves #194.