Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored and integrated Grey Wolf Optimizer #121

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

hellkite500
Copy link
Member

As noted in #99, a Grey Wolf Optimizer was implemented by @xfeng2021. This original implementation is captured in the history in 92abc92 and a small suite of unit tests were developed to exercise that implementation for its three key semantics -- creation, optimization, and "restart" from a non-zero iteration.

A few minor fixes were required on the original code to get these tests to work correctly, and then a refactor an consolidation of the code was applied.

Note that the PR doesn't currently expose this optimizer for use to search.py or __main__ just yet. That functionality is being reviewed from the original implementation and I can add those to this PR or open a new one if this gets merged.

Additions

  • ngen.cal.optimizers subpackage
  • grey_wolf.py implementation of Grey Wolf Optmization
  • test_gwo.py for testing the grey wolf implementation

Testing

  1. Tested with pytest locally

Notes

  • Additional functionality required to integrate the optimizer into the search module

Todos

  • Could consider some different attribute handling/semantics and possibly different serialization for swarm history/cost attributes.

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Target Environment support

  • Linux
  • MacOS

@xfeng2021
Copy link
Contributor

Thanks for the updates.

aaraney added 3 commits May 2, 2024 10:10
…t gwo in public `optimizers`

This refactor adds a layer of indirection to safeguard the `optimizers`
public api. The `_optimizers` subpackage provides a place for _both_
stable and unstable / experimental optimizers to live. Once an
optimizer's api and features are fleshed out, it can then be imported by
the `optimizers` subpackage in `__init__.py` effectively moving it into
the public api.
Copy link
Contributor

@robertbartel robertbartel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have several things I think need at least sanity checks. I got a bit nit-picky, especially on some simple style things, though I tried to offer suggestion inline for those things.

"""_summary_

Args:
SwarmOptimizer (_type_): _description_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is just @hellkite500's IDE defaults. @hellkite500, can you add docs for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of an aside: I originally started my review for grey_wolf.py before it got moved to the private subpackage, but didn't finish before it got moved, and Github made me refresh and, seemingly, required I start over in the "other" grey_wolf.py. So some of my comments may appear to be duplicated.

python/ngen_cal/src/ngen/cal/optimizers/grey_wolf.py Outdated Show resolved Hide resolved
@@ -36,6 +36,7 @@ install_requires =
hydrotools.metrics
hydrotools.nwis_client
pyarrow
pyswarms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add pyswarms to requirements.txt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah? I am inclined to remove the root repo requirements.txt in favor of a ns package level requirements.txt instead. In either case, we should use pip freeze to generate requirements.txt instead of just listing out all the deps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets open this as a TODO in an issue and cycle back to this.

python/ngen_cal/src/ngen/cal/optimizers/grey_wolf.py Outdated Show resolved Hide resolved
python/ngen_cal/src/ngen/cal/optimizers/grey_wolf.py Outdated Show resolved Hide resolved
df (DataFrame): dataframe containing swarm history or cost
name (str): history or attribute variable to extract
iter_range (Optional[List], optional): If provided, only extract
hitory data and the best, otherwise extract cost for current iteration.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hitory data and the best, otherwise extract cost for current iteration.
history data and the best, otherwise extract cost for current iteration.


pos = [np.array(df[df['iteration']==i].iloc[:,0:self.dimensions]) for i in iter_range]
current_pos = pos[len(pos)-1]
return current_pos, pos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like at least something isn't right here. pos seems like it has to be a List[np.ndarray], which would make current_pos an np.ndarray rather than an int. But I'm not sure if the hints are the off or you want the function to be doing something slightly different.

df.to_csv(fname, index=False)
return df

def read_hist_iter_file(self) -> Tuple:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function isn't actually returning anything, despite the hint and docstring.

"""Update history."""
_hist={ a: getattr(self.swarm, a, None) for a in self._cost_attrs+self._pos_attrs}
_hist['mean_pbest_cost'] = np.mean(self.swarm.pbest_cost)
_hist['mean_leader_cost'] = np.mean(self.swarm.leader_cost)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leader_cost attribute for self.swarm seems to be getting added dynamically. Just as a sanity check, are we sure update_history cannot be called on an instance for which self.swarm.leader_cost has not been added?

@@ -0,0 +1 @@
from .._optimizers.grey_wolf import GreyWolfOptimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why move GreyWolfOptimizer to the private package but import it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refactor adds a layer of indirection to safeguard the optimizers
public api. The _optimizers subpackage provides a place for both
stable and unstable / experimental optimizers to live. Once an
optimizer's api and features are fleshed out, it can then be imported by
the optimizers subpackage in __init__.py effectively moving it into
the public api.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. I was thrown off a bit by having in my head that the class was still experimental, since the description for the PR notes it not being exposed yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a little confusing. I also thought we should not expose it to the public api until we have it "hooked up," but @hellkite500 thought otherwise.

@@ -0,0 +1,3 @@
from .grey_wolf import GreyWolfOptimizer

__all__ = [GreyWolfOptimizer]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think __all__ is needed in this case since we aren't importing anything other than GreyWolfOptimizer in this module. __all__ controls what gets imported when you wildcard import (e.g. from ngen.cal.optimizers import *) a module. Regardless, its kind of annoying, but the members of __all__ must be strings. So, if we choose to keep this, this will need to be changed to this:

Suggested change
__all__ = [GreyWolfOptimizer]
__all__ = ["GreyWolfOptimizer"]

@@ -36,6 +36,7 @@ install_requires =
hydrotools.metrics
hydrotools.nwis_client
pyarrow
pyswarms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have to do this here, but we should move deps like pyswarms to an [options.extras_require]. This would let people install ngen.cal[particalswarm] for example. So, just pull in what you need.

import time
from typing import Tuple, List, Optional

def create_swarm(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_swarm(
def create_swarm(
n_particles: int,
dimensions: int,
bounds: Optional[Tuple[Union[np.ndarray, List], Union[np.ndarray, List]]] = None,
center: Optional[Union[np.ndarray, float]] = 1.0,
init_pos: Optional[np.ndarray] = None,
options: Optional[Dict[Any, Any]] = None,
):

bounds=None,
center=1.0,
init_pos=None,
options={}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to default to None and passed as an {} if it is None. Using a mutable type as a default argument is a foot gun in python.



class GreyWolfOptimizer(SwarmOptimizer):
"""_summary_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write up a doc string for this or just remove it, please?

"""

# Apply verbosity
if self.start_iter>0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a lot of style things that I would like to improve in this work. I will run something like black to format this before we merge.

# Setup Pool of processes for parallel evaluation
pool = None if n_processes is None else mp.Pool(n_processes)

ftol_history = deque(maxlen=self.ftol_iter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Close Pool of Processes
if n_processes is not None:
pool.close()
return (final_best_cost, final_best_pos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, this will get reformatted before we merge.

See `pyswarms.base.SwarmOptimizer`'s documentation for more information.

Args:
SwarmOptimizer (_type_): _description_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellkite500, can you add some docs here. In this comment is fine too if you want me to just push them up and add you as the author.

python/ngen_cal/src/ngen/cal/_optimizers/grey_wolf.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants