Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new command flepimop-push, flepimop-pull #296

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

fang19911030
Copy link
Collaborator

Describe your changes.
Added new commands: flepimop-push and flepimop-pull to flepimop.
What does your pull request address? Tag relevant issues.
#192
Mentions of relevant team members.
@saraloo @jcblemai
I hope these commands can be easily integrated into our current workflow. Therefore, I would appreciate it if Sara could take a look at this PR.

Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about the underlying issue and I wasn't requested as a reviewer so I won't leave a approve/request changes, but I did leave a few notes on code quality. Seems like some of these changes, in particular to the file_paths.py file, try to undo the changes in GH-250 though.

flepimop/gempyor_pkg/src/gempyor/file_paths.py Outdated Show resolved Hide resolved
"seir", "hosp", "llik", etc. The file names are generated using the `create_file_name` function,
with specific extensions based on the type: "csv" for "seed" and "parquet" for all other types.

Parameters:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think other functions in this file have been documented following the Google style guide, this looks like the numpy style guide maybe? We should pick a consistent choice, I have a preference for the Google style guide for brevity.

flepimop/gempyor_pkg/src/gempyor/file_paths.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/tests/utils/test_flepimop_pull.py Outdated Show resolved Hide resolved
@fang19911030 fang19911030 self-assigned this Aug 26, 2024
@shauntruelove shauntruelove added this to the flepiMoP 2.0 milestone Aug 26, 2024
@pearsonca
Copy link
Contributor

I like the idea here, but I think we need some refinement of the public facing verbs. See #336 + #337 for my overall thinking. To me, the idealized version looks like:

flepimop pull infer push config.yml

which would read as: "according to config.yml, pull the remote data, run inference, then push the results"

In that idealized version, we should be extracting the remote interaction bits from inference, and leave those all to pull / push. We might want to have flepimop infer config.yml implicitly recognize it needs to pull/push when there is information about remotes in the config, but that seems like an improvement for later.

I also think we don't want to make these their own entry points, but rather parts of the cli.py interface.

Copy link
Contributor

@pearsonca pearsonca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically seems fine to me, but not my area of expertise re the core capabilities. i can say there will need to be some re-orientation to integrate into the overall direction we're going for the flepimop CLI

Comment on lines +56 to +57
flepimop-pull = gempyor.resume_pull:fetching_resume_files
flepimop-push = gempyor.flepimop_push:flepimop_push
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's my intention that these will be shortly replaced by interacting with this capability via the core flepimop cli. makes sense to add them for the time being, but people should be advised that they will migrate soon (ideally) to the overall flepimop cli.

Comment on lines 222 to 225
type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"]
name_list = []
for type_name in type_list:
extension = "csv" if type_name == "seed" else "parquet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: feels like mild code smell to have this if test inside the loop right next to the variables outside. bit less weird as, dunno, a list comprehesion outside with the test, then use the key/value pairs in the loop.

but like i said, minor complaint.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a key/value pairs out of the loop

Comment on lines +90 to +112
@click.command()
@click.option(
"--resume_location",
"resume_location",
envvar=["LAST_JOB_OUTPUT", "RESUME_LOCATION"],
type=click.STRING,
required=True,
help="the path for the last run's output",
)
@click.option(
"--discard_seeding",
"discard_seeding",
envvar="RESUME_DISCARD_SEEDING",
type=click.BOOL,
required=True,
help="required bool value for discarding seeding or not",
)
@click.option("--block_index", "flepi_block_index", envvar="FLEPI_BLOCK_INDEX", type=click.INT, required=True)
@click.option(
"--resume_run_index", "resume_run_index", envvar="RESUME_RUN_INDEX", type=click.STRING, required=True,
)
@click.option("--flepi_run_index", "flepi_run_index", envvar="FLEPI_RUN_INDEX", type=click.STRING, required=True)
@click.option("--flepi_prefix", "flepi_prefix", envvar="FLEPI_PREFIX", type=click.STRING, required=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that several of these overlap w/ push: i think it probably makes sense to have these be complementary functions in the same module, with shared option definitions.

That can be accomplished in a subsequent PR, I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that makes sense. I will also remove overlap options in the subsequent PR

@@ -909,7 +909,7 @@ def create_resume_file_names_map(
liketype=liketype,
)
input_file_name = output_file_name
if os.environ.get("FLEPI_BLOCK_INDEX") == "1":
if flepi_block_index == "1":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume this is just fixing a random error, that you happened to turn up fixing this issue broadly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I found this parameter is not used to replace the reading of the environmental variable. So I added this tiny fix.

Copy link
Contributor

@TimothyWillard TimothyWillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay overall, my two big comments are:

  1. The boto3 issue needs to be resolved, breaking the CI would cause all kinds of problems down the road (see: new command flepimop-push, flepimop-pull #296 (comment)).
  2. I like the CLI unit tests so far, but I would definitely appreciate more unit testing. For example gempyor.file_paths.create_file_name_for_push is a brand new function and having unit tests would create a maintainable baseline of behavior for the future. And more unit testing of the CLI would be helpful, maybe trying to inspect the outputs a bit more if that's possible (may require modifying the patch to return some dummy results).

Comment on lines 222 to 225
type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"]
name_list = []
for type_name in type_list:
extension = "csv" if type_name == "seed" else "parquet"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?

flepimop/gempyor_pkg/src/gempyor/flepimop_push.py Outdated Show resolved Hide resolved
flepimop/gempyor_pkg/src/gempyor/resume_pull.py Outdated Show resolved Hide resolved
Comment on lines 84 to 85
import boto3
import botocore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the cause of the CI failing. The two options are:

  1. Move boto3 and botocore from an extras install group into the requires. I think @jcblemai as expressed that he's not interested in this (is there a reason why?), or
  2. Move the import of these into the function that uses them like gempyor.utils.download_file_from_s3 does.

Slack with more details if needed: https://uncreturntocampus.slack.com/archives/C07MUAU8R0S/p1729705339433819

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I move the import into the function.

Comment on lines 234 to 235
if __name__ == "__main__":
fetching_resume_files()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about this, do we expect users to all this file directly, I would think not right? Or is this required for some other reason?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcblemai Will users call this command to pull resume file directly? Should I make it callable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's kind of conserved stylistic convention.

in my opinion, i think that's ...okay as a convention, but that we should have an overhaul that re-routes all of these kind of invocations through our click framework + issues a warning to users that an execution should happen that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to belabor this too much because this is minor, but I am going to push back on this being "stylistic convention". These two lines are the difference between allowing users to interact with this script directly vs not:

twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume) [1]> python src/gempyor/resume_pull.py --help
Usage: resume_pull.py [OPTIONS]

Options:
  --resume_location TEXT     the path for the last run's output  [required]
  --discard_seeding BOOLEAN  required bool value for discarding seeding or not
                             [required]
  --block_index INTEGER      [required]
  --resume_run_index TEXT    [required]
  --flepi_run_index TEXT     [required]
  --flepi_prefix TEXT        [required]
  --help                     Show this message and exit.
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>

vs

twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)> python src/gempyor/resume_pull.py --help                    
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>

If the goal is to have a consistent CLI then we shouldn't give users the option to run the same tool in multiple ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair, not exactly "style" - but the current library convention seems to be making many (most? almost all?) of the scripts have a main.

i am on-board with deleting all of those (agree - definitely want fewer interface points to maintain), but that is a breaking change and should probably be done as its own issue/PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main is removed.

flepimop/gempyor_pkg/src/gempyor/file_paths.py Outdated Show resolved Hide resolved
Comment on lines 202 to 214
flepi_run_index :
The index of the run. This is used to uniquely identify the run.

prefix :
A prefix string to be included in the file names. This is typically used to categorize or
identify the files.

flepi_slot_index :
The slot index used in the filename. This is formatted as a zero-padded nine-digit number.

flepi_block_index :
The block index used in the filename. This typically indicates a specific block or segment
of the data being processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the spacing be changed just slightly to match this https://google.github.io/styleguide/pyguide.html#doc-function-args a bit better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I changed comment according to this style. Let me know if you think we need further change.

@pearsonca
Copy link
Contributor

@fang19911030 per the flepimop meeting, this should probably be targeting the dev branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants