-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
new command flepimop-push, flepimop-pull #296
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the underlying issue and I wasn't requested as a reviewer so I won't leave a approve/request changes, but I did leave a few notes on code quality. Seems like some of these changes, in particular to the file_paths.py
file, try to undo the changes in GH-250 though.
"seir", "hosp", "llik", etc. The file names are generated using the `create_file_name` function, | ||
with specific extensions based on the type: "csv" for "seed" and "parquet" for all other types. | ||
|
||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think other functions in this file have been documented following the Google style guide, this looks like the numpy style guide maybe? We should pick a consistent choice, I have a preference for the Google style guide for brevity.
I like the idea here, but I think we need some refinement of the public facing verbs. See #336 + #337 for my overall thinking. To me, the idealized version looks like: flepimop pull infer push config.yml which would read as: "according to config.yml, pull the remote data, run inference, then push the results" In that idealized version, we should be extracting the remote interaction bits from inference, and leave those all to pull / push. We might want to have I also think we don't want to make these their own entry points, but rather parts of the cli.py interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically seems fine to me, but not my area of expertise re the core capabilities. i can say there will need to be some re-orientation to integrate into the overall direction we're going for the flepimop CLI
flepimop-pull = gempyor.resume_pull:fetching_resume_files | ||
flepimop-push = gempyor.flepimop_push:flepimop_push |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's my intention that these will be shortly replaced by interacting with this capability via the core flepimop cli. makes sense to add them for the time being, but people should be advised that they will migrate soon (ideally) to the overall flepimop cli.
type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"] | ||
name_list = [] | ||
for type_name in type_list: | ||
extension = "csv" if type_name == "seed" else "parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: feels like mild code smell to have this if test inside the loop right next to the variables outside. bit less weird as, dunno, a list comprehesion outside with the test, then use the key/value pairs in the loop.
but like i said, minor complaint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume
to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a key/value pairs out of the loop
@click.command() | ||
@click.option( | ||
"--resume_location", | ||
"resume_location", | ||
envvar=["LAST_JOB_OUTPUT", "RESUME_LOCATION"], | ||
type=click.STRING, | ||
required=True, | ||
help="the path for the last run's output", | ||
) | ||
@click.option( | ||
"--discard_seeding", | ||
"discard_seeding", | ||
envvar="RESUME_DISCARD_SEEDING", | ||
type=click.BOOL, | ||
required=True, | ||
help="required bool value for discarding seeding or not", | ||
) | ||
@click.option("--block_index", "flepi_block_index", envvar="FLEPI_BLOCK_INDEX", type=click.INT, required=True) | ||
@click.option( | ||
"--resume_run_index", "resume_run_index", envvar="RESUME_RUN_INDEX", type=click.STRING, required=True, | ||
) | ||
@click.option("--flepi_run_index", "flepi_run_index", envvar="FLEPI_RUN_INDEX", type=click.STRING, required=True) | ||
@click.option("--flepi_prefix", "flepi_prefix", envvar="FLEPI_PREFIX", type=click.STRING, required=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that several of these overlap w/ push: i think it probably makes sense to have these be complementary functions in the same module, with shared option definitions.
That can be accomplished in a subsequent PR, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that makes sense. I will also remove overlap options in the subsequent PR
@@ -909,7 +909,7 @@ def create_resume_file_names_map( | |||
liketype=liketype, | |||
) | |||
input_file_name = output_file_name | |||
if os.environ.get("FLEPI_BLOCK_INDEX") == "1": | |||
if flepi_block_index == "1": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume this is just fixing a random error, that you happened to turn up fixing this issue broadly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I found this parameter is not used to replace the reading of the environmental variable. So I added this tiny fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks okay overall, my two big comments are:
- The
boto3
issue needs to be resolved, breaking the CI would cause all kinds of problems down the road (see: new command flepimop-push, flepimop-pull #296 (comment)). - I like the CLI unit tests so far, but I would definitely appreciate more unit testing. For example
gempyor.file_paths.create_file_name_for_push
is a brand new function and having unit tests would create a maintainable baseline of behavior for the future. And more unit testing of the CLI would be helpful, maybe trying to inspect the outputs a bit more if that's possible (may require modifying the patch to return some dummy results).
type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"] | ||
name_list = [] | ||
for type_name in type_list: | ||
extension = "csv" if type_name == "seed" else "parquet" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume
to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?
import boto3 | ||
import botocore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the cause of the CI failing. The two options are:
- Move
boto3
andbotocore
from an extras install group into the requires. I think @jcblemai as expressed that he's not interested in this (is there a reason why?), or - Move the import of these into the function that uses them like
gempyor.utils.download_file_from_s3
does.
Slack with more details if needed: https://uncreturntocampus.slack.com/archives/C07MUAU8R0S/p1729705339433819
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I move the import into the function.
if __name__ == "__main__": | ||
fetching_resume_files() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about this, do we expect users to all this file directly, I would think not right? Or is this required for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcblemai Will users call this command to pull resume file directly? Should I make it callable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's kind of conserved stylistic convention.
in my opinion, i think that's ...okay as a convention, but that we should have an overhaul that re-routes all of these kind of invocations through our click framework + issues a warning to users that an execution should happen that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to belabor this too much because this is minor, but I am going to push back on this being "stylistic convention". These two lines are the difference between allowing users to interact with this script directly vs not:
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume) [1]> python src/gempyor/resume_pull.py --help
Usage: resume_pull.py [OPTIONS]
Options:
--resume_location TEXT the path for the last run's output [required]
--discard_seeding BOOLEAN required bool value for discarding seeding or not
[required]
--block_index INTEGER [required]
--resume_run_index TEXT [required]
--flepi_run_index TEXT [required]
--flepi_prefix TEXT [required]
--help Show this message and exit.
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>
vs
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)> python src/gempyor/resume_pull.py --help
twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>
If the goal is to have a consistent CLI then we shouldn't give users the option to run the same tool in multiple ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair, not exactly "style" - but the current library convention seems to be making many (most? almost all?) of the scripts have a main.
i am on-board with deleting all of those (agree - definitely want fewer interface points to maintain), but that is a breaking change and should probably be done as its own issue/PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This main is removed.
flepi_run_index : | ||
The index of the run. This is used to uniquely identify the run. | ||
|
||
prefix : | ||
A prefix string to be included in the file names. This is typically used to categorize or | ||
identify the files. | ||
|
||
flepi_slot_index : | ||
The slot index used in the filename. This is formatted as a zero-padded nine-digit number. | ||
|
||
flepi_block_index : | ||
The block index used in the filename. This typically indicates a specific block or segment | ||
of the data being processed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the spacing be changed just slightly to match this https://google.github.io/styleguide/pyguide.html#doc-function-args a bit better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I changed comment according to this style. Let me know if you think we need further change.
@fang19911030 per the flepimop meeting, this should probably be targeting the dev branch |
Describe your changes.
Added new commands: flepimop-push and flepimop-pull to flepimop.
What does your pull request address? Tag relevant issues.
#192
Mentions of relevant team members.
@saraloo @jcblemai
I hope these commands can be easily integrated into our current workflow. Therefore, I would appreciate it if Sara could take a look at this PR.