new command flepimop-push, flepimop-pull #296

fang19911030 · 2024-08-13T17:37:21Z

Describe your changes.
Added new commands: flepimop-push and flepimop-pull to flepimop.
What does your pull request address? Tag relevant issues.
#192
Mentions of relevant team members.
@saraloo @jcblemai
I hope these commands can be easily integrated into our current workflow. Therefore, I would appreciate it if Sara could take a look at this PR.

flepimop/gempyor_pkg/src/gempyor/file_paths.py

TimothyWillard

I don't know much about the underlying issue and I wasn't requested as a reviewer so I won't leave a approve/request changes, but I did leave a few notes on code quality. Seems like some of these changes, in particular to the file_paths.py file, try to undo the changes in GH-250 though.

flepimop/gempyor_pkg/src/gempyor/file_paths.py

TimothyWillard · 2024-08-13T18:00:55Z

flepimop/gempyor_pkg/src/gempyor/file_paths.py

+    "seir", "hosp", "llik", etc. The file names are generated using the `create_file_name` function, 
+    with specific extensions based on the type: "csv" for "seed" and "parquet" for all other types.
+
+    Parameters:


I think other functions in this file have been documented following the Google style guide, this looks like the numpy style guide maybe? We should pick a consistent choice, I have a preference for the Google style guide for brevity.

flepimop/gempyor_pkg/src/gempyor/file_paths.py

flepimop/gempyor_pkg/src/gempyor/flepimop_push.py

flepimop/gempyor_pkg/tests/utils/test_flepimop_pull.py

pearsonca · 2024-10-12T17:33:15Z

I like the idea here, but I think we need some refinement of the public facing verbs. See #336 + #337 for my overall thinking. To me, the idealized version looks like:

flepimop pull infer push config.yml

which would read as: "according to config.yml, pull the remote data, run inference, then push the results"

In that idealized version, we should be extracting the remote interaction bits from inference, and leave those all to pull / push. We might want to have flepimop infer config.yml implicitly recognize it needs to pull/push when there is information about remotes in the config, but that seems like an improvement for later.

I also think we don't want to make these their own entry points, but rather parts of the cli.py interface.

…flepiMoP into python_script_resume

pearsonca

basically seems fine to me, but not my area of expertise re the core capabilities. i can say there will need to be some re-orientation to integrate into the overall direction we're going for the flepimop CLI

pearsonca · 2024-10-23T17:50:59Z

flepimop/gempyor_pkg/setup.cfg

+    flepimop-pull = gempyor.resume_pull:fetching_resume_files
+    flepimop-push = gempyor.flepimop_push:flepimop_push


it's my intention that these will be shortly replaced by interacting with this capability via the core flepimop cli. makes sense to add them for the time being, but people should be advised that they will migrate soon (ideally) to the overall flepimop cli.

pearsonca · 2024-10-23T17:54:35Z

flepimop/gempyor_pkg/src/gempyor/file_paths.py

+    type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"]
+    name_list = []
+    for type_name in type_list:
+        extension = "csv" if type_name == "seed" else "parquet"


minor: feels like mild code smell to have this if test inside the loop right next to the variables outside. bit less weird as, dunno, a list comprehesion outside with the test, then use the key/value pairs in the loop.

but like i said, minor complaint.

Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?

create a key/value pairs out of the loop

flepimop/gempyor_pkg/src/gempyor/flepimop_push.py

pearsonca · 2024-10-23T17:58:55Z

flepimop/gempyor_pkg/src/gempyor/resume_pull.py

+@click.command()
+@click.option(
+    "--resume_location",
+    "resume_location",
+    envvar=["LAST_JOB_OUTPUT", "RESUME_LOCATION"],
+    type=click.STRING,
+    required=True,
+    help="the path for the last run's output",
+)
+@click.option(
+    "--discard_seeding",
+    "discard_seeding",
+    envvar="RESUME_DISCARD_SEEDING",
+    type=click.BOOL,
+    required=True,
+    help="required bool value for discarding seeding or not",
+)
+@click.option("--block_index", "flepi_block_index", envvar="FLEPI_BLOCK_INDEX", type=click.INT, required=True)
+@click.option(
+    "--resume_run_index", "resume_run_index", envvar="RESUME_RUN_INDEX", type=click.STRING, required=True,
+)
+@click.option("--flepi_run_index", "flepi_run_index", envvar="FLEPI_RUN_INDEX", type=click.STRING, required=True)
+@click.option("--flepi_prefix", "flepi_prefix", envvar="FLEPI_PREFIX", type=click.STRING, required=True)


given that several of these overlap w/ push: i think it probably makes sense to have these be complementary functions in the same module, with shared option definitions.

That can be accomplished in a subsequent PR, I think

yes, that makes sense. I will also remove overlap options in the subsequent PR

pearsonca · 2024-10-23T18:00:44Z

flepimop/gempyor_pkg/src/gempyor/utils.py

@@ -909,7 +909,7 @@ def create_resume_file_names_map(
                liketype=liketype,
            )
            input_file_name = output_file_name
-            if os.environ.get("FLEPI_BLOCK_INDEX") == "1":
+            if flepi_block_index == "1":


i assume this is just fixing a random error, that you happened to turn up fixing this issue broadly?

yes, I found this parameter is not used to replace the reading of the environmental variable. So I added this tiny fix.

TimothyWillard

Looks okay overall, my two big comments are:

The boto3 issue needs to be resolved, breaking the CI would cause all kinds of problems down the road (see: new command flepimop-push, flepimop-pull #296 (comment)).
I like the CLI unit tests so far, but I would definitely appreciate more unit testing. For example gempyor.file_paths.create_file_name_for_push is a brand new function and having unit tests would create a maintainable baseline of behavior for the future. And more unit testing of the CLI would be helpful, maybe trying to inspect the outputs a bit more if that's possible (may require modifying the patch to return some dummy results).

TimothyWillard · 2024-10-24T13:28:41Z

flepimop/gempyor_pkg/src/gempyor/file_paths.py

+    type_list = ["seir", "hosp", "llik", "spar", "snpi", "hnpi", "hpar", "init", "seed"]
+    name_list = []
+    for type_name in type_list:
+        extension = "csv" if type_name == "seed" else "parquet"


Yeah, this also seems like something that we should use the gempyor.utils.get_filetype_for_resume to get? Although, just trying todo that right now would be a circular import. Punt to a new issue?

flepimop/gempyor_pkg/src/gempyor/flepimop_push.py

flepimop/gempyor_pkg/src/gempyor/resume_pull.py

TimothyWillard · 2024-10-24T13:34:20Z

flepimop/gempyor_pkg/src/gempyor/resume_pull.py

+import boto3
+import botocore


These are the cause of the CI failing. The two options are:

Move boto3 and botocore from an extras install group into the requires. I think @jcblemai as expressed that he's not interested in this (is there a reason why?), or

Move the import of these into the function that uses them like gempyor.utils.download_file_from_s3 does.

Slack with more details if needed: https://uncreturntocampus.slack.com/archives/C07MUAU8R0S/p1729705339433819

I move the import into the function.

TimothyWillard · 2024-10-24T13:35:14Z

flepimop/gempyor_pkg/src/gempyor/resume_pull.py

+if __name__ == "__main__":
+    fetching_resume_files()


I'm a bit confused about this, do we expect users to all this file directly, I would think not right? Or is this required for some other reason?

@jcblemai Will users call this command to pull resume file directly？ Should I make it callable？

it's kind of conserved stylistic convention.

in my opinion, i think that's ...okay as a convention, but that we should have an overhaul that re-routes all of these kind of invocations through our click framework + issues a warning to users that an execution should happen that way.

I don't want to belabor this too much because this is minor, but I am going to push back on this being "stylistic convention". These two lines are the difference between allowing users to interact with this script directly vs not:

twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume) [1]> python src/gempyor/resume_pull.py --help Usage: resume_pull.py [OPTIONS] Options: --resume_location TEXT the path for the last run's output [required] --discard_seeding BOOLEAN required bool value for discarding seeding or not [required] --block_index INTEGER [required] --resume_run_index TEXT [required] --flepi_run_index TEXT [required] --flepi_prefix TEXT [required] --help Show this message and exit. twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>

vs

twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)> python src/gempyor/resume_pull.py --help twillard@Mac ~/D/G/H/f/f/gempyor_pkg (python_script_resume)>

If the goal is to have a consistent CLI then we shouldn't give users the option to run the same tool in multiple ways.

fair, not exactly "style" - but the current library convention seems to be making many (most? almost all?) of the scripts have a main.

i am on-board with deleting all of those (agree - definitely want fewer interface points to maintain), but that is a breaking change and should probably be done as its own issue/PR.

This main is removed.

flepimop/gempyor_pkg/src/gempyor/file_paths.py

TimothyWillard · 2024-10-24T13:40:31Z

flepimop/gempyor_pkg/src/gempyor/file_paths.py

+        flepi_run_index :
+            The index of the run. This is used to uniquely identify the run.
+
+        prefix :
+            A prefix string to be included in the file names. This is typically used to categorize or 
+            identify the files.
+
+        flepi_slot_index :
+            The slot index used in the filename. This is formatted as a zero-padded nine-digit number.
+
+        flepi_block_index :
+            The block index used in the filename. This typically indicates a specific block or segment 
+            of the data being processed.


Could the spacing be changed just slightly to match this https://google.github.io/styleguide/pyguide.html#doc-function-args a bit better?

OK, I changed comment according to this style. Let me know if you think we need further change.

pearsonca · 2024-10-29T15:28:46Z

@fang19911030 per the flepimop meeting, this should probably be targeting the dev branch

fang19911030 added 22 commits July 25, 2024 11:38

add new command flepimop-pull

d85f6e4

bug fix

edfbb68

change format

1cb9ef1

add the test

d9d59a2

change argument type

c18ecde

add file check

ea59026

add new file for push command

aafc88b

Merge branch 'main' into python_script_resume

d3a4cac

Merge branch 'main' into python_script_resume

9e838b0

add function creating file names for pushing

62c0979

add body for flepimop-push

5407c71

add command flepimop-push

e8c1c42

change error message

cde74d4

fix wrong parameter

f1a57fb

rename file

8c6b65f

wrong file name

b0d8895

update doc and fix format

fc8b4fa

fix

5fce4d4

black fix format

ce734c4

print message

6cacb69

clean

34b18cf

correct variable name

534d932

fang19911030 requested review from jcblemai and saraloo August 13, 2024 17:37

TimothyWillard reviewed Aug 13, 2024

View reviewed changes

flepimop/gempyor_pkg/src/gempyor/file_paths.py Outdated Show resolved Hide resolved

TimothyWillard reviewed Aug 13, 2024

View reviewed changes

flepimop/gempyor_pkg/src/gempyor/file_paths.py Show resolved Hide resolved

correct tests

72ef61b

TimothyWillard reviewed Aug 13, 2024

View reviewed changes

TimothyWillard mentioned this pull request Aug 13, 2024

Initial python linting #282

Closed

fang19911030 self-assigned this Aug 26, 2024

shauntruelove added this to the flepiMoP 2.0 milestone Aug 26, 2024

jcblemai added 2 commits September 13, 2024 11:28

Merge branch 'main' into python_script_resume

1cbce6c

Merge branch 'main' into python_script_resume

d4ba408

fang19911030 added 4 commits October 22, 2024 09:38

Merge branch 'main' into python_script_resume

cee4259

address comments

da0b989

address comments 2

3c21a82

Merge branch 'python_script_resume' of https://github.com/HopkinsIDD/…

1d8879a

…flepiMoP into python_script_resume

fang19911030 requested a review from pearsonca October 23, 2024 17:39

pearsonca reviewed Oct 23, 2024

View reviewed changes

fang19911030 requested a review from TimothyWillard October 23, 2024 18:14

TimothyWillard reviewed Oct 24, 2024

View reviewed changes

fang19911030 added 7 commits October 31, 2024 10:01

Merge branch 'main' into python_script_resume

c9a2307

change doc string of file_paths

29cf95d

remove main

62c56f0

remove main and relocate import

8194214

add test file

14193ff

change

22f5188

new unit test

a417e38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new command flepimop-push, flepimop-pull #296

new command flepimop-push, flepimop-pull #296

fang19911030 commented Aug 13, 2024

TimothyWillard left a comment

TimothyWillard Aug 13, 2024

pearsonca commented Oct 12, 2024

pearsonca left a comment

pearsonca Oct 23, 2024

pearsonca Oct 23, 2024

TimothyWillard Oct 24, 2024

fang19911030 Nov 20, 2024

pearsonca Oct 23, 2024

fang19911030 Nov 20, 2024

pearsonca Oct 23, 2024

fang19911030 Oct 31, 2024

TimothyWillard left a comment •

edited

Loading

TimothyWillard Oct 24, 2024

TimothyWillard Oct 24, 2024

fang19911030 Oct 31, 2024

TimothyWillard Oct 24, 2024

fang19911030 Oct 31, 2024

pearsonca Oct 31, 2024

TimothyWillard Oct 31, 2024

pearsonca Oct 31, 2024

fang19911030 Nov 14, 2024

TimothyWillard Oct 24, 2024

fang19911030 Nov 14, 2024

pearsonca commented Oct 29, 2024

		flepimop-pull = gempyor.resume_pull:fetching_resume_files
		flepimop-push = gempyor.flepimop_push:flepimop_push

new command flepimop-push, flepimop-pull #296

Are you sure you want to change the base?

new command flepimop-push, flepimop-pull #296

Conversation

fang19911030 commented Aug 13, 2024

TimothyWillard left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pearsonca commented Oct 12, 2024

pearsonca left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyWillard left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pearsonca commented Oct 29, 2024

TimothyWillard left a comment •

edited

Loading