-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#99 Make pp yaml more generic #101
Conversation
singhd789
commented
Jun 12, 2024
- update configure script to set experiment, platform, and target from click options
- update configure script to set experiment, platform, and target from click options
A lot of the changes are format fixes (pylint errors) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good as an interim FRE Properties-like implementation, but it's not safe to assume an underscore in the experiment name, and "stem" is really an arbitrary variable that we need to find a new properties.yaml file for perhaps.
The "expname" / "name" thing is more of an opinion. feel free to ignore that one..
fre/pp/configure_script_yaml.py
Outdated
if k in ("HISTORY_DIR", "PP_DIR", "ANALYSIS"): | ||
# replace generic variables in pp yaml | ||
stem=e.split("_",1)[0] | ||
exp=e.split("_",1)[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't safe. We can't safely assumes $(stem) is the first part of the experiment name, divided by an underscore.
Experiment, platform, and target are a special case for "FRE property" expansion as they are provided on the command line. "stem" is more of a regular "FRE property" in that it is just a regular variable name that we want to use in the pp.yamls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'm working on changing this after our discussion yesterday.
fre/pp/configure_script_yaml.py
Outdated
# replace generic variables in pp yaml | ||
stem=e.split("_",1)[0] | ||
exp=e.split("_",1)[1] | ||
replacevars=configvalue.replace("$(stem)",stem).replace("$(expname)",exp).replace("$(platform)",p).replace("$(target)",t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looks good. The first-draft expansion of Canopy "FRE properties".
Except we can't expand "stem" until we have a place to retrieve it from (properties.yaml?), and I suggest using "name" instead of "expname" as that is exactly as Bronx FRE Properties named them (so we have a bridge to Bronx-like behavior..)
fre/pp/ppyaml_test/pp.yaml
Outdated
ptmp_dir: "/xtmp/$USER/ptmp" | ||
analysis: "/nbhome/$USER/canopy/analysis/am5_c96L65_amip" | ||
analysis: "/nbhome/$USER/canopy/analysis/$(expname)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest "name" instead
fre/pp/ppyaml_test/pp.yaml
Outdated
history_dir: "/archive/c2b/am5/2022.01/c96L65_am5f1a0r0_amip/gfdl.ncrc5-intel22-classic-prod-openmp/history" | ||
pp_dir: "/archive/$USER/canopy/am5/c96L65_amip/gfdl.ncrc5-deploy-prod-openmp/pp" | ||
history_dir: "/archive/c2b/$(stem)/2022.01/$(expname)/$(platform)-$(target)/history" | ||
pp_dir: "/archive/$USER/canopy/$(stem)/$(expname)/$(platform)-$(target)/pp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, "stem" wants to be "am5/2022.01". We can't do this, yet. So we could probably leave the hard-coded stem in the pp_dir:
pp_dir: "/archive/$USER/canopy/am5/2022.01/$(name)/$(platform)-$(target)/pp"
fre/pp/ppyaml_test/pp.yaml
Outdated
@@ -21,10 +21,10 @@ rose-suite: | |||
do_analysis: False | |||
|
|||
directories: | |||
history_dir: "/archive/c2b/am5/2022.01/c96L65_am5f1a0r0_amip/gfdl.ncrc5-intel22-classic-prod-openmp/history" | |||
pp_dir: "/archive/$USER/canopy/am5/c96L65_amip/gfdl.ncrc5-deploy-prod-openmp/pp" | |||
history_dir: "/archive/c2b/$(stem)/2022.01/$(expname)/$(platform)-$(target)/history" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really want "/archive/c2b/$(stem)/$(name)/$(platform)-$(target)/history", with stem being "am5/2022.01" but we can't expand "stem" yet.
- can hold experiment yamls - hold FRE properties defined
- parse main yaml to use certain experiment yaml - parse main yaml for FRE properties defined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I think this is almost just what we need for AM5 runtime integration. Let's discuss..
fre/pp/configure_script_yaml.py
Outdated
|
||
# return experiment yaml and defined FRE properties | ||
return(expyaml,stem) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! FRE Property expansion happens here.
I think there has to be special case expansions for the command-line arguments: experiment, platform, and target:
For the rest, it would be ideal to have them be completely user-specified, and not limited to a particular set, e.g. "$(stem).
fre/pp/ppyaml_test/am5-yamls.yaml
Outdated
projectID: "am5" | ||
FRE_VERSION: &fre_version "canopy" | ||
AM5_VERSION: &am5_version "2022.01" #"am5f7b12r1" | ||
stem: "am5/2022.01" # stem=projectID/version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about organizing the properties under a header instead, like:
properties:
projectID: "am5"
FRE_VERSION: &fre_version "canopy"
AM5_VERSION: &am5_version "2022.01"
stem: "am5/2022.01"
In those 4 properties, projectID and stem have just one value, and FRE_VERSION and AM5_VERSION have another definitions, &fre_version
and &am5_versions
. Why the difference, and what is the &
for? (Pardon my yaml ignorance..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh! This was something I was trying to play around with.
I meant to just push up:
projectID: "am5"
FRE_VERSION: "canopy"
AM5_VERSION: "2022.01"
stem: "am5/2022.01"
I was trying to see if the fields would work so we could do:
projectID: &ID "am5"
FRE_VERSION: &fre_version "canopy"
AM5_VERSION: &am5_version "2022.01"
stem: *ID/*am5_version
But it was giving some issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Organizing under properties makes sense
fre/pp/ppyaml_test/am5-yamls.yaml
Outdated
- expname: "c96L65_am5f1a0r0_amip" | ||
platform: "gfdl.ncrc5-intel22-classic" | ||
target: "prod-openmp" | ||
ppyaml: "pp_c96L65.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, mostly. So one does a
fre pp wrapper -y am5-yamls.yaml -p gfdl.ncrc5-intel22-classic -t prod-openmp -e c96L65_am5f1a0r0_amip
and first the am5-yamls.yaml is read, and then the pp_c96L65.yaml is also read. Yes?
I like this as is, but have some reservations about target
and platform
. For postprocessing, the target and platform are mostly used for directories, while the experiment is truly specific to the pp_c96L65.yaml settings.
Since we're passing the platform, target, and experiment to the fre
tools, I think it's unnecessarily redundant to store them in the yaml also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! The am5-yamls.yaml
would be passed with the -y
option in the command, as well as the -e ...
option. So then, if the experiment passed as a click option matches the expname listed in the am5-yamls, it uses the information associated with that experiment name. From your example,
fre pp wrapper -y am5-yamls.yaml -p gfdl.ncrc5-intel22-classic -t prod-openmp -e c96L65_am5f1a0r0_amip
am5-yamls.yaml
is read-e c96L65_am5f1a0r0_amip
is given so it looks through the expname section of the yaml to match the experiment name given- uses the info associated with c96L65_am5f1a0r0_amip, being that specific experiment's ppyaml
I completely agree on the target and platform Chris! I thought about it more this morning, completely redundant and unnecessary.
fre/pp/ppyaml_test/pp_c96L65.yaml
Outdated
ptmp_dir: "/xtmp/$USER/ptmp" | ||
analysis: "/nbhome/$USER/canopy/analysis/$(name)" | ||
analysis: "/nbhome/$USER/canopy/analysis/$(name)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a strong feeling that directories
should be set above the experiments, along with the FRE properties in the "main" yaml.
There's nothing in these definitions that are really experiment-specific except the experiment name itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't the directories experiment specific though with the name? I feel like it makes sense to keep them in the experiment's ppyaml because they would be paths for that experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or at least putting them under the associated expname
field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best argument for keeping it in the main yaml is minimizing duplication. Almost certainly, all the directories will be the same, except for the experiment name, stem, platform, and target.
Another argument is similarity to how Bronx handles directories. In Bronx, the directories are set per-platform, not per-experiment.
- directories defined are not experiment specific (should not be hard coded) - they are the same paths for each experiment - to limit duplication, they can be put in the main yaml instead
@singhd789, here are the draft AM5 pp.yamls I've been working with: main yaml: https://gitlab.gfdl.noaa.gov/Chris.Blanton/am5xml/-/blob/pp-yamls2/xml_include/am5.yaml and one of the pp yamls: https://gitlab.gfdl.noaa.gov/Chris.Blanton/am5xml/-/blob/pp-yamls2/xml_include/yaml_include/pp.c96_amip.yaml There are a couple needed "tricks" to make this work the way I have them there. One is that in order to have the child yaml see the user variables defined in the main yaml, the yaml must be parsed together in python, not separately. Second is that the loader needs a custom "join" constructor. To use it,
I stole this from: https://stackoverflow.com/questions/5484016/how-can-i-do-string-concatenation-or-string-replacement-in-yaml/56658597#56658597 Third, the 3 special build-in FRE properties-- name, platform, and target-- must be custom added to the yaml before parsing. e.g. prepend this to the
|
- now 2: main yaml and experiment yaml (in yaml_include)
- add functions to combine main and experiment yamls - parse combined yaml correctly - to-do: - update validation
- in process of fixing pylint messages
…amls into one - split up `yamlInfo` functionality into functions to create rose-suite and rose-apps separately from main function - start to add in ability to combine multiple experiment yamls in combined yaml (if more than one listed in main yaml)
- update main yaml to have a `fre_properties` block - update experiment yaml to inherit like sections - also overwrites fields if defined again in experiment yaml - update schema.json to validate final `combined.yaml` - schema shows issues for validating separate yamls (main and experiment individaully) because validation does not seem to recognize `!join` (custom tags)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@singhd789 thank you so much for this.
Can this be merged in now? It's functional, though there will be needed updates as we use it. I suggest we merge and make more updates as needed.
fre/pp/configure_script_yaml.py
Outdated
|
||
# Clean combined yaml to validate | ||
del final_yaml["fre_properties"] | ||
del final_yaml["shared"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible (but unlikely) that pp yamls could have no fre_properties
or shared
sections, do you agree?
If those two keys don't exist, then the del
will cause an exception.
I think a experiments
section is always required, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I suppose there could be that case. I could add an check for that now
} | ||
} | ||
} | ||
"name": {"type": "string"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want this schema to be a submodule of https://github.com/NOAA-GFDL/gfdl_msd_schemas soon, but since this works, I think we should take this as is and refactor in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
@ceblanton oh yep! I think it can be merged in. I forgot it was still in draft mode |