Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#99 Make pp yaml more generic #101

Merged
merged 29 commits into from
Jul 26, 2024
Merged

#99 Make pp yaml more generic #101

merged 29 commits into from
Jul 26, 2024

Conversation

singhd789
Copy link
Collaborator

  • update configure script to set experiment, platform, and target from click options

Dana Singh and others added 2 commits June 10, 2024 14:36
- update configure script to set experiment, platform, and target from click options
@singhd789
Copy link
Collaborator Author

A lot of the changes are format fixes (pylint errors)

@bcc2761 bcc2761 marked this pull request as ready for review June 12, 2024 13:42
@bcc2761 bcc2761 marked this pull request as draft June 12, 2024 13:42
Copy link
Collaborator

@ceblanton ceblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good as an interim FRE Properties-like implementation, but it's not safe to assume an underscore in the experiment name, and "stem" is really an arbitrary variable that we need to find a new properties.yaml file for perhaps.

The "expname" / "name" thing is more of an opinion. feel free to ignore that one..

if k in ("HISTORY_DIR", "PP_DIR", "ANALYSIS"):
# replace generic variables in pp yaml
stem=e.split("_",1)[0]
exp=e.split("_",1)[1]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't safe. We can't safely assumes $(stem) is the first part of the experiment name, divided by an underscore.

Experiment, platform, and target are a special case for "FRE property" expansion as they are provided on the command line. "stem" is more of a regular "FRE property" in that it is just a regular variable name that we want to use in the pp.yamls

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'm working on changing this after our discussion yesterday.

# replace generic variables in pp yaml
stem=e.split("_",1)[0]
exp=e.split("_",1)[1]
replacevars=configvalue.replace("$(stem)",stem).replace("$(expname)",exp).replace("$(platform)",p).replace("$(target)",t)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, looks good. The first-draft expansion of Canopy "FRE properties".

Except we can't expand "stem" until we have a place to retrieve it from (properties.yaml?), and I suggest using "name" instead of "expname" as that is exactly as Bronx FRE Properties named them (so we have a bridge to Bronx-like behavior..)

ptmp_dir: "/xtmp/$USER/ptmp"
analysis: "/nbhome/$USER/canopy/analysis/am5_c96L65_amip"
analysis: "/nbhome/$USER/canopy/analysis/$(expname)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest "name" instead

history_dir: "/archive/c2b/am5/2022.01/c96L65_am5f1a0r0_amip/gfdl.ncrc5-intel22-classic-prod-openmp/history"
pp_dir: "/archive/$USER/canopy/am5/c96L65_amip/gfdl.ncrc5-deploy-prod-openmp/pp"
history_dir: "/archive/c2b/$(stem)/2022.01/$(expname)/$(platform)-$(target)/history"
pp_dir: "/archive/$USER/canopy/$(stem)/$(expname)/$(platform)-$(target)/pp"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, "stem" wants to be "am5/2022.01". We can't do this, yet. So we could probably leave the hard-coded stem in the pp_dir:

pp_dir: "/archive/$USER/canopy/am5/2022.01/$(name)/$(platform)-$(target)/pp"

@@ -21,10 +21,10 @@ rose-suite:
do_analysis: False

directories:
history_dir: "/archive/c2b/am5/2022.01/c96L65_am5f1a0r0_amip/gfdl.ncrc5-intel22-classic-prod-openmp/history"
pp_dir: "/archive/$USER/canopy/am5/c96L65_amip/gfdl.ncrc5-deploy-prod-openmp/pp"
history_dir: "/archive/c2b/$(stem)/2022.01/$(expname)/$(platform)-$(target)/history"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really want "/archive/c2b/$(stem)/$(name)/$(platform)-$(target)/history", with stem being "am5/2022.01" but we can't expand "stem" yet.

@ilaflott ilaflott linked an issue Jun 21, 2024 that may be closed by this pull request
singhd789 and others added 4 commits July 5, 2024 16:02
- can hold experiment yamls
- hold FRE properties defined
- parse main yaml to use certain experiment yaml
- parse main yaml for FRE properties defined
Copy link
Collaborator

@ceblanton ceblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think this is almost just what we need for AM5 runtime integration. Let's discuss..


# return experiment yaml and defined FRE properties
return(expyaml,stem)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! FRE Property expansion happens here.

I think there has to be special case expansions for the command-line arguments: experiment, platform, and target: $(name), $(platform), and $(target).

For the rest, it would be ideal to have them be completely user-specified, and not limited to a particular set, e.g. "$(stem).

projectID: "am5"
FRE_VERSION: &fre_version "canopy"
AM5_VERSION: &am5_version "2022.01" #"am5f7b12r1"
stem: "am5/2022.01" # stem=projectID/version
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about organizing the properties under a header instead, like:

properties:
  projectID: "am5"
  FRE_VERSION: &fre_version "canopy"
  AM5_VERSION: &am5_version "2022.01"
  stem: "am5/2022.01"

In those 4 properties, projectID and stem have just one value, and FRE_VERSION and AM5_VERSION have another definitions, &fre_version and &am5_versions. Why the difference, and what is the & for? (Pardon my yaml ignorance..)

Copy link
Collaborator Author

@singhd789 singhd789 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! This was something I was trying to play around with.

I meant to just push up:

  projectID: "am5"
  FRE_VERSION: "canopy"
  AM5_VERSION: "2022.01"
  stem: "am5/2022.01"

I was trying to see if the fields would work so we could do:

  projectID: &ID  "am5"
  FRE_VERSION: &fre_version "canopy"
  AM5_VERSION: &am5_version "2022.01"
  stem: *ID/*am5_version

But it was giving some issues

Copy link
Collaborator Author

@singhd789 singhd789 Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Organizing under properties makes sense

- expname: "c96L65_am5f1a0r0_amip"
platform: "gfdl.ncrc5-intel22-classic"
target: "prod-openmp"
ppyaml: "pp_c96L65.yaml"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, mostly. So one does a

fre pp wrapper -y am5-yamls.yaml -p gfdl.ncrc5-intel22-classic -t prod-openmp -e c96L65_am5f1a0r0_amip

and first the am5-yamls.yaml is read, and then the pp_c96L65.yaml is also read. Yes?

I like this as is, but have some reservations about target and platform. For postprocessing, the target and platform are mostly used for directories, while the experiment is truly specific to the pp_c96L65.yaml settings.

Since we're passing the platform, target, and experiment to the fre tools, I think it's unnecessarily redundant to store them in the yaml also.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! The am5-yamls.yaml would be passed with the -y option in the command, as well as the -e ... option. So then, if the experiment passed as a click option matches the expname listed in the am5-yamls, it uses the information associated with that experiment name. From your example,
fre pp wrapper -y am5-yamls.yaml -p gfdl.ncrc5-intel22-classic -t prod-openmp -e c96L65_am5f1a0r0_amip

  1. am5-yamls.yaml is read
  2. -e c96L65_am5f1a0r0_amip is given so it looks through the expname section of the yaml to match the experiment name given
  3. uses the info associated with c96L65_am5f1a0r0_amip, being that specific experiment's ppyaml

I completely agree on the target and platform Chris! I thought about it more this morning, completely redundant and unnecessary.

ptmp_dir: "/xtmp/$USER/ptmp"
analysis: "/nbhome/$USER/canopy/analysis/$(name)"
analysis: "/nbhome/$USER/canopy/analysis/$(name)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a strong feeling that directories should be set above the experiments, along with the FRE properties in the "main" yaml.

There's nothing in these definitions that are really experiment-specific except the experiment name itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the directories experiment specific though with the name? I feel like it makes sense to keep them in the experiment's ppyaml because they would be paths for that experiment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or at least putting them under the associated expname field

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best argument for keeping it in the main yaml is minimizing duplication. Almost certainly, all the directories will be the same, except for the experiment name, stem, platform, and target.

Another argument is similarity to how Bronx handles directories. In Bronx, the directories are set per-platform, not per-experiment.

Dana Singh and others added 3 commits July 8, 2024 16:16
- directories defined are not experiment specific (should not be hard coded)
- they are the same paths for each experiment
- to limit duplication, they can be put in the main yaml instead
@ceblanton
Copy link
Collaborator

ceblanton commented Jul 11, 2024

@singhd789, here are the draft AM5 pp.yamls I've been working with:

main yaml: https://gitlab.gfdl.noaa.gov/Chris.Blanton/am5xml/-/blob/pp-yamls2/xml_include/am5.yaml

and one of the pp yamls: https://gitlab.gfdl.noaa.gov/Chris.Blanton/am5xml/-/blob/pp-yamls2/xml_include/yaml_include/pp.c96_amip.yaml

There are a couple needed "tricks" to make this work the way I have them there.

One is that in order to have the child yaml see the user variables defined in the main yaml, the yaml must be parsed together in python, not separately.

Second is that the loader needs a custom "join" constructor. To use it,

def join(loader, node):
    seq = loader.construct_sequence(node)
    return ''.join([str(i) for i in seq])

yaml.add_constructor('!join', join)
combined = open('combined.yaml')
data = yaml.load(combined, yaml.Loader)

I stole this from: https://stackoverflow.com/questions/5484016/how-can-i-do-string-concatenation-or-string-replacement-in-yaml/56658597#56658597

Third, the 3 special build-in FRE properties-- name, platform, and target-- must be custom added to the yaml before parsing. e.g. prepend this to the combined.yaml:

define: &name               "myname"
define: &platform           "myplatform"
define: &target             "mytarget"

Dana Singh and others added 12 commits July 18, 2024 10:25
- now 2: main yaml and experiment yaml (in yaml_include)
- add functions to combine main and experiment yamls
- parse combined yaml correctly
- to-do:
    - update validation
- defined for my own development/troubleshooting
- in process of fixing pylint messages
…amls into one

- split up `yamlInfo` functionality into functions to create rose-suite and rose-apps separately from main function
- start to add in ability to combine multiple experiment yamls in combined yaml (if more than one listed in main yaml)
- update main yaml to have a `fre_properties` block
- update experiment yaml to inherit like sections - also overwrites fields if defined again in experiment yaml
- update schema.json to validate final `combined.yaml`
- schema shows issues for validating separate yamls (main and experiment individaully) because validation does not seem to recognize `!join` (custom tags)
@ilaflott ilaflott self-requested a review July 25, 2024 18:27
Copy link
Collaborator

@ceblanton ceblanton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@singhd789 thank you so much for this.

Can this be merged in now? It's functional, though there will be needed updates as we use it. I suggest we merge and make more updates as needed.


# Clean combined yaml to validate
del final_yaml["fre_properties"]
del final_yaml["shared"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible (but unlikely) that pp yamls could have no fre_properties or shared sections, do you agree?

If those two keys don't exist, then the del will cause an exception.

I think a experiments section is always required, though.

Copy link
Collaborator Author

@singhd789 singhd789 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I suppose there could be that case. I could add an check for that now

}
}
}
"name": {"type": "string"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want this schema to be a submodule of https://github.com/NOAA-GFDL/gfdl_msd_schemas soon, but since this works, I think we should take this as is and refactor in another PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

@singhd789
Copy link
Collaborator Author

singhd789 commented Jul 26, 2024

@ceblanton oh yep! I think it can be merged in. I forgot it was still in draft mode

@singhd789 singhd789 marked this pull request as ready for review July 26, 2024 17:48
@singhd789 singhd789 marked this pull request as draft July 26, 2024 18:14
@singhd789 singhd789 marked this pull request as ready for review July 26, 2024 18:26
@singhd789 singhd789 merged commit ac287ad into main Jul 26, 2024
2 checks passed
@singhd789 singhd789 deleted the 99.update-pp-yaml branch July 26, 2024 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expand experiment, platform, target in pp.yaml definitions
4 participants