-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate maintenance of generic_COBALT diagnostics #22
Automate maintenance of generic_COBALT diagnostics #22
Conversation
To simplify maintenance of registering diagnostics (in the generic_COBALT_register_diag subroutine), I've moved the info for the COBALT diagnostics to a .yaml file. I also added a small python script to generate the necessary source code from that yaml file and dump it to a separate file, which can be regenerated as needed when new diagnostics are added or diagnostic-registering syntax is changed.
This is amazing! |
I got this to compile and run a singe-column test case on my GFDL workstation. The only thing I had to do was change the file extension of the generated fortran file from .f90 to something else (.inc) so that the make script doesn't try to compile it on its own. (Then just the file extension of the file in the include line in generic_COBALT.f90 has to be changed to match). |
|
||
# Parse and check input | ||
|
||
if len(sys.argv) < 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be if len(sys.argv) < 2:
if we want to keep original
as the default when users do not provide an input argument?
|
||
# Open file for writing | ||
|
||
f = open("generic_COBALT_register_diag_body.f90", "w") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @andrew-c-ross pointed out, the current make script (as well as the GFDL FRE workflow) will try to compile files with extensions like *.F90 and *.f90. For now, I suggest we simply change the file extension to .inc
, so it becomes generic_COBALT_register_diag_body.inc
. I also quickly checked the MOM6 side, and it looks like they have mostly abandoned the INCLUDE statement except for some header files. I do not think our changes here will have any impact on them, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the only INCLUDE statement I am used to seeing is for MOM_memory.
However, MOM6 does use modules and has many statements like
use cobalt_register_diag_body only : register_diag_body
I am not sure what the motivation for having all of cobalt as one module is, but this seems like a case where having multiple .f90 files with separate modules could benefit the main code. Breaking up the generic_COBALT code into multiple modules would be part of a bigger conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further consideration, I think using a separate module, as @theresa-morrison suggested, would be a better idea than using an INCLUDE statement.
|
||
# This format eliminates the creation of an intermediate vardesc variable | ||
|
||
mystr = ( " {var_to_register} = register_diag_field(package_name, {name} {axes}, &\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a comma after {name}. It should be{name} {axes},
generic_tracers/generic_COBALT.F90
Outdated
cmor_standard_name="sea_water_alkalinity_expressed_as_mole_equivalent", & | ||
cmor_long_name="Surface Total Alkalinity") | ||
|
||
vardesc_temp = vardesc("talknatos_raw","Surface Natural Total Alkalinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we change the file extension to inc
. Please see my comments in the Python script for the reason.
# | ||
# Limitation Diagnostics | ||
# | ||
- {name: '"P_C_max_Di"' , longname: '"Diaz. Maximum Growth Rate"' , hor_grid: "'h'", z_grid: "'L'", t_grid: "'s'", units: "'sec-1'" , mem_size: "'f'", var_to_register: "phyto(DIAZO )%id_P_C_max" , axes: "axes(1:3)", missing_value: "missing_value1"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By checking the MOM6 code, I can confirm that a mix of quote styles (single vs. double quotes) exists on their side as well. @theresa-morrison, do you have any ideas for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should choose what ever is most readable here. If using only single or double quotes would mean we can eliminate having both ' " x" '
and " ' x ' "
that would be good -- I can see that tripping up a lot of users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove most of the quotes in the yaml file and have them inserted by the python code? I think we would just need to keep single quotes in the yaml file for strings with special characters like %. EDIT: I think the %s are okay here, it's mostly just the axes(#:#) that need quoted.
RE: include statements, one idea for an alternative would be to generate a complete module named something like generic_COBALT_utils that has a subroutine with all of the diagnostic registration code. Longer-term I think we can also automatically generate the diagnostic sending calls with just one addition to the yaml file, so it might be helpful to be able to put everything auto generated into one module file. EDIT: I guess this could create some ugly circular dependencies though, and not sure how cobalt's use of module variables will impact that. |
I think there are ways that we could add auto generated modules and manually generated modules to cobalt without introducing circular dependencies. It would definitely take a bit of work and involve moving things around within the ocean_BGC directory, but I think there are real advantages to not having all of cobalt in one 16,000 line .f90 file. |
# register_diag_field that were passed by position (rather than name/value) in the original | ||
# generic_COBALT code. | ||
|
||
primary = ["name", "longname", "hor_grid", "z_grid", "t_grid", "units", "mem_size", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since some of these appear to be always the same (hor_grid, t_grid, mem_size, missing_value), could we put defaults for those in the python file that can still be overwritten by the yaml if ever necessary?
It also seems like technically we only need one of z_grid and axes, since 2d variables have z_grid=1 and axes=axes(1:2) and 3D variables have z_grid=L and axes=axes(1:3).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some variables on the "1" z-axis (i.e. neither 2D nor 3D) where the axes variable can vary (either axes(1:2) or axesTi(1:1))... these are things like runoff and bottom fluxes.
But there is also a comment there indicating that the values of the axes variable is never really used, other than a length check, so there is possibly some simplification that could be done there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of automating code generation with Python. Here, I have some additional thoughts if we want to pursue this path. The pro is that Python is straightforward and easy for new-generation users to follow. The con is that if we decide to mix different languages, we will have to maintain both of them. For instance, we probably have to consider implementing unit tests to ensure that those functions in the Python scripts work properly. In this sense, we will have to enforce stricter rules for those Python scripts. Taking this Python script as an example, we better refactor the main code as a function:
import yaml
def build_register_diag_body(diag, formatopt="original"):
primary = ["name", "longname", "hor_grid", "z_grid", "t_grid", "units", "mem_size",
"var_to_register", "axes"]
output = []
for d in diag:
nvstr = ""
for key, value in d.items():
if key not in primary:
nvstr = ", &\n ".join([nvstr, "{} = {}".format(key, value)])
if formatopt == "original":
mystr = (" vardesc_temp = vardesc({name}, {longname}, {hor_grid}, {z_grid}, {t_grid}, {units}, {mem_size})\n"
" {var_to_register} = register_diag_field(package_name, vardesc_temp%name, {axes}, &\n"
" init_time, vardesc_temp%longname, vardesc_temp%units"
).format(**d)
mystr = mystr + nvstr + ")\n"
elif formatopt == "novardesc":
mystr = (" {var_to_register} = register_diag_field(package_name, {name}, {axes}, &\n"
" init_time, {longname}, {units}"
).format(**d)
mystr = mystr + nvstr + ")\n"
output.append(mystr)
if d['name'] == 'b_di14c': # begin radiocarbon block
output.append(" if (do_14c) then\n")
elif d['name'] == 'jdo14c': # end radiocarbon block
output.append(" endif\n")
return '\n'.join(output)
if __name__ == "__main__":
import sys
if len(sys.argv) < 2:
formatopt = "original"
else:
formatopt = sys.argv[1]
with open('generic_COBALT_diag.yaml', 'r') as f:
diag = yaml.safe_load(f)
output = build_register_diag_body(diag, formatopt)
with open('generic_COBALT_register_diag_body.f90', 'w') as f:
f.write(output)
By doing this we can write a unit test for this specific function.
We don't have to address this issue in this PR, but it's something we'll have to think about. I'm open to any further thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of this refactoring. I'll leave the the code as is for now because my python fluency is a little shaky when it comes to "doing things the right way" syntax.
A comment on this toy example: the 'b_di14c' tracer is the first one in the radiocarbon block, so the if statement needs to be appended before the mystr addition in that case. I added a note in the YAML file advising developers to be cautious about inserting tracers within that block, but it might be worth considering a more robust check for any specific diagnostics that include special handling, like the radiocarbon ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kakearney, thank you for your prompt action. This is truly amazing work (as you can see, everyone is encouraged by your work and cannot wait to share their thoughts 💯)! Overall, I love the whole idea. I only have a few comments on the code and how we handle the include statement (include vs modules).
Great to see all the feedback so soon! And I agree that a full discussion of how the code might be refactored would be good. There are many other places in the code (the g_tracer_add_param, g_tracer_add, g_tracer_get_values, g_send_data, and allocation and deallocation calls) that are similarly repetitive and might benefit from the same table -> autogenerated code treatment. I updated my branch to fix the typos pointed out in the .py script and to change the extension of the autogenerated file to .inc. |
@kakearney, thanks! The updates look good to me. I will start the CI test. @charliestock, just keeping you in the loop. @kakearney opened a great PR as a starting point. I recommend we spend some time in next week's cobaltV3 meeting to further discuss how we are going to handle code refactoring. |
@kakearney, oops! It looks like the CI is complaining that the head commit for this pull request is not ahead of the base commit (dev/cefi repo). Could you please check and make sure your feature branch is up-to-date with the current main branch? |
Oops, overlooked the overnight (to me) update... should be synced now. |
Great job, @kakearney 🥇 ! The PR passed the CI test. I'll approve it for now, but let's hold off on merging until we discuss code refactoring in our next COBALTv3 meeting. |
We have initiated the diagnostics cleanup process under #109. Once that is completed, we can revisit the option for automated diagnostic code generation. I will close this PR for now. |
This addresses some of the discussion in #15 by automating the generation of source code related to the generic_COBALT_register_diag subroutine.
This update adds a few new files:
The idea here is that all maintenance of diagnostic registration is done by editing the .yaml file and rerunning the .py script, with no human ever directly editing generic_COBALT_register_diag_body.f90. (Down the road, we could get fancy and automate the re-running of the .py script any time an update was pushed to the .yaml file)
A few points for discussion:
Warning: I don't yet have my gaea account, so I have not tested the Fortran code for compilation! Someone please check my work, haha.