Automate maintenance of generic_COBALT diagnostics #22

kakearney · 2024-03-27T23:26:35Z

This addresses some of the discussion in #15 by automating the generation of source code related to the generic_COBALT_register_diag subroutine.

This update adds a few new files:

generic_COBALT_diag.yaml: I chose a YAML format because it allows table-like display, is both human- and machine-readable, and allows for commenting. In this format, it's relatively easy to scan through the many (798 currently!) diagnostics to check for consistency and to add new stuff as it becomes necessary. I preserved most comments from the existing source code here (although I dropped a few version-control type ones indicating when certain diagnostics had been added or updated and by who; we can re-add those if necessary.)
generic_COBALT_diagyaml2register_diag_body.py: A simple python script to reformat the info in the .yaml file to the appropriate Fortran source code. It can run with two different options for formatting: a) "original", which replicates the existing code, and b) "novardesc", which strips out the vardesc commands as was suggested in the linked issue.
generic_COBALT_register_diag_body.f90: The Fortran code auto-generated by the above python script, run with the "original" formatting option. The corresponding code in the generic_COBALT.F90 file was deleted and replaced with an include statement pointing to this file.

The idea here is that all maintenance of diagnostic registration is done by editing the .yaml file and rerunning the .py script, with no human ever directly editing generic_COBALT_register_diag_body.f90. (Down the road, we could get fancy and automate the re-running of the .py script any time an update was pushed to the .yaml file)

A few points for discussion:

What are the guidelines for including non-source code (like the .yaml and .py files) within this repo?
Does the MOM6 code style allow for include statements? Adding the code in this manner isolates the 3196-3994 lines of code related to this process away from the main module, which makes the main code more readable and reduces the chances of accidental copy-paste errors. But it does add an extra file, and possibly has some impact on the building of the source code.
The existing code used an interesting mix of quote styles (single vs double quotes), which I preserved in the .yaml file. Was this done for a purpose or just an artifact of different coding styles? If the latter, it may make sense to edit for consistency.

Warning: I don't yet have my gaea account, so I have not tested the Fortran code for compilation! Someone please check my work, haha.

To simplify maintenance of registering diagnostics (in the generic_COBALT_register_diag subroutine), I've moved the info for the COBALT diagnostics to a .yaml file. I also added a small python script to generate the necessary source code from that yaml file and dump it to a separate file, which can be regenerated as needed when new diagnostics are added or diagnostic-registering syntax is changed.

andrew-c-ross · 2024-03-28T12:47:34Z

This is amazing!

andrew-c-ross · 2024-03-28T13:09:00Z

I got this to compile and run a singe-column test case on my GFDL workstation. The only thing I had to do was change the file extension of the generated fortran file from .f90 to something else (.inc) so that the make script doesn't try to compile it on its own. (Then just the file extension of the file in the include line in generic_COBALT.f90 has to be changed to match).

yichengt900 · 2024-03-28T13:41:27Z

generic_tracers/generic_COBALT_diagyaml2register_diag_body.py

+
+# Parse and check input
+
+if len(sys.argv) < 1:


Should this be if len(sys.argv) < 2: if we want to keep original as the default when users do not provide an input argument?

yichengt900 · 2024-03-28T13:50:25Z

generic_tracers/generic_COBALT_diagyaml2register_diag_body.py

+
+# Open file for writing
+
+f = open("generic_COBALT_register_diag_body.f90", "w")


As @andrew-c-ross pointed out, the current make script (as well as the GFDL FRE workflow) will try to compile files with extensions like *.F90 and *.f90. For now, I suggest we simply change the file extension to .inc, so it becomes generic_COBALT_register_diag_body.inc. I also quickly checked the MOM6 side, and it looks like they have mostly abandoned the INCLUDE statement except for some header files. I do not think our changes here will have any impact on them, though.

I agree, the only INCLUDE statement I am used to seeing is for MOM_memory.

However, MOM6 does use modules and has many statements like
use cobalt_register_diag_body only : register_diag_body
I am not sure what the motivation for having all of cobalt as one module is, but this seems like a case where having multiple .f90 files with separate modules could benefit the main code. Breaking up the generic_COBALT code into multiple modules would be part of a bigger conversation.

Upon further consideration, I think using a separate module, as @theresa-morrison suggested, would be a better idea than using an INCLUDE statement.

yichengt900 · 2024-03-28T13:52:53Z

generic_tracers/generic_COBALT_diagyaml2register_diag_body.py

+
+        # This format eliminates the creation of an intermediate vardesc variable 
+
+        mystr = ( "     {var_to_register} = register_diag_field(package_name, {name} {axes}, &\n"


Missing a comma after {name}. It should be{name} {axes},

yichengt900 · 2024-03-28T13:56:10Z

generic_tracers/generic_COBALT.F90

-         cmor_standard_name="sea_water_alkalinity_expressed_as_mole_equivalent", &
-         cmor_long_name="Surface Total Alkalinity")
-
-    vardesc_temp = vardesc("talknatos_raw","Surface Natural Total Alkalinity


I suggest we change the file extension to inc. Please see my comments in the Python script for the reason.

yichengt900 · 2024-03-28T14:03:21Z

generic_tracers/generic_COBALT_diag.yaml

+#
+# Limitation Diagnostics
+#
+- {name: '"P_C_max_Di"' , longname: '"Diaz. Maximum Growth Rate"'                   , hor_grid: "'h'", z_grid: "'L'", t_grid: "'s'", units: "'sec-1'"                    , mem_size: "'f'", var_to_register: "phyto(DIAZO )%id_P_C_max"   , axes: "axes(1:3)", missing_value: "missing_value1"}


By checking the MOM6 code, I can confirm that a mix of quote styles (single vs. double quotes) exists on their side as well. @theresa-morrison, do you have any ideas for that?

I think we should choose what ever is most readable here. If using only single or double quotes would mean we can eliminate having both ' " x" ' and " ' x ' " that would be good -- I can see that tripping up a lot of users.

Can we remove most of the quotes in the yaml file and have them inserted by the python code? I think we would just need to keep single quotes in the yaml file for strings with special characters like %. EDIT: I think the %s are okay here, it's mostly just the axes(#:#) that need quoted.

andrew-c-ross · 2024-03-28T14:18:01Z

RE: include statements, one idea for an alternative would be to generate a complete module named something like generic_COBALT_utils that has a subroutine with all of the diagnostic registration code. Longer-term I think we can also automatically generate the diagnostic sending calls with just one addition to the yaml file, so it might be helpful to be able to put everything auto generated into one module file.

EDIT: I guess this could create some ugly circular dependencies though, and not sure how cobalt's use of module variables will impact that.

theresa-cordero · 2024-03-28T14:32:24Z

I think there are ways that we could add auto generated modules and manually generated modules to cobalt without introducing circular dependencies. It would definitely take a bit of work and involve moving things around within the ocean_BGC directory, but I think there are real advantages to not having all of cobalt in one 16,000 line .f90 file.

andrew-c-ross · 2024-03-28T15:09:23Z

generic_tracers/generic_COBALT_diagyaml2register_diag_body.py

+# register_diag_field that were passed by position (rather than name/value) in the original
+# generic_COBALT code.  
+
+primary = ["name", "longname", "hor_grid", "z_grid", "t_grid", "units", "mem_size", 


Since some of these appear to be always the same (hor_grid, t_grid, mem_size, missing_value), could we put defaults for those in the python file that can still be overwritten by the yaml if ever necessary?

It also seems like technically we only need one of z_grid and axes, since 2d variables have z_grid=1 and axes=axes(1:2) and 3D variables have z_grid=L and axes=axes(1:3).

There are some variables on the "1" z-axis (i.e. neither 2D nor 3D) where the axes variable can vary (either axes(1:2) or axesTi(1:1))... these are things like runoff and bottom fluxes.

But there is also a comment there indicating that the values of the axes variable is never really used, other than a length check, so there is possibly some simplification that could be done there.

yichengt900 · 2024-03-28T15:47:19Z

generic_tracers/generic_COBALT_diagyaml2register_diag_body.py

I like the idea of automating code generation with Python. Here, I have some additional thoughts if we want to pursue this path. The pro is that Python is straightforward and easy for new-generation users to follow. The con is that if we decide to mix different languages, we will have to maintain both of them. For instance, we probably have to consider implementing unit tests to ensure that those functions in the Python scripts work properly. In this sense, we will have to enforce stricter rules for those Python scripts. Taking this Python script as an example, we better refactor the main code as a function:

import yaml def build_register_diag_body(diag, formatopt="original"): primary = ["name", "longname", "hor_grid", "z_grid", "t_grid", "units", "mem_size", "var_to_register", "axes"] output = [] for d in diag: nvstr = "" for key, value in d.items(): if key not in primary: nvstr = ", &\n ".join([nvstr, "{} = {}".format(key, value)]) if formatopt == "original": mystr = (" vardesc_temp = vardesc({name}, {longname}, {hor_grid}, {z_grid}, {t_grid}, {units}, {mem_size})\n" " {var_to_register} = register_diag_field(package_name, vardesc_temp%name, {axes}, &\n" " init_time, vardesc_temp%longname, vardesc_temp%units" ).format(**d) mystr = mystr + nvstr + ")\n" elif formatopt == "novardesc": mystr = (" {var_to_register} = register_diag_field(package_name, {name}, {axes}, &\n" " init_time, {longname}, {units}" ).format(**d) mystr = mystr + nvstr + ")\n" output.append(mystr) if d['name'] == 'b_di14c': # begin radiocarbon block output.append(" if (do_14c) then\n") elif d['name'] == 'jdo14c': # end radiocarbon block output.append(" endif\n") return '\n'.join(output) if __name__ == "__main__": import sys if len(sys.argv) < 2: formatopt = "original" else: formatopt = sys.argv[1] with open('generic_COBALT_diag.yaml', 'r') as f: diag = yaml.safe_load(f) output = build_register_diag_body(diag, formatopt) with open('generic_COBALT_register_diag_body.f90', 'w') as f: f.write(output)

By doing this we can write a unit test for this specific function.

We don't have to address this issue in this PR, but it's something we'll have to think about. I'm open to any further thoughts.

I'm in favor of this refactoring. I'll leave the the code as is for now because my python fluency is a little shaky when it comes to "doing things the right way" syntax.

A comment on this toy example: the 'b_di14c' tracer is the first one in the radiocarbon block, so the if statement needs to be appended before the mystr addition in that case. I added a note in the YAML file advising developers to be cautious about inserting tracers within that block, but it might be worth considering a more robust check for any specific diagnostics that include special handling, like the radiocarbon ones.

yichengt900

@kakearney, thank you for your prompt action. This is truly amazing work (as you can see, everyone is encouraged by your work and cannot wait to share their thoughts 💯)! Overall, I love the whole idea. I only have a few comments on the code and how we handle the include statement (include vs modules).

kakearney · 2024-03-28T17:21:04Z

Great to see all the feedback so soon! And I agree that a full discussion of how the code might be refactored would be good. There are many other places in the code (the g_tracer_add_param, g_tracer_add, g_tracer_get_values, g_send_data, and allocation and deallocation calls) that are similarly repetitive and might benefit from the same table -> autogenerated code treatment.

I updated my branch to fix the typos pointed out in the .py script and to change the extension of the autogenerated file to .inc.

yichengt900 · 2024-03-28T17:46:43Z

@kakearney, thanks! The updates look good to me. I will start the CI test. @charliestock, just keeping you in the loop. @kakearney opened a great PR as a starting point. I recommend we spend some time in next week's cobaltV3 meeting to further discuss how we are going to handle code refactoring.

yichengt900 · 2024-03-28T17:50:40Z

@kakearney, oops! It looks like the CI is complaining that the head commit for this pull request is not ahead of the base commit (dev/cefi repo). Could you please check and make sure your feature branch is up-to-date with the current main branch?

kakearney · 2024-03-28T18:48:58Z

Oops, overlooked the overnight (to me) update... should be synced now.

yichengt900 · 2024-03-28T19:59:58Z

Great job, @kakearney 🥇 ! The PR passed the CI test. I'll approve it for now, but let's hold off on merging until we discuss code refactoring in our next COBALTv3 meeting.

yichengt900 · 2025-01-07T18:45:16Z

We have initiated the diagnostics cleanup process under #109. Once that is completed, we can revisit the option for automated diagnostic code generation. I will close this PR for now.

yichengt900 requested review from andrew-c-ross and yichengt900 March 28, 2024 13:31

yichengt900 assigned yichengt900 and kakearney and unassigned yichengt900 Mar 28, 2024

yichengt900 added enhancement New feature or request CodeCleanUp labels Mar 28, 2024

yichengt900 reviewed Mar 28, 2024

View reviewed changes

andrew-c-ross reviewed Mar 28, 2024

View reviewed changes

yichengt900 reviewed Mar 28, 2024

View reviewed changes

Changed included file extension, fixed typos

a70833e

yichengt900 requested a review from charliestock March 28, 2024 17:43

Merge branch 'NOAA-CEFI-Regional-Ocean-Modeling:dev/cefi' into diagcode

6bf5287

yichengt900 approved these changes Mar 28, 2024

View reviewed changes

yichengt900 mentioned this pull request Apr 1, 2024

Eliminate repeated diagnostic ID checks and refactor send_diag section #23

Merged

yichengt900 mentioned this pull request Apr 10, 2024

New module for diagnostic registration #43

Merged

yichengt900 closed this Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automate maintenance of generic_COBALT diagnostics #22

Automate maintenance of generic_COBALT diagnostics #22

kakearney commented Mar 27, 2024

andrew-c-ross commented Mar 28, 2024

andrew-c-ross commented Mar 28, 2024

yichengt900 Mar 28, 2024

yichengt900 Mar 28, 2024

theresa-cordero Mar 28, 2024

yichengt900 Mar 28, 2024

yichengt900 Mar 28, 2024

yichengt900 Mar 28, 2024

yichengt900 Mar 28, 2024

theresa-cordero Mar 28, 2024

andrew-c-ross Mar 28, 2024 •

edited

Loading

andrew-c-ross commented Mar 28, 2024 •

edited

Loading

theresa-cordero commented Mar 28, 2024

andrew-c-ross Mar 28, 2024

kakearney Mar 28, 2024

yichengt900 Mar 28, 2024

kakearney Mar 28, 2024

yichengt900 left a comment

kakearney commented Mar 28, 2024

yichengt900 commented Mar 28, 2024

yichengt900 commented Mar 28, 2024 •

edited

Loading

kakearney commented Mar 28, 2024

yichengt900 commented Mar 28, 2024

yichengt900 commented Jan 7, 2025


		# Open file for writing

		f = open("generic_COBALT_register_diag_body.f90", "w")


		# This format eliminates the creation of an intermediate vardesc variable

		mystr = ( " {var_to_register} = register_diag_field(package_name, {name} {axes}, &\n"

Automate maintenance of generic_COBALT diagnostics #22

Automate maintenance of generic_COBALT diagnostics #22

Conversation

kakearney commented Mar 27, 2024

andrew-c-ross commented Mar 28, 2024

andrew-c-ross commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrew-c-ross Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

andrew-c-ross commented Mar 28, 2024 • edited Loading

theresa-cordero commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yichengt900 left a comment

Choose a reason for hiding this comment

kakearney commented Mar 28, 2024

yichengt900 commented Mar 28, 2024

yichengt900 commented Mar 28, 2024 • edited Loading

kakearney commented Mar 28, 2024

yichengt900 commented Mar 28, 2024

yichengt900 commented Jan 7, 2025

andrew-c-ross Mar 28, 2024 •

edited

Loading

andrew-c-ross commented Mar 28, 2024 •

edited

Loading

yichengt900 commented Mar 28, 2024 •

edited

Loading