Skip to content
Matt Norman edited this page Apr 1, 2022 · 20 revisions

miniWeatherML

miniWeatherML is a playground for learning and developing Machine Learning (ML) surrogate models. It is realistic enough to be challenging and small enough for rapid prototyping in:

  • Data generation and curation
  • Machine Learning model training
  • ML model deployment and analysis
  • End-to-end workflows

Code description

Using portable C++

The code is written in portable C++ using the Yet Another Kernel Library (YAKL, https://github.com/mrnorman/YAKL). The main code uses a real type for all floating point values, which is typedef'd to double. Also, miniWeatherML/model/main_header.h defines a set of YAKL Array typedef's for conveniently defining multi dimensional arrays of various types and dimensionalities that are either const (i.e. read-only) or non-const (i.e. read or write) or in host or GPU device memory.

By default, all arrays are in device memory. All YAKL Array objects are contiguous under the hood and, by default, use C-style indexing, meaning row-major ordering with the last index varying the fastest in memory and zero-based indexing. Essentially, if you were to create a naive C multi-dimensional array, real myarr[dim1][dim2][dim3];, you would do this in YAKL with real3d myarr("myarr",dim1,dim2,dim3);. Also, the naive C-style indexing myarr[0][1][2], would become in YAKL: myarr(0,1,2). For Kokkos users, this is very similar to a Kokkos View with the LayoutRight template parameter.

To express parallelism for, say, three tightly nested loops, the parallel_for function is used as follows (again with great similarities to Kokkos):

// for (int k=0; k < nz; k++) {
//   for (int j=0; j < ny; j++) {
//     for (int i=0; i < nx; i++) {
//       loop body...
//     }
//   }
// }
// The above loop, when parallelized, will become:
yakl::c::parallel_for( yakl::c::Bounds<3>(nz,ny,nx) ,
                       YAKL_LAMBDA (int k, int j, int i) {
  // loop body...
});

In the Bounds the left-most is always the slowest varying loop, and the right-most is always the fastest varying. For more information about using YAKL, please see the above github link.

The Coupler class and modules

miniWeatherML's heart is composed of the Coupler class that stores the model state and the various "modules" that modify the model's state. The most important modules for weather-like flow that need to be a part of every experiment are the dynamics and microphysics modules.

Below will give more information about the modules, but the Coupler class contains all of the data and information about the simulation and its current fluid state. There is a DataManager object that manages all model arrays where the user can register_and_allocate arrays and get<type,#dimensions>("array_label") the arrays. The coupler also contains an Options object containing {key,value} pairs defining various options that modules in the model can respond to. The key is always a std::string, and the type of the value can vary. For instance, there is nearly always a coupler.set_option<std::string>( "standalone_input_file" , inFile ); with the YAML input file's path/filename.yaml that a module can obtain with coupler.get_option<std::string>("standalone_input_file")'. The coupler also houses the number of grid cells in each dimension, the domain sizes in each dimension, and the grid spacing in each dimension.

The model coupler has a hard-coded fluid state that is comprised of the following 3-D variables:

  • Dry density "density_dry"
  • u-velocity (x-direction fluid velocity) "uvel"
  • v-velocity (y-direction fluid velocity) "vvel"
  • w-velocity (z-direction, vertical, fluid velocity) "wvel"
  • Temperature "temp"
  • Tracers masses, e.g. "water_vapor", cloud liquid, cloud ice, precipitation, etc.

This always exists in the coupler, and any additional arrays are created via coupler.dm.register_and_allocate(...).

The miniWeatherML/model/core code is fairly small (less than 1K lines of code) and documented. Therefore, hopefully between the code itself and the example usage in existing subdirectories of miniWeatherML/experiments, it becomes clear how to manipulate the model and the coupler's state.

Directory Structure

  • build
    • The code is built from the build directory, which contains a couple of build and cleaning scripts as well as a machines directory, which contains environment files sourced before building (to determine, e.g., whether to run the code on the CPU or GPU or what flags to use).
  • experiments
    • Users and developers are encouraged to place their experiments as a subdirectory in the experiments directory. Each subdirectory will contain a set of driver .cpp source files, and a CMakeLists.txt, which defines how to build them
    • experiments/my_experiment/custom_modules
      • Custom modules for use in the driver at initialization, during time stepping, or at finalization will be placed in here.
      • As will be seen later, the main modules directory are intended to have identical APIs for easy use and integration by experiments. These modules, while encouraged to maintain a similar API, can have whatever interface they want because they are only called by the experiment drivers for this specific experiment directory.
    • experiments/my_experiment/inputs
      • Input files for these experiments will be placed in here.
  • external
  • model
    • This contains a CMakeLists.txt file that encapsulates all of the common model infrastructure and modules into the model library. Thus, when creating a new experiment, the experiment folder's CMakeLists.txt only needs to add_subdirectory(../../model model) and link to that library to get everything in the miniWeatherML common model infrastructure.
    • model/core
      • This holds the core infrastructure of miniWeatherML, which is mostly encapsulated in the core::Coupler class, which holds all of the model's data and options at any given time, and the core::MultiField class, which allows the user / developer to aggregate multiple arrays of the same type and rank into a single aggregated array.
    • model/modules
      • This holds miscellaneous classes and functions that alter the coupler's state in various ways to enforce different behavior and physics.
      • The dynamics and microphysics modules, which form the core of miniWeatherML's physics, exist in this directory.
      • These should only include modules likely to be used by multiple experiments. Experiment-specific modules should go in the custom_modules subdirectory of the experiment directory.
      • There are only three types of modules that should be defined in this folder:
        • "Initialization modules" that take only the coupler state as a parameter: e.g., my_module_init( core::Coupler &coupler ) { ... } and are only called at model initialization
        • "Time stepping modules" that take two parameters, the coupler state and the model time step: e.g., my_module_time_stepper( core::Copuler &coupler , real dtphys ) { ... } and are called each model time step
        • "Finalization modules" that take only the coupler state as a parameter: e.g., my_module_finalize( core::Coupler &coupler ) { ... } and are called only after the time stepping is complete.
      • Modules and custom modules may be simple functions or classes with complex persistent internal states. So long as classes expose functions of the form of initialization, time stepping, or finalization modules, they can have pretty much any internal working.

Setting up experiments

Users / developers are encouraged to place their experiments in the experiments directory. The experiments/supercell_example is there as an example of how to set this up with an example driver, CMakeLists.txt file, and inputs input file directory using YAML formatting for input files. You will almost certainly also need to add a custom_modules directory with modules that do custom things for your experiment like generate statistics about the data you want to emulate with a ML model, generate / curate data for your ML model, and deploy a trained ML model.

Tips and Tricks

  • Whenever you get data from the coupler, please ensure that it you only plan to read from it that you extract it as a const type, e.g., coupler.dm.get<real const,3>("water_vapor"). The coupler's DataManager object, "dm", keeps track of variables that are potentially written to (meaning variables that are extracted as a non-const type). Everything to get from the DataManager object that doesn't have a const type is flagged as "dirty", i.e., potentially written to. Everything retrieved as a const data type does not set this flag. Since ML models only need to predict the data that changes, this makes it easier to understand what data to predict in an ML model using the trick in the next item of this list.
  • If you're wanting to create a surrogate model for a given subroutine, you can identify the arrays that are written to by: (1) cleaning all entries of their "dirty bit", (2) running the module, and (3) printing all dirty arrays that were written to. This can help guide you as to what data you should output when creating samples for ML model training. E.g.
coupler.dm.clean_all_entries();
my_timestepping_module( coupler , dtphys );
std::vector<std::string> dirty_entries = coupler.dm.get_dirty_entries();
std::cout << "The following entries have potentially been written to: "
for (auto &entry : dirty_entries) { std::cout << entry << " , "; }
std::cout << std::endl;
  • Use the coupler's cloning function to capture a coupler state before running a module. E.g.,
core::Coupler inputs;
coupler.clone_into( inputs );
my_timestepping_module( coupler , dtphys );
// Now that the module has run, the coupler is the output state of the module
// We can pass the input and output couplers to a custom module to generate data
//    for offline ML model training like the API below:
my_data_generation_module( input , coupler , dtphys , etime );

Building an experiment

To build and run an experiment:

  • cd miniWeatherML/build
  • source machines/[machine_name]/[machine_environment_file].sh
  • ./cmakescript ../experiments/my_experiment_name
  • make -j [#tasks]
  • ./driver_name ./inputs/desired_input_file.yaml
  • After you're done, if you want to tidy up some, use ./cmakeclean.sh. It may not get everything, but it'll erase most things.
Clone this wiki locally