Merge pull request #20 from IMMM-SFA/feature/testing_and_validation

add basic test, update validation, update readme
IMMM-SFA · Feb 18, 2021 · bc45a2a · bc45a2a
2 parents c78bf81 + dc591f5
commit bc45a2a
Show file tree

Hide file tree

Showing 17 changed files with 291 additions and 121 deletions.
diff --git a/README.md b/README.md
@@ -41,10 +41,28 @@ Alternatively, one can update the settings via code in the driving script:
  mosart_wm.config['simulation.end_date'] = datetime(1985, 12, 31)
 ```
 
-By default, key model variables are output on a monthly basis at a daily averaged resolution to `./output/<simulation name>/<simulation name>_<year>_<month>.nc`. Support for the [CSDMS standard names](https://github.com/csdms/standard_names) will be added shortly, but for now see configuration file and the `./src/_initialize_state.py` file for examples of how to modify the outputs.
+By default, key model variables are output on a monthly basis at a daily averaged resolution to `./output/<simulation name>/<simulation name>_<year>_<month>.nc`. Support for the [CSDMS standard names](https://github.com/csdms/standard_names) will be added shortly, but for now see configuration file for examples of how to modify the outputs, and the `./src/_initialize_state.py` file for state variable names.
+
 
 ### input files
 
-More details will be added to this section soon.
+Several input files in NetCDF format are required to successfuly run a simulation, which are not shipped with this repository due to their large size. The grid files, reservoir files, and a small range of runoff and demand input files are available for public download as a zip archive [here](https://zenodo.org/record/4537907/files/mosartwmpy_sample_input_data_1980_1985.zip?download=1). This data can also be obtained using the download utility by running `python download.py` in the repository root and choosing option 1 for "sample_input". Currently, all input files are assumed to be at the same resolution (for the sample files this is 1/8 degree over the CONUS). Below is a summary of the various input files:
+
+Name | Description | Configuration Path | Notes
+--- | --- | --- | ---
+Grid | Spatial constants dimensioned by latitude and longitude relating to the physical properties of the river channels | `grid.path` |
+Land Fraction | Fraction of grid cell that is land (as opposed to i.e. ocean water) dimensioned by latitude and longitude | `grid.land.path` | as a TODO item, this variable should be merged into the grid file (historically it was separate for the coupled land model)
+Reservoirs | Locations of reservoirs (possibly aggregated) and their physical and political properties | `water_management.reservoirs.path` |
+Runoff | Surface runoff, subsurface runoff, and wetland runoff per grid cell averaged per unit of time; used to drive the river routing | `runoff.path` |
+Demand | Water demand of grid cells averaged per unit of time; currently assumed to be monthly | `water_management.reservoirs.demand` | there are plans to support other time scales, such as epiweeks
+
+
+### testing and validation
+
+Before running the tests or validation, make sure to download the "sample_input" and "validation" datasets using the download utility `python download.py`.
+
+To execute the tests, run `./test.sh` or `python -m unittest discover mosartwmpy/tests` from the repository root.
+
+To execute the validation, run a model simulation that includes the years 1981 - 1982, note your output directory, and then run `./validation.sh` or `python validation/validate.py` from the repository root. This will ask you for the simulation output directory, think for a moment, and then open a figure with several plots representing the NMAE (Normalized Mean Absolute Error) as a percentage and the spatial sums of several key variables compared between your simulation and the validation scenario. Use these plots to assist you in determining if the changes you have made to the code have caused unintended deviation from the validation scenario. The NMAE should be 0% across time if you have caused no deviations. A non-zero NMAE indicates numerical difference between your simulation and the validation scenario. This might be caused by changes you have made to the code, or alternatively by running a simulation with different configuration or parameters (i.e. larger timestep, fewer iterations, etc). The plots of the spatial sums can assist you in determining what changed and the overall magnitude of the changes.
 
-Several input files are required to successfuly run a simulation, which are not shipped with this repository due to their large size. For now, please reach out to the developers for sample input files. A script will soon be added to download a small subset of input files and validated output files from a public data repository. If you have access to PNNL's Constance, you can download a small subset of inputs at: `/people/thur961/mosartwmpy_input.zip`.
+If you wish to merge code changes that intentionally cause significant deviation from the validation scenario, please work with the maintainers to create a new validation dataset.
diff --git a/mosartwmpy/grid/grid.py b/mosartwmpy/grid/grid.py
@@ -1,17 +1,83 @@
 import logging
 import numpy as np
 import pandas as pd
+import pickle
+import tempfile
 import xarray as xr
 
-from xarray import open_dataset
 from benedict.dicts import benedict as Benedict
+from xarray import open_dataset
+from zipfile import ZIP_DEFLATED, ZipFile
 
 from mosartwmpy.config.parameters import Parameters
 from mosartwmpy.reservoirs.grid import load_reservoirs
 
 class Grid():
     """Class to store grid related values that are constant throughout a simulation."""
 
+    # initialize all properties
+    id: np.ndarray = np.empty(0)
+    downstream_id: np.ndarray = np.empty(0)
+    longitude: np.ndarray = np.empty(0)
+    latitude: np.ndarray = np.empty(0)
+    unique_longitudes: np.ndarray = np.empty(0)
+    unique_latitudes: np.ndarray = np.empty(0)
+    cell_count: int = 0
+    longitude_spacing: float = 0.0
+    latitude_spacing: float = 0.0
+    land_mask: np.ndarray = np.empty(0)
+    mosart_mask: np.ndarray = np.empty(0)
+    area: np.ndarray = np.empty(0)
+    outlet_id: np.ndarray = np.empty(0)
+    upstream_id: np.ndarray = np.empty(0)
+    upstream_cell_count: np.ndarray = np.empty(0)
+    land_fraction: np.ndarray = np.empty(0)
+    iterations_main_channel: np.ndarray = np.empty(0)
+    iterations_subnetwork: np.ndarray = np.empty(0)
+    drainage_fraction: np.ndarray = np.empty(0)
+    local_drainage_area: np.ndarray = np.empty(0)
+    total_drainage_area_multi: np.ndarray = np.empty(0)
+    total_drainage_area_single: np.ndarray = np.empty(0)
+    flow_direction: np.ndarray = np.empty(0)
+    hillslope_manning: np.ndarray = np.empty(0)
+    subnetwork_manning: np.ndarray = np.empty(0)
+    channel_manning: np.ndarray = np.empty(0)
+    hillslope: np.ndarray = np.empty(0)
+    hillslope_length: np.ndarray = np.empty(0)
+    drainage_density: np.ndarray = np.empty(0)
+    subnetwork_slope: np.ndarray = np.empty(0)
+    subnetwork_width: np.ndarray = np.empty(0)
+    subnetwork_length: np.ndarray = np.empty(0)
+    channel_length: np.ndarray = np.empty(0)
+    channel_slope: np.ndarray = np.empty(0)
+    channel_width: np.ndarray = np.empty(0)
+    channel_floodplain_width: np.ndarray = np.empty(0)
+    total_channel_length: np.ndarray = np.empty(0)
+    grid_channel_depth: np.ndarray = np.empty(0)
+
+    # Reservoir related properties
+    reservoir_id: np.ndarray = np.empty(0)
+    reservoir_runoff_capacity: np.ndarray = np.empty(0)
+    reservoir_height: np.ndarray = np.empty(0)
+    reservoir_length: np.ndarray = np.empty(0)
+    reservoir_surface_area: np.ndarray = np.empty(0)
+    reservoir_storage_capacity: np.ndarray = np.empty(0)
+    reservoir_depth: np.ndarray = np.empty(0)
+    reservoir_use_irrigation: np.ndarray = np.empty(0)
+    reservoir_use_electricity: np.ndarray = np.empty(0)
+    reservoir_use_supply: np.ndarray = np.empty(0)
+    reservoir_use_flood_control: np.ndarray = np.empty(0)
+    reservoir_use_recreation: np.ndarray = np.empty(0)
+    reservoir_use_navigation: np.ndarray = np.empty(0)
+    reservoir_use_fish_protection: np.ndarray = np.empty(0)
+    reservoir_withdrawal: np.ndarray = np.empty(0)
+    reservoir_conveyance: np.ndarray = np.empty(0)
+    reservoir_count: np.ndarray = np.empty(0)
+    reservoir_to_grid_mapping: pd.DataFrame = pd.DataFrame()
+    reservoir_streamflow_schedule: xr.DataArray = xr.DataArray()
+    reservoir_demand_schedule: xr.DataArray = xr.DataArray()
+    reservoir_prerelease_schedule: xr.DataArray = xr.DataArray()
+
     def __init__(self, config: Benedict = None, parameters: Parameters = None, empty: bool = False):
         """Initialize the Grid class.
         
@@ -25,49 +91,6 @@ def __init__(self, config: Benedict = None, parameters: Parameters = None, empty
         if empty:
             return
 
-        # initialize all properties
-        self.drainage_fraction = np.empty(0)
-        self.local_drainage_area = np.empty(0)
-        self.total_drainage_area_multi = np.empty(0)
-        self.total_drainage_area_single = np.empty(0)
-        self.id = np.empty(0)
-        self.downstream_id = np.empty(0)
-        self.flow_direction = np.empty(0)
-        self.hillslope_manning = np.empty(0)
-        self.subnetwork_manning = np.empty(0)
-        self.channel_manning = np.empty(0)
-        self.hillslope = np.empty(0)
-        self.drainage_density = np.empty(0)
-        self.subnetwork_slope = np.empty(0)
-        self.subnetwork_width = np.empty(0)
-        self.channel_length = np.empty(0)
-        self.channel_slope = np.empty(0)
-        self.channel_width = np.empty(0)
-        self.channel_floodplain_width = np.empty(0)
-        self.grid_channel_depth = np.empty(0)
-        if config.get('water_management.enabled', False):
-            self.reservoir_id = np.empty(0)
-            self.reservoir_runoff_capacity = np.empty(0)
-            self.reservoir_height = np.empty(0)
-            self.reservoir_length = np.empty(0)
-            self.reservoir_surface_area = np.empty(0)
-            self.reservoir_storage_capacity = np.empty(0)
-            self.reservoir_depth = np.empty(0)
-            self.reservoir_use_irrigation = np.empty(0)
-            self.reservoir_use_electricity = np.empty(0)
-            self.reservoir_use_supply = np.empty(0)
-            self.reservoir_use_flood_control = np.empty(0)
-            self.reservoir_use_recreation = np.empty(0)
-            self.reservoir_use_navigation = np.empty(0)
-            self.reservoir_use_fish_protection = np.empty(0)
-            self.reservoir_withdrawal = np.empty(0)
-            self.reservoir_conveyance = np.empty(0)
-            self.reservoir_count = np.empty(0)
-            self.reservoir_to_grid_mapping = pd.DataFrame()
-            self.reservoir_streamflow_schedule = xr.DataArray()
-            self.reservoir_demand_schedule = xr.DataArray()
-            self.reservoir_prerelease_schedule = xr.DataArray()
-
         logging.info('Loading grid file.')
 
         # open dataset
@@ -316,4 +339,111 @@ def __init__(self, config: Benedict = None, parameters: Parameters = None, empty
         # note that reservoir grid is assumed to be the same as the domain grid
         if config.get('water_management.enabled', False):
             logging.debug(' - reservoirs')
-            load_reservoirs(self, config, parameters)
+            load_reservoirs(self, config, parameters)
+
+    def to_files(self, path: str) -> None:
+        """Builds a dataframe from all the grid values.
+        
+        Args:
+            path (str): the file path to save the grid zip file to
+        """
+
+        if not path.endswith('.zip'):
+            path += '.zip'
+
+        keys = dir(self)
+
+        paths = []
+        names = []
+
+        # handle special cases
+        special = ['unique_longitudes', 'unique_latitudes', 'cell_count', 'longitude_spacing', 'latitude_spacing']
+        to_pickle = {}
+        for key in special:
+            keys.remove(key)
+            to_pickle[key] = getattr(self, key)
+
+        # handle numpy arrrays
+        npdf = pd.DataFrame()
+        for key in [key for key in keys if isinstance(getattr(self, key), np.ndarray)]:
+            if getattr(self, key).size > 0:
+                npdf[key] = getattr(self, key)
+
+        # handle dataframes
+        dfs = []
+        for key in [key for key in keys if isinstance(getattr(self, key), pd.DataFrame)]:
+            if getattr(self, key).size > 0:
+                dfs.append({
+                    'key': key,
+                    'frame': getattr(self, key)
+                })
+
+        # handle xarrays
+        xrs = []
+        for key in [key for key in keys if isinstance(getattr(self, key), xr.DataArray)]:
+            if getattr(self, key).size > 0:
+                xrs.append({
+                    'key': key,
+                    'data_array': getattr(self, key)
+                })
+
+        # write them all to files and zip
+        with tempfile.TemporaryDirectory() as tmpdir:
+            names.append('special.pickle')
+            paths.append(f'{tmpdir}/{names[-1]}')
+            with open(paths[-1], 'wb') as file:
+                pickle.dump(to_pickle, file)
+            names.append('np.feather')
+            paths.append(f'{tmpdir}/{names[-1]}')
+            npdf.to_feather(paths[-1])
+            for df in dfs:
+                names.append(f'{df["key"]}.df.nc')
+                paths.append(f'{tmpdir}/{names[-1]}')
+                df['frame'].to_xarray().to_netcdf(paths[-1], engine='h5netcdf')
+            for ds in xrs:
+                names.append(f'{ds["key"]}.xr.nc')
+                paths.append(f'{tmpdir}/{names[-1]}')
+                ds['data_array'].to_netcdf(paths[-1], engine='h5netcdf')
+            with ZipFile(path, 'w', compression=ZIP_DEFLATED, compresslevel=9) as zip:
+                for i, filename in enumerate(paths):
+                    zip.write(filename, names[i])
+
+    @staticmethod
+    def from_files(path: str) -> 'Grid':
+        """Creates a Grid instance from columns in a dataframe.
+
+        Args:
+            path (str): the file path to the zip file to load the grid from
+
+        Returns:
+            Grid: a Grid instance populated with the columns from the dataframe
+        """
+        if not path.endswith('.zip'):
+            path += '.zip'
+
+        grid = Grid(empty=True)
+
+        with ZipFile(path, 'r') as zip:
+            for filename in zip.namelist():
+                with zip.open(filename) as file:
+                    if filename.endswith('.pickle'):
+                        from_pickle = pickle.load(file)
+                        for key in from_pickle.keys():
+                            setattr(grid, key, from_pickle[key])
+                    if filename.endswith('np.feather'):
+                        npdf = pd.read_feather(file)
+                        for key in npdf.columns:
+                            setattr(grid, key, npdf[key].values)
+                    if filename.endswith('df.nc'):
+                        key = filename.split('.')[0]
+                        ds = xr.open_dataset(file, engine='h5netcdf')
+                        df = ds.to_dataframe()
+                        setattr(grid, key, df)
+                        ds.close()
+                    if filename.endswith('xr.nc'):
+                        key = filename.split('.')[0]
+                        ds = xr.open_dataarray(file, engine='h5netcdf').load()
+                        setattr(grid, key, ds)
+                        ds.close()
+
+        return grid
diff --git a/mosartwmpy/model.py b/mosartwmpy/model.py
@@ -59,7 +59,7 @@ def __init__(self):
         self.git_hash = None
         self.git_untracked = None
 
-    def initialize(self, config_file_path: str) -> None:
+    def initialize(self, config_file_path: str, grid: Grid = None, state: State = None) -> None:
 
         t = timer()
 
@@ -98,38 +98,44 @@ def initialize(self, config_file_path: str) -> None:
             raise e
 
         # load grid
-        try:
-            self.grid = Grid(config=self.config, parameters=self.parameters)
-        except Exception as e:
-            logging.exception('Failed to load grid file; see below for stacktrace.')
-            raise e
+        if grid is not None:
+            self.grid = grid
+        else:
+            try:
+                self.grid = Grid(config=self.config, parameters=self.parameters)
+            except Exception as e:
+                logging.exception('Failed to load grid file; see below for stacktrace.')
+                raise e
 
         # load restart file or initialize state
-        try:
-            # restart file
-            if self.config.get('simulation.restart_file') is not None and self.config.get('simulation.restart_file') != '':
-                path = self.config.get('simulation.restart_file')
-                logging.info(f'Loading restart file from: `{path}`.')
-                # set simulation start time based on file name
-                date = re.search(r'\d{4}_\d{2}_\d{2}', path)
-                if date:
-                    date = date[len(date) - 1].split('_')
-                    self.current_time = datetime(int(date[0]), int(date[1]), int(date[2]))
+        if state is not None:
+            self.state = state
+        else:
+            try:
+                # restart file
+                if self.config.get('simulation.restart_file') is not None and self.config.get('simulation.restart_file') != '':
+                    path = self.config.get('simulation.restart_file')
+                    logging.info(f'Loading restart file from: `{path}`.')
+                    # set simulation start time based on file name
+                    date = re.search(r'\d{4}_\d{2}_\d{2}', path)
+                    if date:
+                        date = date[len(date) - 1].split('_')
+                        self.current_time = datetime(int(date[0]), int(date[1]), int(date[2]))
+                    else:
+                        logging.warn('Unable to parse date from restart file name, falling back to configured start date.')
+                        self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
+                    x = open_dataset(path)
+                    self.state = State.from_dataframe(x.to_dataframe())
+                    x.close()
+                    # TODO ensure output file writes still workr
                 else:
-                    logging.warn('Unable to parse date from restart file name, falling back to configured start date.')
+                    # simulation start time
                     self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
-                x = open_dataset(path)
-                self.state = State.from_dataframe(x.to_dataframe())
-                x.close()
-                # TODO ensure output file writes still workr
-            else:
-                # simulation start time
-                self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
-                # initialize state
-                self.state = State(grid=self.grid, config=self.config, parameters=self.parameters, grid_size=self.get_grid_size())
-        except Exception as e:
-            logging.exception('Failed to initialize model; see below for stacktrace.')
-            raise e
+                    # initialize state
+                    self.state = State(grid=self.grid, config=self.config, parameters=self.parameters, grid_size=self.get_grid_size())
+            except Exception as e:
+                logging.exception('Failed to initialize model; see below for stacktrace.')
+                raise e
 
         # setup output file averaging
         try: