Skip to content

Commit

Permalink
Merge pull request #20 from IMMM-SFA/feature/testing_and_validation
Browse files Browse the repository at this point in the history
add basic test, update validation, update readme
  • Loading branch information
thurber authored Feb 18, 2021
2 parents c78bf81 + dc591f5 commit bc45a2a
Show file tree
Hide file tree
Showing 17 changed files with 291 additions and 121 deletions.
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,28 @@ Alternatively, one can update the settings via code in the driving script:
mosart_wm.config['simulation.end_date'] = datetime(1985, 12, 31)
```

By default, key model variables are output on a monthly basis at a daily averaged resolution to `./output/<simulation name>/<simulation name>_<year>_<month>.nc`. Support for the [CSDMS standard names](https://github.com/csdms/standard_names) will be added shortly, but for now see configuration file and the `./src/_initialize_state.py` file for examples of how to modify the outputs.
By default, key model variables are output on a monthly basis at a daily averaged resolution to `./output/<simulation name>/<simulation name>_<year>_<month>.nc`. Support for the [CSDMS standard names](https://github.com/csdms/standard_names) will be added shortly, but for now see configuration file for examples of how to modify the outputs, and the `./src/_initialize_state.py` file for state variable names.


### input files

More details will be added to this section soon.
Several input files in NetCDF format are required to successfuly run a simulation, which are not shipped with this repository due to their large size. The grid files, reservoir files, and a small range of runoff and demand input files are available for public download as a zip archive [here](https://zenodo.org/record/4537907/files/mosartwmpy_sample_input_data_1980_1985.zip?download=1). This data can also be obtained using the download utility by running `python download.py` in the repository root and choosing option 1 for "sample_input". Currently, all input files are assumed to be at the same resolution (for the sample files this is 1/8 degree over the CONUS). Below is a summary of the various input files:

Name | Description | Configuration Path | Notes
--- | --- | --- | ---
Grid | Spatial constants dimensioned by latitude and longitude relating to the physical properties of the river channels | `grid.path` |
Land Fraction | Fraction of grid cell that is land (as opposed to i.e. ocean water) dimensioned by latitude and longitude | `grid.land.path` | as a TODO item, this variable should be merged into the grid file (historically it was separate for the coupled land model)
Reservoirs | Locations of reservoirs (possibly aggregated) and their physical and political properties | `water_management.reservoirs.path` |
Runoff | Surface runoff, subsurface runoff, and wetland runoff per grid cell averaged per unit of time; used to drive the river routing | `runoff.path` |
Demand | Water demand of grid cells averaged per unit of time; currently assumed to be monthly | `water_management.reservoirs.demand` | there are plans to support other time scales, such as epiweeks


### testing and validation

Before running the tests or validation, make sure to download the "sample_input" and "validation" datasets using the download utility `python download.py`.

To execute the tests, run `./test.sh` or `python -m unittest discover mosartwmpy/tests` from the repository root.

To execute the validation, run a model simulation that includes the years 1981 - 1982, note your output directory, and then run `./validation.sh` or `python validation/validate.py` from the repository root. This will ask you for the simulation output directory, think for a moment, and then open a figure with several plots representing the NMAE (Normalized Mean Absolute Error) as a percentage and the spatial sums of several key variables compared between your simulation and the validation scenario. Use these plots to assist you in determining if the changes you have made to the code have caused unintended deviation from the validation scenario. The NMAE should be 0% across time if you have caused no deviations. A non-zero NMAE indicates numerical difference between your simulation and the validation scenario. This might be caused by changes you have made to the code, or alternatively by running a simulation with different configuration or parameters (i.e. larger timestep, fewer iterations, etc). The plots of the spatial sums can assist you in determining what changed and the overall magnitude of the changes.

Several input files are required to successfuly run a simulation, which are not shipped with this repository due to their large size. For now, please reach out to the developers for sample input files. A script will soon be added to download a small subset of input files and validated output files from a public data repository. If you have access to PNNL's Constance, you can download a small subset of inputs at: `/people/thur961/mosartwmpy_input.zip`.
If you wish to merge code changes that intentionally cause significant deviation from the validation scenario, please work with the maintainers to create a new validation dataset.
220 changes: 175 additions & 45 deletions mosartwmpy/grid/grid.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,83 @@
import logging
import numpy as np
import pandas as pd
import pickle
import tempfile
import xarray as xr

from xarray import open_dataset
from benedict.dicts import benedict as Benedict
from xarray import open_dataset
from zipfile import ZIP_DEFLATED, ZipFile

from mosartwmpy.config.parameters import Parameters
from mosartwmpy.reservoirs.grid import load_reservoirs

class Grid():
"""Class to store grid related values that are constant throughout a simulation."""

# initialize all properties
id: np.ndarray = np.empty(0)
downstream_id: np.ndarray = np.empty(0)
longitude: np.ndarray = np.empty(0)
latitude: np.ndarray = np.empty(0)
unique_longitudes: np.ndarray = np.empty(0)
unique_latitudes: np.ndarray = np.empty(0)
cell_count: int = 0
longitude_spacing: float = 0.0
latitude_spacing: float = 0.0
land_mask: np.ndarray = np.empty(0)
mosart_mask: np.ndarray = np.empty(0)
area: np.ndarray = np.empty(0)
outlet_id: np.ndarray = np.empty(0)
upstream_id: np.ndarray = np.empty(0)
upstream_cell_count: np.ndarray = np.empty(0)
land_fraction: np.ndarray = np.empty(0)
iterations_main_channel: np.ndarray = np.empty(0)
iterations_subnetwork: np.ndarray = np.empty(0)
drainage_fraction: np.ndarray = np.empty(0)
local_drainage_area: np.ndarray = np.empty(0)
total_drainage_area_multi: np.ndarray = np.empty(0)
total_drainage_area_single: np.ndarray = np.empty(0)
flow_direction: np.ndarray = np.empty(0)
hillslope_manning: np.ndarray = np.empty(0)
subnetwork_manning: np.ndarray = np.empty(0)
channel_manning: np.ndarray = np.empty(0)
hillslope: np.ndarray = np.empty(0)
hillslope_length: np.ndarray = np.empty(0)
drainage_density: np.ndarray = np.empty(0)
subnetwork_slope: np.ndarray = np.empty(0)
subnetwork_width: np.ndarray = np.empty(0)
subnetwork_length: np.ndarray = np.empty(0)
channel_length: np.ndarray = np.empty(0)
channel_slope: np.ndarray = np.empty(0)
channel_width: np.ndarray = np.empty(0)
channel_floodplain_width: np.ndarray = np.empty(0)
total_channel_length: np.ndarray = np.empty(0)
grid_channel_depth: np.ndarray = np.empty(0)

# Reservoir related properties
reservoir_id: np.ndarray = np.empty(0)
reservoir_runoff_capacity: np.ndarray = np.empty(0)
reservoir_height: np.ndarray = np.empty(0)
reservoir_length: np.ndarray = np.empty(0)
reservoir_surface_area: np.ndarray = np.empty(0)
reservoir_storage_capacity: np.ndarray = np.empty(0)
reservoir_depth: np.ndarray = np.empty(0)
reservoir_use_irrigation: np.ndarray = np.empty(0)
reservoir_use_electricity: np.ndarray = np.empty(0)
reservoir_use_supply: np.ndarray = np.empty(0)
reservoir_use_flood_control: np.ndarray = np.empty(0)
reservoir_use_recreation: np.ndarray = np.empty(0)
reservoir_use_navigation: np.ndarray = np.empty(0)
reservoir_use_fish_protection: np.ndarray = np.empty(0)
reservoir_withdrawal: np.ndarray = np.empty(0)
reservoir_conveyance: np.ndarray = np.empty(0)
reservoir_count: np.ndarray = np.empty(0)
reservoir_to_grid_mapping: pd.DataFrame = pd.DataFrame()
reservoir_streamflow_schedule: xr.DataArray = xr.DataArray()
reservoir_demand_schedule: xr.DataArray = xr.DataArray()
reservoir_prerelease_schedule: xr.DataArray = xr.DataArray()

def __init__(self, config: Benedict = None, parameters: Parameters = None, empty: bool = False):
"""Initialize the Grid class.
Expand All @@ -25,49 +91,6 @@ def __init__(self, config: Benedict = None, parameters: Parameters = None, empty
if empty:
return

# initialize all properties
self.drainage_fraction = np.empty(0)
self.local_drainage_area = np.empty(0)
self.total_drainage_area_multi = np.empty(0)
self.total_drainage_area_single = np.empty(0)
self.id = np.empty(0)
self.downstream_id = np.empty(0)
self.flow_direction = np.empty(0)
self.hillslope_manning = np.empty(0)
self.subnetwork_manning = np.empty(0)
self.channel_manning = np.empty(0)
self.hillslope = np.empty(0)
self.drainage_density = np.empty(0)
self.subnetwork_slope = np.empty(0)
self.subnetwork_width = np.empty(0)
self.channel_length = np.empty(0)
self.channel_slope = np.empty(0)
self.channel_width = np.empty(0)
self.channel_floodplain_width = np.empty(0)
self.grid_channel_depth = np.empty(0)
if config.get('water_management.enabled', False):
self.reservoir_id = np.empty(0)
self.reservoir_runoff_capacity = np.empty(0)
self.reservoir_height = np.empty(0)
self.reservoir_length = np.empty(0)
self.reservoir_surface_area = np.empty(0)
self.reservoir_storage_capacity = np.empty(0)
self.reservoir_depth = np.empty(0)
self.reservoir_use_irrigation = np.empty(0)
self.reservoir_use_electricity = np.empty(0)
self.reservoir_use_supply = np.empty(0)
self.reservoir_use_flood_control = np.empty(0)
self.reservoir_use_recreation = np.empty(0)
self.reservoir_use_navigation = np.empty(0)
self.reservoir_use_fish_protection = np.empty(0)
self.reservoir_withdrawal = np.empty(0)
self.reservoir_conveyance = np.empty(0)
self.reservoir_count = np.empty(0)
self.reservoir_to_grid_mapping = pd.DataFrame()
self.reservoir_streamflow_schedule = xr.DataArray()
self.reservoir_demand_schedule = xr.DataArray()
self.reservoir_prerelease_schedule = xr.DataArray()

logging.info('Loading grid file.')

# open dataset
Expand Down Expand Up @@ -316,4 +339,111 @@ def __init__(self, config: Benedict = None, parameters: Parameters = None, empty
# note that reservoir grid is assumed to be the same as the domain grid
if config.get('water_management.enabled', False):
logging.debug(' - reservoirs')
load_reservoirs(self, config, parameters)
load_reservoirs(self, config, parameters)

def to_files(self, path: str) -> None:
"""Builds a dataframe from all the grid values.
Args:
path (str): the file path to save the grid zip file to
"""

if not path.endswith('.zip'):
path += '.zip'

keys = dir(self)

paths = []
names = []

# handle special cases
special = ['unique_longitudes', 'unique_latitudes', 'cell_count', 'longitude_spacing', 'latitude_spacing']
to_pickle = {}
for key in special:
keys.remove(key)
to_pickle[key] = getattr(self, key)

# handle numpy arrrays
npdf = pd.DataFrame()
for key in [key for key in keys if isinstance(getattr(self, key), np.ndarray)]:
if getattr(self, key).size > 0:
npdf[key] = getattr(self, key)

# handle dataframes
dfs = []
for key in [key for key in keys if isinstance(getattr(self, key), pd.DataFrame)]:
if getattr(self, key).size > 0:
dfs.append({
'key': key,
'frame': getattr(self, key)
})

# handle xarrays
xrs = []
for key in [key for key in keys if isinstance(getattr(self, key), xr.DataArray)]:
if getattr(self, key).size > 0:
xrs.append({
'key': key,
'data_array': getattr(self, key)
})

# write them all to files and zip
with tempfile.TemporaryDirectory() as tmpdir:
names.append('special.pickle')
paths.append(f'{tmpdir}/{names[-1]}')
with open(paths[-1], 'wb') as file:
pickle.dump(to_pickle, file)
names.append('np.feather')
paths.append(f'{tmpdir}/{names[-1]}')
npdf.to_feather(paths[-1])
for df in dfs:
names.append(f'{df["key"]}.df.nc')
paths.append(f'{tmpdir}/{names[-1]}')
df['frame'].to_xarray().to_netcdf(paths[-1], engine='h5netcdf')
for ds in xrs:
names.append(f'{ds["key"]}.xr.nc')
paths.append(f'{tmpdir}/{names[-1]}')
ds['data_array'].to_netcdf(paths[-1], engine='h5netcdf')
with ZipFile(path, 'w', compression=ZIP_DEFLATED, compresslevel=9) as zip:
for i, filename in enumerate(paths):
zip.write(filename, names[i])

@staticmethod
def from_files(path: str) -> 'Grid':
"""Creates a Grid instance from columns in a dataframe.
Args:
path (str): the file path to the zip file to load the grid from
Returns:
Grid: a Grid instance populated with the columns from the dataframe
"""
if not path.endswith('.zip'):
path += '.zip'

grid = Grid(empty=True)

with ZipFile(path, 'r') as zip:
for filename in zip.namelist():
with zip.open(filename) as file:
if filename.endswith('.pickle'):
from_pickle = pickle.load(file)
for key in from_pickle.keys():
setattr(grid, key, from_pickle[key])
if filename.endswith('np.feather'):
npdf = pd.read_feather(file)
for key in npdf.columns:
setattr(grid, key, npdf[key].values)
if filename.endswith('df.nc'):
key = filename.split('.')[0]
ds = xr.open_dataset(file, engine='h5netcdf')
df = ds.to_dataframe()
setattr(grid, key, df)
ds.close()
if filename.endswith('xr.nc'):
key = filename.split('.')[0]
ds = xr.open_dataarray(file, engine='h5netcdf').load()
setattr(grid, key, ds)
ds.close()

return grid
64 changes: 35 additions & 29 deletions mosartwmpy/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ def __init__(self):
self.git_hash = None
self.git_untracked = None

def initialize(self, config_file_path: str) -> None:
def initialize(self, config_file_path: str, grid: Grid = None, state: State = None) -> None:

t = timer()

Expand Down Expand Up @@ -98,38 +98,44 @@ def initialize(self, config_file_path: str) -> None:
raise e

# load grid
try:
self.grid = Grid(config=self.config, parameters=self.parameters)
except Exception as e:
logging.exception('Failed to load grid file; see below for stacktrace.')
raise e
if grid is not None:
self.grid = grid
else:
try:
self.grid = Grid(config=self.config, parameters=self.parameters)
except Exception as e:
logging.exception('Failed to load grid file; see below for stacktrace.')
raise e

# load restart file or initialize state
try:
# restart file
if self.config.get('simulation.restart_file') is not None and self.config.get('simulation.restart_file') != '':
path = self.config.get('simulation.restart_file')
logging.info(f'Loading restart file from: `{path}`.')
# set simulation start time based on file name
date = re.search(r'\d{4}_\d{2}_\d{2}', path)
if date:
date = date[len(date) - 1].split('_')
self.current_time = datetime(int(date[0]), int(date[1]), int(date[2]))
if state is not None:
self.state = state
else:
try:
# restart file
if self.config.get('simulation.restart_file') is not None and self.config.get('simulation.restart_file') != '':
path = self.config.get('simulation.restart_file')
logging.info(f'Loading restart file from: `{path}`.')
# set simulation start time based on file name
date = re.search(r'\d{4}_\d{2}_\d{2}', path)
if date:
date = date[len(date) - 1].split('_')
self.current_time = datetime(int(date[0]), int(date[1]), int(date[2]))
else:
logging.warn('Unable to parse date from restart file name, falling back to configured start date.')
self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
x = open_dataset(path)
self.state = State.from_dataframe(x.to_dataframe())
x.close()
# TODO ensure output file writes still workr
else:
logging.warn('Unable to parse date from restart file name, falling back to configured start date.')
# simulation start time
self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
x = open_dataset(path)
self.state = State.from_dataframe(x.to_dataframe())
x.close()
# TODO ensure output file writes still workr
else:
# simulation start time
self.current_time = datetime.combine(self.config.get('simulation.start_date'), time.min)
# initialize state
self.state = State(grid=self.grid, config=self.config, parameters=self.parameters, grid_size=self.get_grid_size())
except Exception as e:
logging.exception('Failed to initialize model; see below for stacktrace.')
raise e
# initialize state
self.state = State(grid=self.grid, config=self.config, parameters=self.parameters, grid_size=self.get_grid_size())
except Exception as e:
logging.exception('Failed to initialize model; see below for stacktrace.')
raise e

# setup output file averaging
try:
Expand Down
Loading

0 comments on commit bc45a2a

Please sign in to comment.