Better names for classes in code #2

mnlevy1981 · 2018-11-28T18:28:42Z

Issue Description

Right now, the way we have organized the data in our various classes is rather confusing. @klindsay28 and I chatted a bit about how the data is organized, and I think that the following is an accurate representation of the result of that conversation.

Current Hierarchy of Classes

Data is stored in an xarray dataset
A "collection" is a container class that has the xarray dataset and the following
- Location of data on disk
- Source of data: CESM output, observational data, etc
- type of data: history files, time series, climatology, etc
- Role of data: should dataset be used as a reference for "truth"?
- Years of data requested for analysis
- Variables from the dataset to include in analysis
- Operations to perform on data
An "analysis element" is a container class that has a dictionary of "collections" and the following
- Description of analysis being done
- Variables to analyze
- Plots / tables requested
- Where on disk to write output
- Horizontal grid / vertical levels to use for output (if needed)

Proposed Hierarchy of Classes

Data is stored in an xarray dataset
There is a container class that has an xarray dataset and metadata
- Location of data on disk
- Source of data: CESM output, observational data, etc
- type of data: history files, time series, climatology, etc
There is a dictionary of these container class objects (a "collection")
There is a container class of the collection that contains information like
- Description of analysis being done
- Variables to analyze
- Time period to use for analysis (default = all available data)
- Which collection should be considered "truth"?
- Plots / tables requested
- Where on disk to write output
- Horizontal grid / vertical levels to use for output (if needed)

Notable changes

The container class of collections is the only source of variables to analyze.
- Users will specify at the highest level if a variable should be ignored in some datasets (but included in others)
- For the time being, requesting a variable that is not actually in the dataset will result in an error. We can open another issue ticket to discuss whether this is the best behavior, or if missing variables should just be omitted from analysis.
The container class for the xarray dataset no longer explicitly specifies what operations are needed on the dataset -- this should be determined by the operations requested in the container class of collections.
- E.g. plot_state knows it needs monthly climatologies, so it will request monthly climatologies of each dataset. It's up to the collection of datasets to know if the data is already a climatology or if another operation is needed.
The container class for the xarray dataset no longer explicitly specifies which dataset should be used as reference for the truth

The issue is that we need names for the container classes. I think the container of datasets can now be called a data_source because the constructor only needs information about where the dataset should be read from and what kind of data it is. The compute_monthly_climatology function still exists at this level, although at some point we will replace the xarray dataset object with the esmlab dataset object and then the compute_monthly_climatology function will come from esmlab.

I don't have a good name for the container of collections + metadata... we've been calling it an analysis element, but it's odd to call something that contains multiple collections an element. Maybe it's a diagnostic evaluation object?

The text was updated successfully, but these errors were encountered:

mnlevy1981 · 2018-12-03T23:36:11Z

As of 2e0733c, the code is mostly in line with the proposed hierarchy above - the time period for the analysis is currently still part of the collection class (corresponding to 2 in the list of four levels) but everything else is tied to the correct component. I'm going to rename collection -> data_source and similarly collections -> data_sources (though it will be hard to break the habit of referring to multiple data sources as a single collection).

I haven't yet come up with a good name for what we currently call the analysis element.

mnlevy1981 mentioned this issue Mar 8, 2019

Massive refactor in python #13

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better names for classes in code #2

Better names for classes in code #2

mnlevy1981 commented Nov 28, 2018

mnlevy1981 commented Dec 3, 2018

Better names for classes in code #2

Better names for classes in code #2

Comments

mnlevy1981 commented Nov 28, 2018

Issue Description

Current Hierarchy of Classes

Proposed Hierarchy of Classes

Notable changes

mnlevy1981 commented Dec 3, 2018