You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now, the way we have organized the data in our various classes is rather confusing. @klindsay28 and I chatted a bit about how the data is organized, and I think that the following is an accurate representation of the result of that conversation.
Current Hierarchy of Classes
Data is stored in an xarray dataset
A "collection" is a container class that has the xarray dataset and the following
Location of data on disk
Source of data: CESM output, observational data, etc
type of data: history files, time series, climatology, etc
Role of data: should dataset be used as a reference for "truth"?
Years of data requested for analysis
Variables from the dataset to include in analysis
Operations to perform on data
An "analysis element" is a container class that has a dictionary of "collections" and the following
Description of analysis being done
Variables to analyze
Plots / tables requested
Where on disk to write output
Horizontal grid / vertical levels to use for output (if needed)
Proposed Hierarchy of Classes
Data is stored in an xarray dataset
There is a container class that has an xarray dataset and metadata
Location of data on disk
Source of data: CESM output, observational data, etc
type of data: history files, time series, climatology, etc
There is a dictionary of these container class objects (a "collection")
There is a container class of the collection that contains information like
Description of analysis being done
Variables to analyze
Time period to use for analysis (default = all available data)
Which collection should be considered "truth"?
Plots / tables requested
Where on disk to write output
Horizontal grid / vertical levels to use for output (if needed)
Notable changes
The container class of collections is the only source of variables to analyze.
Users will specify at the highest level if a variable should be ignored in some datasets (but included in others)
For the time being, requesting a variable that is not actually in the dataset will result in an error. We can open another issue ticket to discuss whether this is the best behavior, or if missing variables should just be omitted from analysis.
The container class for the xarray dataset no longer explicitly specifies what operations are needed on the dataset -- this should be determined by the operations requested in the container class of collections.
E.g. plot_state knows it needs monthly climatologies, so it will request monthly climatologies of each dataset. It's up to the collection of datasets to know if the data is already a climatology or if another operation is needed.
The container class for the xarray dataset no longer explicitly specifies which dataset should be used as reference for the truth
The issue is that we need names for the container classes. I think the container of datasets can now be called a data_source because the constructor only needs information about where the dataset should be read from and what kind of data it is. The compute_monthly_climatology function still exists at this level, although at some point we will replace the xarray dataset object with the esmlab dataset object and then the compute_monthly_climatology function will come from esmlab.
I don't have a good name for the container of collections + metadata... we've been calling it an analysis element, but it's odd to call something that contains multiple collections an element. Maybe it's a diagnostic evaluation object?
The text was updated successfully, but these errors were encountered:
As of 2e0733c, the code is mostly in line with the proposed hierarchy above - the time period for the analysis is currently still part of the collection class (corresponding to 2 in the list of four levels) but everything else is tied to the correct component. I'm going to rename collection -> data_source and similarly collections -> data_sources (though it will be hard to break the habit of referring to multiple data sources as a single collection).
I haven't yet come up with a good name for what we currently call the analysis element.
Issue Description
Right now, the way we have organized the data in our various classes is rather confusing. @klindsay28 and I chatted a bit about how the data is organized, and I think that the following is an accurate representation of the result of that conversation.
Current Hierarchy of Classes
Data is stored in an xarray dataset
A "collection" is a container class that has the
xarray
dataset and the followingAn "analysis element" is a container class that has a dictionary of "collections" and the following
Proposed Hierarchy of Classes
Data is stored in an xarray dataset
There is a container class that has an xarray dataset and metadata
There is a dictionary of these container class objects (a "collection")
There is a container class of the collection that contains information like
Notable changes
The container class of collections is the only source of variables to analyze.
The container class for the xarray dataset no longer explicitly specifies what operations are needed on the dataset -- this should be determined by the operations requested in the container class of collections.
plot_state
knows it needs monthly climatologies, so it will request monthly climatologies of each dataset. It's up to the collection of datasets to know if the data is already a climatology or if another operation is needed.The container class for the xarray dataset no longer explicitly specifies which dataset should be used as reference for the truth
The issue is that we need names for the container classes. I think the container of datasets can now be called a
data_source
because the constructor only needs information about where the dataset should be read from and what kind of data it is. Thecompute_monthly_climatology
function still exists at this level, although at some point we will replace thexarray
dataset object with theesmlab
dataset object and then thecompute_monthly_climatology
function will come fromesmlab
.I don't have a good name for the container of collections + metadata... we've been calling it an analysis element, but it's odd to call something that contains multiple collections an element. Maybe it's a diagnostic evaluation object?
The text was updated successfully, but these errors were encountered: