diff --git a/appa.adoc b/appa.adoc index d59818f3..1ae85869 100644 --- a/appa.adoc +++ b/appa.adoc @@ -9,7 +9,16 @@ See <> for the grid mapping attributes, and <> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose. +For variable attributes, the possible values of "Use" are: + +* **C** for variables containing coordinate data, +* **D** for data variables, +* **M** for geometry container variables, +* **Do** for domain variables, +* **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), +* **A** for aggregation variables (see <>), +* **-** for variables with some other purpose. + CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. "Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary). @@ -38,6 +47,18 @@ Attribute If both **`scale_factor`** and **`add_offset`** attributes are present, the data are first scaled before the offset is added. In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general. +| **`aggregated_data`** +| S +| A +| <> +| Records the aggregation instructions that define how to create the aggregated data of an aggregation variable. + +| **`aggregated_dimensions`** +| S +| A +| <> +| Identifies the dimensions of the aggregated data of an aggregation variable. + | **`ancillary_variables`** | S | D diff --git a/appl.adoc b/appl.adoc new file mode 100644 index 00000000..a2ba94bf --- /dev/null +++ b/appl.adoc @@ -0,0 +1,584 @@ +[[appendix-aggregation-examples, Appendix L, Aggregation Variable Examples]] + +[appendix] +== Aggregation Variable Examples + +This appendix contains examples of aggregation variables. +Details of how to encode and decode aggregation variables are found in <>. + +[[example-L.1]] +[caption="Example L.1 "] +.Aggregation variable example 1 +==== +---- +dimensions: + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment array dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Fragment shape dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment array dimension + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_time, f_level, f_latitude, f_longitude) ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = "January-March.nc", "April-December.nc" ; + fragment_address = "temperature" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; +---- +In this example, the `temperature` data variable is an aggregation variable. +Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with data shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. +The fragment dataset locations are relative-path URI references, and so in this case are assumed to be in the same location as the aggregation file. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + + +[[example-L.2]] +[caption="Example L.2 "] +.Aggregation variable example 2 +==== +---- +dimensions: + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment array dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Fragment shape dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment array dimension + // Fragment versions dimension + versions = 2 ; // The maximum number of versions for a fragment + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double time ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_time, f_level, f_latitude, f_longitude, versions) ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = "file://data/January-March.nc", + _, + "file://data/April-December.nc", + "https://remote.host/data/April-December.nc" ; + fragment_address = "temperature" ; + fragment_address_time = "time" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; +---- +This example is similar to <>, but now the fragment dataset locations are absolute URIs, and two versions of the second fragment have been provided. +The `fragment_location` variable has the extra trailing dimension `versions` to accommodate the extra fragment version. +There is only one version of the first fragment, so its trailing dimension is padded with missing data. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.3]] +[caption="Example L.3 "] +.Aggregation variable example 3 +==== +---- +dimensions: + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment array dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Fragment shape dimensions + j = 4 ; // Equal to the number of aggregated dimensions + j_time = 1 ; // Equal to the number of aggregated dimensions for time + i = 2 ; // Equal to the size of the largest fragment array dimension + // Fragment versions dimension + versions = 2 ; // The maximum number of versions for a fragment + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Aggregation coordinate variable + double time ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + time:aggregated_dimensions = "time" ; + time:aggregated_data = "location: fragment_location + address: fragment_address_time + shape: fragment_shape_time" ; + // Coordinate variables + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_time, f_level, f_latitude, f_longitude, versions) ; + fragment_location:substitutions = "${local}: file://data/ + ${remote}: https://remote.host/data/" ; + string fragment_location_time(f_time, versions) ; + fragment_location:substitutions = "${local}: file://data/ + ${remote}: https://remote.host/data/" ; + string fragment_address ; + string fragment_address_time ; + int fragment_shape(j, i) ; + int fragment_shape_time(j_time, i) ; + +data: + temperature = _ ; + time = _ ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = "${local}January-March.nc", _, + "${local}April-December.nc", "${remote}April-December.nc" ; + fragment_location_time = "${local}January-March.nc", _, + "${local}April-December.nc", "${remote}April-December.nc" ; + fragment_address = "temperature" ; + fragment_address_time = "time" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; + fragment_shape_time = 3, 9 ; +---- +This example is similar to <>, but now the fragment dataset locations have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_location` variable. +In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment datasets as `temperature`. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.4]] +[caption="Example L.4 "] +.Aggregation variable example 4 +==== +---- +dimensions: + level = 17 ; + latitude = 181 ; + longitude = 360 ; + // Fragment array dimensions + f_level = 1 ; + f_latitude = 3 ; + f_longitude = 2 ; + // Fragment shape dimensions + j = 3 ; // Equal to the number of aggregated dimensions + i = 3 ; // Equal to the size of the largest fragment array dimension + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double level(level) ; + level:standard_name = "air_pressure" ; + level:units = "hPa" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_level, f_latitude, f_longitude) ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = "file_A.nc", "file_B.nc", + "file_C.nc", "file_D.nc", + "file_E.nc", "file_F.nc" ; + fragment_address = "temperature" ; + fragment_shape = 17, _, _, + 91, 45, 45, + 180, 180, _ ; +---- +This example is an encoding for the conceptual fragment array described in <>. +The `temperature` data variable is an aggregation of six fragments. +The distribution of missing values in the `fragment_shape` variable indicates that the `level` aggregated dimension is spanned by one fragment, the `latitude` aggregated dimension is spanned by three fragments, and the `longitude` aggregated dimension is spanned by two fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. +The row sums of the `fragment_shape` variable are `17`, `181`, and `360`, which equal the sizes of the `level`, `latitude`, and `longitude` aggregated dimensions, respectively. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.5]] +[caption="Example L.5 "] +.Aggregation variable example 5 +==== +---- +dimensions: + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment array dimensions + f_time = 12 ; + f_level = 1 ; + f_latitude = 2 ; + f_longitude = 4 ; + // Fragment shape dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 12 ; // Equal to the size of the largest fragment array dimension + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + double pressure(time, level, latitude, longitude) ; + temperature:standard_name = "air_pressure" ; + temperature:units = "hPa" ; + temperature:cell_methods = "time: mean" ; + + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_time, f_level, f_latitude, f_longitude) ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + pressure = ... ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = ... ; + fragment_address = "temperature" ; + fragment_shape = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, _, _, _, _, _, _, _, _, _, _, _, + 37, 36, _, _, _, _, _, _, _, _, _, _, + 36, 36, 36, 36, _, _, _, _, _, _, _, _ ; +---- +In this example, the `temperature` data variable is an aggregation of 96 fragments. +The implied fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. +The `pressure` data variable is not an aggregation variable. + +The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_location` variable, are omitted for clarity. +==== + +[[example-L.6]] +[caption="Example L.6 "] +.Aggregation variable example 6 +==== +---- +dimensions: + station = 3 ; + obs = 15000 ; + // Fragment array dimensions + f_station = 3 ; + // Fragment shape dimensions + j = 1 ; // Equal to the number of aggregated dimensions + i = 3 ; // Equal to the size of the largest fragment array dimension + +variables: + // Aggregation data variable + float tas(obs) ; + tas:standard_name = "air_temperature" ; + tas:units = "K" ; + tas:coordinates = "time lat lon alt station_name" ; + tas:aggregated_dimensions = "obs" ; + tas:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // DSG count variable + int row_size(station) ; + row_size:long_name = "number of observations per station" ; + row_size:sample_dimension = "obs" ; + + // Aggregation auxiliary coordinate variables + float time ; + time:standard_name = "time" ; + time:units = "days since 1970-01-01" ; + time:aggregated_dimensions = "obs" ; + time:aggregated_data = "location: fragment_location + address: fragment_address_time + shape: fragment_shape" ; + float lon(station) ; + lon:standard_name = "longitude"; + lon:long_name = "station longitude"; + lon:units = "degrees_east"; + lon:aggregated_dimensions = "station" ; + lon:aggregated_data = "location: fragment_location + address: fragment_address_lon + shape: fragment_shape_latlon" ; + float lat(station) ; + lat:standard_name = "latitude"; + lat:long_name = "station latitude" ; + lat:units = "degrees_north" ; + lat:aggregated_dimensions = "station" ; + lat:aggregated_data = "location: fragment_location + address: fragment_address_lat + shape: fragment_shape_latlon" ; + // Fragment array variables + string fragment_location(f_station) ; + string fragment_address ; + string fragment_address_time(f_station) ; + string fragment_address_lat ; + string fragment_address_lon ; + int fragment_shape(j, i) ; + int fragment_shape_latlon(j, i) ; + +// global attributes: + :featureType = "timeSeries" ; + +data: + tas = _ ; + row_size = 5000, 4000, 6000 ; + time = _ ; + lat = _ ; + lon = _ ; + fragment_location = "Harwell.nc", "Abingdon.nc", "Lambourne.nc" ; + fragment_address = "tas" ; + fragment_address_time = "t1", "t2", "t3" ; + fragment_address_lat = "lat" ; + fragment_address_lon = "lon" ; + fragment_shape = 5000, 4000, 6000 ; + fragment_shape_latlon = 1, 1, 1 ; +---- +In this example, three fragments are aggregated into a collection of DSG timeseries feature types with contiguous ragged array representation. +The auxiliary coordinate variables `time`, `lon`, and `lat` are also aggregation variables. +The time variables in the fragment datasets all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. +The fragments for all aggregation variables come from the same three fragment datasets, in this case. + +No data have been omitted from the CDL. +==== + +[[example-L.7]] +[caption="Example L.7 "] +.Aggregation variable example 7 +==== +---- +dimensions: + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment array dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Fragment shape dimensions + j = 4 ; // Equal to the number of temperature aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment array dimension + j_uid = 1 ; // Equal to the number of uid aggregated dimensions + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:ancillary_variables = "uid" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Aggregation ancillary variable + string uid ; + uid:long_name = "Fragment dataset unique identifiers" ; + uid:missing_value = "N/A" ; + uid:aggregated_dimensions = "time" ; + uid:aggregated_data = "value: fragment_value_uid + shape: fragment_shape_uid"; + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location(f_time, f_level, f_latitude, f_longitude) ; + string fragment_address ; + int fragment_shape(j, i) ; + string fragment_value_uid(f_time) ; + int fragment_shape_uid(j_uid, i) ; + +data: + temperature = _ ; + uid = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_location = "January-March.nc", "April-December.nc" ; + fragment_address = "temperature" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; + fragment_value_uid = "04b9-7eb5-4046-97b-0bf8", "05ee0-a183-43b3-a67-1eca" ; + fragment_shape_uid = 3, 9 ; +---- +This example is similar to <>, but now there is the aggregation ancillary variable `uid` which defines its fragments from the constant values stored in the `fragment_value_uid` variable, that are intended to be broadcast across the `time` aggregated dimension. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.8]] +[caption="Example L.8 "] +.Aggregation variable example 8 +==== +---- +dimensions: + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Scalar coordinate variables + double time ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double height ; + level:standard_name = "height" ; + level:units = "m" ; + double latitude ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location ; + string fragment_address ; + int fragment_shape ; + +data: + temperature = _ ; + time = 0 ; + height = 1.5 ; + latitude = 18.53 ; + longitude = 73.81 ; + fragment_location = "file.nc" ; + fragment_address = "tas" ; + fragment_shape = 1 ; +---- +An example of an aggregation variable with scalar aggregated data. +==== \ No newline at end of file diff --git a/bibliography.adoc b/bibliography.adoc index 5673cc8d..9a85b113 100644 --- a/bibliography.adoc +++ b/bibliography.adoc @@ -20,3 +20,4 @@ OGC document 12-063. 1st May 2015. - [[[XML]]] link:$$https://www.w3.org/TR/1998/REC-xml-19980210$$[Extensible Markup Language (XML) 1.0]. T. Bray, J. Paoli, and C.M. Sperberg-McQueen. 10 February 1998. - [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017. - [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files] +- [[[URI]]] link:$$https://doi.org/10.17487/RFC3986$$[RFC 3986. Uniform Resource Identifier (URI): Generic Syntax]. T. Berners-Lee, R. Fielding, L. Masinter. January 2005. diff --git a/cf-conventions.adoc b/cf-conventions.adoc index 01e32322..c5b308be 100644 --- a/cf-conventions.adoc +++ b/cf-conventions.adoc @@ -1,6 +1,6 @@ include::version.adoc[] = NetCDF Climate and Forecast (CF) Metadata Conventions -Brian{nbsp}Eaton; Jonathan{nbsp}Gregory; Bob{nbsp}Drach; Karl{nbsp}Taylor; Steve{nbsp}Hankin; Jon{nbsp}Blower; John{nbsp}Caron; Rich{nbsp}Signell; Phil{nbsp}Bentley; Greg{nbsp}Rappa; Heinke{nbsp}Höck; Alison{nbsp}Pamment; Martin{nbsp}Juckes; Martin{nbsp}Raspaud; Randy{nbsp}Horne; Timothy{nbsp}Whiteaker; David{nbsp}Blodgett; Charlie{nbsp}Zender; Daniel{nbsp}Lee; David{nbsp}Hassell; Alan{nbsp}D.{nbsp}Snow; Tobias{nbsp}Kölling; Dave{nbsp}Allured; Aleksandar{nbsp}Jelenak; Anders{nbsp}Meier{nbsp}Soerensen; Lucile{nbsp}Gaultier; Sylvain{nbsp}Herlédan; Fernando{nbsp}Manzano; Lars{nbsp}Bärring; Christopher{nbsp}Barker; Sadie{nbsp}Bartholomew +Brian{nbsp}Eaton; Jonathan{nbsp}Gregory; Bob{nbsp}Drach; Karl{nbsp}Taylor; Steve{nbsp}Hankin; Jon{nbsp}Blower; John{nbsp}Caron; Rich{nbsp}Signell; Phil{nbsp}Bentley; Greg{nbsp}Rappa; Heinke{nbsp}Höck; Alison{nbsp}Pamment; Martin{nbsp}Juckes; Martin{nbsp}Raspaud; Randy{nbsp}Horne; Timothy{nbsp}Whiteaker; David{nbsp}Blodgett; Charlie{nbsp}Zender; Daniel{nbsp}Lee; David{nbsp}Hassell; Alan{nbsp}D.{nbsp}Snow; Tobias{nbsp}Kölling; Dave{nbsp}Allured; Aleksandar{nbsp}Jelenak; Anders{nbsp}Meier{nbsp}Soerensen; Lucile{nbsp}Gaultier; Sylvain{nbsp}Herlédan; Fernando{nbsp}Manzano; Lars{nbsp}Bärring; Christopher{nbsp}Barker; Sadie{nbsp}Bartholomew; Bryan{nbsp}Lawrence; Neil{nbsp}Massey Version{nbsp}{current-version},{nbsp}{nbsp}{docprodtime}: See{nbsp}https://cfconventions.org{nbsp}for{nbsp}further{nbsp}information. :doctype: book :pdf-folio-placement: physical @@ -49,6 +49,8 @@ include::toc-extra.adoc[] * Lars Bärring, SMHI * Christopher Barker, NOAA * Sadie Bartholomew, NCAS and University of Reading +* Bryan Lawrence, NCAS and University of Reading +* Neil Massey, NCAS and STFC Many others have contributed to the development of CF through their participation in discussions about proposed changes. @@ -124,6 +126,9 @@ include::appj.adoc[] :numbered!: include::appk.adoc[] +:numbered!: +include::appl.adoc[] + :numbered!: include::history.adoc[] diff --git a/ch01.adoc b/ch01.adoc index 90a39158..4d22f667 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -57,6 +57,12 @@ Therefore CF-netCDF does not use codes, but instead relies on controlled vocabul The terms in this document that refer to components of a netCDF file are defined in the NetCDF User's Guide (NUG) <> NUG. Some of those definitions are repeated below for convenience. +aggregated data:: The data of an aggregation variable, after it has been created in memory by an application program. + +aggregated dimension:: A dimension of the aggregated data of an aggregation variable. + +aggregation variable:: A variable whose data is defined as an aggregation of fragments, rather than containing its own data. + ancestor group:: A group from which the referring group is descended via direct parent-child relationships auxiliary coordinate variable:: Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). @@ -78,6 +84,9 @@ coordinate variable:: We use this term precisely as it is defined in the link:$$https://docs.unidata.ucar.edu/nug/current/best_practices.html#bp_Coordinate-Systems$$[NUG section on coordinate variables]. It is a one-dimensional variable with the same name as its dimension [e.g., **`time(time)`**], and it is defined as a numeric data type with values in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. +Note that an aggregation coordinate variable is stored as a scalar and has the same name as its aggregated dimension (see <>). + +fragment:: A constituent part, found in an external dataset, of the aggregated data of an aggregation variable. grid mapping variable:: A variable used as a container for attributes that define a specific grid mapping. The type of the variable is arbitrary since it contains no data. diff --git a/ch02.adoc b/ch02.adoc index df44de66..2f4c94ee 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -268,3 +268,206 @@ They may not be attached to a group, even if all variables within that group use If attributes are present within groups without being attached to a variable, these attributes apply to the group where they are defined, and to that group's descendants, but not to ancestor or sibling groups. If a group attribute is defined in a parent group, and one of the child group redefines the same attribute, the definition within the child group applies for the child and all of its descendants. + +[[aggregation-variables, Section 2.8, "Aggregation Variables"]] +=== Aggregation Variables + +An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ that are generally stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. +A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. +The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ in memory as an aggregation of the data from each fragment. + +Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). +Fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as a CF-compliant view of non-CF datasets. +Use cases for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulations, for combining output data that have been written to disk as multiple datasets decomposed in time and space. + +An aggregation variable must be a scalar (i.e. it has no dimensions). +It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. +The data type of the aggregation variable must be the data type of the aggregated data, but the value of the aggregation variable's single element is immaterial. + +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. + +Any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. +For instance: + +* The dimension of a coordinate variable of an aggregation data variable must be included as one of the aggregated dimensions of the aggregation data variable. + +* The name of an aggregation coordinate variable (which is a scalar) must +be the same as the name of its single aggregated dimension (identified by its **`aggregated_dimensions`** attribute), just as the name of a coordinate variable (which is one-dimensional) must be the same as the name of its single +dimension. + +The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. + + +[[aggregated-dimensions-and-data, Section 2.8.1, "Aggregated Dimensions and Data"]] +==== Aggregated Dimensions and Data + +The aggregated dimensions are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. +The value of the **`aggregated_dimensions`** attribute is a blank-separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. +If the aggregated data is scalar then there are no aggregated dimensions and the **`aggregated_dimensions`** attribute must be an empty string. +The aggregated dimensions must exist as dimensions in the aggregation file. + +The fragments which provide the aggregated data are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. +Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. +The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. +See the <>. + +The aggregated data are created by concatenating the canonical forms of the fragments' data (see <>) along each fragment array dimension, and in the order in which they appear in the fragment array. + +[[example-fragment-array]] +[caption="Example 2.2. "] +.Schematic representation of a fragment array for aggregated data +==== +[cols="a,a"] +|=============== +| *Fragment array position `[0, 0, 0]`* + +Fragment location: `file_A.nc` + +Fragment data shape: `(17, 91, 180)` + +`17` vertical levels + +`[90, 0]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 0, 1]`* + +Fragment location: `file_B.nc` + +Fragment data shape: `(17, 91, 180)` + +`17` vertical levels + +`[90, 0]` degrees north + +`[180, 360)` degrees east + +| *Fragment array position `[0, 1, 0]`* + +Fragment location: `file_C.nc` + +Fragment data shape: `(17, 45, 180)` + +`17` vertical levels + +`(0, -45]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 1, 1]`* + +Fragment location: `file_D.nc` + +Fragment data shape: `(17, 45, 180)` + +`17` vertical levels + +`(0, -45]` degrees north + +`[180, 360)` degrees east + +| *Fragment array position `[0, 2, 0]`* + +Fragment location: `file_E.nc` + +Fragment data shape: `(17, 45, 180)` + +`17` vertical levels + +`(-45, -90]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 2, 1]`* + +Fragment location: `file_F.nc` + +Fragment data shape: `(17, 45, 180)` + +`17` vertical levels + +`(-45, -90]` degrees north + +`[180, 360)` degrees east +|=============== +The fragments, stored in six fragment datasets, are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. +Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. +The fragments combine to create three-dimensional aggregated data that have global Z-Y-X coverage, with shape `(17, 181, 360)`. +The Z aggregated dimension is spanned by one fragment, the Y aggregated dimension is spanned by three fragments, and the X aggregated dimension is spanned by two fragments. +Note that, since this example is a schematic representation, the C or Fortran order of the dimensions is of no consequence. +See <> for a CDL representation of this fragment array. +==== + +The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. +This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values unambiguously define the fragment array. +The order of elements in the **`aggregated_data`** attribute is not significant. + +The features must comprise either all three of the `shape`, `location`, and `address` keywords; or else both of the `shape` and `value` keywords. No other combinations of keywords are allowed. These features are defined as follows: + +// Turn off section numbering for a bit +:numbered!: + +===== shape + +The integer-valued `shape` fragment array variable defines the shape of each fragment's data in its canonical form (see <>). +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower-varying dimension (i.e. the first dimension in CDL order, representing rows) being the number of fragment array dimensions, and the size of the more rapidly-varying dimension (i.e. the second dimension in CDL order, representing columns) being the size of the largest fragment array dimension. +The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. +The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. +See <>, which shows the `shape` fragment array variable for the fragment array described by the <>. +If the aggregated data is scalar then the `shape` fragment array variable must be a scalar and contain the value `1`. +See <>. + +===== location + +The string-valued `location` fragment array variable defines the locations of fragment datasets. +In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. +A fragment dataset is located with a Uniform Resource Identifier (URI) <> that must be either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). +A relative-path URI reference is taken as being relative to the location of the aggregation file. +If the aggregation file is moved to another location, then a fragment dataset identified by an absolute URI will still be accessible, whereas a fragment dataset identified by a relative-path URI reference will also need be moved to preserve the relative reference. +Not all fragment dataset locations need be of the same URI type. +See <> and <>. + +The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of fragments to be specified. +Each version must contain equivalent information, so that any version that exists may be selected for use in the aggregated data. +This could be useful when it is known that a fragment could be stored in a number of locations, but it is not known which of them might exist at any given time. +For instance, when remotely stored and locally cached versions of the same fragment have been defined, an application program could choose to only retrieve the remote version if the local version does not exist. +Every fragment must have at least one version, but not all fragments need to have the same number of versions. +Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. +See <>. + +A fragment dataset location may be defined with any number of string substitutions, each of which is provided by the `location` fragment array variable's **`substitutions`** attribute. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines part of a `location` fragment array variable value which is to be replaced by __replacement__ in order to find the actual fragment dataset location. +A `location` fragment array variable value may include any subset of zero or more of the substitution keywords. +After replacements have been made, the fragment dataset location must be an absolute URI or a relative-path URI reference. +The substitution keyword must have the form `${\*}`, where `*` represents any number of any characters. +For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as `$\{path}file.nc`, in conjunction with `substitutions="$\{path}: \https://remote.host/data/"`. +The order of elements in the **`substitutions`** attribute is not significant, and the substitutions for a given fragment must be such that applying them in any order will result in the same fragment dataset location. +The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. +See <>. + +===== address + +The `address` fragment array variable, that may have any data type, defines how to find each fragment within its fragment dataset, i.e. the address of the fragment. +In general it has the same dimensions in the same order as the `location` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +However, if the `address` fragment array variable is a scalar, then its single value applies to all versions of all fragments. +For a netCDF fragment dataset, the address must be the string-valued netCDF variable name of the fragment. +Addresses for other fragment dataset formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. +See <> and <>. + +===== value + +When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by reference to a fragment dataset. +The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. +The `value` fragment array variable may have any data type, and contains each fragment's unique value. +A fragment that contains wholly missing data is specified by any missing value indicated by the `value` fragment array variable. +See <>, which uses an aggregation ancillary variable to make fragment dataset global attributes available to an aggregation data variable. + +// Turn section numbering back on +:numbered: + + +[[fragment-interpretation, Section 2.8.2 "Fragment Interpretation"]] +==== Fragment Interpretation + +The data of a fragment must be converted to its __canonical form__ prior to being inserted into the aggregated data. The canonical form of a fragment's data is such that: + +* The fragment's data, in its entirety, provide the values for a unique and contiguous part of the aggregated data. + +* The fragment's data dimensions correspond to the aggregated dimensions in the same order. + +* The fragment's data have the same units as the aggregation variable. + +* The fragment's data have missing values as indicated by the aggregation variable. + +* The fragment's data are not packed (i.e. not stored using a smaller data type than the original data). + +* The fragment's data have the same data type as the aggregation variable. + +The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data in memory. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. +A combination of some of the following operations may be required to convert the fragment's data to its canonical form: + +* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the shape of the canonical form of the fragment's data. + +* Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). + +* Transforming the fragment's data to have the same data type as the aggregated data. +Note that some transformations may result in a loss of information, such as could be the case when casting floating point numbers to integers. + +* Transforming missing values in the fragment's data to a value indicated as missing by the aggregation variable. +Note that it is the responsibility of the creator of the aggregation file to ensure that all non-missing fragment data values do not coincide with any of the missing values indicated by the aggregation variable. + +* Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). + +* Unpacking the fragment's data. +Note that if the aggregation variable indicates that the aggregated data values are packed (as determined by the attributes defined in <>), then the canonical fragment data values will represent packed values in the aggregated data, and so will be subject to the aggregation variable's unpacking. \ No newline at end of file diff --git a/conformance.adoc b/conformance.adoc index fddd28ea..d44e8c85 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -124,6 +124,48 @@ References can be absolute, relative or with no path, in which case, the variabl * NUG-coordinate variables that are not in the referring group or one of its direct ancestors should be referenced by absolute or relative paths rather than relying on the lateral search algorithm. +[[aggregation-variables]] +=== 2.8 Aggregation Variables + +*Requirements:* + +* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank-separated list of zero or more aggregated dimension names. +Each aggregated dimension must name a dimension in the file. + +* An aggregation variable must be a scalar. + +* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. + Each __variable__ must be the name of a variable in the file. + The __feature__ keywords must comprise either all three of the `shape`, `location`, and `address` keyords; or else both of the `shape` and `value` keywords. + + ** The `location` variable must have a string data type. + + ** The `location` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. + + ** The `location` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. + Each __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. + + ** A data value of a `location` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. + + ** The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `location` variable. + + ** The `value` variable must have the same number of dimensions as there are aggregated dimensions. + + ** The `shape` variable must have an integer data type. + + ** If there are zero aggregated dimensions then the `shape` variable must a be scalar and contain the value `1`. + + ** If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional. + *** The size of the slower-varying dimension (i.e. the first dimension in CDL order, representing rows) must be the number of aggregated dimensions. + *** The size of the more rapidly-varying dimension (i.e. the second dimension in CDL order) must be either the size of the largest of the `value` variable dimensions, or else the size of the largest of the `location` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. + + *** The rows correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row's non-missing values must equal the size of its corresponding aggregated dimension. + +*Recommendations:* + +* The following kinds of variable should not be aggregation variables: grid mapping variables, domain variables, mesh topology variables, geometry container variables, and interpolation variables. + + [[section-6]] [[description-of-the-data]] === 3 Description of the Data diff --git a/history.adoc b/history.adoc index 4d100eae..6102366c 100644 --- a/history.adoc +++ b/history.adoc @@ -7,6 +7,7 @@ === Working version (most recent first) +* {issues}508[Issue #508]: Introduce aggregation variables * {issues}237[Issue #237]: Clarify that the character set given in section 2.3 for variable, dimension, attribute and group names is a recommendation, not a requirement. * {issues}515[Issue #515]: Clarify the recommendation to use the convention of 4.3.3 for parametric vertical coordinates, because the previous wording caused confusion. * {issues}511[Issue #511]: Appendix B: New element in XML file header to record the "first published date" diff --git a/toc-extra.adoc b/toc-extra.adoc index eac9bb14..981aa3e5 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -37,6 +37,7 @@ J.5. <> [%hardbreaks] 2.1. <> +2.2. <> 3.1. <> 3.2. <> 3.3. <> @@ -119,4 +120,12 @@ H.19. <> H.20. <> H.21. <> H.22. <> -I.1. <> \ No newline at end of file +I.1. <> +L.1. <> +L.2. <> +L.3. <> +L.4. <> +L.5. <> +L.6. <> +L.7. <> +L.8. <>