From 261b55f0693627423d8b5104a3eac5ee89996bff Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 18 Apr 2024 17:54:11 +0100 Subject: [PATCH 01/59] aggregation variables first draft --- appa.adoc | 14 +- appl.adoc | 542 ++++++++++++++++++++++++++++++++++++++++++++ cf-conventions.adoc | 3 + ch01.adoc | 9 + ch02.adoc | 162 +++++++++++++ 5 files changed, 729 insertions(+), 1 deletion(-) create mode 100644 appl.adoc diff --git a/appa.adoc b/appa.adoc index d59818f3..48cda684 100644 --- a/appa.adoc +++ b/appa.adoc @@ -9,7 +9,7 @@ See <> for the grid mapping attributes, and <> for the distinction between **BI** and **BO**), and **-** for variables with some other purpose. +For variable attributes, the possible values of "Use" are: **C** for variables containing coordinate data, **D** for data variables, **M** for geometry container variables, **Do** for domain variables, **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), **A** for an aggregation variable, and **-** for variables with some other purpose. CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. "Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary). @@ -38,6 +38,18 @@ Attribute If both **`scale_factor`** and **`add_offset`** attributes are present, the data are first scaled before the offset is added. In cases where there is a strong constraint on dataset size, it is allowed to pack the coordinate variables (using add_offset and/or scale_factor), but this is not recommended in general. +| **`aggregated_data`** +| S +| A +| <> +| Records the aggregation instructions that define how to create an aggregation variable's aggregated data. + +| **`aggregated_dimensions`** +| S +| A +| <> +| Identifies the dimensions of an aggregation variable's aggregated data. + | **`ancillary_variables`** | S | D diff --git a/appl.adoc b/appl.adoc new file mode 100644 index 00000000..cd49b5f6 --- /dev/null +++ b/appl.adoc @@ -0,0 +1,542 @@ +[[appendix-aggregation-examples, Appendix L, Aggregation Variable Examples]] + +[appendix] +== Aggregation Variable Examples + +This appendix contains examples of aggregation variables. Details of how to encode and decode aggregation variables may found in <>. + +[[example-L.1]] +[caption=] +.Example L.1 +==== +---- +dimensions: + // Aggregated dimensions + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Extra dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment dimension + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_time, f_level, f_latitude, f_longitude) ; + string fragment_format ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = "January-March.nc", "April-December.nc" ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; +---- +In this example, the `temperature` data variable is an aggregation variable. Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. The fragment files names are taken as being relative to the current directory location of the aggregation file, since they are not fully qualified URIs. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + + +[[example-L.2]] +[caption=] +.Example L.2 +==== +---- +dimensions: + // Aggregated dimensions + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Extra dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment dimension + versions = 2 ; // The maximum number of versions for a fragment + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double time ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_time, f_level, f_latitude, f_longitude, versions) ; + string fragment_format ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = "file://local/data/January-March.nc", + _, + "file://local/data/April-December.nc", + "https://remote/data/April-December.nc" ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_address_time = "time" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; +---- +This example is similar to <>, but now the fragment file names are fully qualified URIs, and two versions of the second fragment have been provided. The `fragment_file` fragment array variable has the extra trailing dimension `versions` to accommodate the extra fragment version. There is only one version of the first fragment, so its trailing dimension is padded with missing data. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.3]] +[caption=] +.Example L.3 +==== +---- +dimensions: + // Aggregated dimensions + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Extra dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment dimension + versions = 2 ; // The maximum number of versions for a fragment + j_time = 1 ; // Equal to the he number of aggregated dimensions for time + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double time ; // This is an aggregation coordinate variable + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + time:aggregated_dimensions = "time" ; + time:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address_time + shape: fragment_shape_time" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_time, f_level, f_latitude, f_longitude, versions) ; + fragment_file:substitutions = "${local}: file://local/data/ + ${remote}: https://remote/data/" ; + string fragment_file_time(f_time, versions) ; + fragment_file:substitutions = "${local}: file://local/data/ + ${remote}: https://remote/data/" ; + string fragment_format ; + string fragment_address ; + string fragment_address_time ; + int fragment_shape(j, i) ; + int fragment_shape_time(j_time, i) ; + +data: + temperature = _ ; + time = _ ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = "${local}January-March.nc", + _, + "${local}April-December.nc", + "${remote}April-December.nc" ; + fragment_file_time = "${local}January-March.nc", + _, + "${local}April-December.nc", + "${remote}April-December.nc" ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_address_time = "time" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; + fragment_shape_time = 3, 9 ; +---- +This example is similar to <>, but now the fragment file names have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_file` fragment array variable `fragment_file`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. + +In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment files as `temperature`. +==== + +[[example-L.4]] +[caption=] +.Example L.4 +==== +---- +dimensions: + // Aggregated dimensions + level = 17 ; + latitude = 181 ; + longitude = 360 ; + // Fragment dimensions + f_level = 1 ; + f_latitude = 3 ; + f_longitude = 2 ; + // Extra dimensions + j = 3 ; // Equal to the number of aggregated dimensions + i = 3 ; // Equal to the size of the largest fragment dimension + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape" ; + // Coordinate variables + double level(level) ; + level:standard_name = "air_pressure" ; + level:units = "hPa" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_level, f_latitude, f_longitude) ; + string fragment_format ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = "file_A.nc", "file_B.nc", + "file_C.nc", "file_D.nc", + "file_E.nc", "file_F.nc" ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_shape = 17, _, _, + 91, 45, 45, + 180, 180, _ ; +---- +This example is an encoding for the fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.5]] +[caption=] +.Example L.5 +==== +---- +dimensions: + // Aggregated dimensions + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment dimensions + f_time = 12 ; + f_level = 1 ; + f_latitude = 2 ; + f_longitude = 4 ; + // Extra dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 12 ; // Equal to the size of the largest fragment dimension + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape" ; + double pressure(time, level, latitude, longitude) ; + temperature:standard_name = "air_pressure" ; + temperature:units = "hPa" ; + temperature:cell_methods = "time: mean" ; + + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_time, f_level, f_latitude, f_longitude) ; + string fragment_format ; + string fragment_address ; + int fragment_shape(j, i) ; + +data: + temperature = _ ; + pressure = ... ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = ... ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_shape = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, + 1, _, _, _, _, _, _, _, _, _, _, _, + 37, 36, _, _, _, _, _, _, _, _, _, _, + 36, 36, 36, 36, _, _, _, _, _, _, _, _ ; +---- +In this example, the `temperature` data variable is an aggregation of 96 fragments. The fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. The `pressure` data variable is not an aggregation variable. The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_file` fragment array variable, are omitted for clarity. +==== + +[[example-L.6]] +[caption=] +.Example L.6 +==== +---- +dimensions: + // Aggregated dimensions + station = 3 ; + obs = 15000 ; + // Fragment dimensions + f_station = 3 ; + f_obs = 3 ; + // Extra dimensions + j = 1 ; + i = 3 ; + +variables: + // Data variable + float tas(obs) ; + tas:standard_name = "air_temperature" ; + tas:units = "K" ; + tas:coordinates = "time lat lon alt station_name" ; + tas:aggregated_dimensions = "obs" ; + tas:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address_tas + shape: fragment_shape" ; + // DSG count variable + int row_size(station) ; + row_size:long_name = "number of observations per station" ; + row_size:sample_dimension = "obs" ; + + // Auxiliary coordinate variables + float time ; + time:standard_name = "time" ; + time:units = "days since 1970-01-01" ; + time:aggregated_dimensions = "obs" ; + time:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address_time + shape: fragment_shape" ; + float lon(station) ; + lon:standard_name = "longitude"; + lon:long_name = "station longitude"; + lon:units = "degrees_east"; + lon:aggregated_dimensions = "station" ; + lon:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address_lon + shape: fragment_shape_latlon" ; + float lat(station) ; + lat:standard_name = "latitude"; + lat:long_name = "station latitude" ; + lat:units = "degrees_north" ; + lat:aggregated_dimensions = "station" ; + lat:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address_lat + shape: fragment_shape_latlon" ; + + // Fragment array variables + string fragment_file(f_station) ; + string fragment_format ; + string fragment_address_tas ; + string fragment_address_time(f_station) ; + string fragment_address_lat ; + string fragment_address_lon ; + int fragment_shape(j, i) ; + int fragment_shape_latlon(j, i) ; + +// global attributes: + :featureType = "timeSeries"; + +data: + tas = _ ; + row_size = 5000, 4000, 6000 ; + time = _ ; + lat = _ ; + lon = _ ; + fragment_file = "Harwell.nc", "Abingdon.nc", "Lambourne.nc" ; + fragment_format = "nc" ; + fragment_address_tas = "tas" ; + fragment_address_time = "t1", "t2", "t3" ; + fragment_address_lat = "lat" ; + fragment_address_lon = "lon" ; + fragment_shape = 5000, 4000, 6000 ; + fragment_shape_latlon = 1, 1, 1 ; +---- +In this example, three fragments are aggregated into a collection of DSG timeseries feature types with contiguous ragged array representation. The auxiliary coordinate variables which span either the `obs` or `station` dimensions are also aggregation variables. The time variables in the fragment files all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. The fragments for all aggregation variable come from the same three fragment files. No data have been omitted from the CDL. +==== + +[[example-L.7]] +[caption=] +.Example L.7 +==== +---- +dimensions: + // Aggregated dimensions + time = 12 ; + level = 1 ; + latitude = 73 ; + longitude = 144 ; + // Fragment dimensions + f_time = 2 ; + f_level = 1 ; + f_latitude = 1 ; + f_longitude = 1 ; + // Extra dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 2 ; // Equal to the size of the largest fragment dimension + +variables: + // Data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_data = "file: fragment_file + format: fragment_format + address: fragment_address + shape: fragment_shape + id: fragment_id" ; // Non-standardized feature + + // Coordinate variables + double time(time) ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double level(level) ; + level:standard_name = "height_above_mean_sea_level" ; + level:units = "m" ; + double latitude(latitude) ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude(longitude) ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + + // Fragment array variables + string fragment_file(f_time, f_level, f_latitude, f_longitude) ; + string fragment_format ; + string fragment_address ; + int fragment_shape(j, i) ; + string fragment_id(f_time, f_level, f_latitude, f_longitude) ; + fragment_id:long_name = "Fragment file unique identifiers" + +data: + temperature = _ ; + time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; + level = ... ; + latitude = ... ; + longitude = ... ; + fragment_file = "January-March.nc", "April-December.nc" ; + fragment_format = "nc" ; + fragment_address = "temperature" ; + fragment_shape = 3, 9, + 1, _, + 73, _, + 144, _ ; + fragment_id = "04821b9-7eb5-4046-937b-0bf06b01588", "056d1ee0-a183-43b3-ae67-1ec6aa1532a" ; +---- +This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== \ No newline at end of file diff --git a/cf-conventions.adoc b/cf-conventions.adoc index 01e32322..8ecc4002 100644 --- a/cf-conventions.adoc +++ b/cf-conventions.adoc @@ -124,6 +124,9 @@ include::appj.adoc[] :numbered!: include::appk.adoc[] +:numbered!: +include::appl.adoc[] + :numbered!: include::history.adoc[] diff --git a/ch01.adoc b/ch01.adoc index e8bda96d..76a022a7 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -57,6 +57,12 @@ Therefore CF-netCDF does not use codes, but instead relies on controlled vocabul The terms in this document that refer to components of a netCDF file are defined in the NetCDF User's Guide (NUG) <> NUG. Some of those definitions are repeated below for convenience. +aggregated data:: The data of an aggregation variable, after it has been created by an application program. + +aggregated dimension:: A dimension of the aggregated data of an aggregation variable. + +aggregation variable:: A variable whose data is defined by as a virtual aggregation of fragments, rather than containing its own data. + ancestor group:: A group from which the referring group is descended via direct parent-child relationships auxiliary coordinate variable:: Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the NUG and used by this standard - see below). @@ -78,6 +84,9 @@ coordinate variable:: We use this term precisely as it is defined in the link:$$https://docs.unidata.ucar.edu/nug/current/best_practices.html#bp_Coordinate-Systems$$[NUG section on coordinate variables]. It is a one-dimensional variable with the same name as its dimension [e.g., **`time(time)`**], and it is defined as a numeric data type with values in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. +Note that an aggregation coordinate variable is stored as a scalar, and must have the same name its aggregated dimension (see <>). + +fragment:: A constituent part, found in an external file, of the aggregated data of an aggregation variable. grid mapping variable:: A variable used as a container for attributes that define a specific grid mapping. The type of the variable is arbitrary since it contains no data. diff --git a/ch02.adoc b/ch02.adoc index 1c5a2e2e..24e3c6a4 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -268,3 +268,165 @@ They may not be attached to a group, even if all variables within that group use If attributes are present within groups without being attached to a variable, these attributes apply to the group where they are defined, and to that group's descendants, but not to ancestor or sibling groups. If a group attribute is defined in a parent group, and one of the child group redefines the same attribute, the definition within the child group applies for the child and all of its descendants. +[[aggregation-variables, Section 2.8, "Aggregation Variables"]] +=== Aggregation Variables + +An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment files__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. + +Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple files, whilst taking up very little space on disk (since the aggregation file contains no copies of the data in the fragments). The fragment files may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. Storing aggregations is useful for data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; and for archive curation, as the aggregation can act as a metadata-rich archive index. + +An aggregation variable must be a scalar (i.e. it has no dimensions) and the value of its single element is immaterial. It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregated data is the same as the data type of the aggregation variable. + +Any variable may be an aggregation variable, and being an aggregation variable does not affect its role within CF (i.e. data variable, coordinate variable, boundary variable, cell measure variable, etc.). + +The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. + +The conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated in this way. + + +[[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] +==== Aggregated Dimensions + +The aggregated dimensions must be stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. The value of the **`aggregated_dimensions`** attribute is a blank separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. + +The interpretation of all variables needs to account for the fact that the aggregated dimensions of an aggregation variable have exactly the same status as the dimensions of a normal (i.e. non-aggregation) variable. For instance: + +* coordinate and auxiliary coordinate variables must share their dimensions with the aggregated dimensions of their aggregation data variable, +* an aggregation coordinate variable (which will be a scalar) must have the same name as its aggregated dimension, +* etc. + +[[aggregated-data, Section 2.8.2, Aggregated Data]] +==== Aggregated Data + +The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See <>. + + +The aggregated data is created by concatenating the fragments' data along each fragment dimension, in the order in which they appear in the fragment array. + +Once the aggregated data has been created in memory, it has exactly the same status as the data of a normal (i.e. non-aggregation) variable. + +[[example-fragment-array]] +[caption="Example 2.2. "] +.A schematic representation of a fragment array for aggregated data +==== +[cols="a,a"] +|=============== +| *Fragment array position `[0, 0, 0]`* + +Fragment file name `file_A.nc` + +Fragment data shape `(17, 91, 180)` + +`17` vertical levels + +`[90, 0]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 0, 1]`* + +Fragment file name `file_B.nc` + +Fragment data shape `(17, 91, 180)` + +`17` vertical levels + +`[90, 0]` degrees north + +`[180, 360)` degrees east + +| *Fragment array position `[0, 1, 0]`* + +Fragment file name `file_C.nc` + +Fragment data shape `(17, 45, 180)` + +`17` vertical levels + +`(0, -45]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 1, 1]`* + +Fragment file name `file_D.nc` + +Fragment data shape `(17, 45, 180)` + +`17` vertical levels + +`(0, -45]` degrees north + +`[180, 360)` degrees east + +| *Fragment array position `[0, 2, 0]`* + +Fragment file name `file_E.nc` + +Fragment data shape `(17, 45, 180)` + +`17` vertical levels + +`(-45, -90]` degrees north + +`[0, 180)` degrees east | *Fragment array position `[0, 2, 1]`* + +Fragment file name `file_F.nc` + +Fragment data shape `(17, 45, 180)` + +`17` vertical levels + +`(-45, -90]` degrees north + +`[180, 360)` degrees east +|=============== +Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane. The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. + +==== + +The fragment array is defined by the aggregation variable's **`aggregated_data`** attribute. This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides the feature's values for each fragment in the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. + +There are four standardized and mandatory features, given by the `file`, `format`, `address`, and `shape` keywords; and any amount of non-standardized features are also allowed: + +`file` + +The string-valued `file` fragment array variable defines how to find each fragment file. In general it has the same shape as the fragment array, and its values specify the fragment file names. Each file name must take one of the following forms: + +* A fully qualified Uniform Resource Identifier (URI, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. + +* A file path that is relative to the current location of the aggregation file. If the aggregation file is moved then the fragment files must also be moved to preserve their relative locations. + +Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. This is useful when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. For instance, this feature could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. See <>. + +A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. The use of substitutions can save space in the aggregation file; and in the event that the fragment files are moved from their original locations it may be possible for the fragment file names to be modified by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating and reading the fragment file. The order of elements is not significant. The _base_ keyword must have the form `${\...}`, where `\...` represents any characters. For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. See <>. + +`format` + +The string-valued `format` fragment array variable defines the format of the fragment files. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. The format of a CF-netCDF fragment file must be indicated with the value `nc`. Other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. + +`address` + +The `address` fragment array variable defines how to find the fragments within the fragment files. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. It may have any data type. For a CF-netCDF fragment file, the address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. + +`shape` + +The integer-valued `shape` fragment array variable defines the shape of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment dimension. Each row provides the sizes of the fragments along that dimension of the fragment array. Rows that correspond to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>. + +*Non-standardized features* + +Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. In general, the fragment array variable for a non-standardized feature has the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. However, if the fragment array variable is a scalar, then its single value is assumed to apply to all fragments. + +Use cases for non-standardized features include, but are not limited to: + +* To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. + +* To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. For instance, it may be convenient to store in the aggregation file an attribute from each fragment file so that it is available without having to open and inspect the fragment files themselves. See <>. + + +[[fragment-interpretation, Section 2.8.3, Fragment Interpretation]] +==== Fragment Interpretation + +The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must be convertible to its __canonical form__ by the application program that is creating the aggregated data. A fragment must be converted to its canonical form prior to being inserted into the aggregated data. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. + +The canonical form of a fragment is such that: + +* The fragment's data, in its entirety, provides the values for a unique, contiguous part of the aggregated data. + +* The fragment's data has the same number of dimensions as the aggregated data, and each of those dimensions must uniquely correspond to an aggregated dimension, and be in the same order. + +* Each dimension of the fragment's data has the same sense of directionality (i.e. the sense in which it is increasing in physical space) as its corresponding aggregated dimension. + +* The fragment's data has the same units as the aggregation variable. + +* The fragment's data is not packed (i.e. stored using a smaller data type than the original data). + +* The fragment's data has the same data type as the aggregation variable. + +* The fragment's data has the same indication of missing values as the aggregation variable. + +The conversion of fragments to their canonical form is the responsibility of the application program which is creating the aggregated data, and it is up to the application program to decide what to do in the event that the conversion is not possible. + +The application program is expected to allow some or all of the following operations: + +* Inserting any omitted size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). + +* Converting the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). + +* Casting the data type of the fragment's data to the aggregation variable's data type. Note that some conversions may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. + +* Unpacking the fragment's data. Note that if the aggregation variable indicates that the aggregated data is packed (as specified by attributes defined in <>), then the unpacked fragment data values must represent packed values in the aggregated data. + +* Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. Note that it is up to the creator of the aggregation variable to ensure that the non-missing fragment data values do not coincide with any of the aggregation variable's missing values. From 616c4c0e2d48d6bc8ed112c3d5c8b85a71d93e15 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 18 Apr 2024 18:05:28 +0100 Subject: [PATCH 02/59] aggregation variables first draft --- appl.adoc | 35 +++++++++++++----- ch02.adoc | 104 +++++++++++++++++++++++++++++++++++++++++------------- 2 files changed, 107 insertions(+), 32 deletions(-) diff --git a/appl.adoc b/appl.adoc index cd49b5f6..525f35c9 100644 --- a/appl.adoc +++ b/appl.adoc @@ -3,7 +3,8 @@ [appendix] == Aggregation Variable Examples -This appendix contains examples of aggregation variables. Details of how to encode and decode aggregation variables may found in <>. +This appendix contains examples of aggregation variables. +Details of how to encode and decode aggregation variables may found in <>. [[example-L.1]] [caption=] @@ -70,7 +71,10 @@ data: 73, _, 144, _ ; ---- -In this example, the `temperature` data variable is an aggregation variable. Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. The fragment files names are taken as being relative to the current directory location of the aggregation file, since they are not fully qualified URIs. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +In this example, the `temperature` data variable is an aggregation variable. +Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. +The fragment files names are taken as being relative to the current directory location of the aggregation file, since they are not fully qualified URIs. +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -144,7 +148,10 @@ data: 73, _, 144, _ ; ---- -This example is similar to <>, but now the fragment file names are fully qualified URIs, and two versions of the second fragment have been provided. The `fragment_file` fragment array variable has the extra trailing dimension `versions` to accommodate the extra fragment version. There is only one version of the first fragment, so its trailing dimension is padded with missing data. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +This example is similar to <>, but now the fragment file names are fully qualified URIs, and two versions of the second fragment have been provided. +The `fragment_file` fragment array variable has the extra trailing dimension `versions` to accommodate the extra fragment version. +There is only one version of the first fragment, so its trailing dimension is padded with missing data. +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== [[example-L.3]] @@ -235,7 +242,8 @@ data: 144, _ ; fragment_shape_time = 3, 9 ; ---- -This example is similar to <>, but now the fragment file names have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_file` fragment array variable `fragment_file`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +This example is similar to <>, but now the fragment file names have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_file` fragment array variable `fragment_file`. +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment files as `temperature`. ==== @@ -300,7 +308,10 @@ data: 91, 45, 45, 180, 180, _ ; ---- -This example is an encoding for the fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +This example is an encoding for the fragment array described in <>. +The `temperature` data variable is an aggregation of 6 fragments. +The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== [[example-L.5]] @@ -374,7 +385,10 @@ data: 37, 36, _, _, _, _, _, _, _, _, _, _, 36, 36, 36, 36, _, _, _, _, _, _, _, _ ; ---- -In this example, the `temperature` data variable is an aggregation of 96 fragments. The fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. The `pressure` data variable is not an aggregation variable. The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_file` fragment array variable, are omitted for clarity. +In this example, the `temperature` data variable is an aggregation of 96 fragments. +The fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. +The `pressure` data variable is not an aggregation variable. +The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_file` fragment array variable, are omitted for clarity. ==== [[example-L.6]] @@ -465,7 +479,11 @@ data: fragment_shape = 5000, 4000, 6000 ; fragment_shape_latlon = 1, 1, 1 ; ---- -In this example, three fragments are aggregated into a collection of DSG timeseries feature types with contiguous ragged array representation. The auxiliary coordinate variables which span either the `obs` or `station` dimensions are also aggregation variables. The time variables in the fragment files all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. The fragments for all aggregation variable come from the same three fragment files. No data have been omitted from the CDL. +In this example, three fragments are aggregated into a collection of DSG timeseries feature types with contiguous ragged array representation. +The auxiliary coordinate variables which span either the `obs` or `station` dimensions are also aggregation variables. +The time variables in the fragment files all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. +The fragments for all aggregation variable come from the same three fragment files. +No data have been omitted from the CDL. ==== [[example-L.7]] @@ -538,5 +556,6 @@ data: 144, _ ; fragment_id = "04821b9-7eb5-4046-937b-0bf06b01588", "056d1ee0-a183-43b3-ae67-1ec6aa1532a" ; ---- -This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== \ No newline at end of file diff --git a/ch02.adoc b/ch02.adoc index 24e3c6a4..329a9f0d 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -271,11 +271,16 @@ If a group attribute is defined in a parent group, and one of the child group re [[aggregation-variables, Section 2.8, "Aggregation Variables"]] === Aggregation Variables -An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment files__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. +An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment files__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. +The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. -Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple files, whilst taking up very little space on disk (since the aggregation file contains no copies of the data in the fragments). The fragment files may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. Storing aggregations is useful for data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; and for archive curation, as the aggregation can act as a metadata-rich archive index. +Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple files, whilst taking up very little space on disk (since the aggregation file contains no copies of the data in the fragments). +The fragment files may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. +Storing aggregations is useful for data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; and for archive curation, as the aggregation can act as a metadata-rich archive index. -An aggregation variable must be a scalar (i.e. it has no dimensions) and the value of its single element is immaterial. It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregated data is the same as the data type of the aggregation variable. +An aggregation variable must be a scalar (i.e. it has no dimensions) and the value of its single element is immaterial. +It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. +The data type of the aggregated data is the same as the data type of the aggregation variable. Any variable may be an aggregation variable, and being an aggregation variable does not affect its role within CF (i.e. data variable, coordinate variable, boundary variable, cell measure variable, etc.). @@ -287,9 +292,13 @@ The conventions do not currently offer guidance to dataset creators on how to de [[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] ==== Aggregated Dimensions -The aggregated dimensions must be stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. The value of the **`aggregated_dimensions`** attribute is a blank separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. +The aggregated dimensions must be stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. +The value of the **`aggregated_dimensions`** attribute is a blank separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. +If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. +The aggregated dimensions must exist as dimensions in the aggregation file. -The interpretation of all variables needs to account for the fact that the aggregated dimensions of an aggregation variable have exactly the same status as the dimensions of a normal (i.e. non-aggregation) variable. For instance: +The interpretation of all variables needs to account for the fact that the aggregated dimensions of an aggregation variable have exactly the same status as the dimensions of a normal (i.e. non-aggregation) variable. +For instance: * coordinate and auxiliary coordinate variables must share their dimensions with the aggregated dimensions of their aggregation data variable, * an aggregation coordinate variable (which will be a scalar) must have the same name as its aggregated dimension, @@ -298,7 +307,10 @@ The interpretation of all variables needs to account for the fact that the aggre [[aggregated-data, Section 2.8.2, Aggregated Data]] ==== Aggregated Data -The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See <>. +The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. +Each dimension of the fragment array is called a __fragment dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. +The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. +See <>. The aggregated data is created by concatenating the fragments' data along each fragment dimension, in the order in which they appear in the fragment array. @@ -353,53 +365,94 @@ Fragment data shape `(17, 45, 180)` + `(-45, -90]` degrees north + `[180, 360)` degrees east |=============== -Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane. The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. - +Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. +Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane. +The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. +The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. +See <> for a CDL representation of this fragment array. ==== -The fragment array is defined by the aggregation variable's **`aggregated_data`** attribute. This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides the feature's values for each fragment in the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. +The fragment array is defined by the aggregation variable's **`aggregated_data`** attribute. +This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides the feature's values for each fragment in the fragment array. +The order of elements in the **`aggregated_data`** attribute is not significant. There are four standardized and mandatory features, given by the `file`, `format`, `address`, and `shape` keywords; and any amount of non-standardized features are also allowed: `file` -The string-valued `file` fragment array variable defines how to find each fragment file. In general it has the same shape as the fragment array, and its values specify the fragment file names. Each file name must take one of the following forms: +The string-valued `file` fragment array variable defines how to find each fragment file. +In general it has the same shape as the fragment array, and its values specify the fragment file names. +Each file name must take one of the following forms: -* A fully qualified Uniform Resource Identifier (URI, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. +* A fully qualified Uniform Resource Identifier (URI, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). +If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. -* A file path that is relative to the current location of the aggregation file. If the aggregation file is moved then the fragment files must also be moved to preserve their relative locations. +* A file path that is relative to the current location of the aggregation file. +If the aggregation file is moved then the fragment files must also be moved to preserve their relative locations. -Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. This is useful when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. For instance, this feature could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. See <>. +Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. +Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. +This is useful when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. +For instance, this feature could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. +Every fragment must have at least one version, but not all fragments need have the same number of versions. +If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. +See <>. -A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. The use of substitutions can save space in the aggregation file; and in the event that the fragment files are moved from their original locations it may be possible for the fragment file names to be modified by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating and reading the fragment file. The order of elements is not significant. The _base_ keyword must have the form `${\...}`, where `\...` represents any characters. For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. See <>. +A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. +The use of substitutions can save space in the aggregation file; and in the event that the fragment files are moved from their original locations it may be possible for the fragment file names to be modified by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating and reading the fragment file. +The order of elements is not significant. +The _base_ keyword must have the form `${\...}`, where `\...` represents any characters. +For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. +See <>. `format` -The string-valued `format` fragment array variable defines the format of the fragment files. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. The format of a CF-netCDF fragment file must be indicated with the value `nc`. Other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. +The string-valued `format` fragment array variable defines the format of the fragment files. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. +However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +The format of a CF-netCDF fragment file must be indicated with the value `nc`. +Other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. `address` -The `address` fragment array variable defines how to find the fragments within the fragment files. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. It may have any data type. For a CF-netCDF fragment file, the address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. - +The `address` fragment array variable defines how to find the fragments within the fragment files. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. +However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +It may have any data type. +For a CF-netCDF fragment file, the address must be the fragment's netCDF variable name. +Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. + `shape` -The integer-valued `shape` fragment array variable defines the shape of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment dimension. Each row provides the sizes of the fragments along that dimension of the fragment array. Rows that correspond to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>. +The integer-valued `shape` fragment array variable defines the shape of each fragment in its canonical form (see <>). +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment dimension. +Each row provides the sizes of the fragments along that dimension of the fragment array. +Rows that correspond to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. +When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. +See <>. *Non-standardized features* -Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. In general, the fragment array variable for a non-standardized feature has the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. However, if the fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. +In general, the fragment array variable for a non-standardized feature has the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. +However, if the fragment array variable is a scalar, then its single value is assumed to apply to all fragments. Use cases for non-standardized features include, but are not limited to: * To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. -* To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. For instance, it may be convenient to store in the aggregation file an attribute from each fragment file so that it is available without having to open and inspect the fragment files themselves. See <>. +* To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. +For instance, it may be convenient to store in the aggregation file an attribute from each fragment file so that it is available without having to open and inspect the fragment files themselves. +See <>. [[fragment-interpretation, Section 2.8.3, Fragment Interpretation]] ==== Fragment Interpretation -The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must be convertible to its __canonical form__ by the application program that is creating the aggregated data. A fragment must be converted to its canonical form prior to being inserted into the aggregated data. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. +The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must be convertible to its __canonical form__ by the application program that is creating the aggregated data. +A fragment must be converted to its canonical form prior to being inserted into the aggregated data. +It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. The canonical form of a fragment is such that: @@ -425,8 +478,11 @@ The application program is expected to allow some or all of the following operat * Converting the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). -* Casting the data type of the fragment's data to the aggregation variable's data type. Note that some conversions may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. +* Casting the data type of the fragment's data to the aggregation variable's data type. +Note that some conversions may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. -* Unpacking the fragment's data. Note that if the aggregation variable indicates that the aggregated data is packed (as specified by attributes defined in <>), then the unpacked fragment data values must represent packed values in the aggregated data. +* Unpacking the fragment's data. +Note that if the aggregation variable indicates that the aggregated data is packed (as determinded by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. -* Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. Note that it is up to the creator of the aggregation variable to ensure that the non-missing fragment data values do not coincide with any of the aggregation variable's missing values. +* Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. +Note that it is up to the creator of the aggregation variable to ensure that the non-missing fragment data values do not coincide with any of the aggregation variable's missing values. From bba211459b909c86b7a1fb89ff33fce9c9fd845a Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 18 Apr 2024 23:42:56 +0100 Subject: [PATCH 03/59] cfa --- ch02.adoc | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 329a9f0d..889cb31d 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -301,7 +301,7 @@ The interpretation of all variables needs to account for the fact that the aggre For instance: * coordinate and auxiliary coordinate variables must share their dimensions with the aggregated dimensions of their aggregation data variable, -* an aggregation coordinate variable (which will be a scalar) must have the same name as its aggregated dimension, +* an aggregation coordinate variable must have the same name as its aggregated dimension, * etc. [[aggregated-data, Section 2.8.2, Aggregated Data]] @@ -312,7 +312,6 @@ Each dimension of the fragment array is called a __fragment dimension__, and cor The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See <>. - The aggregated data is created by concatenating the fragments' data along each fragment dimension, in the order in which they appear in the fragment array. Once the aggregated data has been created in memory, it has exactly the same status as the data of a normal (i.e. non-aggregation) variable. From e53b40018aa83640327d5638af8165373b4b46f6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 19 Apr 2024 15:47:39 +0100 Subject: [PATCH 04/59] JMG comments --- ch02.adoc | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 889cb31d..076faefd 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -272,6 +272,7 @@ If a group attribute is defined in a parent group, and one of the child group re === Aggregation Variables An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment files__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. +A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple files, whilst taking up very little space on disk (since the aggregation file contains no copies of the data in the fragments). @@ -284,9 +285,17 @@ The data type of the aggregated data is the same as the data type of the aggrega Any variable may be an aggregation variable, and being an aggregation variable does not affect its role within CF (i.e. data variable, coordinate variable, boundary variable, cell measure variable, etc.). +Aggregation variables may be used as data variables, ancillary variables, coordinate variables, auxiliary coordinate variables, boundary variables and cell measure variables. +Any text applying to any of these kinds of variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role, and any reference to a dimension of such a variable applies to the aggregation dimension of an aggregation variable. +For instance: + +* the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimen +sions of the aggregation data variable, +* an aggregation coordinate variable must have the same name as its aggregated dimension. + The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. -The conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated in this way. +The CF conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated in this way. [[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] @@ -297,12 +306,6 @@ The value of the **`aggregated_dimensions`** attribute is a blank separated list If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. -The interpretation of all variables needs to account for the fact that the aggregated dimensions of an aggregation variable have exactly the same status as the dimensions of a normal (i.e. non-aggregation) variable. -For instance: - -* coordinate and auxiliary coordinate variables must share their dimensions with the aggregated dimensions of their aggregation data variable, -* an aggregation coordinate variable must have the same name as its aggregated dimension, -* etc. [[aggregated-data, Section 2.8.2, Aggregated Data]] ==== Aggregated Data @@ -410,16 +413,17 @@ See <>. The string-valued `format` fragment array variable defines the format of the fragment files. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. -The format of a CF-netCDF fragment file must be indicated with the value `nc`. +The format of a netCDF fragment file must be indicated with the value `nc`. Other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. - +The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. + `address` -The `address` fragment array variable defines how to find the fragments within the fragment files. +The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. It may have any data type. -For a CF-netCDF fragment file, the address must be the fragment's netCDF variable name. +For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. `shape` @@ -449,7 +453,7 @@ See <>. [[fragment-interpretation, Section 2.8.3, Fragment Interpretation]] ==== Fragment Interpretation -The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must be convertible to its __canonical form__ by the application program that is creating the aggregated data. +The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must contain sufficient metadata for it to be convertible to its __canonical form__ by the application program that is creating the aggregated data. A fragment must be converted to its canonical form prior to being inserted into the aggregated data. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. @@ -459,11 +463,9 @@ The canonical form of a fragment is such that: * The fragment's data has the same number of dimensions as the aggregated data, and each of those dimensions must uniquely correspond to an aggregated dimension, and be in the same order. -* Each dimension of the fragment's data has the same sense of directionality (i.e. the sense in which it is increasing in physical space) as its corresponding aggregated dimension. - * The fragment's data has the same units as the aggregation variable. -* The fragment's data is not packed (i.e. stored using a smaller data type than the original data). +* The fragment's data is not packed (i.e. stored using a smaller data type than its original data). * The fragment's data has the same data type as the aggregation variable. From 0ccd1d3580774183ab846f43115f89ca421aa7b6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 22 Apr 2024 14:26:41 +0100 Subject: [PATCH 05/59] remove outdated text --- ch02.adoc | 2 -- 1 file changed, 2 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 076faefd..ac362a88 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -283,8 +283,6 @@ An aggregation variable must be a scalar (i.e. it has no dimensions) and the val It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregated data is the same as the data type of the aggregation variable. -Any variable may be an aggregation variable, and being an aggregation variable does not affect its role within CF (i.e. data variable, coordinate variable, boundary variable, cell measure variable, etc.). - Aggregation variables may be used as data variables, ancillary variables, coordinate variables, auxiliary coordinate variables, boundary variables and cell measure variables. Any text applying to any of these kinds of variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role, and any reference to a dimension of such a variable applies to the aggregation dimension of an aggregation variable. For instance: From 9eecc28e4e149eefe2c9dbaff3820e786694ae50 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 22 Apr 2024 19:49:02 +0100 Subject: [PATCH 06/59] cfa --- bibliography.adoc | 1 + ch02.adoc | 55 +++++++++++++++++++++++++++-------------------- conformance.adoc | 27 +++++++++++++++++++++++ 3 files changed, 60 insertions(+), 23 deletions(-) diff --git a/bibliography.adoc b/bibliography.adoc index 5673cc8d..1269b281 100644 --- a/bibliography.adoc +++ b/bibliography.adoc @@ -20,3 +20,4 @@ OGC document 12-063. 1st May 2015. - [[[XML]]] link:$$https://www.w3.org/TR/1998/REC-xml-19980210$$[Extensible Markup Language (XML) 1.0]. T. Bray, J. Paoli, and C.M. Sperberg-McQueen. 10 February 1998. - [[[CFDM]]] link:$$https://doi.org/10.5194/gmd-10-4619-2017$$[A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)]. Hassell, D., Gregory, J., Blower, J., Lawrence, B. N., and Taylor, K. E.: _Geosci. Model Dev._, 10, 4619-4646, 2017. - [[[UGRID]]] link:$$https://ugrid-conventions.github.io/ugrid-conventions$$[UGRID Conventions for storing unstructured (or flexible mesh) data in netCDF files] +- [[[URI]]] link:$$https://datatracker.ietf.org/doc/html/rfc3986$$[RFC3986. Uniform Resource Identifier (URI): Generic Syntax. January 2005]. diff --git a/ch02.adoc b/ch02.adoc index ac362a88..bc786cf7 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -290,6 +290,7 @@ For instance: * the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimen sions of the aggregation data variable, * an aggregation coordinate variable must have the same name as its aggregated dimension. +* etc. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. @@ -314,8 +315,8 @@ The size of a fragment dimension is equal to the number of fragments that are ne See <>. The aggregated data is created by concatenating the fragments' data along each fragment dimension, in the order in which they appear in the fragment array. +Any text applying to the data of a variable in the CF conventions applies in exactly the same way to the aggregated data of an aggregation variable. -Once the aggregated data has been created in memory, it has exactly the same status as the data of a normal (i.e. non-aggregation) variable. [[example-fragment-array]] [caption="Example 2.2. "] @@ -366,7 +367,7 @@ Fragment data shape `(17, 45, 180)` + `[180, 360)` degrees east |=============== Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. -Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane. +Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which as 1 degree resolution. The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. @@ -384,42 +385,44 @@ The string-valued `file` fragment array variable defines how to find each fragme In general it has the same shape as the fragment array, and its values specify the fragment file names. Each file name must take one of the following forms: -* A fully qualified Uniform Resource Identifier (URI, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). +* A fully qualified Uniform Resource Identifier (URI <>, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. -* A file path that is relative to the current location of the aggregation file. +* A local file path that is relative to the current location of the aggregation file. If the aggregation file is moved then the fragment files must also be moved to preserve their relative locations. +Note that the only way to specify an absolute local file path (i.e. one that specifies the file location from the root directory) is with a file URI. + Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. -This is useful when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. -For instance, this feature could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. +This may be be used when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. +For instance, this could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. See <>. A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. -The use of substitutions can save space in the aggregation file; and in the event that the fragment files are moved from their original locations it may be possible for the fragment file names to be modified by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating and reading the fragment file. +The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating the fragment file. The order of elements is not significant. -The _base_ keyword must have the form `${\...}`, where `\...` represents any characters. +The _base_ keyword must have the form `${\*}`, where `*` represents any number of any characters. For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. See <>. `format` The string-valued `format` fragment array variable defines the format of the fragment files. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. -However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding each fragment version. +However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. The format of a netCDF fragment file must be indicated with the value `nc`. -Other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. +Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. `address` The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment. -However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment version. +However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. It may have any data type. For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. @@ -429,30 +432,36 @@ Addresses for other fragment file formats are allowed, on the understanding that The integer-valued `shape` fragment array variable defines the shape of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment dimension. Each row provides the sizes of the fragments along that dimension of the fragment array. -Rows that correspond to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. +Rows corresponding to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>. *Non-standardized features* Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. +The fragment array variable for a non-standardized feature must either be a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, with or without the extra trailing dimension if the `file` fragment array variable has one. + +* An a the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. +However, if the non-standardized fragment array variable is a scalar, then its single value is assumed to apply to all fragments. + + In general, the fragment array variable for a non-standardized feature has the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. -However, if the fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +However, if the non-standardized fragment array variable is a scalar, then its single value is assumed to apply to all fragments. Use cases for non-standardized features include, but are not limited to: * To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. * To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. -For instance, it may be convenient to store in the aggregation file an attribute from each fragment file so that it is available without having to open and inspect the fragment files themselves. +For instance, it may be convenient to store in the aggregation file an attribute from each fragment file, making it available without having to open and inspect the fragment files themselves. See <>. [[fragment-interpretation, Section 2.8.3, Fragment Interpretation]] ==== Fragment Interpretation -The only restriction on the how a fragment is stored in a fragment file, of any format, is that the fragment must contain sufficient metadata for it to be convertible to its __canonical form__ by the application program that is creating the aggregated data. -A fragment must be converted to its canonical form prior to being inserted into the aggregated data. +A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. +This means that the fragment file must contain data with sufficient metadata that the application program creating the aggregated data can do the conversion. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. The canonical form of a fragment is such that: @@ -473,12 +482,12 @@ The conversion of fragments to their canonical form is the responsibility of the The application program is expected to allow some or all of the following operations: -* Inserting any omitted size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). +* Inserting omitted size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). -* Converting the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). +* Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). -* Casting the data type of the fragment's data to the aggregation variable's data type. -Note that some conversions may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. +* Transforming the fragment's data to have the same data type as the aggregated data. +Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. Note that if the aggregation variable indicates that the aggregated data is packed (as determinded by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. diff --git a/conformance.adoc b/conformance.adoc index fddd28ea..f061f65b 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -124,6 +124,33 @@ References can be absolute, relative or with no path, in which case, the variabl * NUG-coordinate variables that are not in the referring group or one of its direct ancestors should be referenced by absolute or relative paths rather than relying on the lateral search algorithm. +[[aggregation-variables]] +=== 2.8 Aggregation Variables + +*Requirements:* + +* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank separated list of the aggregated dimension names. +Each aggregated dimension must name a dimension in the file. + +* An aggregation variable must be a scalar. + +* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form **`feature: variable`**. Each **`variable`** must be the name of a variable in the file. +The **`feature`** keywords must include `file`, `format`, `address`, and `shape`. + + - The `file` variable must have a string data type and have the same number of dimensions as there are aggregated dimensions, possibly with the inclusion of one extra trailing dimension. + + - The `format` variable must have a string data type and either be a scalar, or else have the same dimensions in the same order as the `file` variable. + + - The `address` variable must either be a scalar, or else have the same dimensions in the same order as the `file` variable. + + - The `shape` variable must have an integer data type. + If there are aggregated dimensions then the `shape` variable must be two dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding its extra trailing dimension if it has one. + The rows correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. + If there are no aggregated dimensions then the `shape` variable must be one dimensional, of size one, and contain the value `1`. + + - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, with or without the extra trailing dimension if the `file` variable has one. + + [[section-6]] [[description-of-the-data]] === 3 Description of the Data From 4a3a390ebc6cb371d161ba7d5eff4c57ed23da8b Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 23 Apr 2024 14:13:13 +0100 Subject: [PATCH 07/59] CFA authors --- cf-conventions.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cf-conventions.adoc b/cf-conventions.adoc index 8ecc4002..c5b308be 100644 --- a/cf-conventions.adoc +++ b/cf-conventions.adoc @@ -1,6 +1,6 @@ include::version.adoc[] = NetCDF Climate and Forecast (CF) Metadata Conventions -Brian{nbsp}Eaton; Jonathan{nbsp}Gregory; Bob{nbsp}Drach; Karl{nbsp}Taylor; Steve{nbsp}Hankin; Jon{nbsp}Blower; John{nbsp}Caron; Rich{nbsp}Signell; Phil{nbsp}Bentley; Greg{nbsp}Rappa; Heinke{nbsp}Höck; Alison{nbsp}Pamment; Martin{nbsp}Juckes; Martin{nbsp}Raspaud; Randy{nbsp}Horne; Timothy{nbsp}Whiteaker; David{nbsp}Blodgett; Charlie{nbsp}Zender; Daniel{nbsp}Lee; David{nbsp}Hassell; Alan{nbsp}D.{nbsp}Snow; Tobias{nbsp}Kölling; Dave{nbsp}Allured; Aleksandar{nbsp}Jelenak; Anders{nbsp}Meier{nbsp}Soerensen; Lucile{nbsp}Gaultier; Sylvain{nbsp}Herlédan; Fernando{nbsp}Manzano; Lars{nbsp}Bärring; Christopher{nbsp}Barker; Sadie{nbsp}Bartholomew +Brian{nbsp}Eaton; Jonathan{nbsp}Gregory; Bob{nbsp}Drach; Karl{nbsp}Taylor; Steve{nbsp}Hankin; Jon{nbsp}Blower; John{nbsp}Caron; Rich{nbsp}Signell; Phil{nbsp}Bentley; Greg{nbsp}Rappa; Heinke{nbsp}Höck; Alison{nbsp}Pamment; Martin{nbsp}Juckes; Martin{nbsp}Raspaud; Randy{nbsp}Horne; Timothy{nbsp}Whiteaker; David{nbsp}Blodgett; Charlie{nbsp}Zender; Daniel{nbsp}Lee; David{nbsp}Hassell; Alan{nbsp}D.{nbsp}Snow; Tobias{nbsp}Kölling; Dave{nbsp}Allured; Aleksandar{nbsp}Jelenak; Anders{nbsp}Meier{nbsp}Soerensen; Lucile{nbsp}Gaultier; Sylvain{nbsp}Herlédan; Fernando{nbsp}Manzano; Lars{nbsp}Bärring; Christopher{nbsp}Barker; Sadie{nbsp}Bartholomew; Bryan{nbsp}Lawrence; Neil{nbsp}Massey Version{nbsp}{current-version},{nbsp}{nbsp}{docprodtime}: See{nbsp}https://cfconventions.org{nbsp}for{nbsp}further{nbsp}information. :doctype: book :pdf-folio-placement: physical @@ -49,6 +49,8 @@ include::toc-extra.adoc[] * Lars Bärring, SMHI * Christopher Barker, NOAA * Sadie Bartholomew, NCAS and University of Reading +* Bryan Lawrence, NCAS and University of Reading +* Neil Massey, NCAS and STFC Many others have contributed to the development of CF through their participation in discussions about proposed changes. From 2bcd3ea3b56aa77e5f134dff8dc49189ddf90d2e Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 23 Apr 2024 14:13:34 +0100 Subject: [PATCH 08/59] clarity --- appl.adoc | 21 +++++---------- bibliography.adoc | 2 +- ch01.adoc | 2 +- ch02.adoc | 66 +++++++++++++++++++++++++---------------------- conformance.adoc | 5 ++++ 5 files changed, 49 insertions(+), 47 deletions(-) diff --git a/appl.adoc b/appl.adoc index 525f35c9..efa6aed7 100644 --- a/appl.adoc +++ b/appl.adoc @@ -12,7 +12,6 @@ Details of how to encode and decode aggregation variables may found in <>. -The CF conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated in this way. +The CF conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated. [[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] @@ -314,7 +324,7 @@ Each dimension of the fragment array is called a __fragment dimension__, and cor The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See <>. -The aggregated data is created by concatenating the fragments' data along each fragment dimension, in the order in which they appear in the fragment array. +The aggregated data is created by concatenating the fragments' data along each fragment dimension, and in the order in which they appear in the fragment array. Any text applying to the data of a variable in the CF conventions applies in exactly the same way to the aggregated data of an aggregation variable. @@ -367,7 +377,7 @@ Fragment data shape `(17, 45, 180)` + `[180, 360)` degrees east |=============== Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. -Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which as 1 degree resolution. +Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. @@ -382,50 +392,52 @@ There are four standardized and mandatory features, given by the `file`, `format `file` The string-valued `file` fragment array variable defines how to find each fragment file. -In general it has the same shape as the fragment array, and its values specify the fragment file names. +In general it has the same shape as the fragment array, and its values provide the fragment file names. Each file name must take one of the following forms: * A fully qualified Uniform Resource Identifier (URI <>, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. * A local file path that is relative to the current location of the aggregation file. -If the aggregation file is moved then the fragment files must also be moved to preserve their relative locations. +If the aggregation file is moved to another location then the fragment files must also be moved to preserve their relative locations. -Note that the only way to specify an absolute local file path (i.e. one that specifies the file location from the root directory) is with a file URI. +Note that the only way to specify an absolute local file path (i.e. one that specifies the file location from the root directory) is with a file URI that starts with `file://`. Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. -This may be be used when it is known in advance that various file locations will be possible for the fragment, but it is not known which of them will exist at any given future time. -For instance, this could be used to define remotely stored and locally cached versions of a fragment, allowing an application program to only commit to the expense of accessing the remote version if the local version does not exist. +This may be be used when it is known that various fragment file locations will be possible, but it is not known in advance which of them might exist at any given time. +For instance, providing remotely stored and locally cached versions of the same fragment could allow an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. See <>. A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. -The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the `file` fragment array variable values themselves. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__base: replacement__", where __base__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__ prior to locating the fragment file. -The order of elements is not significant. -The _base_ keyword must have the form `${\*}`, where `*` represents any number of any characters. +The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__, prior to locating the fragment file. +The order of elements in the **`substitutions`** attribute is not significant. +The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. See <>. `format` The string-valued `format` fragment array variable defines the format of the fragment files. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding each fragment version. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. The format of a netCDF fragment file must be indicated with the value `nc`. Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. -The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. +The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. See <>. + `address` The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value for each fragment version. +In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. It may have any data type. For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. +See <> and <>. `shape` @@ -441,13 +453,6 @@ See <>. Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. The fragment array variable for a non-standardized feature must either be a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, with or without the extra trailing dimension if the `file` fragment array variable has one. -* An a the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. -However, if the non-standardized fragment array variable is a scalar, then its single value is assumed to apply to all fragments. - - -In general, the fragment array variable for a non-standardized feature has the same shape as the fragment array (possibly with extra trailing dimensions), and its values are assumed to apply to the corresponding fragments. -However, if the non-standardized fragment array variable is a scalar, then its single value is assumed to apply to all fragments. - Use cases for non-standardized features include, but are not limited to: * To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. @@ -461,7 +466,7 @@ See <>. ==== Fragment Interpretation A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. -This means that the fragment file must contain data with sufficient metadata that the application program creating the aggregated data can do the conversion. +This means that the fragment file must contain data with sufficient metadata for the application program creating the aggregated data to do the conversion. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. The canonical form of a fragment is such that: @@ -472,15 +477,14 @@ The canonical form of a fragment is such that: * The fragment's data has the same units as the aggregation variable. -* The fragment's data is not packed (i.e. stored using a smaller data type than its original data). +* The fragment's data is not numerically packed (i.e. stored using a smaller data type than its original data). * The fragment's data has the same data type as the aggregation variable. * The fragment's data has the same indication of missing values as the aggregation variable. The conversion of fragments to their canonical form is the responsibility of the application program which is creating the aggregated data, and it is up to the application program to decide what to do in the event that the conversion is not possible. - -The application program is expected to allow some or all of the following operations: +The application program may need to carry out any combination of the following operations when converting a fragment to its canonical form: * Inserting omitted size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). @@ -490,7 +494,7 @@ The application program is expected to allow some or all of the following operat Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data is packed (as determinded by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. +Note that if the aggregation variable indicates that the aggregated data is numerically packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. * Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. Note that it is up to the creator of the aggregation variable to ensure that the non-missing fragment data values do not coincide with any of the aggregation variable's missing values. diff --git a/conformance.adoc b/conformance.adoc index f061f65b..182eff8f 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -134,11 +134,16 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must be a scalar. +* An aggregation variable must one of a data variable, an ancillary variable, a coordinate variable, an auxiliary coordinate variable, a scalar coordinate variable, a boundary variable, a cell measure variable, a connectivity index variable, or a location index set variable. + * An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form **`feature: variable`**. Each **`variable`** must be the name of a variable in the file. The **`feature`** keywords must include `file`, `format`, `address`, and `shape`. - The `file` variable must have a string data type and have the same number of dimensions as there are aggregated dimensions, possibly with the inclusion of one extra trailing dimension. + - The **`substitutions`** attribute of a `file` variable must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. + The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. + - The `format` variable must have a string data type and either be a scalar, or else have the same dimensions in the same order as the `file` variable. - The `address` variable must either be a scalar, or else have the same dimensions in the same order as the `file` variable. From 9b5f18be0c911bab86dfce5e63ffa906df376d4c Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 23 Apr 2024 23:25:17 +0100 Subject: [PATCH 09/59] tidy --- ch02.adoc | 2 -- conformance.adoc | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 5d1bf916..73425b37 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -304,8 +304,6 @@ For instance: The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. -The CF conventions do not currently offer guidance to dataset creators on how to decide if two or more fragments can be aggregated. - [[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] ==== Aggregated Dimensions diff --git a/conformance.adoc b/conformance.adoc index 182eff8f..19904dee 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -136,8 +136,8 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must one of a data variable, an ancillary variable, a coordinate variable, an auxiliary coordinate variable, a scalar coordinate variable, a boundary variable, a cell measure variable, a connectivity index variable, or a location index set variable. -* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form **`feature: variable`**. Each **`variable`** must be the name of a variable in the file. -The **`feature`** keywords must include `file`, `format`, `address`, and `shape`. +* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. +The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The `file` variable must have a string data type and have the same number of dimensions as there are aggregated dimensions, possibly with the inclusion of one extra trailing dimension. From 9bd81a2f6852593e80498df0052d31ad1c284214 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 24 Apr 2024 09:51:06 +0100 Subject: [PATCH 10/59] tidy --- appl.adoc | 54 ++++++++++++++++++++++++++++-------------------------- ch02.adoc | 54 +++++++++++++++++++++++++++--------------------------- 2 files changed, 55 insertions(+), 53 deletions(-) diff --git a/appl.adoc b/appl.adoc index efa6aed7..26400798 100644 --- a/appl.adoc +++ b/appl.adoc @@ -16,14 +16,14 @@ dimensions: level = 1 ; latitude = 73 ; longitude = 144 ; - // Fragment dimensions + // Fragment array dimensions f_time = 2 ; f_level = 1 ; f_latitude = 1 ; f_longitude = 1 ; - // Extra dimensions + // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions - i = 2 ; // Equal to the size of the largest fragment dimension + i = 2 ; // Equal to the size of the largest fragment array dimension variables: // Data variable @@ -88,14 +88,15 @@ dimensions: level = 1 ; latitude = 73 ; longitude = 144 ; - // Fragment dimensions + // Fragment array dimensions f_time = 2 ; f_level = 1 ; f_latitude = 1 ; f_longitude = 1 ; - // Extra dimensions + // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions - i = 2 ; // Equal to the size of the largest fragment dimension + i = 2 ; // Equal to the size of the largest fragment array dimension + // Fragment versions dimension versions = 2 ; // The maximum number of versions for a fragment variables: @@ -163,16 +164,17 @@ dimensions: level = 1 ; latitude = 73 ; longitude = 144 ; - // Fragment dimensions + // Fragment array dimensions f_time = 2 ; f_level = 1 ; f_latitude = 1 ; f_longitude = 1 ; - // Extra dimensions + // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions - i = 2 ; // Equal to the size of the largest fragment dimension + j_time = 1 ; // Equal to the number of aggregated dimensions for time + i = 2 ; // Equal to the size of the largest fragment array dimension + // Fragment versions dimension versions = 2 ; // The maximum number of versions for a fragment - j_time = 1 ; // Equal to the he number of aggregated dimensions for time variables: // Data variable @@ -255,13 +257,13 @@ dimensions: level = 17 ; latitude = 181 ; longitude = 360 ; - // Fragment dimensions + // Fragment array dimensions f_level = 1 ; f_latitude = 3 ; f_longitude = 2 ; - // Extra dimensions + // Fragment shape dimensions j = 3 ; // Equal to the number of aggregated dimensions - i = 3 ; // Equal to the size of the largest fragment dimension + i = 3 ; // Equal to the size of the largest fragment array dimension variables: // Data variable @@ -269,7 +271,7 @@ variables: temperature:standard_name = "air_temperature" ; temperature:units = "K" ; temperature:cell_methods = "time: mean" ; - temperature:aggregated_dimensions = "time level latitude longitude" ; + temperature:aggregated_dimensions = "level latitude longitude" ; temperature:aggregated_data = "file: fragment_file format: fragment_format address: fragment_address @@ -286,7 +288,7 @@ variables: longitude:units = "degrees_east" ; // Fragment array variables - string fragment_file(f_level, f_latitude, f_longitude) ; + string fragment_file(f_level, f_latitude, f_longitude) ; string fragment_format ; string fragment_address ; int fragment_shape(j, i) ; @@ -307,7 +309,7 @@ data: ---- This example is an encoding for the fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. -The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. +The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -321,14 +323,14 @@ dimensions: level = 1 ; latitude = 73 ; longitude = 144 ; - // Fragment dimensions + // Fragment array dimensions f_time = 12 ; f_level = 1 ; f_latitude = 2 ; f_longitude = 4 ; - // Extra dimensions - j = 4 ; // Equal to the number of aggregated dimensions - i = 12 ; // Equal to the size of the largest fragment dimension + // Fragment shape dimensions + j = 4 ; // Equal to the number of aggregated dimensions + i = 12 ; // Equal to the size of the largest fragment array dimension variables: // Data variable @@ -395,11 +397,11 @@ The data for the `pressure`, `level`, `latitude` and `longitude` variables, and dimensions: station = 3 ; obs = 15000 ; - // Fragment dimensions + // Fragment array dimensions f_station = 3 ; - // Extra dimensions + // Fragment shape dimensions j = 1 ; // Equal to the number of aggregated dimensions - i = 3 ; // Equal to the size of the largest fragment dimension + i = 3 ; // Equal to the size of the largest fragment array dimension variables: // Data variable @@ -490,14 +492,14 @@ dimensions: level = 1 ; latitude = 73 ; longitude = 144 ; - // Fragment dimensions + // Fragment array dimensions f_time = 2 ; f_level = 1 ; f_latitude = 1 ; f_longitude = 1 ; - // Extra dimensions + // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions - i = 2 ; // Equal to the size of the largest fragment dimension + i = 2 ; // Equal to the size of the largest fragment array dimension variables: // Data variable diff --git a/ch02.adoc b/ch02.adoc index 73425b37..91725c2b 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -318,11 +318,11 @@ The aggregated dimensions must exist as dimensions in the aggregation file. ==== Aggregated Data The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. -Each dimension of the fragment array is called a __fragment dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. -The size of a fragment dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. +Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. +The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See <>. -The aggregated data is created by concatenating the fragments' data along each fragment dimension, and in the order in which they appear in the fragment array. +The aggregated data are created by concatenating the fragments' data along each fragment array dimension, and in the order in which they appear in the fragment array. Any text applying to the data of a variable in the CF conventions applies in exactly the same way to the aggregated data of an aggregation variable. @@ -376,7 +376,7 @@ Fragment data shape `(17, 45, 180)` + |=============== Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. -The fragments combine to create three-dimensional aggregated data that has global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. +The fragments combine to create three-dimensional aggregated data that have global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. ==== @@ -387,10 +387,10 @@ The order of elements in the **`aggregated_data`** attribute is not significant. There are four standardized and mandatory features, given by the `file`, `format`, `address`, and `shape` keywords; and any amount of non-standardized features are also allowed: -`file` +*file* The string-valued `file` fragment array variable defines how to find each fragment file. -In general it has the same shape as the fragment array, and its values provide the fragment file names. +In general its data have the same shape as the fragment array, and its values provide the fragment file names. Each file name must take one of the following forms: * A fully qualified Uniform Resource Identifier (URI <>, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). @@ -401,12 +401,12 @@ If the aggregation file is moved to another location then the fragment files mus Note that the only way to specify an absolute local file path (i.e. one that specifies the file location from the root directory) is with a file URI that starts with `file://`. -Multiple versions of a fragment may be provided if an extra trailing dimension is included in the `file` fragment array variable. +An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. This may be be used when it is known that various fragment file locations will be possible, but it is not known in advance which of them might exist at any given time. For instance, providing remotely stored and locally cached versions of the same fragment could allow an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. -If a fragment has fewer versions than some others, then its trailing dimension must be padded with missing values. +Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. @@ -417,39 +417,39 @@ The __substitution__ keyword must have the form `${\*}`, where `*` represents an For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. See <>. -`format` +*format* The string-valued `format` fragment array variable defines the format of the fragment files. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. -However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. +In general it must have the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. The format of a netCDF fragment file must be indicated with the value `nc`. Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. See <>. -`address` +*address* The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. -In general it has the same shape as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. -However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragment versions. +In general it must have the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. It may have any data type. For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. See <> and <>. -`shape` +*shape* -The integer-valued `shape` fragment array variable defines the shape of each fragment in its canonical form (see <>). -In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment dimension. +The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. Each row provides the sizes of the fragments along that dimension of the fragment array. -Rows corresponding to fragment dimensions that are smaller than the largest fragment dimension are padded with missing values. +Rows corresponding to fragment array dimensions that are smaller than the largest fragment array dimension are padded with missing values. When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>. *Non-standardized features* Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. -The fragment array variable for a non-standardized feature must either be a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, with or without the extra trailing dimension if the `file` fragment array variable has one. +The fragment array variable for a non-standardized feature must either be a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions, if it exists. Use cases for non-standardized features include, but are not limited to: @@ -464,22 +464,22 @@ See <>. ==== Fragment Interpretation A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. -This means that the fragment file must contain data with sufficient metadata for the application program creating the aggregated data to do the conversion. +The fragment file must contain an array of data with sufficient metadata for it to be convertible to its canonical form by the application program that is creating the aggregated data. It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. The canonical form of a fragment is such that: -* The fragment's data, in its entirety, provides the values for a unique, contiguous part of the aggregated data. +* The fragment's data, in its entirety, provide the values for a unique, contiguous part of the aggregated data. -* The fragment's data has the same number of dimensions as the aggregated data, and each of those dimensions must uniquely correspond to an aggregated dimension, and be in the same order. +* The fragment's data have the same number of dimensions as the aggregated data, and each of those dimensions must uniquely correspond to an aggregated dimension, and be in the same order. -* The fragment's data has the same units as the aggregation variable. +* The fragment's data have the same units as the aggregation variable. -* The fragment's data is not numerically packed (i.e. stored using a smaller data type than its original data). +* The fragment's data are not numerically packed (i.e. stored using a smaller data type than its original data). -* The fragment's data has the same data type as the aggregation variable. +* The fragment's data have the same data type as the aggregation variable. -* The fragment's data has the same indication of missing values as the aggregation variable. +* The fragment's data have the same indication of missing values as the aggregation variable. The conversion of fragments to their canonical form is the responsibility of the application program which is creating the aggregated data, and it is up to the application program to decide what to do in the event that the conversion is not possible. The application program may need to carry out any combination of the following operations when converting a fragment to its canonical form: @@ -495,4 +495,4 @@ Note that some transformations may result in a loss of information (as could be Note that if the aggregation variable indicates that the aggregated data is numerically packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. * Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. -Note that it is up to the creator of the aggregation variable to ensure that the non-missing fragment data values do not coincide with any of the aggregation variable's missing values. +Note that it is up to the creator of the aggregation variable to ensure that none of the aggregation variable's missing values coincide with non-missing fragment data values. From eb788ac3a78ee69f3b066d3fb0e4cd2ef96dae60 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 24 Apr 2024 15:26:44 +0100 Subject: [PATCH 11/59] clarify use of URIs --- appl.adoc | 16 +++++++++++----- ch02.adoc | 22 +++++++--------------- conformance.adoc | 4 ++-- 3 files changed, 20 insertions(+), 22 deletions(-) diff --git a/appl.adoc b/appl.adoc index 26400798..849e10ae 100644 --- a/appl.adoc +++ b/appl.adoc @@ -71,8 +71,9 @@ data: 144, _ ; ---- In this example, the `temperature` data variable is an aggregation variable. -Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. -The fragment files names are taken as being relative to the current directory location of the aggregation file, since they are not fully qualified URIs. +Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with data shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. +The fragment file names are relative URIs, and so in this case are assumed to be in the same location as the aggregation file. + The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -148,9 +149,10 @@ data: 73, _, 144, _ ; ---- -This example is similar to <>, but now the fragment file names are fully qualified URIs, and two versions of the second fragment have been provided. +This example is similar to <>, but now the fragment file names are absolute URIs, and two versions of the second fragment have been provided. The `fragment_file` fragment array variable has the extra trailing dimension `versions` to accommodate the extra fragment version. There is only one version of the first fragment, so its trailing dimension is padded with missing data. + The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -243,9 +245,9 @@ data: fragment_shape_time = 3, 9 ; ---- This example is similar to <>, but now the fragment file names have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_file` fragment array variable `fragment_file`. -The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. - In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment files as `temperature`. + +The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== [[example-L.4]] @@ -310,6 +312,7 @@ data: This example is an encoding for the fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments. + The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -386,6 +389,7 @@ data: In this example, the `temperature` data variable is an aggregation of 96 fragments. The fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. The `pressure` data variable is not an aggregation variable. + The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_file` fragment array variable, are omitted for clarity. ==== @@ -479,6 +483,7 @@ In this example, three fragments are aggregated into a collection of DSG timeser The auxiliary coordinate variables `time`, `lon`, and `lat` are also aggregation variables. The time variables in the fragment files all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. The fragments for all aggregation variables come from the same three fragment files, in this case. + No data have been omitted from the CDL. ==== @@ -552,5 +557,6 @@ data: fragment_id = "04821b9-7eb5-4046-937b-0bf06b01588", "056d1ee0-a183-43b3-ae67-1ec6aa1532a" ; ---- This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. + The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== \ No newline at end of file diff --git a/ch02.adoc b/ch02.adoc index 91725c2b..c9571027 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -389,17 +389,10 @@ There are four standardized and mandatory features, given by the `file`, `format *file* -The string-valued `file` fragment array variable defines how to find each fragment file. -In general its data have the same shape as the fragment array, and its values provide the fragment file names. -Each file name must take one of the following forms: - -* A fully qualified Uniform Resource Identifier (URI <>, i.e. one that starts with `file://`, `http://`, `s3://`, etc.). -If the aggregation file is moved to another location then it will still be able to access the fragment files which haven't moved. - -* A local file path that is relative to the current location of the aggregation file. -If the aggregation file is moved to another location then the fragment files must also be moved to preserve their relative locations. - -Note that the only way to specify an absolute local file path (i.e. one that specifies the file location from the root directory) is with a file URI that starts with `file://`. +The string-valued `file` fragment array variable defines the locations of the fragment files. +In general its dimensions are the fragment array dimensions in the same order as they occur in the fragment array, and its values provide the fragment file names. +Each fragment file name must be a Uniform Resource Identifier (URI) <>. If the fragment file name is a relative URI (i.e. a URI that does __not__ start with a scheme name followed by a colon, such as `file:`, `http:`, `s3:`, etc.) then it is taken as being relative to the current location of the aggregation file, and if the aggregation file is moved to another location then the fragment file may also need be moved for it to remain accessible. +See <> and <>. An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. @@ -411,7 +404,7 @@ See <>. A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by the string defined by __replacement__, prior to locating the fragment file. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by __replacement__, prior to locating the fragment file. The order of elements in the **`substitutions`** attribute is not significant. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. @@ -426,7 +419,6 @@ The format of a netCDF fragment file must be indicated with the value `nc`. Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. See <>. - *address* The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. @@ -444,12 +436,12 @@ In general, the `shape` fragment array variable is two-dimensional, with the siz Each row provides the sizes of the fragments along that dimension of the fragment array. Rows corresponding to fragment array dimensions that are smaller than the largest fragment array dimension are padded with missing values. When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. -See <>. +See <>. *Non-standardized features* Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. -The fragment array variable for a non-standardized feature must either be a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions, if it exists. +The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions, if it exists. Use cases for non-standardized features include, but are not limited to: diff --git a/conformance.adoc b/conformance.adoc index 19904dee..21a0f410 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -144,9 +144,9 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The **`substitutions`** attribute of a `file` variable must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - The `format` variable must have a string data type and either be a scalar, or else have the same dimensions in the same order as the `file` variable. + - The `format` variable must have a string data type and be either a scalar, or else have the same dimensions in the same order as the `file` variable. - - The `address` variable must either be a scalar, or else have the same dimensions in the same order as the `file` variable. + - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. - The `shape` variable must have an integer data type. If there are aggregated dimensions then the `shape` variable must be two dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding its extra trailing dimension if it has one. From d75b597ce8b1396de41dc7bbd572743bdd5abc9c Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 24 Apr 2024 18:03:41 +0100 Subject: [PATCH 12/59] cfa --- ch02.adoc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index c9571027..0edc632c 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -391,7 +391,9 @@ There are four standardized and mandatory features, given by the `file`, `format The string-valued `file` fragment array variable defines the locations of the fragment files. In general its dimensions are the fragment array dimensions in the same order as they occur in the fragment array, and its values provide the fragment file names. -Each fragment file name must be a Uniform Resource Identifier (URI) <>. If the fragment file name is a relative URI (i.e. a URI that does __not__ start with a scheme name followed by a colon, such as `file:`, `http:`, `s3:`, etc.) then it is taken as being relative to the current location of the aggregation file, and if the aggregation file is moved to another location then the fragment file may also need be moved for it to remain accessible. +Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (one that begins with a scheme component followed by a colon, such as `file:`, `http:`, `s3:`, etc.), or else a __relative-path URI reference__ (typically one that is not an absolute URI and does not begin with a `/` character). +A relative-path URI reference is taken as being relative to the current location of the aggregation file. +If the aggregation file is moved to another location then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved so that the reference resolution still locates the fragment. See <> and <>. An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. From e09485360f6f0d12927916a2e9582e320542df8e Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 25 Apr 2024 10:27:32 +0100 Subject: [PATCH 13/59] clarity --- appl.adoc | 4 +++- ch02.adoc | 27 +++++++-------------------- conformance.adoc | 12 +++++------- 3 files changed, 15 insertions(+), 28 deletions(-) diff --git a/appl.adoc b/appl.adoc index 849e10ae..edd0d4fd 100644 --- a/appl.adoc +++ b/appl.adoc @@ -311,7 +311,9 @@ data: ---- This example is an encoding for the fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. -The fragment array shape is `(1, 3, 2)`, indicating that two of the three aggregated dimensions are spanned by multiple fragments. The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments. +The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and +that the shape of the implied fragment array is `(1, 3, 2)`. +The row sums of the `fragment_shape` fragment array variable are `17`, `181`, and `360`, which equal the sizes of the `level`, `latitude`, and `longitude` aggregated dimensions, respectively. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== diff --git a/ch02.adoc b/ch02.adoc index 0edc632c..7c47fcf0 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -283,19 +283,8 @@ An aggregation variable must be a scalar (i.e. it has no dimensions) and the val It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregated data is the same as the data type of the aggregation variable. -Aggregation variables may be used as: - -* data variables -* ancillary variables -* coordinate variables -* auxiliary coordinate variables -* scalar coordinate variables -* boundary variables -* cell measure variables -* connectivity index variables -* location index set variables - -Any text applying to any of these kinds of variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role, and any reference to a dimension of such a variable applies to the aggregated dimension of an aggregation variable. +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), and any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role. +Any reference to the data or dimensions of a variable applies to the aggregated data or aggregated dimensions, respectively, of an aggregation variable. For instance: * the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable, @@ -323,8 +312,6 @@ The size of a fragment array dimension is equal to the number of fragments that See <>. The aggregated data are created by concatenating the fragments' data along each fragment array dimension, and in the order in which they appear in the fragment array. -Any text applying to the data of a variable in the CF conventions applies in exactly the same way to the aggregated data of an aggregation variable. - [[example-fragment-array]] [caption="Example 2.2. "] @@ -391,9 +378,9 @@ There are four standardized and mandatory features, given by the `file`, `format The string-valued `file` fragment array variable defines the locations of the fragment files. In general its dimensions are the fragment array dimensions in the same order as they occur in the fragment array, and its values provide the fragment file names. -Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (one that begins with a scheme component followed by a colon, such as `file:`, `http:`, `s3:`, etc.), or else a __relative-path URI reference__ (typically one that is not an absolute URI and does not begin with a `/` character). +Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/store/file.nc`, `\https://data/store/file.nc`, and `s3://data/store/file.nc`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, and `data/file.nc`). A relative-path URI reference is taken as being relative to the current location of the aggregation file. -If the aggregation file is moved to another location then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved so that the reference resolution still locates the fragment. +When the aggregation file is moved to another location then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. See <> and <>. An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. @@ -435,9 +422,9 @@ See <> and <>. The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. -Each row provides the sizes of the fragments along that dimension of the fragment array. +Each row provides the sizes of the fragments along that dimension of the fragment array, whose sum must equal the size of the corresponding aggregated dimension. Rows corresponding to fragment array dimensions that are smaller than the largest fragment array dimension are padded with missing values. -When the aggregated data is a scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. +When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>. *Non-standardized features* @@ -469,7 +456,7 @@ The canonical form of a fragment is such that: * The fragment's data have the same units as the aggregation variable. -* The fragment's data are not numerically packed (i.e. stored using a smaller data type than its original data). +* The fragment's data are not numerically packed (i.e. not stored using a smaller data type than its original data). * The fragment's data have the same data type as the aggregation variable. diff --git a/conformance.adoc b/conformance.adoc index 21a0f410..28166b1e 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -129,19 +129,17 @@ References can be absolute, relative or with no path, in which case, the variabl *Requirements:* -* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank separated list of the aggregated dimension names. +* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank separated list of zero or more aggregated dimension names. Each aggregated dimension must name a dimension in the file. * An aggregation variable must be a scalar. -* An aggregation variable must one of a data variable, an ancillary variable, a coordinate variable, an auxiliary coordinate variable, a scalar coordinate variable, a boundary variable, a cell measure variable, a connectivity index variable, or a location index set variable. - * An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - - The `file` variable must have a string data type and have the same number of dimensions as there are aggregated dimensions, possibly with the inclusion of one extra trailing dimension. + - The `file` variable must have a string data type and either have the same number of dimensions as there are aggregated dimensions, or else optionally also including one extra trailing dimension. - - The **`substitutions`** attribute of a `file` variable must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. + - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - The `format` variable must have a string data type and be either a scalar, or else have the same dimensions in the same order as the `file` variable. @@ -149,11 +147,11 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. - The `shape` variable must have an integer data type. - If there are aggregated dimensions then the `shape` variable must be two dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding its extra trailing dimension if it has one. + If there are one or more aggregated dimensions then the `shape` variable must be two dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding its extra trailing dimension if it has one. The rows correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. If there are no aggregated dimensions then the `shape` variable must be one dimensional, of size one, and contain the value `1`. - - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, with or without the extra trailing dimension if the `file` variable has one. + - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, optionally excluding the extra trailing dimension if it has one. [[section-6]] From 02e49491e392fa68ed739ec4207730920fe8ce92 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 25 Apr 2024 13:44:29 +0100 Subject: [PATCH 14/59] cfa --- appa.adoc | 6 +++--- appl.adoc | 9 ++++----- ch02.adoc | 11 ++++++----- conformance.adoc | 21 +++++++++++++++------ 4 files changed, 28 insertions(+), 19 deletions(-) diff --git a/appa.adoc b/appa.adoc index 48cda684..5554b091 100644 --- a/appa.adoc +++ b/appa.adoc @@ -9,7 +9,7 @@ See <> for the grid mapping attributes, and <> for the distinction between **BI** and **BO**), **A** for an aggregation variable, and **-** for variables with some other purpose. +For variable attributes, the possible values of "Use" are: **C** for variables containing coordinate data, **D** for data variables, **M** for geometry container variables, **Do** for domain variables, **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), **A** for an aggregation variable (see <>), and **-** for variables with some other purpose. CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. "Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary). @@ -42,13 +42,13 @@ In cases where there is a strong constraint on dataset size, it is allowed to pa | S | A | <> -| Records the aggregation instructions that define how to create an aggregation variable's aggregated data. +| Records the aggregation instructions that define how to create the aggregated data of an aggregation variable. | **`aggregated_dimensions`** | S | A | <> -| Identifies the dimensions of an aggregation variable's aggregated data. +| Identifies the dimensions of the aggregated data of an aggregation variable. | **`ancillary_variables`** | S diff --git a/appl.adoc b/appl.adoc index edd0d4fd..9991d071 100644 --- a/appl.adoc +++ b/appl.adoc @@ -309,7 +309,7 @@ data: 91, 45, 45, 180, 180, _ ; ---- -This example is an encoding for the fragment array described in <>. +This example is an encoding for the conceptual fragment array described in <>. The `temperature` data variable is an aggregation of 6 fragments. The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. @@ -464,7 +464,7 @@ variables: int fragment_shape_latlon(j, i) ; // global attributes: - :featureType = "timeSeries"; + :featureType = "timeSeries" ; data: tas = _ ; @@ -520,7 +520,6 @@ variables: address: fragment_address shape: fragment_shape id: fragment_id" ; // Non-standardized feature - // Coordinate variables double time(time) ; time:standard_name = "time" ; @@ -541,7 +540,7 @@ variables: string fragment_address ; int fragment_shape(j, i) ; string fragment_id(f_time, f_level, f_latitude, f_longitude) ; - fragment_id:long_name = "Fragment file unique identifiers" + fragment_id:long_name = "Fragment file unique identifiers" ; data: temperature = _ ; @@ -556,7 +555,7 @@ data: 1, _, 73, _, 144, _ ; - fragment_id = "04821b9-7eb5-4046-937b-0bf06b01588", "056d1ee0-a183-43b3-ae67-1ec6aa1532a" ; + fragment_id = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; ---- This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. diff --git a/ch02.adoc b/ch02.adoc index 7c47fcf0..488b551d 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -297,7 +297,7 @@ The details of how to encode and decode aggregation variables are given in this [[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] ==== Aggregated Dimensions -The aggregated dimensions must be stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. +The aggregated dimensions of an aggregation variable are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. The value of the **`aggregated_dimensions`** attribute is a blank separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. @@ -368,7 +368,7 @@ The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension See <> for a CDL representation of this fragment array. ==== -The fragment array is defined by the aggregation variable's **`aggregated_data`** attribute. +The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides the feature's values for each fragment in the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. @@ -377,8 +377,9 @@ There are four standardized and mandatory features, given by the `file`, `format *file* The string-valued `file` fragment array variable defines the locations of the fragment files. -In general its dimensions are the fragment array dimensions in the same order as they occur in the fragment array, and its values provide the fragment file names. -Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/store/file.nc`, `\https://data/store/file.nc`, and `s3://data/store/file.nc`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, and `data/file.nc`). +In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. +The `file` fragment array variable values provide the fragment file names. +Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://data/file.nc`, and `s3://data/file.nc`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, and `data/file.nc`). A relative-path URI reference is taken as being relative to the current location of the aggregation file. When the aggregation file is moved to another location then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. See <> and <>. @@ -392,7 +393,7 @@ Where fragments have fewer versions than others, the trailing dimension must be See <>. A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. -The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. +The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified, it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by __replacement__, prior to locating the fragment file. The order of elements in the **`substitutions`** attribute is not significant. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. diff --git a/conformance.adoc b/conformance.adoc index 28166b1e..7c2f87ac 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -137,21 +137,30 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - - The `file` variable must have a string data type and either have the same number of dimensions as there are aggregated dimensions, or else optionally also including one extra trailing dimension. + - The `file` variable must have a string data type. + + - The `file` variable must either have the same number of dimensions as there are aggregated dimensions, or else optionally also including one extra trailing dimension. - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - The `format` variable must have a string data type and be either a scalar, or else have the same dimensions in the same order as the `file` variable. + - The data values of a `file` variable must not start with a `/` or `#` character. + + - The `format` variable must have a string data type.variable. + + - The `format` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. - The `shape` variable must have an integer data type. - If there are one or more aggregated dimensions then the `shape` variable must be two dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding its extra trailing dimension if it has one. - The rows correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. - If there are no aggregated dimensions then the `shape` variable must be one dimensional, of size one, and contain the value `1`. - - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, optionally excluding the extra trailing dimension if it has one. + - If there are zero aggregated dimensions then the `shape` variable must be one-dimensional, of size one, and contain the value `1`. + + - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding the extra trailing dimension if the `file` variable has one. + + - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. + + - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, optionally excluding the extra trailing dimension if the `file` variable has one. [[section-6]] From b7750316300ffbec19222249b836fc1bf034e5f2 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 25 Apr 2024 16:59:45 +0100 Subject: [PATCH 15/59] cfa --- ch02.adoc | 4 ++-- history.adoc | 1 + 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 488b551d..32ba1108 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -381,12 +381,12 @@ In general its dimensions correspond to, and have the same sizes as, the fragmen The `file` fragment array variable values provide the fragment file names. Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://data/file.nc`, and `s3://data/file.nc`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, and `data/file.nc`). A relative-path URI reference is taken as being relative to the current location of the aggregation file. -When the aggregation file is moved to another location then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. +If the aggregation file is moved to another location, then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. See <> and <>. An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. -This may be be used when it is known that various fragment file locations will be possible, but it is not known in advance which of them might exist at any given time. +This may be used when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. For instance, providing remotely stored and locally cached versions of the same fragment could allow an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. diff --git a/history.adoc b/history.adoc index 5c246d51..88e7ab40 100644 --- a/history.adoc +++ b/history.adoc @@ -7,6 +7,7 @@ === Working version (most recent first) +* {issues}508[Issue #508]: Introducing aggregation variables * {issues}511[Issue #511]: Appendix B: New element in XML file header to record the "first published date" * {issues}509[Issue #509]: In exceptional cases allow a standard name to be aliased into two alternatives * {issues}501[Issue #501]: Clarify that data variables and variables containing coordinate data are highly recommended to have **`long_name`** or **`standard_name`** attributes, that **`cf_role`** is used only for discrete sampling geometries and UGRID mesh topologies, and that CF does not prohibit CF attributes from being used in ways that are not defined by CF but that in such cases their meaning is not defined by CF. From a8bc2802c34acfde7e548d0915c04ec0c702969b Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 26 Apr 2024 15:00:22 +0100 Subject: [PATCH 16/59] cfa --- appl.adoc | 2 +- ch02.adoc | 21 ++++++++++++--------- conformance.adoc | 3 +++ 3 files changed, 16 insertions(+), 10 deletions(-) diff --git a/appl.adoc b/appl.adoc index 9991d071..e2205da4 100644 --- a/appl.adoc +++ b/appl.adoc @@ -72,7 +72,7 @@ data: ---- In this example, the `temperature` data variable is an aggregation variable. Its four-dimensional aggregated data with shape `(12, 1, 73, 144)` is constructed from two non-overlapping fragments, with data shapes `(3, 1, 73, 144)` and `(9, 1, 73, 144)`, which span the first 3 and last 9 elements respectively of the `time` aggregated dimension. -The fragment file names are relative URIs, and so in this case are assumed to be in the same location as the aggregation file. +The fragment file names are relative-path URI references, and so in this case are assumed to be in the same location as the aggregation file. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== diff --git a/ch02.adoc b/ch02.adoc index 32ba1108..b5e21a5e 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -279,11 +279,12 @@ Aggregation provides the utility of being able to view, as a single entity, a da The fragment files may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. Storing aggregations is useful for data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; and for archive curation, as the aggregation can act as a metadata-rich archive index. -An aggregation variable must be a scalar (i.e. it has no dimensions) and the value of its single element is immaterial. +An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. -The data type of the aggregated data is the same as the data type of the aggregation variable. +The data type of the aggregation variable must be the data type of the aggregated data, but the value of the aggregation variable's single element is immaterial. -Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), and any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role. +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. +Any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role. Any reference to the data or dimensions of a variable applies to the aggregated data or aggregated dimensions, respectively, of an aggregation variable. For instance: @@ -379,14 +380,15 @@ There are four standardized and mandatory features, given by the `file`, `format The string-valued `file` fragment array variable defines the locations of the fragment files. In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. The `file` fragment array variable values provide the fragment file names. -Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://data/file.nc`, and `s3://data/file.nc`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, and `data/file.nc`). -A relative-path URI reference is taken as being relative to the current location of the aggregation file. +Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). +The location of a fragment file given by a relative-path URI reference is taken as being relative to the location of the aggregation file. If the aggregation file is moved to another location, then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. +Not all fragment file names need be of the same URI type. See <> and <>. An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. +This may be useful when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. -This may be used when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. For instance, providing remotely stored and locally cached versions of the same fragment could allow an application program to only commit to the expense of accessing the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. @@ -423,10 +425,11 @@ See <> and <>. The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. -Each row provides the sizes of the fragments along that dimension of the fragment array, whose sum must equal the size of the corresponding aggregated dimension. -Rows corresponding to fragment array dimensions that are smaller than the largest fragment array dimension are padded with missing values. +The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along that dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. +The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. -See <>. +See <>, which includes the `shape` fragment array variable for the fragment array described by <>. + *Non-standardized features* diff --git a/conformance.adoc b/conformance.adoc index 7c2f87ac..ae00454e 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -162,6 +162,9 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, optionally excluding the extra trailing dimension if the `file` variable has one. +*Recommendations:* + +* The following kinds of variable should not aggregation variables: grid mapping variable, domain variable, mesh topology variable, geometry container variable, interpolation variable. [[section-6]] [[description-of-the-data]] From c5e08c64bb42b9ce81c39c6da76de59d93d5b716 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Sun, 28 Apr 2024 13:36:17 +0100 Subject: [PATCH 17/59] cfa --- ch02.adoc | 25 +++++++++++++------------ conformance.adoc | 6 +++--- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index b5e21a5e..b7c80902 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -379,13 +379,22 @@ There are four standardized and mandatory features, given by the `file`, `format The string-valued `file` fragment array variable defines the locations of the fragment files. In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. -The `file` fragment array variable values provide the fragment file names. -Each fragment file name must be a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). -The location of a fragment file given by a relative-path URI reference is taken as being relative to the location of the aggregation file. +The `file` fragment array variable values provide the fragment file locations. +A fragment file is located with a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). +The location of a fragment file that is given by a relative-path URI reference is taken as being relative to the location of the aggregation file. If the aggregation file is moved to another location, then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. -Not all fragment file names need be of the same URI type. +Not all fragment file locations need be of the same URI type. See <> and <>. +A fragment file location may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file location which is to be replaced by __replacement__ in order to find the actual fragment file name. +After the replacements have been made, the fragment file location must be an absolute URI or relative-path URI reference. +The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. +For instance, the fragment file name `\file://data/store/file.nc` could be stored as the location `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. +The order of elements in the **`substitutions`** attribute is not significant. +The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. +See <>. + An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. This may be useful when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. @@ -394,14 +403,6 @@ Every fragment must have at least one version, but not all fragments need have t Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. -A fragment file name may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. -The use of substitutions can save space in the aggregation file; and in the event that the fragment file names need to be modified, it may be possible to achieve this by editing the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file name which is to be replaced by __replacement__, prior to locating the fragment file. -The order of elements in the **`substitutions`** attribute is not significant. -The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. -For instance, a fragment file name of `\file://data/store/file.nc` could also be stored as `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. -See <>. - *format* The string-valued `format` fragment array variable defines the format of the fragment files. diff --git a/conformance.adoc b/conformance.adoc index ae00454e..699966e5 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -144,9 +144,9 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - The data values of a `file` variable must not start with a `/` or `#` character. + - A data value of a `file` variable must be either and absolute URI or else a relative-path URI reference, after any string substitutions defined by the **`substitutions`** attribute have been applied. - - The `format` variable must have a string data type.variable. + - The `format` variable must have a string data type. - The `format` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. @@ -164,7 +164,7 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. *Recommendations:* -* The following kinds of variable should not aggregation variables: grid mapping variable, domain variable, mesh topology variable, geometry container variable, interpolation variable. +* The following kinds of variable should not be aggregation variables: grid mapping variable, domain variable, mesh topology variable, geometry container variable, interpolation variable. [[section-6]] [[description-of-the-data]] From d31ce2676224979c36f98876b53c31a62683b5f6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:39:20 +0100 Subject: [PATCH 18/59] reformat appendix A Co-authored-by: Sadie L. Bartholomew --- appa.adoc | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/appa.adoc b/appa.adoc index 5554b091..36e25833 100644 --- a/appa.adoc +++ b/appa.adoc @@ -9,7 +9,14 @@ See <> for the grid mapping attributes, and <> for the distinction between **BI** and **BO**), **A** for an aggregation variable (see <>), and **-** for variables with some other purpose. +For variable attributes, the possible values of "Use" are: +* **C** for variables containing coordinate data, +* **D** for data variables, +* **M** for geometry container variables, +* **Do** for domain variables, +* **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), +* **A** for an aggregation variable (see <>), +* and **-** for variables with some other purpose. CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. "Links" indicates the location of the attribute"s original definition (first link) and sections where the attribute is discussed in this document (additional links as necessary). From 9efe177fa8afbc6706787253220ec7373f736658 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:40:38 +0100 Subject: [PATCH 19/59] Clarity Co-authored-by: Sadie L. Bartholomew --- ch01.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch01.adoc b/ch01.adoc index 40b63636..b07469dc 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -61,7 +61,7 @@ aggregated data:: The data of an aggregation variable, after it has been created aggregated dimension:: A dimension of the aggregated data of an aggregation variable. -aggregation variable:: A variable whose data is defined by as an aggregation of fragments, rather than containing its own data. +aggregation variable:: A variable whose data is defined as an aggregation of fragments, rather than containing its own data. ancestor group:: A group from which the referring group is descended via direct parent-child relationships From bd58c1ec842685d4b4b5bdf9f4c049b5932ce4d6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:41:03 +0100 Subject: [PATCH 20/59] Clarity Co-authored-by: Sadie L. Bartholomew --- ch01.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch01.adoc b/ch01.adoc index b07469dc..f830683a 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -84,7 +84,7 @@ coordinate variable:: We use this term precisely as it is defined in the link:$$https://docs.unidata.ucar.edu/nug/current/best_practices.html#bp_Coordinate-Systems$$[NUG section on coordinate variables]. It is a one-dimensional variable with the same name as its dimension [e.g., **`time(time)`**], and it is defined as a numeric data type with values in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. -Note that an aggregation coordinate variable is stored as a scalar, and must have the same name its aggregated dimension (see <>). +Note that an aggregation coordinate variable is stored as a scalar, and must have the same name as its aggregated dimension (see <>). fragment:: A constituent part, found in an external file, of the aggregated data of an aggregation variable. From 2a9443de8364dd274a530f12d01dacfb09e2a106 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:41:19 +0100 Subject: [PATCH 21/59] Typo Co-authored-by: Sadie L. Bartholomew --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index 699966e5..c641f292 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -129,7 +129,7 @@ References can be absolute, relative or with no path, in which case, the variabl *Requirements:* -* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank separated list of zero or more aggregated dimension names. +* An aggregation variable has an **`aggregated_dimensions`** attribute whose string value is a blank-separated list of zero or more aggregated dimension names. Each aggregated dimension must name a dimension in the file. * An aggregation variable must be a scalar. From ffa7805c5c5c6cc9fa29defb3266b1a537ca1840 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:41:29 +0100 Subject: [PATCH 22/59] Typo Co-authored-by: Sadie L. Bartholomew --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index c641f292..281fd8ce 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -141,7 +141,7 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The `file` variable must either have the same number of dimensions as there are aggregated dimensions, or else optionally also including one extra trailing dimension. - - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. + - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - A data value of a `file` variable must be either and absolute URI or else a relative-path URI reference, after any string substitutions defined by the **`substitutions`** attribute have been applied. From 27b11ca5a22099433698a4bcf8270085852d4842 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:41:49 +0100 Subject: [PATCH 23/59] Typo Co-authored-by: Sadie L. Bartholomew --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index 281fd8ce..88e33d3f 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -144,7 +144,7 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - A data value of a `file` variable must be either and absolute URI or else a relative-path URI reference, after any string substitutions defined by the **`substitutions`** attribute have been applied. + - A data value of a `file` variable must be either an absolute URI or else a relative-path URI reference, after any string substitutions defined by the **`substitutions`** attribute have been applied. - The `format` variable must have a string data type. From f550369652710d41a4a6a2ab790d0d1e6e08a452 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:42:50 +0100 Subject: [PATCH 24/59] Clarity Co-authored-by: Sadie L. Bartholomew --- ch02.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index b7c80902..c3a6c883 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -290,7 +290,6 @@ For instance: * the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable, * an aggregation coordinate variable must have the same name as its aggregated dimension. -* etc. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. From cef86ee39ce8335e963994ddbfac8689befe0fb4 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:51:17 +0100 Subject: [PATCH 25/59] clarity Co-authored-by: Sadie L. Bartholomew --- ch02.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index c3a6c883..56c0eb8f 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -427,7 +427,7 @@ The integer-valued `shape` fragment array variable defines the shape of the data In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along that dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. -When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. +When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional and contain only the value `1` (hence be size one). See <>, which includes the `shape` fragment array variable for the fragment array described by <>. From d8fd7817201ff55e02f509e2c69aacbcf815f9b7 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:51:49 +0100 Subject: [PATCH 26/59] Clarity Co-authored-by: Sadie L. Bartholomew --- ch02.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index 56c0eb8f..fa90af36 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -436,7 +436,7 @@ See <>, which includes the `shape` fragment array variable for the Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions, if it exists. -Use cases for non-standardized features include, but are not limited to: +Use cases for non-standardized features include, but are not limited to, the following: * To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. From e1330102a4d4b487641e72ea83396366aebff8a8 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Mon, 29 Apr 2024 17:53:34 +0100 Subject: [PATCH 27/59] dev --- appl.adoc | 2 +- ch02.adoc | 73 +++++++++++++++++++++++------------------------- conformance.adoc | 12 +++++--- 3 files changed, 44 insertions(+), 43 deletions(-) diff --git a/appl.adoc b/appl.adoc index e2205da4..14203b17 100644 --- a/appl.adoc +++ b/appl.adoc @@ -309,7 +309,7 @@ data: 91, 45, 45, 180, 180, _ ; ---- -This example is an encoding for the conceptual fragment array described in <>. +This example is an encoding for the conceptual fragment array described in example <>. The `temperature` data variable is an aggregation of 6 fragments. The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. diff --git a/ch02.adoc b/ch02.adoc index b7c80902..f1f32170 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -120,6 +120,8 @@ If the variable is packed using the **`scale_factor`** and **`add_offset`** attr The elements of **`actual_range`** must be exactly equal to the minimum and the maximum data values which occur in the variable (when unpacked if packing is used), and both must be within the **`valid_range`** if specified. If the data is all missing or invalid, the **`actual_range`** attribute cannot be used. +For aggregation variables (see <>), missing values in the aggregated data occur at locations where the fragments have missing values as determined by the fragment files' own metadata, as opposed to locations defined by the aggregation variables attributes. The aggregation variable's own indications of missing values (given by the attributes defined in <>) are ignored, and it is recommended that they are not provided. + === Attributes This standard describes many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. @@ -283,19 +285,23 @@ An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregation variable must be the data type of the aggregated data, but the value of the aggregation variable's single element is immaterial. -Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. -Any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role. -Any reference to the data or dimensions of a variable applies to the aggregated data or aggregated dimensions, respectively, of an aggregation variable. +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that other container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. + +In general, any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the data or dimensions of a variable applies to the aggregated data or aggregated dimensions, respectively, of an aggregation variable. For instance: * the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable, * an aggregation coordinate variable must have the same name as its aggregated dimension. * etc. +The only exception is the definition of missing data in the aggregated data. +Each fragment defines the locations of its missing data based on its own metadata, and the locations of missing data in the aggregated data are then derived solely from the locations of missing data in the fragments, rather than from any of the aggregation variable's attributes for indicating missing values: **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`** and **`valid_range`** (see <>). +Since these attributes are ignored on aggregation variables, it is recommended that they are not provided. + The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. -[[aggregated-dimensions, Section 2.8.1, Aggregated Dimensions]] +[[aggregated-dimensions, Section 2.8.1, "Aggregated Dimensions"]] ==== Aggregated Dimensions The aggregated dimensions of an aggregation variable are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. @@ -304,15 +310,15 @@ If the aggregated data is scalar then the **`aggregated_dimensions`** attribute The aggregated dimensions must exist as dimensions in the aggregation file. -[[aggregated-data, Section 2.8.2, Aggregated Data]] +[[aggregated-data, Section 2.8.2, "Aggregated Data"]] ==== Aggregated Data The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. -See <>. +See the example <>. -The aggregated data are created by concatenating the fragments' data along each fragment array dimension, and in the order in which they appear in the fragment array. +The aggregated data are created by concatenating the canonical forms of the fragments' data (see <>) along each fragment array dimension, and in the order in which they appear in the fragment array. [[example-fragment-array]] [caption="Example 2.2. "] @@ -379,43 +385,42 @@ There are four standardized and mandatory features, given by the `file`, `format The string-valued `file` fragment array variable defines the locations of the fragment files. In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. -The `file` fragment array variable values provide the fragment file locations. -A fragment file is located with a Uniform Resource Identifier (URI) <> that is either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). -The location of a fragment file that is given by a relative-path URI reference is taken as being relative to the location of the aggregation file. +A fragment file is located with a Uniform Resource Identifier (URI) <> that must be either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). +A relative-path URI reference is taken as being relative to the location of the aggregation file. If the aggregation file is moved to another location, then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. Not all fragment file locations need be of the same URI type. See <> and <>. A fragment file location may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file location which is to be replaced by __replacement__ in order to find the actual fragment file name. -After the replacements have been made, the fragment file location must be an absolute URI or relative-path URI reference. +After the replacements have been made, the fragment file location must be an absolute URI or a relative-path URI reference. The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. -For instance, the fragment file name `\file://data/store/file.nc` could be stored as the location `${local}file.nc`, in conjunction with `substitutions="${local}: \file://data/store/"`. +For instance, the fragment file location `\https://remote.host/data/file.nc` could be stored as `${path}file.nc`, in conjunction with `substitutions="${path}: \https://remote.host/data/"`. The order of elements in the **`substitutions`** attribute is not significant. -The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute, rather than by changing the actual `file` fragment array variable values. +The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `file` fragment array variable values. See <>. -An extra trailing trailing dimension, that is not a fragment array dimension, may be provided for specifying multiple versions of the fragments. -This may be useful when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. -Each version is expected to contain equivalent information, so that any version whose file exists may be selected for use in the aggregated data. -For instance, providing remotely stored and locally cached versions of the same fragment could allow an application program to only commit to the expense of accessing the remote version if the local version does not exist. -Every fragment must have at least one version, but not all fragments need have the same number of versions. +The `file` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. +Each version must contain equivalent information, so any version whose file exists may be selected for use in the aggregated data. +This could be useful when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. +For instance, when remotely stored and locally cached versions of the same fragment have been provided, an application program could choose to only retrieve the remote version if the local version does not exist. +Every fragment must have at least one version, but not all fragments need to have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. *format* The string-valued `format` fragment array variable defines the format of the fragment files. -In general it must have the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +In general it has the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. The format of a netCDF fragment file must be indicated with the value `nc`. Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. -The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format, provided that the `address` fragment array variable can still be used to find each fragment within its fragment file. See <>. +The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format. See <>. *address* The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. -In general it must have the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +In general it has the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. It may have any data type. For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. @@ -429,13 +434,12 @@ In general, the `shape` fragment array variable is two-dimensional, with the siz The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along that dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. -See <>, which includes the `shape` fragment array variable for the fragment array described by <>. - +See <>, which shows the `shape` fragment array variable for the fragment array described by the example <>. *Non-standardized features* Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. -The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions, if it exists. +The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions if there is one. Use cases for non-standardized features include, but are not limited to: @@ -446,18 +450,17 @@ For instance, it may be convenient to store in the aggregation file an attribute See <>. -[[fragment-interpretation, Section 2.8.3, Fragment Interpretation]] +[[fragment-interpretation, Section 2.8.3 "Fragment Interpretation"]] ==== Fragment Interpretation -A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. -The fragment file must contain an array of data with sufficient metadata for it to be convertible to its canonical form by the application program that is creating the aggregated data. -It is up to the creator of the aggregation variable to ensure that it is possible to convert all fragments to their canonical forms. +A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. +The fragment file must contain an array of data with metadata that is sufficient for the fragment to be convertible to its canonical form, and that conversion is the responsibility of the application program which is creating the aggregated data. Any fragment metadata that is not needed for the conversion to canonical form may be ignored by the application program. The canonical form of a fragment is such that: -* The fragment's data, in its entirety, provide the values for a unique, contiguous part of the aggregated data. +* The fragment's data, in its entirety, provide the values for a unique and contiguous part of the aggregated data. -* The fragment's data have the same number of dimensions as the aggregated data, and each of those dimensions must uniquely correspond to an aggregated dimension, and be in the same order. +* The fragment's data dimensions correspond to the aggregated dimensions in the same order. * The fragment's data have the same units as the aggregation variable. @@ -465,12 +468,9 @@ The canonical form of a fragment is such that: * The fragment's data have the same data type as the aggregation variable. -* The fragment's data have the same indication of missing values as the aggregation variable. - -The conversion of fragments to their canonical form is the responsibility of the application program which is creating the aggregated data, and it is up to the application program to decide what to do in the event that the conversion is not possible. The application program may need to carry out any combination of the following operations when converting a fragment to its canonical form: -* Inserting omitted size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). +* Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). * Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). @@ -478,7 +478,4 @@ The application program may need to carry out any combination of the following o Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data is numerically packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. - -* Replacing missing values in the fragment's data with values indicated by the aggregation variable as missing. -Note that it is up to the creator of the aggregation variable to ensure that none of the aggregation variable's missing values coincide with non-missing fragment data values. +Note that if the aggregation variable indicates that the aggregated data is numerically packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. It is recommended that the aggregated data is not numerically packed, because of the potential for mistakes and confusion. \ No newline at end of file diff --git a/conformance.adoc b/conformance.adoc index 699966e5..8b3c6643 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -139,12 +139,12 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. - The `file` variable must have a string data type. - - The `file` variable must either have the same number of dimensions as there are aggregated dimensions, or else optionally also including one extra trailing dimension. + - The `file` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. - - The **`substitutions`** attribute of a `file` variable, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. - The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. + - The `file` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank separated word pairs in the form __substitution: replacement__. + Each __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - A data value of a `file` variable must be either and absolute URI or else a relative-path URI reference, after any string substitutions defined by the **`substitutions`** attribute have been applied. + - A data value of a `file` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. - The `format` variable must have a string data type. @@ -166,6 +166,10 @@ The __feature__ keywords must include `file`, `format`, `address`, and `shape`. * The following kinds of variable should not be aggregation variables: grid mapping variable, domain variable, mesh topology variable, geometry container variable, interpolation variable. +* An aggregation variable should not have the any of the attributes **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`**. + +* An aggregation variable should not have either of the attributes **`scale_factor`** and **`add_offset`**. + [[section-6]] [[description-of-the-data]] === 3 Description of the Data From 2c09afaa4e183cb4a5c9ab4224f65460c91ae11a Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 30 Apr 2024 08:20:01 +0100 Subject: [PATCH 28/59] cfa --- ch02.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index bfc64f1a..d2c48d07 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -120,7 +120,6 @@ If the variable is packed using the **`scale_factor`** and **`add_offset`** attr The elements of **`actual_range`** must be exactly equal to the minimum and the maximum data values which occur in the variable (when unpacked if packing is used), and both must be within the **`valid_range`** if specified. If the data is all missing or invalid, the **`actual_range`** attribute cannot be used. -For aggregation variables (see <>), missing values in the aggregated data occur at locations where the fragments have missing values as determined by the fragment files' own metadata, as opposed to locations defined by the aggregation variables attributes. The aggregation variable's own indications of missing values (given by the attributes defined in <>) are ignored, and it is recommended that they are not provided. === Attributes From 03565d6b562d3a03cd70729a8b1940c5ea70c0ef Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 30 Apr 2024 08:25:56 +0100 Subject: [PATCH 29/59] file and format checkpoint --- ch02.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/ch02.adoc b/ch02.adoc index d2c48d07..02832a15 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -269,6 +269,7 @@ They may not be attached to a group, even if all variables within that group use If attributes are present within groups without being attached to a variable, these attributes apply to the group where they are defined, and to that group's descendants, but not to ancestor or sibling groups. If a group attribute is defined in a parent group, and one of the child group redefines the same attribute, the definition within the child group applies for the child and all of its descendants. + [[aggregation-variables, Section 2.8, "Aggregation Variables"]] === Aggregation Variables From 8092e80088fce1d28266e699ba60607e75cd4aae Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 1 May 2024 09:35:48 +0100 Subject: [PATCH 30/59] cfa --- appa.adoc | 4 +- appl.adoc | 199 ++++++++++++++++++++--------------------------- ch01.adoc | 4 +- ch02.adoc | 153 +++++++++++++++++------------------- conformance.adoc | 22 +++--- 5 files changed, 168 insertions(+), 214 deletions(-) diff --git a/appa.adoc b/appa.adoc index e475f32f..900bcbaf 100644 --- a/appa.adoc +++ b/appa.adoc @@ -50,13 +50,13 @@ In cases where there is a strong constraint on dataset size, it is allowed to pa | **`aggregated_data`** | S | A -| <> +| <> | Records the aggregation instructions that define how to create the aggregated data of an aggregation variable. | **`aggregated_dimensions`** | S | A -| <> +| <> | Identifies the dimensions of the aggregated data of an aggregation variable. | **`ancillary_variables`** diff --git a/appl.adoc b/appl.adoc index 14203b17..52025d72 100644 --- a/appl.adoc +++ b/appl.adoc @@ -8,7 +8,7 @@ Details of how to encode and decode aggregation variables may found in <>, but now the fragment file names are absolute URIs, and two versions of the second fragment have been provided. -The `fragment_file` fragment array variable has the extra trailing dimension `versions` to accommodate the extra fragment version. +This example is similar to <>, but now the fragment dataset locations are absolute URIs, and two versions of the second fragment have been provided. +The `fragment_location` variable has the extra trailing dimension `versions` to accommodate the extra fragment version. There is only one version of the first fragment, so its trailing dimension is padded with missing data. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. @@ -177,7 +169,7 @@ dimensions: i = 2 ; // Equal to the size of the largest fragment array dimension // Fragment versions dimension versions = 2 ; // The maximum number of versions for a fragment - + variables: // Data variable double temperature ; @@ -185,8 +177,7 @@ variables: temperature:units = "K" ; temperature:cell_methods = "time: mean" ; temperature:aggregated_dimensions = "time level latitude longitude" ; - temperature:aggregated_data = "file: fragment_file - format: fragment_format + temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; // Coordinate variables @@ -194,8 +185,7 @@ variables: time:standard_name = "time" ; time:units = "days since 2001-01-01" ; time:aggregated_dimensions = "time" ; - time:aggregated_data = "file: fragment_file - format: fragment_format + time:aggregated_data = "location: fragment_location address: fragment_address_time shape: fragment_shape_time" ; double level(level) ; @@ -207,45 +197,42 @@ variables: double longitude(longitude) ; longitude:standard_name = "longitude" ; longitude:units = "degrees_east" ; - // Fragment array variables - string fragment_file(f_time, f_level, f_latitude, f_longitude, versions) ; - fragment_file:substitutions = "${local}: file://local/data/ - ${remote}: https://remote/data/" ; - string fragment_file_time(f_time, versions) ; - fragment_file:substitutions = "${local}: file://local/data/ - ${remote}: https://remote/data/" ; - string fragment_format ; + string fragment_location(f_time, f_level, f_latitude, f_longitude, versions) ; + fragment_location:substitutions = "${local}: file://data/ + ${remote}: https://remote.host/data/" ; + string fragment_location_time(f_time, versions) ; + fragment_location:substitutions = "${local}: file://data/ + ${remote}: https://remote.host/data/" ; string fragment_address ; string fragment_address_time ; int fragment_shape(j, i) ; int fragment_shape_time(j_time, i) ; - + data: temperature = _ ; time = _ ; level = ... ; latitude = ... ; longitude = ... ; - fragment_file = "${local}January-March.nc", - _, - "${local}April-December.nc", - "${remote}April-December.nc" ; - fragment_file_time = "${local}January-March.nc", - _, - "${local}April-December.nc", - "${remote}April-December.nc" ; - fragment_format = "nc" ; + fragment_location = "${local}January-March.nc", + _, + "${local}April-December.nc", + "${remote}April-December.nc" ; + fragment_location_time = "${local}January-March.nc", + _, + "${local}April-December.nc", + "${remote}April-December.nc" ; fragment_address = "temperature" ; fragment_address_time = "time" ; - fragment_shape = 3, 9, - 1, _, - 73, _, + fragment_shape = 3, 9, + 1, _, + 73, _, 144, _ ; fragment_shape_time = 3, 9 ; ---- -This example is similar to <>, but now the fragment file names have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_file` fragment array variable `fragment_file`. -In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment files as `temperature`. +This example is similar to <>, but now the fragment dataset locations have been defined using the string substitutions given by the **`substitutions`** attribute of the `fragment_location` variable. +In addition, `time` is now an aggregation coordinate variable, with its aggregated data being derived from the same fragment datasets as `temperature`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -266,7 +253,7 @@ dimensions: // Fragment shape dimensions j = 3 ; // Equal to the number of aggregated dimensions i = 3 ; // Equal to the size of the largest fragment array dimension - + variables: // Data variable double temperature ; @@ -274,8 +261,7 @@ variables: temperature:units = "K" ; temperature:cell_methods = "time: mean" ; temperature:aggregated_dimensions = "level latitude longitude" ; - temperature:aggregated_data = "file: fragment_file - format: fragment_format + temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; // Coordinate variables @@ -288,22 +274,19 @@ variables: double longitude(longitude) ; longitude:standard_name = "longitude" ; longitude:units = "degrees_east" ; - // Fragment array variables - string fragment_file(f_level, f_latitude, f_longitude) ; - string fragment_format ; + string fragment_location(f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; - + data: temperature = _ ; level = ... ; latitude = ... ; longitude = ... ; - fragment_file = "file_A.nc", "file_B.nc", - "file_C.nc", "file_D.nc", - "file_E.nc", "file_F.nc" ; - fragment_format = "nc" ; + fragment_location = "file_A.nc", "file_B.nc", + "file_C.nc", "file_D.nc", + "file_E.nc", "file_F.nc" ; fragment_address = "temperature" ; fragment_shape = 17, _, _, 91, 45, 45, @@ -311,9 +294,8 @@ data: ---- This example is an encoding for the conceptual fragment array described in example <>. The `temperature` data variable is an aggregation of 6 fragments. -The distribution of missing values in the `fragment_shape` fragment array variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and -that the shape of the implied fragment array is `(1, 3, 2)`. -The row sums of the `fragment_shape` fragment array variable are `17`, `181`, and `360`, which equal the sizes of the `level`, `latitude`, and `longitude` aggregated dimensions, respectively. +The distribution of missing values in the `fragment_shape` variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. +The row sums of the `fragment_shape` variable are `17`, `181`, and `360`, which equal the sizes of the `level`, `latitude`, and `longitude` aggregated dimensions, respectively. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== @@ -336,7 +318,7 @@ dimensions: // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions i = 12 ; // Equal to the size of the largest fragment array dimension - + variables: // Data variable double temperature ; @@ -344,8 +326,7 @@ variables: temperature:units = "K" ; temperature:cell_methods = "time: mean" ; temperature:aggregated_dimensions = "time level latitude longitude" ; - temperature:aggregated_data = "file: fragment_file - format: fragment_format + temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; double pressure(time, level, latitude, longitude) ; @@ -366,13 +347,11 @@ variables: double longitude(longitude) ; longitude:standard_name = "longitude" ; longitude:units = "degrees_east" ; - // Fragment array variables - string fragment_file(f_time, f_level, f_latitude, f_longitude) ; - string fragment_format ; + string fragment_location(f_time, f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; - + data: temperature = _ ; pressure = ... ; @@ -380,8 +359,7 @@ data: level = ... ; latitude = ... ; longitude = ... ; - fragment_file = ... ; - fragment_format = "nc" ; + fragment_location = ... ; fragment_address = "temperature" ; fragment_shape = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, _, _, _, _, _, _, _, _, _, _, _, @@ -389,10 +367,10 @@ data: 36, 36, 36, 36, _, _, _, _, _, _, _, _ ; ---- In this example, the `temperature` data variable is an aggregation of 96 fragments. -The fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. +The implied fragment array shape is `(12, 1, 2, 4)`, indicating that three of the four aggregated dimensions are spanned by multiple fragments. The `pressure` data variable is not an aggregation variable. -The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_file` fragment array variable, are omitted for clarity. +The data for the `pressure`, `level`, `latitude` and `longitude` variables, and the `fragment_location` variable, are omitted for clarity. ==== [[example-L.6]] @@ -416,8 +394,7 @@ variables: tas:units = "K" ; tas:coordinates = "time lat lon alt station_name" ; tas:aggregated_dimensions = "obs" ; - tas:aggregated_data = "file: fragment_file - format: fragment_format + tas:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; // DSG count variable @@ -430,8 +407,7 @@ variables: time:standard_name = "time" ; time:units = "days since 1970-01-01" ; time:aggregated_dimensions = "obs" ; - time:aggregated_data = "file: fragment_file - format: fragment_format + time:aggregated_data = "location: fragment_location address: fragment_address_time shape: fragment_shape" ; float lon(station) ; @@ -439,8 +415,7 @@ variables: lon:long_name = "station longitude"; lon:units = "degrees_east"; lon:aggregated_dimensions = "station" ; - lon:aggregated_data = "file: fragment_file - format: fragment_format + lon:aggregated_data = "location: fragment_location address: fragment_address_lon shape: fragment_shape_latlon" ; float lat(station) ; @@ -448,14 +423,11 @@ variables: lat:long_name = "station latitude" ; lat:units = "degrees_north" ; lat:aggregated_dimensions = "station" ; - lat:aggregated_data = "file: fragment_file - format: fragment_format + lat:aggregated_data = "location: fragment_location address: fragment_address_lat shape: fragment_shape_latlon" ; - // Fragment array variables - string fragment_file(f_station) ; - string fragment_format ; + string fragment_location(f_station) ; string fragment_address ; string fragment_address_time(f_station) ; string fragment_address_lat ; @@ -465,15 +437,14 @@ variables: // global attributes: :featureType = "timeSeries" ; - + data: - tas = _ ; + tas = _ ; row_size = 5000, 4000, 6000 ; - time = _ ; - lat = _ ; + time = _ ; + lat = _ ; lon = _ ; - fragment_file = "Harwell.nc", "Abingdon.nc", "Lambourne.nc" ; - fragment_format = "nc" ; + fragment_location = "Harwell.nc", "Abingdon.nc", "Lambourne.nc" ; fragment_address = "tas" ; fragment_address_time = "t1", "t2", "t3" ; fragment_address_lat = "lat" ; @@ -483,8 +454,8 @@ data: ---- In this example, three fragments are aggregated into a collection of DSG timeseries feature types with contiguous ragged array representation. The auxiliary coordinate variables `time`, `lon`, and `lat` are also aggregation variables. -The time variables in the fragment files all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. -The fragments for all aggregation variables come from the same three fragment files, in this case. +The time variables in the fragment datasets all have different netCDF variables names, which differ from the netCDF name of the `time` aggregation variable. +The fragments for all aggregation variables come from the same three fragment datasets, in this case. No data have been omitted from the CDL. ==== @@ -507,7 +478,7 @@ dimensions: // Fragment shape dimensions j = 4 ; // Equal to the number of aggregated dimensions i = 2 ; // Equal to the size of the largest fragment array dimension - + variables: // Data variable double temperature ; @@ -515,8 +486,7 @@ variables: temperature:units = "K" ; temperature:cell_methods = "time: mean" ; temperature:aggregated_dimensions = "time level latitude longitude" ; - temperature:aggregated_data = "file: fragment_file - format: fragment_format + temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape id: fragment_id" ; // Non-standardized feature @@ -533,31 +503,28 @@ variables: double longitude(longitude) ; longitude:standard_name = "longitude" ; longitude:units = "degrees_east" ; - // Fragment array variables - string fragment_file(f_time, f_level, f_latitude, f_longitude) ; - string fragment_format ; + string fragment_location(f_time, f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; string fragment_id(f_time, f_level, f_latitude, f_longitude) ; - fragment_id:long_name = "Fragment file unique identifiers" ; - + fragment_id:long_name = "Fragment dataset unique identifiers" ; + data: temperature = _ ; time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; level = ... ; latitude = ... ; longitude = ... ; - fragment_file = "January-March.nc", "April-December.nc" ; - fragment_format = "nc" ; + fragment_location = "January-March.nc", "April-December.nc" ; fragment_address = "temperature" ; - fragment_shape = 3, 9, - 1, _, - 73, _, + fragment_shape = 3, 9, + 1, _, + 73, _, 144, _ ; fragment_id = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; ---- -This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized keyword `id`, which has the fragment array variable `fragment_id`. +This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized feature keyword `id`, which has the corresponding variable `fragment_id`. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== \ No newline at end of file diff --git a/ch01.adoc b/ch01.adoc index f830683a..8ad94b70 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -84,9 +84,9 @@ coordinate variable:: We use this term precisely as it is defined in the link:$$https://docs.unidata.ucar.edu/nug/current/best_practices.html#bp_Coordinate-Systems$$[NUG section on coordinate variables]. It is a one-dimensional variable with the same name as its dimension [e.g., **`time(time)`**], and it is defined as a numeric data type with values in strict monotonic order (all values are different, and they are arranged in either consistently increasing or consistently decreasing order). Missing values are not allowed in coordinate variables. -Note that an aggregation coordinate variable is stored as a scalar, and must have the same name as its aggregated dimension (see <>). +Note that an aggregation coordinate variable is stored as a scalar and has the same name as its aggregated dimension (see <>). -fragment:: A constituent part, found in an external file, of the aggregated data of an aggregation variable. +fragment:: A constituent part, found in an external dataset, of the aggregated data of an aggregation variable. grid mapping variable:: A variable used as a container for attributes that define a specific grid mapping. The type of the variable is arbitrary since it contains no data. diff --git a/ch02.adoc b/ch02.adoc index 02832a15..5206d220 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -120,7 +120,6 @@ If the variable is packed using the **`scale_factor`** and **`add_offset`** attr The elements of **`actual_range`** must be exactly equal to the minimum and the maximum data values which occur in the variable (when unpacked if packing is used), and both must be within the **`valid_range`** if specified. If the data is all missing or invalid, the **`actual_range`** attribute cannot be used. - === Attributes This standard describes many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. @@ -273,46 +272,42 @@ If a group attribute is defined in a parent group, and one of the child group re [[aggregation-variables, Section 2.8, "Aggregation Variables"]] === Aggregation Variables -An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment files__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. -A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. +An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. +A fragment dataset contains an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. -Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple files, whilst taking up very little space on disk (since the aggregation file contains no copies of the data in the fragments). -The fragment files may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. -Storing aggregations is useful for data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; and for archive curation, as the aggregation can act as a metadata-rich archive index. +Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). +The fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. +Uses for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulation, for combining output data that have been written to disk as multiple datasets decomposed in time and space. An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregation variable must be the data type of the aggregated data, but the value of the aggregation variable's single element is immaterial. -Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that other container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. -In general, any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the data or dimensions of a variable applies to the aggregated data or aggregated dimensions, respectively, of an aggregation variable. +In general, any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. For instance: * the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable, -* an aggregation coordinate variable must have the same name as its aggregated dimension. +* an aggregation coordinate variable (which is a scalar) must have the same name as its aggregated dimension. The only exception is the definition of missing data in the aggregated data. -Each fragment defines the locations of its missing data based on its own metadata, and the locations of missing data in the aggregated data are then derived solely from the locations of missing data in the fragments, rather than from any of the aggregation variable's attributes for indicating missing values: **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`** and **`valid_range`** (see <>). +Each fragment defines its missing data based on its own metadata, and missing data in the aggregated data are then derived solely from where there are missing data in the fragments, rather than from any of the aggregation variable's attributes for indicating missing values: **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`** (see <>). Since these attributes are ignored on aggregation variables, it is recommended that they are not provided. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. -[[aggregated-dimensions, Section 2.8.1, "Aggregated Dimensions"]] -==== Aggregated Dimensions +[[aggregated-dimensions-data, Section 2.8.1, "Aggregated Dimensions and Data"]] +==== Aggregated Dimensions and Data -The aggregated dimensions of an aggregation variable are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. -The value of the **`aggregated_dimensions`** attribute is a blank separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. +The aggregated dimensions are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. +The value of the **`aggregated_dimensions`** attribute is a blank-separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. - -[[aggregated-data, Section 2.8.2, "Aggregated Data"]] -==== Aggregated Data - -The fragments are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. +The fragments which provide the aggregated data are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. See the example <>. @@ -327,147 +322,143 @@ The aggregated data are created by concatenating the canonical forms of the frag |=============== | *Fragment array position `[0, 0, 0]`* -Fragment file name `file_A.nc` + -Fragment data shape `(17, 91, 180)` + +Fragment location: `file_A.nc` + +Fragment data shape: `(17, 91, 180)` + `17` vertical levels + `[90, 0]` degrees north + `[0, 180)` degrees east | *Fragment array position `[0, 0, 1]`* -Fragment file name `file_B.nc` + -Fragment data shape `(17, 91, 180)` + +Fragment location: `file_B.nc` + +Fragment data shape: `(17, 91, 180)` + `17` vertical levels + `[90, 0]` degrees north + `[180, 360)` degrees east -| *Fragment array position `[0, 1, 0]`* +| *Fragment array position `[0, 1, 0]`* -Fragment file name `file_C.nc` + -Fragment data shape `(17, 45, 180)` + +Fragment location: `file_C.nc` + +Fragment data shape: `(17, 45, 180)` + `17` vertical levels + `(0, -45]` degrees north + `[0, 180)` degrees east | *Fragment array position `[0, 1, 1]`* -Fragment file name `file_D.nc` + -Fragment data shape `(17, 45, 180)` + +Fragment location: `file_D.nc` + +Fragment data shape: `(17, 45, 180)` + `17` vertical levels + `(0, -45]` degrees north + `[180, 360)` degrees east | *Fragment array position `[0, 2, 0]`* -Fragment file name `file_E.nc` + -Fragment data shape `(17, 45, 180)` + +Fragment location: `file_E.nc` + +Fragment data shape: `(17, 45, 180)` + `17` vertical levels + `(-45, -90]` degrees north + `[0, 180)` degrees east | *Fragment array position `[0, 2, 1]`* -Fragment file name `file_F.nc` + -Fragment data shape `(17, 45, 180)` + +Fragment location: `file_F.nc` + +Fragment data shape: `(17, 45, 180)` + `17` vertical levels + `(-45, -90]` degrees north + `[180, 360)` degrees east |=============== Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. -The fragments combine to create three-dimensional aggregated data that have global `(Z, Y, X)` coverage, with shape `(17, 181, 360)`. +The fragments combine to create three-dimensional aggregated data that have global Z-Y-X coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. See <> for a CDL representation of this fragment array. ==== The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. -This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides the feature's values for each fragment in the fragment array. +This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides values for that feature. The features and their values must unambiguously define the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. -There are four standardized and mandatory features, given by the `file`, `format`, `address`, and `shape` keywords; and any amount of non-standardized features are also allowed: +The features given by the `location`, `address`, and `shape` keywords are standardized and must be provided, and any amount of non-standardized features are also allowed: + +// Turn off section numbering for a bit +:numbered!: -*file* +===== location -The string-valued `file` fragment array variable defines the locations of the fragment files. +The string-valued `location` fragment array variable defines the locations of the fragment datasets. In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. -A fragment file is located with a Uniform Resource Identifier (URI) <> that must be either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). +A fragment dataset is located with a Uniform Resource Identifier (URI) <> that must be either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). A relative-path URI reference is taken as being relative to the location of the aggregation file. -If the aggregation file is moved to another location, then a fragment file identified by an absolute URI will still be accessible, whereas a fragment file identified by a relative-path URI reference will also need be moved to preserve the relative reference. -Not all fragment file locations need be of the same URI type. +If the aggregation file is moved to another location, then a fragment dataset identified by an absolute URI will still be accessible, whereas a fragment dataset identified by a relative-path URI reference will also need be moved to preserve the relative reference. +Not all fragment dataset locations need be of the same URI type. See <> and <>. -A fragment file location may contain any number of string substitutions, each of which is defined by the `file` fragment array variable's **`substitutions`** attribute. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a fragment file location which is to be replaced by __replacement__ in order to find the actual fragment file name. -After the replacements have been made, the fragment file location must be an absolute URI or a relative-path URI reference. -The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. -For instance, the fragment file location `\https://remote.host/data/file.nc` could be stored as `${path}file.nc`, in conjunction with `substitutions="${path}: \https://remote.host/data/"`. -The order of elements in the **`substitutions`** attribute is not significant. -The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `file` fragment array variable values. -See <>. - -The `file` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. -Each version must contain equivalent information, so any version whose file exists may be selected for use in the aggregated data. -This could be useful when it is known that multiple fragment file locations are possible, but it is not known in advance which of them might exist at any given time. +The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. +Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. +This could be useful when it is known that multiple fragment dataset locations are possible, but it is not known in advance which of them might exist at any given time. For instance, when remotely stored and locally cached versions of the same fragment have been provided, an application program could choose to only retrieve the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need to have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. -*format* - -The string-valued `format` fragment array variable defines the format of the fragment files. -In general it has the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. -However, if the `format` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. -The format of a netCDF fragment file must be indicated with the value `nc`. -Other fragment file formats may be provided, on the understanding that an application program may choose to ignore any values that it does not understand. -The `format` fragment array variable may contain a range of different values, i.e. not all fragment files need to have the same format. See <>. +A fragment dataset location may contain any number of string substitutions, each of which is defined by the `location` fragment array variable's **`substitutions`** attribute. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a `location` fragment array variable value which is to be replaced by __replacement__ in order to find the actual fragment dataset location. +After replacements have been made, the fragment dataset location must be an absolute URI or a relative-path URI reference. +The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. +For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as `${path}file.nc`, in conjunction with `substitutions="${path}: \https://remote.host/data/"`. +The order of elements in the **`substitutions`** attribute is not significant. +The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. +See <>. -*address* +===== address -The `address` fragment array variable defines how to find each fragment within its fragment file, i.e. the address of the fragment. -In general it has the same dimensions in the same order as the `file` fragment array variable, and must contain a non-missing value corresponding to each fragment version. -However, if the `address` fragment array variable is a scalar, then its single value is assumed to apply to all fragments. +The `address` fragment array variable defines how to find each fragment within its fragment dataset, i.e. the address of the fragment. This is necessary because a fragment dataset may also contain data that is not required by the aggregation. +In general it has the same dimensions in the same order as the `location` fragment array variable, and must contain a non-missing value corresponding to each fragment version. +However, if the `address` fragment array variable is a scalar, then its single value applies to all fragment versions. It may have any data type. -For a netCDF fragment file, the string-valued address must be the fragment's netCDF variable name. -Addresses for other fragment file formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. +For a netCDF fragment dataset, the address must be the string-valued netCDF variable name of the fragment. +Addresses for other fragment dataset formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. See <> and <>. -*shape* +===== shape The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. -The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along that dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. +The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. See <>, which shows the `shape` fragment array variable for the fragment array described by the example <>. -*Non-standardized features* +===== Non-standardized features Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. -The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `file` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions if there is one. +The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `location` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions if there is one. Use cases for non-standardized features include, but are not limited to, the following: -* To provide extra information that enables the aggregation of fragments stored in a file format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment files. +* To provide extra information that enables the aggregation of fragments stored in a dataset format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment datasets. * To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. -For instance, it may be convenient to store in the aggregation file an attribute from each fragment file, making it available without having to open and inspect the fragment files themselves. +For instance, it may be convenient to store in the aggregation file an attribute from each fragment dataset, making it available without having to open and inspect the fragment datasets themselves. See <>. +// Turn section numbering back on +:numbered: -[[fragment-interpretation, Section 2.8.3 "Fragment Interpretation"]] -==== Fragment Interpretation -A fragment stored in a fragment file, of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. -The fragment file must contain an array of data with metadata that is sufficient for the fragment to be convertible to its canonical form, and that conversion is the responsibility of the application program which is creating the aggregated data. Any fragment metadata that is not needed for the conversion to canonical form may be ignored by the application program. +[[fragment-interpretation, Section 2.8.2 "Fragment Interpretation"]] +==== Fragment Interpretation -The canonical form of a fragment is such that: +The data of a fragment, stored in a fragment dataset of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. The canonical form of a fragment's data is such that: * The fragment's data, in its entirety, provide the values for a unique and contiguous part of the aggregated data. * The fragment's data dimensions correspond to the aggregated dimensions in the same order. -* The fragment's data have the same units as the aggregation variable. +* The fragment's data have the same units as the aggregation variable. -* The fragment's data are not numerically packed (i.e. not stored using a smaller data type than its original data). +* The fragment's data are not packed (i.e. not stored using a smaller data type than its original data). * The fragment's data have the same data type as the aggregation variable. -The application program may need to carry out any combination of the following operations when converting a fragment to its canonical form: +The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. +The application program may ignore any fragment dataset metadata that are not needed for the conversion to the canonical form. +The conversion may require a combination of the following operations: * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). @@ -477,4 +468,4 @@ The application program may need to carry out any combination of the following o Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data is numerically packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. It is recommended that the aggregated data is not numerically packed, because of the potential for mistakes and confusion. \ No newline at end of file +Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. It is therefore recommended that the aggregated data is not numerically packed, because of the potential for mistakes and confusion. \ No newline at end of file diff --git a/conformance.adoc b/conformance.adoc index e6d677dc..ea7a1d0d 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -135,36 +135,32 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must be a scalar. * An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. -The __feature__ keywords must include `file`, `format`, `address`, and `shape`. +The __feature__ keywords must include `location`, `address`, and `shape`. - - The `file` variable must have a string data type. + - The `location` variable must have a string data type. - - The `file` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. + - The `location` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. - - The `file` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. + - The `location` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. Each __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - A data value of a `file` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. + - A data value of a `location` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. - - The `format` variable must have a string data type. - - - The `format` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. - - - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `file` variable. + - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `location` variable. - The `shape` variable must have an integer data type. - If there are zero aggregated dimensions then the `shape` variable must be one-dimensional, of size one, and contain the value `1`. - - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `file` variable dimensions, excluding the extra trailing dimension if the `file` variable has one. + - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `location` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. - - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `file` variable, optionally excluding the extra trailing dimension if the `file` variable has one. + - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `location` variable, optionally excluding the extra trailing dimension if the `location` variable has one. *Recommendations:* -* The following kinds of variable should not be aggregation variables: grid mapping variable, domain variable, mesh topology variable, geometry container variable, interpolation variable. +* The following kinds of variable should not be aggregation variables: grid mapping variables, domain variables, mesh topology variables, geometry container variables, and interpolation variables. * An aggregation variable should not have the any of the attributes **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`**. From 3159eb3b50cc86d3beaddedcd0c4d39d6198be08 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 09:04:34 +0100 Subject: [PATCH 31/59] cfa --- appl.adoc | 2 +- ch02.adoc | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/appl.adoc b/appl.adoc index 52025d72..95886f05 100644 --- a/appl.adoc +++ b/appl.adoc @@ -4,7 +4,7 @@ == Aggregation Variable Examples This appendix contains examples of aggregation variables. -Details of how to encode and decode aggregation variables may found in <>. +Details of how to encode and decode aggregation variables are found in <>. [[example-L.1]] [caption=] diff --git a/ch02.adoc b/ch02.adoc index 5206d220..46c66072 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -273,12 +273,12 @@ If a group attribute is defined in a parent group, and one of the child group re === Aggregation Variables An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. -A fragment dataset contains an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. +A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). The fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. -Uses for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulation, for combining output data that have been written to disk as multiple datasets decomposed in time and space. +Uses for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulations, for combining output data that have been written to disk as multiple datasets decomposed in time and space. An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. From 15e130c0ca6bcfc2d16eb39111ad999fd33970bc Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 10:42:16 +0100 Subject: [PATCH 32/59] cfa --- appl.adoc | 40 ++++++++++++++++++---------------------- ch02.adoc | 20 +++++++++++--------- toc-extra.adoc | 10 +++++++++- 3 files changed, 38 insertions(+), 32 deletions(-) diff --git a/appl.adoc b/appl.adoc index 52025d72..346d9a6c 100644 --- a/appl.adoc +++ b/appl.adoc @@ -7,8 +7,8 @@ This appendix contains examples of aggregation variables. Details of how to encode and decode aggregation variables may found in <>. [[example-L.1]] -[caption=] -.Example L.1 +[caption="Example L.1 "] +.Aggregation variable example 1 ==== ---- dimensions: @@ -75,8 +75,8 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for [[example-L.2]] -[caption=] -.Example L.2 +[caption="Example L.2 "] +.Aggregation variable example 2 ==== ---- dimensions: @@ -149,8 +149,8 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for ==== [[example-L.3]] -[caption=] -.Example L.3 +[caption="Example L.3 "] +.Aggregation variable example 3 ==== ---- dimensions: @@ -215,14 +215,10 @@ data: level = ... ; latitude = ... ; longitude = ... ; - fragment_location = "${local}January-March.nc", - _, - "${local}April-December.nc", - "${remote}April-December.nc" ; - fragment_location_time = "${local}January-March.nc", - _, - "${local}April-December.nc", - "${remote}April-December.nc" ; + fragment_location = "${local}January-March.nc", _, + "${local}April-December.nc", "${remote}April-December.nc" ; + fragment_location_time = "${local}January-March.nc", _, + "${local}April-December.nc", "${remote}April-December.nc" ; fragment_address = "temperature" ; fragment_address_time = "time" ; fragment_shape = 3, 9, @@ -238,8 +234,8 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for ==== [[example-L.4]] -[caption=] -.Example L.4 +[caption="Example L.4 "] +.Aggregation variable example 4 ==== ---- dimensions: @@ -301,8 +297,8 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for ==== [[example-L.5]] -[caption=] -.Example L.5 +[caption="Example L.5 "] +.Aggregation variable example 5 ==== ---- dimensions: @@ -374,8 +370,8 @@ The data for the `pressure`, `level`, `latitude` and `longitude` variables, and ==== [[example-L.6]] -[caption=] -.Example L.6 +[caption="Example L.6 "] +.Aggregation variable example 6 ==== ---- dimensions: @@ -461,8 +457,8 @@ No data have been omitted from the CDL. ==== [[example-L.7]] -[caption=] -.Example L.7 +[caption="Example L.7 "] +.Aggregation variable example 7 ==== ---- dimensions: diff --git a/ch02.adoc b/ch02.adoc index 5206d220..25b3de09 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -273,7 +273,7 @@ If a group attribute is defined in a parent group, and one of the child group re === Aggregation Variables An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. -A fragment dataset contains an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. +A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). @@ -310,7 +310,7 @@ The aggregated dimensions must exist as dimensions in the aggregation file. The fragments which provide the aggregated data are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. -See the example <>. +See <>. The aggregated data are created by concatenating the canonical forms of the fragments' data (see <>) along each fragment array dimension, and in the order in which they appear in the fragment array. @@ -389,20 +389,22 @@ Not all fragment dataset locations need be of the same URI type. See <> and <>. The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. -Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. This could be useful when it is known that multiple fragment dataset locations are possible, but it is not known in advance which of them might exist at any given time. +Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. For instance, when remotely stored and locally cached versions of the same fragment have been provided, an application program could choose to only retrieve the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need to have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. -A fragment dataset location may contain any number of string substitutions, each of which is defined by the `location` fragment array variable's **`substitutions`** attribute. -The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines the part of a `location` fragment array variable value which is to be replaced by __replacement__ in order to find the actual fragment dataset location. +A fragment dataset location may be defined with any number of string substitutions, each of which is defined by the `location` fragment array variable's **`substitutions`** attribute. +The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines part of a `location` fragment array variable value which is to be replaced by __replacement__ in order to find the actual fragment dataset location. +A `location` fragment array variable value may contain then zero or more of the substitution keywords. After replacements have been made, the fragment dataset location must be an absolute URI or a relative-path URI reference. -The __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. -For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as `${path}file.nc`, in conjunction with `substitutions="${path}: \https://remote.host/data/"`. +The substitution keyword must have the form `${\*}`, where `*` represents any number of any characters. +For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as the string `$\{path}file.nc`, in conjunction with `substitutions="$\{path}: \https://remote.host/data/"`. The order of elements in the **`substitutions`** attribute is not significant. -The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. +The string substitutions must be such that applying them in any order will result in the same fragment dataset location. +The use of string substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. See <>. ===== address @@ -422,7 +424,7 @@ In general, the `shape` fragment array variable is two-dimensional, with the siz The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. -See <>, which shows the `shape` fragment array variable for the fragment array described by the example <>. +See <>, which shows the `shape` fragment array variable for the fragment array described by <>. ===== Non-standardized features diff --git a/toc-extra.adoc b/toc-extra.adoc index eac9bb14..4ecb3302 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -37,6 +37,7 @@ J.5. <> [%hardbreaks] 2.1. <> +2.2 <> 3.1. <> 3.2. <> 3.3. <> @@ -119,4 +120,11 @@ H.19. <> H.20. <> H.21. <> H.22. <> -I.1. <> \ No newline at end of file +I.1. <> +L.1 <> +L.2 <> +L.3 <> +L.4 <> +L.5 <> +L.6 <> +L.7 <> From 532533e19d18af42d12124d23c6a2cc73a33670f Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 12:01:15 +0100 Subject: [PATCH 33/59] multiple non-standard features --- appl.adoc | 6 +++--- ch02.adoc | 32 +++++++++++++++----------------- conformance.adoc | 2 +- 3 files changed, 19 insertions(+), 21 deletions(-) diff --git a/appl.adoc b/appl.adoc index db56daba..a166e17f 100644 --- a/appl.adoc +++ b/appl.adoc @@ -288,9 +288,9 @@ data: 91, 45, 45, 180, 180, _ ; ---- -This example is an encoding for the conceptual fragment array described in example <>. -The `temperature` data variable is an aggregation of 6 fragments. -The distribution of missing values in the `fragment_shape` variable indicates that the `level` aggregated dimension is spanned by 1 fragment, the `latitude` aggregated dimension is spanned by 3 fragments, and the `longitude` aggregated dimension is spanned by 2 fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. +This example is an encoding for the conceptual fragment array described in <>. +The `temperature` data variable is an aggregation of six fragments. +The distribution of missing values in the `fragment_shape` variable indicates that the `level` aggregated dimension is spanned by one fragment, the `latitude` aggregated dimension is spanned by three fragments, and the `longitude` aggregated dimension is spanned by two fragments; and that the shape of the implied fragment array is `(1, 3, 2)`. The row sums of the `fragment_shape` variable are `17`, `181`, and `360`, which equal the sizes of the `level`, `latitude`, and `longitude` aggregated dimensions, respectively. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. diff --git a/ch02.adoc b/ch02.adoc index 0ae18b1b..bde8c171 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -362,15 +362,15 @@ Fragment data shape: `(17, 45, 180)` + `(-45, -90]` degrees north + `[180, 360)` degrees east |=============== -Six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. +The six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. The fragments combine to create three-dimensional aggregated data that have global Z-Y-X coverage, with shape `(17, 181, 360)`. -The Z aggregated dimension is spanned by 1 fragment, the Y aggregated dimension is spanned by 3 fragments, and the X aggregated dimension is spanned by 2 fragments. +The Z aggregated dimension is spanned by one fragment, the Y aggregated dimension is spanned by three fragments, and the X aggregated dimension is spanned by two fragments. See <> for a CDL representation of this fragment array. ==== The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. -This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ that provides values for that feature. The features and their values must unambiguously define the fragment array. +This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values must unambiguously define the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. The features given by the `location`, `address`, and `shape` keywords are standardized and must be provided, and any amount of non-standardized features are also allowed: @@ -389,30 +389,28 @@ Not all fragment dataset locations need be of the same URI type. See <> and <>. The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. -This could be useful when it is known that multiple fragment dataset locations are possible, but it is not known in advance which of them might exist at any given time. +This could be useful when it is known that multiple locations are possible for a given fragment, but it is not known in advance which of them might exist at any given time. Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. -For instance, when remotely stored and locally cached versions of the same fragment have been provided, an application program could choose to only retrieve the remote version if the local version does not exist. +For instance, when remotely stored and locally cached versions of the same fragment have been defined, an application program could choose to only retrieve the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need to have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. See <>. -A fragment dataset location may be defined with any number of string substitutions, each of which is defined by the `location` fragment array variable's **`substitutions`** attribute. +A fragment dataset location may be defined with any number of string substitutions, each of which is provided by the `location` fragment array variable's **`substitutions`** attribute. The **`substitutions`** attribute takes a string value comprising blank-separated elements of the form "__substitution: replacement__", where __substitution__ is a case-sensitive keyword that defines part of a `location` fragment array variable value which is to be replaced by __replacement__ in order to find the actual fragment dataset location. -A `location` fragment array variable value may contain then zero or more of the substitution keywords. +A `location` fragment array variable value may include any subset of zero or more of the substitution keywords. After replacements have been made, the fragment dataset location must be an absolute URI or a relative-path URI reference. The substitution keyword must have the form `${\*}`, where `*` represents any number of any characters. -For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as the string `$\{path}file.nc`, in conjunction with `substitutions="$\{path}: \https://remote.host/data/"`. -The order of elements in the **`substitutions`** attribute is not significant. -The string substitutions must be such that applying them in any order will result in the same fragment dataset location. -The use of string substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. +For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as `$\{path}file.nc`, in conjunction with `substitutions="$\{path}: \https://remote.host/data/"`. +The order of elements in the **`substitutions`** attribute is not significant, an the substitutions for a given fragment must be such that applying them in any order will result in the same fragment dataset location. +The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. See <>. ===== address -The `address` fragment array variable defines how to find each fragment within its fragment dataset, i.e. the address of the fragment. This is necessary because a fragment dataset may also contain data that is not required by the aggregation. +The `address` fragment array variable, that may have any data type, defines how to find each fragment within its fragment dataset, i.e. the address of the fragment. In general it has the same dimensions in the same order as the `location` fragment array variable, and must contain a non-missing value corresponding to each fragment version. -However, if the `address` fragment array variable is a scalar, then its single value applies to all fragment versions. -It may have any data type. +However, if the `address` fragment array variable is a scalar, then its single value applies to all versions of all fragments. For a netCDF fragment dataset, the address must be the string-valued netCDF variable name of the fragment. Addresses for other fragment dataset formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. See <> and <>. @@ -459,8 +457,8 @@ The data of a fragment, stored in a fragment dataset of any format, must be conv * The fragment's data have the same data type as the aggregation variable. The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. -The application program may ignore any fragment dataset metadata that are not needed for the conversion to the canonical form. -The conversion may require a combination of the following operations: +The application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. +Converting a fragment's data to its canonical form may require a combination of the following operations: * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). @@ -470,4 +468,4 @@ The conversion may require a combination of the following operations: Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values represent packed values in the aggregated data. It is therefore recommended that the aggregated data is not numerically packed, because of the potential for mistakes and confusion. \ No newline at end of file +Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. It is therefore recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. \ No newline at end of file diff --git a/conformance.adoc b/conformance.adoc index ea7a1d0d..8ffd8eb8 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -156,7 +156,7 @@ The __feature__ keywords must include `location`, `address`, and `shape`. - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. - - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `location` variable, optionally excluding the extra trailing dimension if the `location` variable has one. + - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `location` variable, excluding the extra trailing dimension if the `location` variable has one. *Recommendations:* From 6d5feff7952a2183c2b05499bcd6040d9a547abe Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 12:02:41 +0100 Subject: [PATCH 34/59] multiple non-standard features checkpoint --- ch02.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/ch02.adoc b/ch02.adoc index bde8c171..dd427c5b 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -415,6 +415,7 @@ For a netCDF fragment dataset, the address must be the string-valued netCDF vari Addresses for other fragment dataset formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. See <> and <>. + ===== shape The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). From d86bd5bb1bb8f6d2574e7cad526a17abb74c9fc1 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 14:00:34 +0100 Subject: [PATCH 35/59] value feature keyword first commit --- appl.adoc | 36 +++++++++++++++++++++--------------- ch02.adoc | 42 +++++++++++++++++++----------------------- conformance.adoc | 13 +++++++------ 3 files changed, 47 insertions(+), 44 deletions(-) diff --git a/appl.adoc b/appl.adoc index a166e17f..b9f81efe 100644 --- a/appl.adoc +++ b/appl.adoc @@ -26,7 +26,7 @@ dimensions: i = 2 ; // Equal to the size of the largest fragment array dimension variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -97,7 +97,7 @@ dimensions: versions = 2 ; // The maximum number of versions for a fragment variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -171,7 +171,7 @@ dimensions: versions = 2 ; // The maximum number of versions for a fragment variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -180,7 +180,7 @@ variables: temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; - // Coordinate variables + // Aggregated coordinate variable double time ; // This is an aggregation coordinate variable time:standard_name = "time" ; time:units = "days since 2001-01-01" ; @@ -188,6 +188,7 @@ variables: time:aggregated_data = "location: fragment_location address: fragment_address_time shape: fragment_shape_time" ; + // Coordinate variables double level(level) ; level:standard_name = "height_above_mean_sea_level" ; level:units = "m" ; @@ -251,7 +252,7 @@ dimensions: i = 3 ; // Equal to the size of the largest fragment array dimension variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -316,7 +317,7 @@ dimensions: i = 12 ; // Equal to the size of the largest fragment array dimension variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -384,7 +385,7 @@ dimensions: i = 3 ; // Equal to the size of the largest fragment array dimension variables: - // Data variable + // Aggregated data variable float tas(obs) ; tas:standard_name = "air_temperature" ; tas:units = "K" ; @@ -398,7 +399,7 @@ variables: row_size:long_name = "number of observations per station" ; row_size:sample_dimension = "obs" ; - // Auxiliary coordinate variables + // Aggregated auxiliary coordinate variables float time ; time:standard_name = "time" ; time:units = "days since 1970-01-01" ; @@ -476,16 +477,22 @@ dimensions: i = 2 ; // Equal to the size of the largest fragment array dimension variables: - // Data variable + // Aggregated data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; temperature:cell_methods = "time: mean" ; + temperature:ancillary_variables = "uid" ; temperature:aggregated_dimensions = "time level latitude longitude" ; temperature:aggregated_data = "location: fragment_location address: fragment_address - shape: fragment_shape - id: fragment_id" ; // Non-standardized feature + shape: fragment_shape" ; + // Aggregated ancillary variable + string uid() ; + uid:long_name = "Fragment dataset unique identifiers" ; + uid:aggregated_dimensions = "time level latitude longitude" ; + uid:aggregated_data = "value: fragment_value + shape: fragment_shape"; // Coordinate variables double time(time) ; time:standard_name = "time" ; @@ -503,8 +510,7 @@ variables: string fragment_location(f_time, f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; - string fragment_id(f_time, f_level, f_latitude, f_longitude) ; - fragment_id:long_name = "Fragment dataset unique identifiers" ; + string fragment_value(f_time, f_level, f_latitude, f_longitude) ; data: temperature = _ ; @@ -518,9 +524,9 @@ data: 1, _, 73, _, 144, _ ; - fragment_id = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; + fragment_value = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; ---- -This example is similar to <>, but now the **`aggregated_data`** attribute also includes the non-standardized feature keyword `id`, which has the corresponding variable `fragment_id`. +This example is similar to <>, but now there is the aggregated ancillary variable `uid` which defines its fragments as constant values stored int he `fragment_value` variable,that are intended to be broadcast across its aggregated data. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== \ No newline at end of file diff --git a/ch02.adoc b/ch02.adoc index dd427c5b..a89b74d5 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -272,7 +272,7 @@ If a group attribute is defined in a parent group, and one of the child group re [[aggregation-variables, Section 2.8, "Aggregation Variables"]] === Aggregation Variables -An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. +An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ that are generally stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. @@ -373,11 +373,20 @@ The fragment array must be defined by an aggregation variable's **`aggregated_da This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values must unambiguously define the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. -The features given by the `location`, `address`, and `shape` keywords are standardized and must be provided, and any amount of non-standardized features are also allowed: +The features must comprise either all three of the `shape`, `location`, and `address` keyords, or else both of the `shape` and `value` keywords. No other combination of these keywords is allowed, nor the use of any other feature keywords. The features are defined as follows: // Turn off section numbering for a bit :numbered!: +===== shape + +The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. +The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. +The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. +When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be a scalar and contain the value `1`. +See <>, which shows the `shape` fragment array variable for the fragment array described by <>. + ===== location The string-valued `location` fragment array variable defines the locations of the fragment datasets. @@ -415,28 +424,13 @@ For a netCDF fragment dataset, the address must be the string-valued netCDF vari Addresses for other fragment dataset formats are allowed, on the understanding that an application program may choose to ignore any values that it does not understand. See <> and <>. +===== value -===== shape - -The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). -In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. -The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. -The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. -When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be one-dimensional, of size one, and contain the value `1`. -See <>, which shows the `shape` fragment array variable for the fragment array described by <>. - -===== Non-standardized features - -Any number of non-standardized features are allowed, on the understanding that an application program may choose to ignore any such features that it does not understand, or which are irrelevant for its purpose. -The fragment array variable for a non-standardized feature must be either a scalar, or else have the same dimensions in the same order as the `location` fragment array variable, optionally omitting the extra trailing dimension for multiple fragment versions if there is one. - -Use cases for non-standardized features include, but are not limited to, the following: - -* To provide extra information that enables the aggregation of fragments stored in a dataset format for which the `address` fragment array variable alone is insufficient to identify the fragments within the fragment datasets. - -* To store extra metadata that relate to the fragments, but which are not necessary for the creation of the aggregated data. -For instance, it may be convenient to store in the aggregation file an attribute from each fragment dataset, making it available without having to open and inspect the fragment datasets themselves. -See <>. +When the data values within a fragment are all the same, for every fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by a fragment dataset. +The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. +The `value` fragment array variable may have any data type, and each value is the unique value of a fragment's data. +This feature could be used, for instance, to store an attribute from each fragment dataset, making them available without having to inspect the fragment datasets themselves. +See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on :numbered: @@ -461,6 +455,8 @@ The conversion of the fragment's data to its canonical form is carried out by th The application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. Converting a fragment's data to its canonical form may require a combination of the following operations: +* When the fragment's data has been provided as its unique value (via the `value` feature keyword of the **`aggregated_data`** attribute), broadcasting that value across the fragment's canonical shape. + * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). * Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). diff --git a/conformance.adoc b/conformance.adoc index 8ffd8eb8..d3e1a655 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -134,8 +134,9 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must be a scalar. -* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. -The __feature__ keywords must include `location`, `address`, and `shape`. +* An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. + Each __variable__ must be the name of a variable in the file. + The __feature__ keywords must comprise either all three of the `shape`, `location`, and `address` keyords, or else both of the `shape` and `value` keywords. - The `location` variable must have a string data type. @@ -148,16 +149,16 @@ The __feature__ keywords must include `location`, `address`, and `shape`. - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `location` variable. + - The `value` variable must have the same number of dimensions as there are aggregated dimensions. + - The `shape` variable must have an integer data type. - - If there are zero aggregated dimensions then the `shape` variable must be one-dimensional, of size one, and contain the value `1`. + - If there are zero aggregated dimensions then the `shape` variable must a be scalar and contain the value `1`. - - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `location` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. + - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `location` or `value` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. - - A variable associated with a non-standardized feature keyword must either be a scalar, or else have the same dimensions in the same order as the `location` variable, excluding the extra trailing dimension if the `location` variable has one. - *Recommendations:* * The following kinds of variable should not be aggregation variables: grid mapping variables, domain variables, mesh topology variables, geometry container variables, and interpolation variables. From 66080115ffde369b7c5aba28d9b052df9ccac032 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 18:50:58 +0100 Subject: [PATCH 36/59] tidy --- ch02.adoc | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index a89b74d5..11c6db38 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -373,7 +373,7 @@ The fragment array must be defined by an aggregation variable's **`aggregated_da This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values must unambiguously define the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. -The features must comprise either all three of the `shape`, `location`, and `address` keyords, or else both of the `shape` and `value` keywords. No other combination of these keywords is allowed, nor the use of any other feature keywords. The features are defined as follows: +The features must comprise either all three of the `shape`, `location`, and `address` keywords, or else both of the `shape` and `value` keywords. No other combinations of keywords are allowed. These features are defined as follows: // Turn off section numbering for a bit :numbered!: @@ -429,7 +429,6 @@ See <> and <>. When the data values within a fragment are all the same, for every fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by a fragment dataset. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. The `value` fragment array variable may have any data type, and each value is the unique value of a fragment's data. -This feature could be used, for instance, to store an attribute from each fragment dataset, making them available without having to inspect the fragment datasets themselves. See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on @@ -439,7 +438,7 @@ See <>, which uses an aggregation ancillary variable to make fragme [[fragment-interpretation, Section 2.8.2 "Fragment Interpretation"]] ==== Fragment Interpretation -The data of a fragment, stored in a fragment dataset of any format, must be converted to its __canonical form__ prior to being inserted into the aggregated data. The canonical form of a fragment's data is such that: +The data of a fragment must be converted to its __canonical form__ prior to being inserted into the aggregated data. The canonical form of a fragment's data is such that: * The fragment's data, in its entirety, provide the values for a unique and contiguous part of the aggregated data. @@ -451,11 +450,10 @@ The data of a fragment, stored in a fragment dataset of any format, must be conv * The fragment's data have the same data type as the aggregation variable. -The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. -The application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. +The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. Converting a fragment's data to its canonical form may require a combination of the following operations: -* When the fragment's data has been provided as its unique value (via the `value` feature keyword of the **`aggregated_data`** attribute), broadcasting that value across the fragment's canonical shape. +* When the fragment's data has been provided as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). From 2dbf226863c444d575109e2377db39691fbf54b6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 2 May 2024 19:40:09 +0100 Subject: [PATCH 37/59] tidy --- ch02.adoc | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 11c6db38..088b0b5e 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -277,8 +277,8 @@ A fragment is an array of data with sufficient metadata for it to be correctly i The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). -The fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. -Uses for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulations, for combining output data that have been written to disk as multiple datasets decomposed in time and space. +Fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. +Use cases for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulations, for combining output data that have been written to disk as multiple datasets decomposed in time and space. An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. @@ -362,7 +362,7 @@ Fragment data shape: `(17, 45, 180)` + `(-45, -90]` degrees north + `[180, 360)` degrees east |=============== -The six fragments are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. +The fragments, stored in six fragment datasets, are arranged in a three-dimensional fragment array with shape `(1, 3, 2)`. Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. The fragments combine to create three-dimensional aggregated data that have global Z-Y-X coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by one fragment, the Y aggregated dimension is spanned by three fragments, and the X aggregated dimension is spanned by two fragments. @@ -389,7 +389,7 @@ See <>, which shows the `shape` fragment array variable for the fra ===== location -The string-valued `location` fragment array variable defines the locations of the fragment datasets. +The string-valued `location` fragment array variable defines the locations of fragment datasets. In general its dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. A fragment dataset is located with a Uniform Resource Identifier (URI) <> that must be either an __absolute URI__ (a URI that begins with a scheme component followed by a `:` character, such as `\file://data/file.nc`, `\https://remote.host/data/file.nc`, `s3://remote.host/data/file.nc`, or `locally_meaningful_protocol://UID`), or else a __relative-path URI reference__ (a URI that is not an absolute URI and which does not begin with a `/` or `#` character, such as `file.nc`, `../file.nc`, or `data/file.nc`). A relative-path URI reference is taken as being relative to the location of the aggregation file. @@ -453,7 +453,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. Converting a fragment's data to its canonical form may require a combination of the following operations: -* When the fragment's data has been provided as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. +* When, and only when, the fragment's data has been provided as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). From 3f746d28117d4cbc0787054ac6cd1912f3d1d625 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 3 May 2024 09:25:30 +0100 Subject: [PATCH 38/59] clarity --- appl.adoc | 71 +++++++++++++++++++++++++++++++++++++++++--------- ch02.adoc | 13 ++++----- toc-extra.adoc | 1 + 3 files changed, 67 insertions(+), 18 deletions(-) diff --git a/appl.adoc b/appl.adoc index b9f81efe..bac54895 100644 --- a/appl.adoc +++ b/appl.adoc @@ -26,7 +26,7 @@ dimensions: i = 2 ; // Equal to the size of the largest fragment array dimension variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -80,7 +80,6 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for ==== ---- dimensions: - // Aggregated dimensions time = 12 ; level = 1 ; latitude = 73 ; @@ -97,7 +96,7 @@ dimensions: versions = 2 ; // The maximum number of versions for a fragment variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -171,7 +170,7 @@ dimensions: versions = 2 ; // The maximum number of versions for a fragment variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -180,7 +179,7 @@ variables: temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; - // Aggregated coordinate variable + // Aggregation coordinate variable double time ; // This is an aggregation coordinate variable time:standard_name = "time" ; time:units = "days since 2001-01-01" ; @@ -252,7 +251,7 @@ dimensions: i = 3 ; // Equal to the size of the largest fragment array dimension variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -317,7 +316,7 @@ dimensions: i = 12 ; // Equal to the size of the largest fragment array dimension variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -385,7 +384,7 @@ dimensions: i = 3 ; // Equal to the size of the largest fragment array dimension variables: - // Aggregated data variable + // Aggregation data variable float tas(obs) ; tas:standard_name = "air_temperature" ; tas:units = "K" ; @@ -399,7 +398,7 @@ variables: row_size:long_name = "number of observations per station" ; row_size:sample_dimension = "obs" ; - // Aggregated auxiliary coordinate variables + // Aggregation auxiliary coordinate variables float time ; time:standard_name = "time" ; time:units = "days since 1970-01-01" ; @@ -477,7 +476,7 @@ dimensions: i = 2 ; // Equal to the size of the largest fragment array dimension variables: - // Aggregated data variable + // Aggregation data variable double temperature ; temperature:standard_name = "air_temperature" ; temperature:units = "K" ; @@ -487,7 +486,7 @@ variables: temperature:aggregated_data = "location: fragment_location address: fragment_address shape: fragment_shape" ; - // Aggregated ancillary variable + // Aggregation ancillary variable string uid() ; uid:long_name = "Fragment dataset unique identifiers" ; uid:aggregated_dimensions = "time level latitude longitude" ; @@ -526,7 +525,55 @@ data: 144, _ ; fragment_value = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; ---- -This example is similar to <>, but now there is the aggregated ancillary variable `uid` which defines its fragments as constant values stored int he `fragment_value` variable,that are intended to be broadcast across its aggregated data. +This example is similar to <>, but now there is the aggregation ancillary variable `uid` which defines its fragments as constant values stored int he `fragment_value` variable,that are intended to be broadcast across its aggregated data. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. +==== + +[[example-L.8]] +[caption="Example L.8 "] +.Aggregation variable example 8 +==== +---- +dimensions: + +variables: + // Aggregation data variable + double temperature ; + temperature:standard_name = "air_temperature" ; + temperature:units = "K" ; + temperature:cell_methods = "time: mean" ; + temperature:aggregated_dimensions = "" ; + temperature:aggregated_data = "location: fragment_location + address: fragment_address + shape: fragment_shape" ; + // Scalar coordinate variables + double time ; + time:standard_name = "time" ; + time:units = "days since 2001-01-01" ; + double height ; + level:standard_name = "height" ; + level:units = "m" ; + double latitude ; + latitude:standard_name = "latitude" ; + latitude:units = "degrees_north" ; + double longitude ; + longitude:standard_name = "longitude" ; + longitude:units = "degrees_east" ; + // Fragment array variables + string fragment_location ; + string fragment_address ; + int fragment_shape ; + +data: + temperature = _ ; + time = 0 ; + height = 1.5 ; + latitude = 18.53 ; + longitude = 73.81 ; + fragment_location = "file.nc" ; + fragment_address = "tas" ; + fragment_shape = 1 ; +---- +An example of an aggregation variable with scalar aggregated data. ==== \ No newline at end of file diff --git a/ch02.adoc b/ch02.adoc index 088b0b5e..b6a1aaf3 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -304,7 +304,7 @@ The details of how to encode and decode aggregation variables are given in this The aggregated dimensions are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. The value of the **`aggregated_dimensions`** attribute is a blank-separated list of the aggregated dimension names given in the order which matches the dimensions of the aggregated data. -If the aggregated data is scalar then the **`aggregated_dimensions`** attribute must be an empty string. +If the aggregated data is scalar then there are no aggregated dimensions and the **`aggregated_dimensions`** attribute must be an empty string. The aggregated dimensions must exist as dimensions in the aggregation file. The fragments which provide the aggregated data are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. @@ -384,8 +384,9 @@ The integer-valued `shape` fragment array variable defines the shape of the data In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. -When the aggregated data is scalar there are no aggregated dimensions, and the `shape` fragment array variable must be a scalar and contain the value `1`. See <>, which shows the `shape` fragment array variable for the fragment array described by <>. +If the aggregated data is scalar then the `shape` fragment array variable must be a scalar and contain the value `1`. +See <>. ===== location @@ -428,7 +429,7 @@ See <> and <>. When the data values within a fragment are all the same, for every fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by a fragment dataset. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. -The `value` fragment array variable may have any data type, and each value is the unique value of a fragment's data. +The `value` fragment array variable may have any data type, and contains the unique value of each fragment's data. See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on @@ -451,16 +452,16 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same data type as the aggregation variable. The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. -Converting a fragment's data to its canonical form may require a combination of the following operations: +A combination of the following operations may be required to convert the fragment's data to its canonical form: -* When, and only when, the fragment's data has been provided as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. +* If, and only if, the fragment's data has been explicitly defined as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). * Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). * Transforming the fragment's data to have the same data type as the aggregated data. -Note that some transformations may result in a loss of information (as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. +Note that some transformations may result in a loss of information (such as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. * Unpacking the fragment's data. Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. It is therefore recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. \ No newline at end of file diff --git a/toc-extra.adoc b/toc-extra.adoc index 4ecb3302..90f62195 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -128,3 +128,4 @@ L.4 <> L.5 <> L.6 <> L.7 <> +L.8 <> From 32391fff06095c89f3f971b67c06843a10987450 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 3 May 2024 11:44:53 +0100 Subject: [PATCH 39/59] dev --- appl.adoc | 2 +- ch02.adoc | 5 +++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/appl.adoc b/appl.adoc index bac54895..3594e79c 100644 --- a/appl.adoc +++ b/appl.adoc @@ -487,7 +487,7 @@ variables: address: fragment_address shape: fragment_shape" ; // Aggregation ancillary variable - string uid() ; + string uid ; uid:long_name = "Fragment dataset unique identifiers" ; uid:aggregated_dimensions = "time level latitude longitude" ; uid:aggregated_data = "value: fragment_value diff --git a/ch02.adoc b/ch02.adoc index b6a1aaf3..3b9a6388 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -427,9 +427,10 @@ See <> and <>. ===== value -When the data values within a fragment are all the same, for every fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by a fragment dataset. +When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows the fragments to be represented explicitly by those unique data values, rather than by reference to fragment datasets. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. The `value` fragment array variable may have any data type, and contains the unique value of each fragment's data. +If a fragment contains only missing data then this is represented with a missing value in the `value` fragment array variable. See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on @@ -454,7 +455,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. A combination of the following operations may be required to convert the fragment's data to its canonical form: -* If, and only if, the fragment's data has been explicitly defined as its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. +* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). From 0a4346bc8774c8e1ef37472225b394a8bd5011bd Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 3 May 2024 17:40:49 +0100 Subject: [PATCH 40/59] dev --- ch02.adoc | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 3b9a6388..6027c920 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -284,16 +284,16 @@ An aggregation variable must be a scalar (i.e. it has no dimensions). It acts as a container for all of the usual attributes that describe the data, with the addition of two special attributes: one that defines the _aggregated dimensions_, i.e. the dimensions of the aggregated data; and one that provides the instructions on how the aggregated data is to be created. The data type of the aggregation variable must be the data type of the aggregated data, but the value of the aggregation variable's single element is immaterial. -Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, grid mapping variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. +Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. In general, any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. For instance: -* the dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable, -* an aggregation coordinate variable (which is a scalar) must have the same name as its aggregated dimension. +* The dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable. +* An aggregation coordinate variable (which is a scalar) must have the same name as its aggregated dimension. The only exception is the definition of missing data in the aggregated data. -Each fragment defines its missing data based on its own metadata, and missing data in the aggregated data are then derived solely from where there are missing data in the fragments, rather than from any of the aggregation variable's attributes for indicating missing values: **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`** (see <>). +Each fragment defines its missing data based on its own metadata, and missing data in the aggregated data are then derived solely from where there are missing data in the fragments, rather than from where the aggregated data has missing values as defined by the aggregation variable's attributes **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`** (see <>). Since these attributes are ignored on aggregation variables, it is recommended that they are not provided. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. From b5a863fc2c8563e70e879b1df5fe62d4131d7976 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 7 May 2024 10:48:35 +0100 Subject: [PATCH 41/59] missing data --- appa.adoc | 4 ++-- ch02.adoc | 10 ++++------ conformance.adoc | 3 +-- 3 files changed, 7 insertions(+), 10 deletions(-) diff --git a/appa.adoc b/appa.adoc index 900bcbaf..f468d97d 100644 --- a/appa.adoc +++ b/appa.adoc @@ -50,13 +50,13 @@ In cases where there is a strong constraint on dataset size, it is allowed to pa | **`aggregated_data`** | S | A -| <> +| <> | Records the aggregation instructions that define how to create the aggregated data of an aggregation variable. | **`aggregated_dimensions`** | S | A -| <> +| <> | Identifies the dimensions of the aggregated data of an aggregation variable. | **`ancillary_variables`** diff --git a/ch02.adoc b/ch02.adoc index 6027c920..ee5d19d9 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -286,15 +286,13 @@ The data type of the aggregation variable must be the data type of the aggregate Aggregation variables may be used as any kind of variable (data variable, coordinate variable, cell measures variable, etc.), but it is recommended that container variables whose data are immaterial (such as grid mapping variables) are not encoded as aggregation variables. -In general, any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. +Any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. For instance: * The dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable. * An aggregation coordinate variable (which is a scalar) must have the same name as its aggregated dimension. -The only exception is the definition of missing data in the aggregated data. -Each fragment defines its missing data based on its own metadata, and missing data in the aggregated data are then derived solely from where there are missing data in the fragments, rather than from where the aggregated data has missing values as defined by the aggregation variable's attributes **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`** (see <>). -Since these attributes are ignored on aggregation variables, it is recommended that they are not provided. +Note that the missing values indicated by the aggregation variable apply to the aggregated data once it has been created, and not to the individual fragments, which may define their own missing data. It is up to the creator of the aggregation variable to ensure that none of the aggregation variable's missing values coincide with non-missing values in the fragments. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. @@ -399,7 +397,7 @@ Not all fragment dataset locations need be of the same URI type. See <> and <>. The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. -This could be useful when it is known that multiple locations are possible for a given fragment, but it is not known in advance which of them might exist at any given time. +This could be useful when it is known that various locations are possible for a given fragment, but it is not known in advance which of them might exist at any given time. Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. For instance, when remotely stored and locally cached versions of the same fragment have been defined, an application program could choose to only retrieve the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need to have the same number of versions. @@ -430,7 +428,7 @@ See <> and <>. When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows the fragments to be represented explicitly by those unique data values, rather than by reference to fragment datasets. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. The `value` fragment array variable may have any data type, and contains the unique value of each fragment's data. -If a fragment contains only missing data then this is represented with a missing value in the `value` fragment array variable. +A fragment that contains wholly missing data is specified with any missing value indicated by the aggregation variable. See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on diff --git a/conformance.adoc b/conformance.adoc index d3e1a655..d80efa48 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -163,10 +163,9 @@ Each aggregated dimension must name a dimension in the file. * The following kinds of variable should not be aggregation variables: grid mapping variables, domain variables, mesh topology variables, geometry container variables, and interpolation variables. -* An aggregation variable should not have the any of the attributes **`_FillValue`**, **`missing_value`**, **`valid_min`**, **`valid_max`**, and **`valid_range`**. - * An aggregation variable should not have either of the attributes **`scale_factor`** and **`add_offset`**. + [[section-6]] [[description-of-the-data]] === 3 Description of the Data From 2505b1afd9f323aed4a64120f37f67a6ccfcc1c2 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 8 May 2024 09:18:51 +0100 Subject: [PATCH 42/59] tidy --- appl.adoc | 11 +++++++---- ch02.adoc | 16 ++++++++-------- conformance.adoc | 4 ++-- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/appl.adoc b/appl.adoc index 3594e79c..9ebc6a95 100644 --- a/appl.adoc +++ b/appl.adoc @@ -472,8 +472,9 @@ dimensions: f_latitude = 1 ; f_longitude = 1 ; // Fragment shape dimensions - j = 4 ; // Equal to the number of aggregated dimensions + j = 4 ; // Equal to the number of temperature aggregated dimensions i = 2 ; // Equal to the size of the largest fragment array dimension + j_uid = 1 ; // Equal to the number of uid aggregated dimensions variables: // Aggregation data variable @@ -489,9 +490,9 @@ variables: // Aggregation ancillary variable string uid ; uid:long_name = "Fragment dataset unique identifiers" ; - uid:aggregated_dimensions = "time level latitude longitude" ; + uid:aggregated_dimensions = "time" ; uid:aggregated_data = "value: fragment_value - shape: fragment_shape"; + shape: fragment_shape_uid"; // Coordinate variables double time(time) ; time:standard_name = "time" ; @@ -509,7 +510,8 @@ variables: string fragment_location(f_time, f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; - string fragment_value(f_time, f_level, f_latitude, f_longitude) ; + string fragment_value(f_time) ; + int fragment_shape_uid(j_uid, i) ; data: temperature = _ ; @@ -524,6 +526,7 @@ data: 73, _, 144, _ ; fragment_value = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; + fragment_shape_uid = 3, 9 ; ---- This example is similar to <>, but now there is the aggregation ancillary variable `uid` which defines its fragments as constant values stored int he `fragment_value` variable,that are intended to be broadcast across its aggregated data. diff --git a/ch02.adoc b/ch02.adoc index ee5d19d9..2ee22519 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -277,7 +277,7 @@ A fragment is an array of data with sufficient metadata for it to be correctly i The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). -Fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as CF-compliant view of non-CF datasets. +Fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as a CF-compliant view of non-CF datasets. Use cases for storing aggregations include, but are not limited to: data analysis, as it avoids the computational expense of deriving the aggregation at the time of analysis; archive curation, as the aggregation can act as a metadata-rich archive index; and model simulations, for combining output data that have been written to disk as multiple datasets decomposed in time and space. An aggregation variable must be a scalar (i.e. it has no dimensions). @@ -297,7 +297,7 @@ Note that the missing values indicated by the aggregation variable apply to the The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. -[[aggregated-dimensions-data, Section 2.8.1, "Aggregated Dimensions and Data"]] +[[aggregated-dimensions-and-data, Section 2.8.1, "Aggregated Dimensions and Data"]] ==== Aggregated Dimensions and Data The aggregated dimensions are stored with the aggregation variable's **`aggregated_dimensions`** attribute, and it is the presence of this attribute that identifies the variable as an aggregation variable. @@ -368,17 +368,17 @@ See <> for a CDL representation of this fragment array. ==== The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. -This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values must unambiguously define the fragment array. +This attribute takes a string value comprising blank-separated elements of the form "__feature: variable__", where __feature__ is a case-sensitive keyword that identifies a feature of the fragment array, and __variable__ is a __fragment array variable__ which provides values for that feature. The features and their values unambiguously define the fragment array. The order of elements in the **`aggregated_data`** attribute is not significant. -The features must comprise either all three of the `shape`, `location`, and `address` keywords, or else both of the `shape` and `value` keywords. No other combinations of keywords are allowed. These features are defined as follows: +The features must comprise either all three of the `shape`, `location`, and `address` keywords; or else both of the `shape` and `value` keywords. No other combinations of keywords are allowed. These features are defined as follows: // Turn off section numbering for a bit :numbered!: ===== shape -The integer-valued `shape` fragment array variable defines the shape of the data of each fragment in its canonical form (see <>). +The integer-valued `shape` fragment array variable defines the shape of each fragment's data in its canonical form (see <>). In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. @@ -396,7 +396,7 @@ If the aggregation file is moved to another location, then a fragment dataset id Not all fragment dataset locations need be of the same URI type. See <> and <>. -The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of a fragment to be specified. +The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of fragments to be specified. This could be useful when it is known that various locations are possible for a given fragment, but it is not known in advance which of them might exist at any given time. Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. For instance, when remotely stored and locally cached versions of the same fragment have been defined, an application program could choose to only retrieve the remote version if the local version does not exist. @@ -410,7 +410,7 @@ A `location` fragment array variable value may include any subset of zero or mor After replacements have been made, the fragment dataset location must be an absolute URI or a relative-path URI reference. The substitution keyword must have the form `${\*}`, where `*` represents any number of any characters. For instance, the fragment dataset location `\https://remote.host/data/file.nc` could be stored as `$\{path}file.nc`, in conjunction with `substitutions="$\{path}: \https://remote.host/data/"`. -The order of elements in the **`substitutions`** attribute is not significant, an the substitutions for a given fragment must be such that applying them in any order will result in the same fragment dataset location. +The order of elements in the **`substitutions`** attribute is not significant, and the substitutions for a given fragment must be such that applying them in any order will result in the same fragment dataset location. The use of substitutions can save space in the aggregation file; and in the event that the fragment locations need to be updated after the aggregation file has been created, it may be possible to achieve this by modifying the **`substitutions`** attribute rather than by changing the actual `location` fragment array variable values. See <>. @@ -453,7 +453,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. A combination of the following operations may be required to convert the fragment's data to its canonical form: -* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined from a fragment dataset), broadcasting that value across the fragment's canonical shape. +* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the fragment's canonical shape. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). diff --git a/conformance.adoc b/conformance.adoc index d80efa48..96be6b5b 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -136,7 +136,7 @@ Each aggregated dimension must name a dimension in the file. * An aggregation variable must have an **`aggregated_data`** attribute whose string value comprises blank-separated elements of the form __feature: variable__. Each __variable__ must be the name of a variable in the file. - The __feature__ keywords must comprise either all three of the `shape`, `location`, and `address` keyords, or else both of the `shape` and `value` keywords. + The __feature__ keywords must comprise either all three of the `shape`, `location`, and `address` keyords; or else both of the `shape` and `value` keywords. - The `location` variable must have a string data type. @@ -157,7 +157,7 @@ Each aggregated dimension must name a dimension in the file. - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `location` or `value` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. - - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row must equal the size of its corresponding aggregated dimension. + - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row's non-missing values must equal the size of its corresponding aggregated dimension. *Recommendations:* From 82bb21ce9016c65992ec0c737bafde2f5181658f Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 8 May 2024 15:03:55 +0100 Subject: [PATCH 43/59] dev --- appl.adoc | 1 + ch02.adoc | 8 ++++---- default-theme-CF-version.yml | 3 ++- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/appl.adoc b/appl.adoc index 9ebc6a95..1e95604b 100644 --- a/appl.adoc +++ b/appl.adoc @@ -490,6 +490,7 @@ variables: // Aggregation ancillary variable string uid ; uid:long_name = "Fragment dataset unique identifiers" ; + uid:missing_value = "N/A" ; uid:aggregated_dimensions = "time" ; uid:aggregated_data = "value: fragment_value shape: fragment_shape_uid"; diff --git a/ch02.adoc b/ch02.adoc index 2ee22519..c4c2042d 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -425,10 +425,10 @@ See <> and <>. ===== value -When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows the fragments to be represented explicitly by those unique data values, rather than by reference to fragment datasets. +When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by reference to a fragment dataset. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. -The `value` fragment array variable may have any data type, and contains the unique value of each fragment's data. -A fragment that contains wholly missing data is specified with any missing value indicated by the aggregation variable. +The `value` fragment array variable may have any data type, and contains each fragment's unique value. +A fragment that contains wholly missing data is specified by any missing value indicated by the aggregation variable. See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. // Turn section numbering back on @@ -453,7 +453,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. A combination of the following operations may be required to convert the fragment's data to its canonical form: -* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the fragment's canonical shape. +* If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the shape of the canonical form of the fragment's data. * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). diff --git a/default-theme-CF-version.yml b/default-theme-CF-version.yml index 6983f21b..c5477e5d 100644 --- a/default-theme-CF-version.yml +++ b/default-theme-CF-version.yml @@ -25,7 +25,8 @@ base: text_align: justify font_color: 333333 font_family: Noto Serif - font_size: 10.5 + #DCH font_size: 10.5 + font_size: 10 # line_height_length is really just a vertical spacing variable; it's not actually the height of a line line_height_length: 12 # The Noto font family has a built-in line height of 1.36 From 0a04ba971abdcfcc3d6a8eb46bdf735b974ed7bc Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:09:00 +0100 Subject: [PATCH 44/59] dev --- ch02.adoc | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index c4c2042d..07642706 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -397,8 +397,8 @@ Not all fragment dataset locations need be of the same URI type. See <> and <>. The `location` fragment array variable may have an extra trailing dimension that allows multiple versions of fragments to be specified. -This could be useful when it is known that various locations are possible for a given fragment, but it is not known in advance which of them might exist at any given time. -Each version must contain equivalent information, so any version that exists may be selected for use in the aggregated data. +Each version must contain equivalent information, so that any version that exists may be selected for use in the aggregated data. +This could be useful when it is known that a fragment could be stored in a number of locations, but it is not known which of them might exist at any given time. For instance, when remotely stored and locally cached versions of the same fragment have been defined, an application program could choose to only retrieve the remote version if the local version does not exist. Every fragment must have at least one version, but not all fragments need to have the same number of versions. Where fragments have fewer versions than others, the trailing dimension must be padded with missing values. @@ -446,7 +446,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same units as the aggregation variable. -* The fragment's data are not packed (i.e. not stored using a smaller data type than its original data). +* The fragment's data are not packed (i.e. not stored using a smaller data type than the original data). * The fragment's data have the same data type as the aggregation variable. From 9d90921e28542503b78a4d21a9ba3e0d18dd7f4e Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:20:52 +0100 Subject: [PATCH 45/59] remove redundent comment Co-authored-by: Sadie L. Bartholomew --- appl.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/appl.adoc b/appl.adoc index 1e95604b..b4f98e0c 100644 --- a/appl.adoc +++ b/appl.adoc @@ -180,7 +180,7 @@ variables: address: fragment_address shape: fragment_shape" ; // Aggregation coordinate variable - double time ; // This is an aggregation coordinate variable + double time ; time:standard_name = "time" ; time:units = "days since 2001-01-01" ; time:aggregated_dimensions = "time" ; From 5ddfca1f496c426befaec977f0862b901e4c0ddb Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:22:01 +0100 Subject: [PATCH 46/59] revert PDF font size Co-authored-by: Sadie L. Bartholomew --- default-theme-CF-version.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/default-theme-CF-version.yml b/default-theme-CF-version.yml index c5477e5d..6983f21b 100644 --- a/default-theme-CF-version.yml +++ b/default-theme-CF-version.yml @@ -25,8 +25,7 @@ base: text_align: justify font_color: 333333 font_family: Noto Serif - #DCH font_size: 10.5 - font_size: 10 + font_size: 10.5 # line_height_length is really just a vertical spacing variable; it's not actually the height of a line line_height_length: 12 # The Noto font family has a built-in line height of 1.36 From 3bc9b101f532f5416e3a90fd3d79797558f5b535 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:22:45 +0100 Subject: [PATCH 47/59] Clarity Co-authored-by: Sadie L. Bartholomew --- conformance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/conformance.adoc b/conformance.adoc index 96be6b5b..e882b06a 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -155,7 +155,7 @@ Each aggregated dimension must name a dimension in the file. - If there are zero aggregated dimensions then the `shape` variable must a be scalar and contain the value `1`. - - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly varying dimension being the size of the largest of the `location` or `value` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. + - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower-varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly-varying dimension being the size of the largest of the `location` or `value` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row's non-missing values must equal the size of its corresponding aggregated dimension. From 3eaef450134fa74702bdde8d76b578e10e937278 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:30:21 +0100 Subject: [PATCH 48/59] Clarity Co-authored-by: Sadie L. Bartholomew --- ch02.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index 07642706..bdd6ebfe 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -379,7 +379,7 @@ The features must comprise either all three of the `shape`, `location`, and `add ===== shape The integer-valued `shape` fragment array variable defines the shape of each fragment's data in its canonical form (see <>). -In general, the `shape` fragment array variable is two-dimensional, with the size of the slower varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower-varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly-varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. See <>, which shows the `shape` fragment array variable for the fragment array described by <>. From 22606249f200c226b1a4a1d79da3a80b5f854d71 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:30:47 +0100 Subject: [PATCH 49/59] Clarity Co-authored-by: Sadie L. Bartholomew --- history.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/history.adoc b/history.adoc index 88e7ab40..9719c5bd 100644 --- a/history.adoc +++ b/history.adoc @@ -7,7 +7,7 @@ === Working version (most recent first) -* {issues}508[Issue #508]: Introducing aggregation variables +* {issues}508[Issue #508]: Introduce aggregation variables * {issues}511[Issue #511]: Appendix B: New element in XML file header to record the "first published date" * {issues}509[Issue #509]: In exceptional cases allow a standard name to be aliased into two alternatives * {issues}501[Issue #501]: Clarify that data variables and variables containing coordinate data are highly recommended to have **`long_name`** or **`standard_name`** attributes, that **`cf_role`** is used only for discrete sampling geometries and UGRID mesh topologies, and that CF does not prohibit CF attributes from being used in ways that are not defined by CF but that in such cases their meaning is not defined by CF. From 8b0179bbba1598e8d732eabfa9c77343641ac0de Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 10:31:20 +0100 Subject: [PATCH 50/59] Clarity Co-authored-by: Sadie L. Bartholomew --- appa.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/appa.adoc b/appa.adoc index f468d97d..1ae85869 100644 --- a/appa.adoc +++ b/appa.adoc @@ -16,7 +16,7 @@ For variable attributes, the possible values of "Use" are: * **M** for geometry container variables, * **Do** for domain variables, * **BI** and **BO** for boundary variables (see <> for the distinction between **BI** and **BO**), -* **A** for an aggregation variable (see <>), +* **A** for aggregation variables (see <>), * **-** for variables with some other purpose. CF does not prohibit any of these attributes from being attached to variables of different kinds from those listed as their "Use" in this table, but their meanings are not defined by CF if they are used in these other ways. From 2678ff23b2d8dc45a9b9e59bdf3a873e386c546b Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 20:15:18 +0100 Subject: [PATCH 51/59] dev --- ch02.adoc | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index bdd6ebfe..8e17224a 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -290,9 +290,10 @@ Any text applying to a variable in the CF conventions applies in exactly the sam For instance: * The dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable. -* An aggregation coordinate variable (which is a scalar) must have the same name as its aggregated dimension. -Note that the missing values indicated by the aggregation variable apply to the aggregated data once it has been created, and not to the individual fragments, which may define their own missing data. It is up to the creator of the aggregation variable to ensure that none of the aggregation variable's missing values coincide with non-missing values in the fragments. +* The name of an aggregation coordinate variable (which is a scalar) must +be the same as the name of its single aggregated dimension (identified by its **`aggregated_dimensions`** attribute), just as the name of a coordinate variable (which is one-dimensional) must be the same as the name of its single +dimension. The details of how to encode and decode aggregation variables are given in this section, with examples provided in <>. @@ -364,7 +365,8 @@ The fragments, stored in six fragment datasets, are arranged in a three-dimensio Each fragment spans the entirety of the Z dimension, but only a part of the Y-X plane, which has 1 degree resolution. The fragments combine to create three-dimensional aggregated data that have global Z-Y-X coverage, with shape `(17, 181, 360)`. The Z aggregated dimension is spanned by one fragment, the Y aggregated dimension is spanned by three fragments, and the X aggregated dimension is spanned by two fragments. -See <> for a CDL representation of this fragment array. +Note that, since this example is a schematic representation, the C or Fortran order of the dimensions is of no consequence. +See <> for a CDL representation of this fragment array. ==== The fragment array must be defined by an aggregation variable's **`aggregated_data`** attribute. @@ -379,7 +381,7 @@ The features must comprise either all three of the `shape`, `location`, and `add ===== shape The integer-valued `shape` fragment array variable defines the shape of each fragment's data in its canonical form (see <>). -In general, the `shape` fragment array variable is two-dimensional, with the size of the slower-varying dimension (i.e. the number of rows) being the number of fragment array dimensions, and the size of the more rapidly-varying dimension (i.e. the number of columns) being the size of the largest fragment array dimension. +In general, the `shape` fragment array variable is two-dimensional, with the size of the slower-varying dimension (i.e. the first dimension in CDL order, representing rows) being the number of fragment array dimensions, and the size of the more rapidly-varying dimension (i.e. the second dimension in CDL order, representing columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. See <>, which shows the `shape` fragment array variable for the fragment array described by <>. @@ -446,6 +448,8 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same units as the aggregation variable. +* The fragment's data have missing values as indicated by the aggregation variable. + * The fragment's data are not packed (i.e. not stored using a smaller data type than the original data). * The fragment's data have the same data type as the aggregation variable. @@ -457,10 +461,17 @@ A combination of the following operations may be required to convert the fragmen * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). -* Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). - * Transforming the fragment's data to have the same data type as the aggregated data. -Note that some transformations may result in a loss of information (such as could be the case when casting floating point numbers to integers), and an application program may choose to disallow these. +Note that some transformations may result in a loss of information (such as could be the case when casting floating point numbers to integers), and the application program may choose to not create the aggregation data. + +* Transforming missing values in the fragment's data to a value indicated as missing by the aggregation variable. +Note that it is up to the application program to choose a new missing value, from those provided by the aggregation variable, that does not coincide with any non-missing value from any fragment, and if that is not possible then the application program may choose to not create the aggregation data. + +* Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. It is therefore recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. \ No newline at end of file +Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. +It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. +It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion that could arise from + +For instance if a variable has datatype `int` and double precision packing attributes `scale_factor=0.0392156862745098` and add_offset=`275.01960784313724`, then we would expect the upacked data to be in the range `270` to `280`. If that variable is an aggregated variable From 4b9d97325054490c087d2dd655116feef7973df7 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 14 May 2024 20:16:26 +0100 Subject: [PATCH 52/59] dev --- ch02.adoc | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 8e17224a..89325f13 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -454,7 +454,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same data type as the aggregation variable. -The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. +The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data array in memory. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. A combination of the following operations may be required to convert the fragment's data to its canonical form: * If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the shape of the canonical form of the fragment's data. @@ -471,7 +471,4 @@ Note that it is up to the application program to choose a new missing value, fro * Unpacking the fragment's data. Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. -It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. -It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion that could arise from - -For instance if a variable has datatype `int` and double precision packing attributes `scale_factor=0.0392156862745098` and add_offset=`275.01960784313724`, then we would expect the upacked data to be in the range `270` to `280`. If that variable is an aggregated variable +It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. \ No newline at end of file From a4666995e88eb1ad2b9a9e42f88b85ab222202a6 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Wed, 15 May 2024 11:03:42 +0100 Subject: [PATCH 53/59] dev --- ch01.adoc | 2 +- ch02.adoc | 14 ++++++++------ 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/ch01.adoc b/ch01.adoc index 8ad94b70..c9b12c2b 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -57,7 +57,7 @@ Therefore CF-netCDF does not use codes, but instead relies on controlled vocabul The terms in this document that refer to components of a netCDF file are defined in the NetCDF User's Guide (NUG) <> NUG. Some of those definitions are repeated below for convenience. -aggregated data:: The data of an aggregation variable, after it has been created by an application program. +aggregated data:: The data of an aggregation variable, after it has been created in memory by an application program. aggregated dimension:: A dimension of the aggregated data of an aggregation variable. diff --git a/ch02.adoc b/ch02.adoc index 89325f13..1bcc00ac 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -274,7 +274,7 @@ If a group attribute is defined in a parent group, and one of the child group re An __aggregation variable__ is a variable which has been formed by combining (i.e. aggregating) multiple __fragments__ that are generally stored in __fragment datasets__ that are external to the file containing the aggregation variable, i.e. the __aggregation file__. A fragment is an array of data with sufficient metadata for it to be correctly interpreted in the context of the aggregation, as described by <>. -The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ as an aggregation of the data from each fragment. +The aggregation variable does not contain any actual data, instead it contains instructions on how to create its __aggregated data__ in memory as an aggregation of the data from each fragment. Aggregation provides the utility of being able to view, as a single entity, a dataset that has been partitioned across multiple other datasets, whilst taking up very little extra space on disk (since the aggregation file contains no copies of the data in the fragments). Fragment datasets may be CF-compliant or have any other format, thereby allowing an aggregation variable to act as a CF-compliant view of non-CF datasets. @@ -454,7 +454,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same data type as the aggregation variable. -The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data array in memory. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. +The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data in memory. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. A combination of the following operations may be required to convert the fragment's data to its canonical form: * If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the shape of the canonical form of the fragment's data. @@ -462,13 +462,15 @@ A combination of the following operations may be required to convert the fragmen * Inserting missing size 1 dimensions into the fragment's data (e.g. as required when aggregating two-dimensional fragments into three-dimensional aggregated data). * Transforming the fragment's data to have the same data type as the aggregated data. -Note that some transformations may result in a loss of information (such as could be the case when casting floating point numbers to integers), and the application program may choose to not create the aggregation data. +Note that some transformations may result in a loss of information, such as could be the case when casting floating point numbers to integers. * Transforming missing values in the fragment's data to a value indicated as missing by the aggregation variable. -Note that it is up to the application program to choose a new missing value, from those provided by the aggregation variable, that does not coincide with any non-missing value from any fragment, and if that is not possible then the application program may choose to not create the aggregation data. +Note that it is the responsibility of the creator of the aggregation file to ensure that all non-missing fragment data values do not coincide with any of the missing values indicated by the aggregation variable. * Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). * Unpacking the fragment's data. -Note that if the aggregation variable indicates that the aggregated data are packed (as determined by the attributes defined in <>), then the unpacked fragment data values will represent packed values in the aggregated data. -It is recommended that the aggregated data is not packed, because of the potential for mistakes and confusion. \ No newline at end of file + +Note that if the aggregation variable indicates that the aggregated data values are packed (as determined by the attributes defined in <>), then the canonical fragment data values will represent packed values in the aggregated data. +In this case, the canonical (i.e. unpacked) fragment data values will be further transformed when the aggregation variable's unpacking is applied. +To avoid the potential for mistakes and confusion as to what the canonical fragment data values represent in the aggregated data, it is recommended that the aggregated variable does not include any packing attributes. From 5bd8531856adb537ecbc50639973fc6a0eeaa921 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Fri, 17 May 2024 17:25:54 +0100 Subject: [PATCH 54/59] dev --- ch02.adoc | 5 +---- conformance.adoc | 24 ++++++++++++------------ 2 files changed, 13 insertions(+), 16 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index 1bcc00ac..ea8d96c2 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -470,7 +470,4 @@ Note that it is the responsibility of the creator of the aggregation file to ens * Transforming the fragment's data to have the aggregation variable's units (e.g. as required when aggregating time fragments whose units have different reference date/times). * Unpacking the fragment's data. - -Note that if the aggregation variable indicates that the aggregated data values are packed (as determined by the attributes defined in <>), then the canonical fragment data values will represent packed values in the aggregated data. -In this case, the canonical (i.e. unpacked) fragment data values will be further transformed when the aggregation variable's unpacking is applied. -To avoid the potential for mistakes and confusion as to what the canonical fragment data values represent in the aggregated data, it is recommended that the aggregated variable does not include any packing attributes. +Note that if the aggregation variable indicates that the aggregated data values are packed (as determined by the attributes defined in <>), then the canonical fragment data values will represent packed values in the aggregated data, and so will be subject to the aggregation variable's unpacking. \ No newline at end of file diff --git a/conformance.adoc b/conformance.adoc index e882b06a..d44e8c85 100644 --- a/conformance.adoc +++ b/conformance.adoc @@ -138,33 +138,33 @@ Each aggregated dimension must name a dimension in the file. Each __variable__ must be the name of a variable in the file. The __feature__ keywords must comprise either all three of the `shape`, `location`, and `address` keyords; or else both of the `shape` and `value` keywords. - - The `location` variable must have a string data type. + ** The `location` variable must have a string data type. - - The `location` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. + ** The `location` variable must have the same number of dimensions as there are aggregated dimensions, with the optional addition of one extra trailing dimension. - - The `location` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. + ** The `location` variable's **`substitutions`** attribute, if it exists, must be a string whose value is list of blank-separated word pairs in the form __substitution: replacement__. Each __substitution__ keyword must have the form `${\*}`, where `*` represents any number of any characters. - - A data value of a `location` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. + ** A data value of a `location` variable, after any string substitutions defined by the **`substitutions`** attribute have been applied, must be either an absolute URI or else a relative-path URI reference. - - The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `location` variable. + ** The `address` variable must be either a scalar, or else have the same dimensions in the same order as the `location` variable. - - The `value` variable must have the same number of dimensions as there are aggregated dimensions. + ** The `value` variable must have the same number of dimensions as there are aggregated dimensions. - - The `shape` variable must have an integer data type. + ** The `shape` variable must have an integer data type. - - If there are zero aggregated dimensions then the `shape` variable must a be scalar and contain the value `1`. + ** If there are zero aggregated dimensions then the `shape` variable must a be scalar and contain the value `1`. - - If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional, with the size of the slower-varying dimension (i.e. the number of rows) being the number of aggregated dimensions, and the size of the more rapidly-varying dimension being the size of the largest of the `location` or `value` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. + ** If there are one or more aggregated dimensions then the `shape` variable must be two-dimensional. + *** The size of the slower-varying dimension (i.e. the first dimension in CDL order, representing rows) must be the number of aggregated dimensions. + *** The size of the more rapidly-varying dimension (i.e. the second dimension in CDL order) must be either the size of the largest of the `value` variable dimensions, or else the size of the largest of the `location` variable dimensions, excluding the extra trailing dimension if the `location` variable has one. - - The rows of a two-dimensional `shape` variable correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row's non-missing values must equal the size of its corresponding aggregated dimension. + *** The rows correspond to the aggregated dimensions in the order in which they are defined by the **`aggregated_dimensions`** attribute, and the sum of each row's non-missing values must equal the size of its corresponding aggregated dimension. *Recommendations:* * The following kinds of variable should not be aggregation variables: grid mapping variables, domain variables, mesh topology variables, geometry container variables, and interpolation variables. -* An aggregation variable should not have either of the attributes **`scale_factor`** and **`add_offset`**. - [[section-6]] [[description-of-the-data]] From c5340562fe6d451911ea0885e2ea4d1d7879eaad Mon Sep 17 00:00:00 2001 From: David Hassell Date: Thu, 25 Jul 2024 12:07:16 +0100 Subject: [PATCH 55/59] correct 'value' missing data --- ch02.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ch02.adoc b/ch02.adoc index ea8d96c2..caa2850b 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -430,8 +430,8 @@ See <> and <>. When the data values within a fragment are all the same, for each fragment, the `value` fragment array variable allows each fragment to be represented explicitly by its unique data value, rather than by reference to a fragment dataset. The `value` fragment array variable dimensions correspond to, and have the same sizes as, the fragment array dimensions in the same order as they appear in the conceptual fragment array. The `value` fragment array variable may have any data type, and contains each fragment's unique value. -A fragment that contains wholly missing data is specified by any missing value indicated by the aggregation variable. -See <>, which uses an aggregation ancillary variable to make fragment dataset attributes available to an aggregation data variable. +A fragment that contains wholly missing data is specified by any missing value indicated by the `value` fragment array variable. +See <>, which uses an aggregation ancillary variable to make fragment dataset global attributes available to an aggregation data variable. // Turn section numbering back on :numbered: From 5efc8f9c65e65bdcc74876e1791e5008d8f242e0 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 30 Jul 2024 14:28:51 +0100 Subject: [PATCH 56/59] re-wording --- ch02.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index b825b0ca..c4d44367 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -289,7 +289,7 @@ Aggregation variables may be used as any kind of variable (data variable, coordi Any text applying to a variable in the CF conventions applies in exactly the same way to an aggregation variable in the same role; and any reference to the dimensions or data of a variable applies to the aggregated dimensions or aggregated data, respectively, of an aggregation variable. For instance: -* The dimension of a coordinate variable of an aggregation data variable must be one of the aggregated dimensions of the aggregation data variable. +* The dimension of a coordinate variable of an aggregation data variable must be included as one of the aggregated dimensions of the aggregation data variable. * The name of an aggregation coordinate variable (which is a scalar) must be the same as the name of its single aggregated dimension (identified by its **`aggregated_dimensions`** attribute), just as the name of a coordinate variable (which is one-dimensional) must be the same as the name of its single From 957d53fd841e4b67d24aa91bffbaefbfb1faadc4 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 30 Jul 2024 15:30:26 +0100 Subject: [PATCH 57/59] dev --- appl.adoc | 8 ++++---- ch02.adoc | 6 +++--- toc-extra.adoc | 18 +++++++++--------- 3 files changed, 16 insertions(+), 16 deletions(-) diff --git a/appl.adoc b/appl.adoc index b4f98e0c..8628f7ae 100644 --- a/appl.adoc +++ b/appl.adoc @@ -492,7 +492,7 @@ variables: uid:long_name = "Fragment dataset unique identifiers" ; uid:missing_value = "N/A" ; uid:aggregated_dimensions = "time" ; - uid:aggregated_data = "value: fragment_value + uid:aggregated_data = "value: fragment_value_uid shape: fragment_shape_uid"; // Coordinate variables double time(time) ; @@ -511,7 +511,7 @@ variables: string fragment_location(f_time, f_level, f_latitude, f_longitude) ; string fragment_address ; int fragment_shape(j, i) ; - string fragment_value(f_time) ; + string fragment_value_uid(f_time) ; int fragment_shape_uid(j_uid, i) ; data: @@ -526,10 +526,10 @@ data: 1, _, 73, _, 144, _ ; - fragment_value = "04821b9-7eb5-4046-937b-0bf0588", "056d1ee0-a183-43b3-ae67-1ec632a" ; + fragment_value_uid = "04b9-7eb5-4046-97b-0bf8", "05ee0-a183-43b3-a67-1eca" ; fragment_shape_uid = 3, 9 ; ---- -This example is similar to <>, but now there is the aggregation ancillary variable `uid` which defines its fragments as constant values stored int he `fragment_value` variable,that are intended to be broadcast across its aggregated data. +This example is similar to <>, but now there is the aggregation ancillary variable `uid` which defines its fragments from the constant values stored in the `fragment_value_uid` variable, that are intended to be broadcast across the `time` aggregated dimension. The data for the `level`, `latitude` and `longitude` variables are omitted for clarity. ==== diff --git a/ch02.adoc b/ch02.adoc index c4d44367..89eaa1e2 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -309,13 +309,13 @@ The aggregated dimensions must exist as dimensions in the aggregation file. The fragments which provide the aggregated data are conceptually organised into a __fragment array__ that has the same number of dimensions as the aggregated data. Each dimension of the fragment array is called a __fragment array dimension__, and corresponds to the aggregated dimension with the same position in the aggregated data. The size of a fragment array dimension is equal to the number of fragments that are needed to span its corresponding aggregated dimension. -See <>. +See the <>. The aggregated data are created by concatenating the canonical forms of the fragments' data (see <>) along each fragment array dimension, and in the order in which they appear in the fragment array. [[example-fragment-array]] [caption="Example 2.2. "] -.A schematic representation of a fragment array for aggregated data +.Schematic representation of a fragment array for aggregated data ==== [cols="a,a"] |=============== @@ -384,7 +384,7 @@ The integer-valued `shape` fragment array variable defines the shape of each fra In general, the `shape` fragment array variable is two-dimensional, with the size of the slower-varying dimension (i.e. the first dimension in CDL order, representing rows) being the number of fragment array dimensions, and the size of the more rapidly-varying dimension (i.e. the second dimension in CDL order, representing columns) being the size of the largest fragment array dimension. The rows correspond to the fragment array dimensions in the same order, and each row provides the sizes of the fragments along its corresponding dimension of the fragment array, padded with missing values if there are fewer fragments than the number of columns. The sum of non-missing values in a row must therefore equal the size of the corresponding aggregated dimension. -See <>, which shows the `shape` fragment array variable for the fragment array described by <>. +See <>, which shows the `shape` fragment array variable for the fragment array described by the <>. If the aggregated data is scalar then the `shape` fragment array variable must be a scalar and contain the value `1`. See <>. diff --git a/toc-extra.adoc b/toc-extra.adoc index 90f62195..981aa3e5 100644 --- a/toc-extra.adoc +++ b/toc-extra.adoc @@ -37,7 +37,7 @@ J.5. <> [%hardbreaks] 2.1. <> -2.2 <> +2.2. <> 3.1. <> 3.2. <> 3.3. <> @@ -121,11 +121,11 @@ H.20. <> H.21. <> H.22. <> I.1. <> -L.1 <> -L.2 <> -L.3 <> -L.4 <> -L.5 <> -L.6 <> -L.7 <> -L.8 <> +L.1. <> +L.2. <> +L.3. <> +L.4. <> +L.5. <> +L.6. <> +L.7. <> +L.8. <> From 395332eb21395017abd350434c12431e5bbe86a3 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 20 Aug 2024 08:59:03 +0100 Subject: [PATCH 58/59] clarification on canonical form --- ch02.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ch02.adoc b/ch02.adoc index 89eaa1e2..2f4c94ee 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -455,7 +455,7 @@ The data of a fragment must be converted to its __canonical form__ prior to bein * The fragment's data have the same data type as the aggregation variable. The conversion of the fragment's data to its canonical form is carried out by the application program which is creating the aggregated data in memory. For fragment datasets, the application program may ignore any fragment metadata that are not needed for the conversion to the canonical form, as well as any other variables that might exist in the fragment dataset. -A combination of the following operations may be required to convert the fragment's data to its canonical form: +A combination of some of the following operations may be required to convert the fragment's data to its canonical form: * If, and only if, the fragment's data has been explicitly defined by its unique value (as opposed to being defined by a fragment dataset), broadcasting that value across the shape of the canonical form of the fragment's data. From dc2ee606e6de091f6e6751dd0e32b89a9133d1a0 Mon Sep 17 00:00:00 2001 From: David Hassell Date: Tue, 20 Aug 2024 15:56:07 +0100 Subject: [PATCH 59/59] fix CDl example --- appl.adoc | 1 + 1 file changed, 1 insertion(+) diff --git a/appl.adoc b/appl.adoc index 8628f7ae..a2ba94bf 100644 --- a/appl.adoc +++ b/appl.adoc @@ -516,6 +516,7 @@ variables: data: temperature = _ ; + uid = _ ; time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ; level = ... ; latitude = ... ;