-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy path06_chapter06_geospatial.Rmd
3804 lines (3183 loc) · 186 KB
/
06_chapter06_geospatial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output: html_document
---
# Geographic data and services {#chapter06}
<br>
<center>
{width=25%}
</center>
<br>
## Background
To make geographic information discoverable and to facilitate their dissemination and use, the ISO Technical Committee on Geographic Information/Geomatics (ISO/TC211) created a set of metadata standards to describe geographic **datasets** (ISO 19115), geographic **data structures** (ISO 19115-2 / ISO 19110), and geographic **data services** (ISO 19119). These standards have been "unified" into a common XML specification (ISO 19139). This set of standards, known as the ISO 19100 series, served as the cornerstone of multiple initiatives to improve the documentation and management of geographic information such as the [Open Geospatial Consortium (OGC)](https://www.ogc.org/), the [US Federal Geographic Data Committee (FDGC)](https://www.fgdc.gov/), the [European INSPIRE directive](https://inspire.ec.europa.eu), or more recently the [Research Data Alliance (RDA)](https://rd-alliance.org/), among others.
The ISO 19100 standards have been designed to cover the large scope of geographic information. The level of detail they provide goes beyond the needs of most data curators. What we present in this Guide is a subset of the standards, which focuses on what we consider as the core requirements to describe and catalog geographic datasets and services. References and links to resources where more detailed information can be found are provided in appendix.
## Geographic information metadata standards
Geographic information metadata standards cover three types of resources: i) datasets, ii) data structure definitions, and iii) data services. Each one of these three components is the object of a specific standard. To support their implementation, a common XML specification (ISO 19139) covering the three standards has been developed. The geographic metadata standard is however, by far, the most complex and "specialized" of all schemas described in this Guide. Its use requires expertise not only in data documentation, but also in the use of geospatial data. We provide in this chapter some information that readers who are not familiar with geographic data may find useful to better understand the purpose and use of the geographic metadata standards.
### Documenting geographic datasets - The ISO 19115 standard
**Geographic datasets** "identify and depict geographic locations, boundaries and characteristics of features on the surface of the earth. They include geographic coordinates (e.g., latitude and longitude) and data associated to geographic locations (...)". (Source: https://www.fws.gov/gis/)
The ISO 19115 standard defines the structure and content of the metadata to be used to document geographic datasets. The standard is split into two parts covering:
1. **vector data** (ISO 19115-1), and
2. **raster data** including imagery and gridded data (ISO 19115-2).
*Vector* and *raster* spatial datasets are built with different structures and formats. The following summarizes how these two categories differ and how they can be processed using the R software. The descriptions of vector and raster data provided in this chapter are adapted from:
- https://gisgeography.com/spatial-data-types-vector-raster/
- https://datacarpentry.org/organization-geospatial/02-intro-vector-data/index.html]
**Vector data**
Vector data are comprised of **points**, **lines**, and **polygons** (areas).
A vector **point** is defined by a single x, y coordinate. Generally, vector points are a latitude and longitude with a spatial reference frame. A point can for example represent the location of a building or facility. When multiple dots are connected in a set order, they become a vector **line** with each dot representing a **vertex**. Lines usually represent features that are linear in nature, like roads and rivers. Each bend in the line represents a vertex that has a defined x, y location. When a set of 3 or more vertices is joined in a particular order and closed (i.e. the first and last coordinate pairs are the same), it becomes a **polygon**. Polygons are used to show boundaries. They will typically represent lakes, oceans, countries and their administrative subdivisions (provinces, states, districts), building footprints, or outline of survey plots. Polygons have an area (which will correspond to the square-footage for a building footprint, to the acreage for an agricultural plot, etc.)
Vector data are often provided in one of the following file formats:
- ESRI Shapefile (actually a zip set of files; not standard and limited as it is based on an outdated DBF format, but still widely used);
- ESRI GeoDatabase file (not a standard format, but widely used);
- GML: the Official OGC geospatial standard format, used by standard spatial data services;
- GeoPackage: the OGC recommended standard for handling vector data;
- GeoJSON: another OGC standard, often used when a *service* is associated to the data;
- KML/KMZ: [Keyhole Markup Language](https://en.wikipedia.org/wiki/Keyhole_Markup_Language), an XML notation for expressing geographic annotation and visualization within two-dimensional maps and three-dimensional Earth browsers;
- CSV file: Comma-separated values files, with geometries provided in OGC Well-Known-Text (WKT);
- OSM: An XML-formatted file containing "nodes" (points), "ways" (connections), and "relations" from [OpenStreetMap](https://www.openstreetmap.org) format.
-------------
Some examples
--------------
**EXAMPLE 1**
The figure below provides an example of vector data extracted from [Open Street Map](https://www.openstreetmap.org/node/1376501203#map=18/27.47008/89.63725) for a part of the city of Thimphu, Bhutan (as of 17 May, 2021).
<center>
{width=100%}
</center>
The content of this map can be exported as an OSM file.
<center>
{width=40%}
</center>
Multiple applications will allow users to read and process OSM files, including open source software applications like [QGIS](https://www.qgis.org/en/site/) or the R packages [sf](https://cran.r-project.org/package=sf) and [osmdata](https://cran.r-project.org/web/packages/osmdata/vignettes/osmdata.html)
```{r, indent="", eval=F, echo=T}
# Example of a R script that reads and shows the content of the map.osm file
library(sf)
# List the layers contained in the OSM file
lyrs <- st_layers("map.osm")
# Read the layers as sf objects
points <- st_read("map.osm", layer = "points")
lines <- st_read("map.osm", layer = "lines")
polygons <- st_read("map.osm", layer = "multipolygons")
```
**EXAMPLE 2**
In this second example, we use the R `sf` (Simple Features) package to read a shape (vector) file of refugee camps in Bangladesh, downloaded from the Humanitarian [Data Exchange (HDX)](https://data.humdata.org) website:
```{r, eval=F, echo=T}
# Load the sf package and utilities
library(sf)
library(utils)
# Download and unzip the shape file (published by HDX as a compressed zip format)
setwd("E:/my_data")
url <- "https://data.humdata.org/dataset/1a67eb3b-57d8-4062-b562-049ad62a85fd/resource/ace4b0a6-ef0f-46e4-a50a-8c552cfe7bf3/download/200908_rrc_outline_camp_al1.zip"
download.file(url, destfile = "200908_RRC_Outline_Camp_AL1.zip")
unzip("E:/my_data/200908_RRC_Outline_Camp_AL1.zip")
# Read the file and display core information about its content
al1 <- st_read("./200908_RRC_Outline_Camp_AL1/200908_RRC_Outline_Camp_AL1.shp")
print(al1)
plot(al1)
# ------------------------------
# Output of the 'print' command:
# ------------------------------
# Simple feature collection with 35 features and 14 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: 92.12973 ymin: 20.91856 xmax: 92.26863 ymax: 21.22292
# geographic CRS: WGS 84
# First 10 features:
# District Upazila Settlement Union Name_Alias SSID SMSD__Cnam NPM_Name Area_Acres PeriMe_Met
# 1 Cox's Bazar Ukhia Collective site Palong Khali Bagghona-Putibonia CXB-224 Camp 16 Camp 16 (Potibonia) 130.57004 4136.730
# 2 Cox's Bazar Ukhia Collective site Palong Khali <NA> CXB-203 Camp 02E Camp 02E 96.58179 4803.162
# 3 ...
#
# Camp_Name Area_SqM Latitude Longitude geometry
# 1 Camp 16 528946.95881724 21.1563813298438 92.1490685817901 MULTIPOLYGON (((92.15056 21...
# 2 Camp 2E 391267.799744003 21.2078084302778 92.1643360947381 MULTIPOLYGON (((92.16715 21...
# 3 ...
# Output of 'str' command:
# Classes 'sf' and 'data.frame': 35 obs. of 15 variables:
# $ District : chr "Cox's Bazar" "Cox's Bazar" "Cox's Bazar" "Cox's Bazar" ...
# $ Upazila : chr "Ukhia" "Ukhia" "Ukhia" "Ukhia" ...
# $ Settlement: chr "Collective site" "Collective site" "Collective site" "Collective site" ...
# $ Union : chr "Palong Khali" "Palong Khali" "Palong Khali" "Raja Palong" ...
# $ Name_Alias: chr "Bagghona-Putibonia" NA "Jamtoli-Baggona" "Kutupalong RC" ...
# $ SSID : chr "CXB-224" "CXB-203" "CXB-223" "CXB-221" ...
# $ SMSD__Cnam: chr "Camp 16" "Camp 02E" "Camp 15" "Camp KRC" ...
# $ NPM_Name : chr "Camp 16 (Potibonia)" "Camp 02E" "Camp 15 (Jamtoli)" "Kutupalong RC" ...
# $ Area_Acres: num 130.6 96.6 243.3 95.7 160.4 ...
# $ PeriMe_Met: num 4137 4803 4722 3095 4116 ...
# $ Camp_Name : chr "Camp 16" "Camp 2E" "Camp 15" "Kutupalong RC" ...
# $ Area_SqM : chr "528946.95881724" "391267.799744003" "985424.393160958" "387729.666427279" ...
# $ Latitude : chr "21.1563813298438" "21.2078084302778" "21.1606399787906" "21.2120281895357" ...
# $ Longitude : chr "92.1490685817901" "92.1643360947381" "92.1428956454661" "92.1638095873048" ...
# $ geometry :sfc_MULTIPOLYGON of length 35; first list element: List of 1
# This information can be extracted and used to document the data
```
The output of the script shows that the shape file contains 35 features (or "objects"; in this case each object represents a refugee camp) and 14 fields (attributes and variables; including information like the camp name, administrative region, surface area, and more) related to each object.
The *geometry type* (multipolygon) and *dimension* (XY) provide information on the type of object. "All geometries are composed of points. Points are coordinates in a 2-, 3- or 4-dimensional space. All points in a geometry have the same dimensionality. In addition to X and Y coordinates, there are two optional additional dimensions:
- a Z coordinate, denoting the altitude;
- an M coordinate (rarely used), denoting some measure that is associated with the point, rather than with the feature as a whole (in which case it would be a feature attribute); examples could be time of measurement, or measurement error of the coordinates.
The four possible cases then are:
- two-dimensional points refer to x and y, easting and northing, or longitude and latitude, referred to as XY
- three-dimensional points as XYZ
- three-dimensional points as XYM
- four-dimensional points as XYZM (the third axis is Z, the fourth is M)
The following seven simple feature types are the most common:
| Type | Description |
| ------------------ | ------------------------------------------------------------ |
| POINT | zero-dimensional geometry containing a single point |
| LINESTRING | sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry |
| POLYGON | geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring |
| MULTIPOINT | set of points; a MULTIPOINT is simple if no two Points in the MULTIPOINT are equal |
| MULTILINESTRING | set of linestrings |
| MULTIPOLYGON | set of polygons |
| GEOMETRYCOLLECTION | set of geometries of any type except GEOMETRYCOLLECTION |
The remaining ten geometries are rarer : CIRCULARSTRING, COMPOUNDCURVE, CURVEPOLYGON, MULTICURVE, MULTISURFACE, CURVE, SURFACE, POLYHEDRALSURFACE, TIN, TRIANGLE (see https://r-spatial.github.io/sf/articles/sf1.html).
The *geographic CRS* informs us on the coordinate reference system (CRS). Coordinates can only be placed on the Earth's surface when their CRS is known; this may be a spheroid CRS such as WGS 84, a projected, two-dimensional (Cartesian) CRS such as a UTM zone or Web Mercator, or a CRS in three-dimensions, or including time. In our example above, the CRS is the WGS 84 (World Geodetic System 84), a standard for use in cartography, geodesy, and satellite navigation including GPS.
The *bbox* is the bounding box.
Information on a subset (top 10 - only 2 shown above) of the features is displayed in the output of the script, with the list of the 14 available fields.
The `plot(al1)` command in R produces a visualization of the numeric fields in the data file:
<center>
{width=100%}
</center>
All this information represents important components of the metadata, which we will want to capture, enrich, and catalog (together with additional information) using the ISO metadata standard. "Enriching" (or "augmenting") the metadata will consist of providing more contextual information (who produced the data, when, why, etc.) and additional information on the features (e.g., what does the variable 'SMSD__Cnam' represent?).
**Raster data**
**Raster data** are made up of pixels, also referred to as *grid cells*. Satellite imagery and other remote sensing data are raster datasets. Grid cells in raster data are usually (but not necessarily) regularly-spaced and square. Data stored in a raster format is arranged in a grid without storing the coordinates of each cell (pixel). The coordinates of the corner points and the spacing of the grid can be used to calculate (rather than to store) the coordinates of each location in a grid.
Any given pixel in a grid stores one or more values (in one or more bands). For example, each cell (pixel) value in a satellite image has a red, a green, and a blue value. Cells in raster data could represent anything from elevation, temperature, rainfall, land cover, population density, or others. (Source: https://worldbank.github.io/OpenNightLights/tutorials/mod2_1_data_overview.html)
Raster data can be **discrete** or **continuous**. Discrete rasters have distinct themes or categories. For example, one grid cell can represent a land cover class, or a soil type. In a discrete raster, each thematic class can be discretely defined (usually represented by an integer) and distinguished from other classes. In other words, each cell is definable and its value applies to the entire area of the cell. For example, the value 1 for a class might indicate "urban area", value 2 "forest", and value 3 "others". Continuous (or non-discrete) rasters are grid cells with gradual changing values, which could for example represent elevation, temperature, or an aerial photograph.
The difference between vector and raster data, and between different types of vectors, is clearly illustrated in the figure below taken from the World Bank's [Light Every Night GitHub repository](https://worldbank.github.io/OpenNightLights/tutorials/mod2_1_data_overview.html).
<center>
{width=100%}
</center>
In GIS applications, vector and raster data are often combined into multi-layer datasets, as shown in the figure below extracted from the [County of San Bernardino (US) website](http://sbcounty.gov/).
<center>
{width=75%}
</center>
We may occasionally want to convert raster data into vector data. For example, a building footprint layer (vector data, composed of polygons) can be derived from a satellite image (raster data). Such conversions can be implemented in a largely automated manner using machine learning algorithms.
<center>
{width=100%}
</center>
Source: https://blogs.bing.com/maps/2019-09/microsoft-releases-18M-building-footprints-in-uganda-and-tanzania-to-enable-ai-assisted-mapping
Raster data are often provided in one of the following file formats:
- GeoTiFF (standard): Most of the remote sensing data are stored as GeoTIFF files. https://www.ogc.org/standards/geotiff
- NetCDF (standard) https://www.unidata.ucar.edu/software/netcdf/docs/netcdf_introduction.html
- ECW: https://en.wikipedia.org/wiki/ECW_(file_format)
- JPEG 2000: https://fr.wikipedia.org/wiki/JPEG_2000
- MrSid: https://en.wikipedia.org/wiki/MrSID
- ArcGrid (ESRI Grid format)
**GeoTIFF** is a popular file format for raster data. A *Tagged Image File Format* (TIFF or TIF) is a file format designed to store raster-type data. A GeoTIFF file is a TIFF file that contains specific tags to store structured geospatial metadata including:
- Spatial extent: the area coverage of the file
- Coordinate reference system: the projection / coordinate reference system used
- Resolution: the spatial extent of each pixel (spatial resolution)
- Number of layers: number of layers or bands available in the file
TIFF files can be read using (among other options) the R package [*raster*](https://cran.r-project.org/package=raster) or the Python library [*rasterio*](https://pypi.org/project/rasterio/).
GeoTIFF files can also be provided as **Cloud Optimized GeoTIFFS (COGs)**. In COGs, the data are structured in a way that allows them to be shared via web services which allow users to query, visualize, or download a user-defined subset of the content of the file, without having to download the entire file. This option can be a major advantage, as geoTIFF files generated by remote sensing/satellite imagery can be very large. Extracting only the relevant part of a file can save significant time and storage space.
-------------
Some examples
--------------
**EXAMPLE 1**
The first example below shows the spatial distribution of the Ethiopian population in 2020. The data file was downloaded from the [WorldPop](https://www.worldpop.org/) website on 17 May 2021.
<center>
{width=100%}
</center>
```{r, eval=F, echo=T}
# Load the raster R package
library(raster)
# Download a TIF file (spatial distribution of population, Ethiopia, 2020) - 62Mb
setwd("E:/my_data")
url <- "https://data.worldpop.org/GIS/Population/Global_2000_2020_Constrained/2020/maxar_v1/ETH/eth_ppp_2020_constrained.tif"
file_name = basename(url)
download.file(url, destfile = file_name, mode = 'wb')
# Read the file and display core information about its content
my_raster_file <- raster(file_name)
print(my_raster_file)
# ------------------------------
# Output of the 'print' command:
# ------------------------------
# dimensions : 13893, 17983, 249837819 (nrow, ncol, ncell)
# resolution : 0.0008333333, 0.0008333333 (x, y)
# extent : 32.99958, 47.98542, 3.322084, 14.89958 (xmin, xmax, ymin, ymax)
# crs : +proj=longlat +datum=WGS84 +no_defs
# source : E:/my_data/eth_ppp_2020_constrained.tif
# names : eth_ppp_2020_constrained
# values : 1.36248, 847.9389 (min, max)
```
This output shows that the TIF file contains one layer of cells, forming an image of 13,893 by 17,983 cells. It also provides information on the projection system (datum): WGS 84 (World Geodetic System 84). This information (and more) will be part of the ISO-compliant metadata we want to generate to document and catalog a raster dataset.
**EXAMPLE 2**
In the second example, we demonstrate the advantages of Cloud Optimized GeoTIFFS (COGs). We extract information from the World Bank [Light Every Night](https://worldbank.github.io/OpenNightLights/wb-light-every-night-readme.html#) repository.
```{r, eval=F, echo=T}
# Load 'aws.s3' package to access the Amazon Web Services (AWS) Simple Storage Service (s3)
library("aws.s3")
# Load 'raster' package to read the target GeoTiFF
library("raster")
# List files for World Bank bucket 'globalnightlight', setting a max number of items
contents <- get_bucket(bucket = 'globalnightlight', max = 10000)
# Get_bucket_df is similar to 'get_bucket' but returns the list as a dataframe
contents <- get_bucket_df(bucket = 'globalnightlight')
# Access DMSP-OLS data for satellite F12 in 1995
F12_1995 <- get_bucket(bucket = 'globalnightlight',
prefix = "F121995")
# As data.frame, with all objects listed
F12_1995_df <- get_bucket_df(bucket = 'globalnightlight',
prefix = "F121995",
max = Inf)
# Number of objects
nrow(F12_1995_df)
# Save the object
filename <- "F12199501140101.night.OIS.tir.co.tif"
save_object(bucket = 'globalnightlight',
object = "F121995/F12199501140101.night.OIS.tir.co.tif",
file = filename)
# Read it with raster package
rs <- raster(filename)
```
<br>
### Describing data structures - The ISO 19115-2 and ISO 19110 standards
The ISO 19115-2 provides the necessary metadata elements to describe the structure of raster data. The ISO 19115-1 standard does not provide all necessary metadata elements needed to describe the structure of vector datasets. The description of data structures for vector data (also referred to as *feature types*) is therefore often omitted. The ISO 19110 standard solves that issue, by providing the means to document the structure of vector datasets (column names and definitions, codes and value labels, measurement units, etc.), which will contribute to making the data more discoverable and usable.
### Describing data services - The ISO 19119 standard
More and more data are disseminated not in the form of datasets, but as data services via web applications. "Geospatial services provide the technology to create, analyze, maintain, and distribute geospatial data and information." (https://www.fws.gov/gis/) The ISO 19119 standard provides the elements to document such services.
### Unified metadata specification - The ISO/TS 19139 standard
The three metadata standards previously described - ISO 19115 for vector and raster datasets, ISO 19110 for vector data structures, and ISO 19119 for data services, provide a set of concepts and definitions useful to describe the geographic information. To facilitate their practical implementation, a digital specification, which defines how this information is stored and organized in an electronic metadata file, is required. The ISO/TS 19139 standard, an XML specification of the ISO 19115/10110/19119/, was created for that purpose.
The ISO/TS 19139 is a standard used worldwide to describe geographic information. It is the backbone for the implementation of [INSPIRE](https://inspire.ec.europa.eu/) dataset and service metadata in the European Union. It is supported by a wide range of tools, including desktop applications like [Quantum GIS](https://qgis.org/en/site/), [ESRI ArcGIS](https://www.arcgis.com/index.html)), and OGC-compliant metadata catalogs (e.g., [GeoNetwork](https://geonetwork-opensource.org/)) and geographic servers (e.g., [GeoServer](http://geoserver.org/)).
ISO 19139-compliant metadata can be generated and edited using specialized metadata editors such as [CatMDEdit](http://catmdedit.sourceforge.net/) or [QSphere](https://www.fgdc.gov/organization/working-groups-subcommittees/mwg/iso-metadata-editors-registry/qsphere), or using programmatic tools like Java Apache SIS or the R packages [geometa](https://cran.r-project.org/web/packages/geometa/index.html) and [geoflow](https://github.com/eblondel/geoflow), among others.
The ISO 19139 specification is complex. To enable and simplify its use in our NADA cataloguing application, we produced a JSON version of (part of) the standard. We selected the elements we considered most relevant for our purpose, and organized them into the JSON schema described below. For data curators with limited expertise in XML and geographic data documentation, this JSON schema will make the production of metadata compliant with the ISO 19139 standard easier.
## Schema description
Main structure (describe) @@@@
<br>
```json
{
"repositoryid": "string",
"published": 0,
"overwrite": "no",
"metadata_information": {},
"description": {},
"provenance": [],
"tags": [],
"lda_topics": [],
"embeddings": [],
"additional": { }
}
```
<br>
### Introduction to ISO19139
Geographic metadata (for both *datasets* and *services*) should include **core metadata properties**, and metadata **sections** aiming to describe specific aspect of the resource (e.g., resource identification or resource distribution).
The content of some metadata elements is controlled by **codelists** (or **controlled vocabularies**). A codelist is a pre-defined set of values. The content of an element controlled by a codelist should be selected from that list. This may for example apply to the element "language", whose content should be selected from the ISO 639 list and codes codes for language names, instead of being free-text. The ISO 19139 suggests but does not impose codelists. It is highly recommended to make use of the suggested codelists (or of specific codelists that may be promoted by agencies or partnerships).
Some metadata elements (referred to as *common elements*) of the ISO 19139 can be repeated in different parts of a metadata file. For example, a standard set of fields is provided to describe a `contact`, a `citation`, or a `file format`. Such common elements can be used in multiple locations of a metadata file (e.g., to provide information on who the contact person is for information on data quality, on data access, on data documentation, etc.)
In the following sections, we first present the **common elements**, then the elements that form the **core metadata properties** (information on the metadata themselves), followed by the elements from the main **metadata sections** used to describe the data, and finally the **features catalog** elements which are used to document attributes and variables related to vector data (ISO 19110).
### Common sets of elements
Common elements are blocks of metadata fields that can appear in multiple locations of a metadata file. For example, information on `contact` person(s) or organization(s) may have to be provided in the section of the file where we document the production and maintenance of the data, where we document the production and maintenance of the metadata, where we document the distribution and terms of use of the data, etc. Other types of common elements include online and offline resources, file formats, citations, keywords, constraints, and extent. We describe these sets of elements below.
#### Contact / Responsible party
The ISO 19139 specification provides a structured set of metadata elements to describe a **contact**. A contact is the party (person or organization) responsible for a specific task. The following set of elements can be used to describe a contact:
| Element | Description |
| ------------------ | ------------------------------------------------------------ |
| `individualName` | Name of the individual |
| `organisationName` | Name of the organization |
| `positionName` | Position of the individual in the organization |
| `contactInfo` | Contact information. The contact information is divided into 3 sections: `phone`(including either `voice` or `facsimile` numbers; `address`, handling the physical address elements (`deliveryPoint`, `city`, `postalCode`, `country`), contact e-mail (`electronicEmailAddress`), and `onlineResource`, e.g., the URL of the organization website (which includes `linkage`, `name`, `description`, `protocol`, and `function` ; see below) |
| `role` | Role of the person/organization. A recommended controlled vocabulary is provided by ISO 19139, with the following options: `{resourceProvider, custodian, owner, sponsor, user, distributor, originator, pointOfContact, principalInvestigator, processor, publisher, author, coAuthor, collaborator, editor, mediator, rightsHolder, contributor, funder, stakeholder}` |
<br>
```json
"contact": [
{
"individualName": "string",
"organisationName": "string",
"positionName": "string",
"contactInfo": {
"phone": {
"voice": "string",
"facsimile": "string"
},
"address": {
"deliveryPoint": "string",
"city": "string",
"postalCode": "string",
"country": "string",
"electronicMailAddress": "string"
},
"onlineResource": {
"linkage": "string",
"name": "string",
"description": "string",
"protocol": "string",
"function": "string"
}
},
"role": "string"
}
]
```
<br>
#### Online resource
An **online resource** is a common set of elements frequently used in the geographic data/services schema. It can be used for example to provide a link to an organization website, to a data file or to a document, etc. An online resource is described with the following properties:
| Element | Description |
| ------------- | ------------------------------------------------------------ |
| `linkage` | URL of the online resource. In case of a geographic standard data services, only the <u>base URL</u> should be provided, without any service parameter. |
| `name` | Name of the online resource. In case of a geographic standard data services, this should be filled with the identifier of the resource as published in the service. Example, for an OGC Web Map Service (WMS), we will use the layer name. |
| `description` | Description of the online resource |
| `protocol` | Web protocol used to get the resource, e.g., FTP, HTTP. In case of a basic HTTP, the ISO 19139 suggests the value 'WWW:LINK-1.0-http--link'. For geographic standard data services, it is recommended to fill this element with the appropriate protocol identifier. For an OGC Web Map Service (WMS) link for example, use 'OGC:WMS-1.1.0-http-get-map' |
| `function` | Function (purpose) of the online resource. |
<br>
```json
"onlineResource": {
"linkage": "string",
"name": "string",
"description": "string",
"protocol": "string",
"function": "string"
}
```
<br>
#### Offline resource (Medium)
An **offline resource** (medium) is a common set of elements that can be used to describe a physical resource used to distribute a dataset, e.g., a DVD or a CD-ROM. A `medium` is described with the following properties:
| Element | Description |
| ------------- | ------------------------------------------------------------ |
| `name` | Name of the medium, eg. 'dvd'. Recommended code following the [ISO/TS 19139](http://standards.iso.org/iso/19139/resources/gmxCodelists.xmlMD_MediumNameCode) MediumName codelist. Suggested values: {cdRom, dvd, dvdRom, 3halfInchFloppy, 5quarterInchFloppy, 7trackTape, 9trackType, 3480Cartridge, 3490Cartridge, 3580Cartridge, 4mmCartridgeTape, 8mmCartridgeTape, 1quarterInchCartridgeTape, digitalLinearTape, onLine, satellite, telephoneLink, hardcopy} |
| `density` | Density (list of) at which the data is recorded |
| `densityUnit`| Unit(s) of measure for the recording density |
| `volumes` | Number of items in the media identified |
| `mediumFormat` | Method used to write to the medium, e.g. tar . Recommended code following the [ISO/TS 19139](http://standards.iso.org/iso/19139/resources/gmxCodelists.xmlMD_MediumFormatCode) MediumFormat codelist. Suggested values: {cpio, tar, highSierra, iso9660, iso9660RockRidge, iso9660AppleHFS, udf} |
| `mediumNote` | Description of other limitations or requirements for using the medium |
#### File format
The table below lists the ISO 19139 elements used to document a **file format**. A format is defined at a minimum by its `name`. It is also recommended to provide a `version`, and possibly a format `specification`. It is good practice to provide a standardized format name, using the file's *mime type*, e.g., `text/csv`, `image/tiff`. A list of available mime types is available from the [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml) website.
| Element | Description |
| ---------------------------- | ------------------------------------------------- |
| `name` | Format name - *Recommended* |
| `version` | Format version (if applicable) - *Recommended* |
| `amendmentNumber` | Amendment number (if applicable) |
| `specification` | Name of the specification - *Recommended* |
| `fileDecompressionTechnique` | Technique for file decompression (if applicable) |
| `FormatDistributor` | <u>Contact</u>(s) responsible of the distribution |
<br>
```json
"resourceFormat": [
{
"name": "string",
"version": "string",
"amendmentNumber": "string",
"specification": "string",
"fileDecompressionTechnique": "string",
"FormatDistributor": {
"individualName": "string",
"organisationName": "string",
"positionName": "string",
"contactInfo": {},
"role": "string"
}
}
]
```
<br>
#### Citation
The **citation** is another common element that can be used in various parts of a geographic metadata file. Citations are used to provide detailed information on external resources related to the dataset or service being documented. A citation can be defined using the following set of (mostly optional) elements:
| Element | Description |
| ----------------------- | ------------------------------------------------------------ |
| `title` | Title of the resource |
| `alternateTitle` | An alternate title (if applicable) |
| `date` | Date(s) associated to a resource, with sub-elements `date` and `type`. This may include different types of dates. The type of date should be provided, and selected from the controlled vocabulary proposed by the ISO 19139: date of `{creation, publication, revision, expiry, lastUpdate, lastRevision, nextUpdate, unavailable, inForce, adopted, deprecated, superseded, validityBegins, validityExpires, released, distribution}` |
| `edition` | Edition of the resource |
| `editionDate` | Edition date |
| `identifier` | A unique persistent identifier for the metadata. If a DOI is available for the resource, the DOI should be entered here. The same `fileIdentifier` should be used if no other persistent identifier is available. |
| `citedResponsibleParty` | Contact(s)/party(ies) responsible for the resource. |
| `presentationForm` | Form in which the resource is made available. The ISO 19139 recommends the following controlled vocabulary: `{documentDigital, imageDigital, documentHardcopy, imageHardcopy, mapDigital, mapHardcopy, modelDigital, modelHardcopy, profileDigital, profileHardcopy, tableDigital, tableHardcopy, videoDigital, videoHardcopy, audioDigital, audioHardcopy, multimediaDigital, multimediaHardcopy, physicalSample, diagramDigital, diagramHardcopy}`. For a geospatial dataset or web-layer, the value `mapDigital` will be preferred. |
| `series` | A description of the series, in case the resource is part of a series. This include the series `name`, `issueIdentification` and `page` |
| `otherCitationDetails` | Any other citation details to specify |
| `collectiveTitle` | A title in case the resource is part of a broader resource (e.g., data collection) |
| `ISBN` | International Standard Book Number (ISBN); an international standard identification number for uniquely identifying publications that are not intended to continue indefinitely. |
| `ISSN` | International Standard Serial Number (ISSN); an international standard for serial publications. |
<br>
```json
"citation": {
"title": "string",
"alternateTitle": "string",
"date": [
{
"date": "string",
"type": "string"
}
],
"edition": "string",
"editionDate": "string",
"identifier": {
"authority": "string",
"code": null
},
"citedResponsibleParty": [],
"presentationForm": [
"string"
],
"series": {
"name": "string",
"issueIdentification": "string",
"page": "string"
},
"otherCitationDetails": "string",
"collectiveTitle": "string",
"ISBN": "string",
"ISSN": "string"
}
```
<br>
#### Keywords
**Keywords** contribute significantly to making a resource more discoverable. Entering a list of relevant keywords is therefore highly recommended. Keywords can, but do not have to be selected from a controlled vocabulary (thesaurus). Keywords are documented using the following elements:
| Element | Description |
| --------------- | ------------------------------------------------------------ |
| `type` | Keywords type. The ISO 19139 provides a recommended controlled vocabulary with the following options: {`dataCenter`, `discipline`, `place`, `dataResolution`, `stratum`, `temporal`, `theme`, `dataCentre`, `featureType`, `instrument`, `platform`, `process`, `project`, `service`, `product`, `subTopicCategory`} |
| `keyword` | The keyword itself. When possible, existing vocabularies should be preferred to writing free-text keywords. An example of global vocabulary is the [*Global Change Master Directory*](http://vocab.nerc.ac.uk/collection/P04/current/) that could be a valuable source to reference data domains / disciplines, or the [UNESCO Thesaurus](http://vocabularies.unesco.org/browser/thesaurus/en/).
| `thesaurusName` | A reference to a thesaurus (if applicable) from which the keywords are extracted. The thesaurus itself should then be documented as a <u>citation</u>. |
<br>
```json
"keywords": [
{
"type": "string",
"keyword": "string",
"thesaurusName": "string"
}
]
```
<br>
#### Constraints @@@@ not clear. where is the element useLimitations? ... what are the elements used in the schema?
The **constraints** common set of elements will be used to document *legal* and *security* constraints associated with the documented dataset or data service. Both types of constraints have one property in common, `useLimitation`, used to describe the use limitation(s) as free text.
<br>
```json
"resourceConstraints": [
{
"legalConstraints": {
"useLimitation": [
"string"
],
"accessConstraints": [
"string"
],
"useConstraints": [
"string"
],
"otherConstraints": [
"string"
]
},
"securityConstraints": {
"useLimitation": [
"string"
],
"classification": "string",
"userNote": "string",
"classificationSystem": "string",
"handlingDescription": "string"
}
}
]
```
<br>
In addition to the `useLimitation` element, **legal constraints** (`legalConstraints`) can be described using the following three metadata elements:
| Element | Description |
| ------------------- | ------------------------------------------------------------ |
| `accessConstraints` | Access constraints. The ISO 19139 provides a controlled vocabulary with the following options: `{copyright, patent, patentPending, trademark, license, intellectualPropertyRights, restricted, otherRestrictions, unrestricted, licenceUnrestricted, licenceEndUser, licenceDistributor, private, statutory, confidential, SBU, in-confidence}` |
| `useConstraints` | Use constraints. To be entered as free text. Filling this element will depend on the resource that is described. As best practice recommended to fill this element, this is where *terms of use*, *disclaimers*, *preferred citation* or* even *data limitations* can be captured |
| `otherConstraints` | Any other constraints related to the resource. |
In addition to the `useLimitation` element, **security constraints** (`securityConstraints`) - which applies essentially to *classified* resources - can be described using the following four metadata elements:
| Element | Description |
| ---------------------- | ------------------------------------------------------------ |
| `classification` | Classification code. The ISO 19139 provides a controlled vocabulary with the following options: `{unclassified, restricted, confidential, secret, topSecret, SBU, forOfficialUseOnly, protected, limitedDistribution}` |
| `userNote` | Note to users (free text) |
| `classificationSystem` | Information on the system used to classify the information. Organizations may have their own system to `classify` the information. |
| `handlingDescription` | Additional free-text description of the classification |
#### Extent
The **extent** defines the boundaries of the dataset in space (horizontally and vertically) and in time. The ISO 19139 standard defines the extent as follows:
Element | Description |
| ------------- | ------------------------------------------------------------ |
| `geographicElement` | Spatial (horizontal) extent element. This can be defined either with a `geographicBoundingBox` providing the coordinates bounding the limits of the dataset, by means of four properties: `southBoundLongitude`, `westBoundLongitude`, `northBoundLongitude`, `eastBoundLongitude` (recommended); or using `geographicDescription` - free text that defines the area covered. When the dataset covers one or more countries, it is recommended to enter the country names in this element, as it can then be used in data catalogs for filtering by geography.|
| `verticalElement` | Spatial (vertical) extent element, providing two properties: `minimumValue`, `maximumValue` and `verticalCRS` (reference to the vertical coordinate reference system) |
| `temporalElement` | Temporal extent element. Depending on the temporal characteristics of the dataset, this will consist in a `TimePeriod` (made of a `beginPosition` and `endPosition`) or a `TimeInstant` (made of a single `timePosition`) referencing date/time information according to [ISO 8601](https://www.iso.org/iso-8601-date-and-time-format.html) |
<br>
```json
"extent": {
"geographicElement": [
{
"geographicBoundingBox": {
"westBoundLongitude": -180,
"eastBoundLongitude": -180,
"southBoundLatitude": -180,
"northBoundLatitude": -180
},
"geographicDescription": "string"
}
],
"temporalElement": [
{
"extent": null
}
],
"verticalElement": [
{
"minimumValue": 0,
"maximumValue": 0,
"verticalCRS": null
}
]
}
```
<br>
### Core metadata properties
A set of elements is provided in the ISO 19139 to document the core properties of the <u>metadata</u> (not the data). With a few exceptions, these elements apply to the metadata related to datasets and data services. The table below summarizes these elements and their applicability. A description of the elements follows.
| Property | Description | Used in *dataset metadata* | Used in *service metadata* |
| ------------------------- | ------------------------------------------------------------ | ------------------------------- | ------------------------------- |
| `fileIdentifier` | Unique persistent identifier for the resource | Yes | - |
| `language` | Main language used in the metadata description | Yes | Yes |
| `characterSet` | Character set encoding used in the metadata description | Yes | Yes |
| `parentIdentifier` | Unique persistent identifier of the parent resource (if any) | Yes | Yes |
| `hierarchyLevel` | Scope(s) / hierarchy level(s) of the resource. List of pre-defined values suggested by the ISO 19139. See details below. | Yes | Yes |
| `hierarchyLevelName` | Alternative name definitions for hierarchy levels | Yes | Yes |
| `contact` | <u>contact</u>(s) associated to the metadata, i.e. persons/organizations in charge of the metadata create/edition/maintenance. For more details, see section on ***common elements*** | Yes | Yes |
| `dateStamp` | Date and time when the metadata record was created or updated | Yes | Yes |
| `metadataStandardName` | Reference or name of the metadata standard used. | Yes | Yes |
| `metadataStandardVersion` | Version of the metadata standard. For the ISO/TC211, the version corresponds to the creation/revision year. | Yes | Yes |
| `dataSetURI` | Unique persistent link to reference the database | Yes | - |
<br>
```json
"description": {
"idno": "string",
"language": "string",
"characterSet": {
"codeListValue": "string",
"codeList": "string"
},
"parentIdentifier": "string",
"hierarchyLevel": [],
"hierarchyLevelName": [],
"contact": [],
"dateStamp": "string",
"metadataStandardName": "string",
"metadataStandardVersion": "string",
"dataSetURI": "string"
}
```
<br>
#### Resource identifier (`idno`)
The `idno` must provide a unique and persistent identifier for the resource (dataset or service). A common approach consists in building a _semantic identifier_, constructed by concatenating some owner and data characteristics. Although this approach offers the advantages of readability of the identifier, it may not guarantee its global uniqueness and its persistence in time. The use of time periods and/or geographic extents as components of a file identifier is not recommended, as these elements may evolve over time. The use of random identifiers such as the Universally Unique Identifiers (UUID) is sometimes suggested as an alternative, but this approach is also not recommended. The use of [Digital Object Identifiers](https://www.doi.org/) (DOI) as global and unique file identifiers is recommended.
#### Language (`language`)
The metadata language refers to the main language used in the metadata. The recommended practice is to use the [ISO 639-2 Language Code List](http://www.loc.gov/standards/iso639-2/) (also known as the alpha-3 language code), e.g. 'eng' for English or 'fra' for French.
#### Character set (`characterSet`)
The character set encoding of the metadata description. The best practice is to use the `utf8` encoding codelist value (UTF-8 encoding). It is capable of encoding all valid character code points in Unicode, a standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The World Wide Web Consortium recommends UTF-8 as the default encoding in XML and HTML. UTF-8 is the most common encoding for the World Wide Web. Many text editors will provide you with an option to save your metadata (text) files in UTF-8, which will often be the default option (see below the example of Notepad++ and R Studio).
<center>
{width=100%}
</center>
#### Parent Identifier (`parentIdentifier`)
A geographic data resource can be a subset of a larger dataset. For example, an aquatic species distribution map can be part of a data collection covering all species, or the 2010 population census dataset of a country can be part of a dataset that includes all population censuses for that country since 1900. In such case, the _parent identifier_ metadata element can be used to identify this higher-level resource. As for the `fileIdentifier`, the `parentIdentifier` must be a unique identifier persistent in time. In a data catalog, a `parentIdentifier` will allow the user to move from one dataset to another. The `parentIdentifier` is generally applied to *datasets*, although it may in some cases be used in *data services* descriptions.
#### Hierarchy level(s) (`hierarchyLevel`)
<br>
```json
"hierarchyLevel": [
"string"
]
```
<br>
The `hierarchylevel` defines the scope of the resource. It indicates whether the resource is a collection, a dataset, a series, a service, or another type of resource. The ISO 19139 provides a controlled vocabulary for this element. It is recommended but not mandatory to make use of it. The most relevant levels for the purpose of cataloguing geographic data and services are **dataset** (for both raster and vector data), **service** (a capability which a service provider entity makes available to a service user entity through a set of interfaces that define a behavior), and **series**. `Series` will be used when the data represent an ordered succession, in time or in space; this will typically apply to time series, but it can also be used to describe other types of series (e.g., a series of ocean water temperatures collected at a succession of depths).
The recommended controlled vocabulary for `hierarchylevel` includes: `{dataset, series, service, attribute, attributeType, collectionHardware, collectionSession, nonGeographicDataset, dimensionGroup, feature, featureType, propertyType, fieldSession, software, model, tile, initiative, stereomate, sensor, platformSeries, sensorSeries, productionSeries, transferAggregate, otherAggregate}`
#### Hierarchy level name(s) (`hierarchyLevelname`)
<br>
```json
"hierarchyLevelName": [
"string"
]
```
<br>
The `hierarchyLevelName` provides an alternative to describe hierarchy levels, using free text instead of a controlled vocabulary. The use of `hierarchyLevel` is preferred to the use of `hierarchylevelName`.
#### Contact(s) (`contact`)
The `contact` element is a common element described in the ***common elements*** section of this chapter. When associated to the **metadata**, it is used to identify the person(s) or organization(s) in charge of the creation, edition, and maintenance of the <u>metadata</u>. The contact(s) responsible for the metadata are not necessarily the ones who are responsible for the dataset/service creation/edition/maintenance. The latter will be documented in the dataset identification elements of the metadata file.
#### Date stamp (`dateStamp`)
The date stamp associated to the metadata. The metadata date stamp may be automatically filled by metadata editors, and will ideally use the standard ISO 8601 date format: YYYY-MM-DD (possibly with a time).
#### Metadata standard name (`metadataStandardName`)
The name of the geographic metadata standard used to describe the resource. The recommended values are:
* in the case of vector dataset metadata: *ISO 19115 Geographic information - Metadata*
* in the case of grid/imagery dataset metadata: *ISO 19115-2 Geographic Information - Metadata Part 2 Extensions for imagery and gridded data*
* in the case of service metadata: *ISO 19119 Geographic information - Services*
#### Metadata standard version (`metadataStandardVersion`)
The version of the metadata standard being used. It is good practice to enter the standard's inception/revision year. ISO standards are revised with an average periodicity of 10-year. Although the ISO TC211 geographic information metadata standards have been reviewed, it is still accepted to refer to the original version of the standard as many information systems/catalogs still make use of that version.
The recommended values are:
* in the case of vector dataset metadata: *ISO 19115:2003*
* in the case of grid/imagery dataset metadata: *ISO 19115-2:2009*
* in the case of service metadata: *ISO 19119:2005*
#### Dataset URI (`datasetURI`)
A unique resource identifier for the dataset, such as a web link that uniquely identifies the dataset. The use of a [Digital Object Identifier (DOI)](https://www.doi.org/) is recommended.
### Main metadata sections
Geographic data can be diverse and complex. Users need detailed information to discover data and to use them in an informed and responsible manner. The core of the information on data will be provided in various *sections* of the metadata file. This will include information on the type of data, on the coordinate system being used, on the scope and coverage of the data, on the format and location of the data, on possible quality issues that users need to be aware of, and more. The table below summarizes the main metadata *sections*, by order of appearance in the ISO 19139 specification.
<br>
```json
"description": {
"spatialRepresentationInfo": [],
"referenceSystemInfo": [],
"identificationInfo": [],
"contentInfo": [],
"distributionInfo": {},
"dataQualityInfo": [],
"metadataMaintenance": {}
}
```
<br>
| Section | Description | Usability in *dataset metadata* | Usability in *service metadata* |
| --------------------------- | ------------------------------------------------------------ | ------------------------------- | ------------------------------- |
| `spatialRepresentationInfo` | The spatial representation of the dataset. Distinction is made between *vector* and *grid* (raster) spatial representations. | Yes | - |
| `referenceSystemInfo` | The reference systems used in the resource. In practice, this will often be limited to the geographic coordinate system. | Yes | Yes |
| `identificationInfo` | Identifies the resource, including descriptive elements (eg. title, purpose, abstract, keywords) and contact(s) having a role in the resource provision. See details below | Yes | Yes |
| `contentInfo` | The content of a dataset resource, i.e. how the dataset is structured (dimensions, attributes, variables, etc.). In the case of *vector* datasets, this relates to separate metadata files compliant with the ISO 19110 standard (Feature Catalogue). In the case of *raster* / *gridded* data, this is covered by the ISO 19115-2 extension for imagery and gridded data. | Yes | - |
| `distributionInfo` | The mode(s) of distribution of the resource (format, online resources), and by whom it is distributed. | Yes | Yes |
| `dataQualityInfo` | The quality reports on the resource (dataset or service), and in case of *datasets*, on the provenance / lineage information giving the process steps performed to obtain the dataset resource. | Yes | Yes |
| `metadataMaintenanceInfo` | The metadata maintenance cycle operated for the resource. | Yes | Yes |
These sections are described in more detail below.
#### Spatial representation (`spatialRepresentationInfo`)
<br>
```json
"spatialRepresentationInfo": [
{
"vectorSpatialRepresentation": {
"topologyLevel": "string",
"geometricObjects": [
{
"geometricObjectType": "string",
"geometricObjectCount": 0
}
]
},
"gridSpatialRepresentation": {
"numberOfDimensions": 0,
"axisDimensionProperties": [
{
"dimensionName": "string",
"dimensionSize": 0,
"resolution": 0
}
],
"cellGeometry": "string",
"transformationParameterAvailability": true
}
}
]
```
<br>
Information on the spatial representation is critical to properly describe a geospatial dataset. The ISO/TS 19139 distinguishes two types of spatial representations, characterized by different properties.
The **vector spatial representation** describes the topology level and the geometric objects of **vector datasets** using the following two properties:
- **Topology level** (`topologyLevel`) is the type of topology used in the vector spatial dataset. The ISO 19139 provides a controlled vocabulary with the following options: `{geometryOnly, topology1D, planarGraph, fullPlanarGraph, surfaceGraph, fullSurfaceGraph, topology3D, fullTopology3D, abstract}`. In most cases, vector datasets will be described as ``geometryOnly`` which covers common geometry types (points, lines, polygons).
- **Geometric objects** (`geometricObjects`) will define:
- Geometry type (`geometricObjectType`): The type of geometry handled. Possible values are: `{complex, composite, curve, point, solid, surface}`.
- Geometry count (`geometricObjectCount`): The number (count) of geometries in the dataset.
In the case of an homogeneous geometry type, a single `geometricObjects`element can be defined. For complex geometries (mixture of various geometry types), one `geometricObjects` element will be defined for each geometry type.
The **grid spatial representation** describes gridded (raster) data using the following three properties:
- **Number of dimensions** (`numberOfDimensions`) in the grid.
- **Axis dimension properties** (`axisDimensionProperties`): a list of each dimension including, for each dimension:
- The name of the dimension type (`dimensionName`): the ISO 19139 provides a controlled vocabulary with the following options: `{row, column, vertical, track, crossTrack, line, sample, time}`. These options represent the following:
- row: ordinate (y) axis
- column: abscissa (x) axis
- vertical: vertical (z) axis
- track: along the direction of motion of the scan point
- crossTrack: perpendicular to the direction of motion of the scan point
- line: scan line of a sensor
- sample: element along a scan line
- time: duration
In the Ethiopia population density file we used as an example of raster data, the types of dimensions will be row and column as the file is a spatial 2D raster. If we had a data with elevation or time dimensions, we would use respectively "vertical" and "time" dimension as name types.
- The dimension size (`dimensionSize`): the length of the dimension.
- The dimension resolution: a resolution number associated to a unit of measurement. This is the resolution of the grid cell dimension. For example:
- for longitude/latitude dimensions, and a grid at 1deg x 5deg, the 'row' dimension will have a resolution of 1 deg, and the 'column' dimension will have a resolution of 5 deg
- for a "vertical" dimension, this will represent the elevation step. For example, the vertical resolution of the mean Ozone concentration between 40m and 50m altitude at a location of longitude x/ latitude y would be 10 m.
- similar: in case of a spatial-temporal grid, the "time" resolution will represent the time lag (e.g., 1 year, 1 month, 1 week, etc.) between two measures.<br><br>
- **Cell geometry type** (`cellGeometry`): The type of geometry used for grid cells. Possible values are: `{point, area, voxel, stratum}` Most "grids" are commonly area-based, but in principle a grid goes beyond this and the grid cells can target a point, an area, or a volume.
- point: each cell represents a point
- area: each cell represents an area
- voxel: each cell represents a volumetric measurement on a regular grid in a three dimensional space
- stratum: height range for a single point vertical profile
#### Reference system(s) (`referenceSystemInfo`)
The reference system(s) typically (but not necessarily) applies to the **geographic reference system** of the dataset. Multiple reference systems can be listed if a dataset is distributed with different spatial reference systems. This block of elements may also apply to *service metadata*. A spatial web-service may support several map projections / geographic coordinate reference systems.
<br>
```json
"referenceSystemInfo": [
{
"code": "string",
"codeSpace": "string"
}
]
```
<br>
A reference system is defined by two properties:
- the **identifier** of the reference system. The recommended practice is to use to the `Spatial Reference IDentifier` (SRID) number. For example, the SRID of the World Geodetic System (WGS 84) is *4326*.
- the **code space** of the source authority providing the SRID. The best practice is to use the [EPSG](https://epsg.org/home.html) authority code `EPSG` (as most of geographic reference systems are registered in it). Codes from other authorities can be used to define ad-hoc projections, for example:
- ESRI:54012 (Eckert IV equal area projection)
- EPSG:4326 ([World Geodetic System 84 - aka WGS84](https://epsg.org/crs_4326/WGS-84.html)), the system used for GPS
- EPSG:3857 ([Web Mercator / Pseudo-Mercator](https://epsg.org/crs_3857/WGS-84-Pseudo-Mercator.html)) - widely used for map visualization from web map tile providers.
The main reference system registry is [EPSG](https://epsg.org/search/by-name), which provides a "search by name" tool for users who need to find a SRID (global or local/country-specific). Other websites reference geographic systems, but are not authoritative sources including http://epsg.io/ and https://spatialreference.org/ The advantage of these sites is that they go beyond the EPSG registry, and handle other specific registries given by providers like ESRI.
The following ESRI projections could be relevant, in particular those in support of world equal-area projected maps (maps conserving area proportions):
- [ESRI:54012 (Eckert IV)](https://epsg.io/54012)
- [ESRI:54009 (Mollweide)](https://epsg.io/54009)
- [ESRI:54030 (Robinson)](https://epsg.io/54030)
#### Identification (`identificationInfo`)
The **identification information** (`identificationInfo`) is where the citation elements of the resource will be provided. This may include descriptive information like `title`, `abstract`, `purpose`, `keywords`, etc., and identification of the parties/contact(s) associated with the resource, such as the *owner*, *publisher*, *co-authors*, etc. Providing and publishing detailed information in these elements will contribute significantly to improving the discoverability of the data.
<br>
```json
"identificationInfo": [
{
"citation": {},
"abstract": "string",
"purpose": "string",
"credit": "string",
"status": "string",
"pointOfContact": [],
"resourceMaintenance": [],
"graphicOverview": [],
"resourceFormat": [],
"descriptiveKeywords": [],
"resourceConstraints": [],
"resourceSpecificUsage": [],
"aggregationInfo": {},
"extent": {},
"spatialRepresentationType": "string",
"spatialResolution": {},
"language": [],
"characterSet": [],
"topicCategory": [],
"supplementalInformation": "string",
"serviceIdentification": {}
}
]
```
<br>
The identification of a resource includes elements that are common to both *datasets* and *data services*, and others that are specific to the type of resource. The following table summarizes the **identification elements** that can be used for *dataset*, *service*, or both.
**Identification elements applicable to datasets and data services**
The following metadata elements apply to resources of type **dataset** and **service**.
| Element | Description |
| ----------------------- | ------------------------------------------------------------ |
| `citation` | A <u>citation</u> set of elements that will describe the dataset/service from a citation perspective, including `title`, associated contacts, etc. For more details, see section on ***common elements*** |
| `abstract` | An abstract for the dataset/service resource |
| `purpose` | A statement describing the purpose of the dataset/service resource |
| `credit` | Credit information. |
| `status` | Status of the resource, with the following recommended controlled vocabulary: `{completed, historicalArchive, obsolete, onGoing, planned, required, underDevelopment, final, pending, retired, superseded, tentative, valid, accepted, notAccepted, withdrawn, proposed, deprecated}` |
| `pointOfContact` | One ore more points of <u>contacts</u> to associate with the resource. People that can be contacted for information on the dataset/service. For more details, see section `contact` in the ***common elements*** section of the chapter. |
| `resourceMaintenance` | Information on how the resource is maintained, essentially informing on the maintenance and update frequency (`maintenanceAndUpdateFrequency`). This frequency should be chosen among possible values recommended by the ISO 19139 standard: `{continual, daily, weekly, fortnightly, monthly, quarterly, biannually, annually, asNeeded, irregular, notPlanned, unknown}`. |
| `graphicOverview` | One or more graphic overview(s) that provide a visual identification of the dataset/service. e.g., a link to a map overview image. A `graphicOverview` will be defined with 3 properties `fileName` (or URL), `fileDescription`, and optionally a `fileType`. |
| `resourceFormat` | Resource format(s) description. For more details on how to describe a <u>format</u>, see the ***common elements*** section of the chapter. |
| `descriptiveKeywords` | A set of keywords that describe the dataset. Keywords are grouped by keyword type, with the possibility to associate a thesaurus (if applicable). For more details how to describe <u>keywords</u>, see the ***common elements*** section of the chapter.|
| `resourceConstraints` | Legal and/or Security *constraints* associated to the resource. For more details how to describe <u>constraints</u>, see the ***common elements*** section of the chapter|
| `resourceSpecificUsage` | Information about specific usage(s) of the dataset/service, e.g., a research paper, a success story, etc. |
| `aggregationInfo` | Information on an aggregate or parent resource to which the resource belongs, i.e. a collection. |
<br>
Resource maintenance
<br>
```json
"resourceMaintenance": [
{
"maintenanceAndUpdateFrequency": "string"
}
]
```
<br>
Graphic overview
<br>
```json
"graphicOverview": [
{
"fileName": "string",
"fileDescription": "string",