@@ -37,9 +37,13 @@ EBI Ingest (the primary channel for incorporating projects into the DCP/2), an
37
37
adapter for processing analysis (meta)data from Terra workspaces, and adapters
38
38
for high-level matrix data from a range of sources.
39
39
40
- All metadata is in JSON format and complies with the HCA Metadata Schema.
41
- Aside from minor schema changes that were necessary for processing the staging
42
- areas, the evolution of the schema is currently on hold.
40
+ All metadata is in JSON format and complies with the `HCA Metadata Schema `_.
41
+ Changes to that schema are made according to standard `DCP/2 operating
42
+ procedures `_.
43
+
44
+ .. _HCA Metadata Schema : https://github.com/HumanCellAtlas/metadata-schema
45
+
46
+ .. _DCP/2 operating procedures : dcp2_operating_procedures.rst
43
47
44
48
The DCP/2 only contains public (meta)data (not controlled access).
45
49
@@ -309,7 +313,7 @@ follows:
309
313
UUID. Pick the row with the highest version.
310
314
311
315
ii. read the ``inputs ``, ``outputs `` and ``protocols `` properties (they're
312
- all lists).
316
+ all lists).
313
317
314
318
For each input, output and protocol, extract the schema type and
315
319
entity ID. Query the TDR table that corresponds to the schema type and
@@ -453,7 +457,7 @@ mapping between the two. Similarly, instead of allocating a random UUIDv4 for
453
457
the descriptor ``file_id `` one could also derive a UUIDv5 from the SHA-1 or
454
458
SHA-256 hashes of the data file's content.
455
459
456
- .. [# ]
460
+ .. [# ]
457
461
If a file is referenced by multiple bundles using different file names, the
458
462
DSS adapter stages multiple objects with the same content. This case occurs
459
463
in the wild, but is of negligible impact (< 1% in volume, zarr store
@@ -503,7 +507,7 @@ files to an ``analysis_process`` in the ``links`` table (`metadata-schema
503
507
Naming datasets and snapshots
504
508
-----------------------------
505
509
506
- |nn |
510
+ |nn |
507
511
508
512
This section contains specific details that anticipate that the DCP/2
509
513
will soon need to support multiple snapshots of per catalog, at least one per
@@ -528,7 +532,7 @@ snapshots:
528
532
labelling, sorting and filtering are available when listing datasets and
529
533
snapshots using the TDR API. Additionally, IDs are hard to read to the
530
534
human eye, and hard to distinguish visually, so as long as we manually
531
- confer them between teams, names are preferred.
535
+ confer them between teams, names are preferred.
532
536
533
537
|ne |
534
538
@@ -655,7 +659,7 @@ descriptors, one for metadata files and one for ``links.json`` files.
655
659
656
660
where
657
661
658
- ``entity_type``
662
+ ``entity_type``
659
663
is the `HCA schema entity type`_ such as ``cell_suspension``.
660
664
661
665
``entity_id``
@@ -695,7 +699,7 @@ descriptors, one for metadata files and one for ``links.json`` files.
695
699
696
700
where
697
701
698
- ``file_name``
702
+ ``file_name``
699
703
is the ``file_name`` property from the file descriptor object for this
700
704
data file.
701
705
@@ -705,7 +709,7 @@ descriptors, one for metadata files and one for ``links.json`` files.
705
709
706
710
where
707
711
708
- ``links_id``
712
+ ``links_id``
709
713
is a UUID that uniquely identifies the subgraph. The DSS adapter uses the
710
714
bundle UUID.
711
715
@@ -1061,19 +1065,19 @@ EBNF/Regex, starting at the ``strata`` non-terminal::
1061
1065
strata = "" | stratum , { "\n" , stratum }
1062
1066
1063
1067
stratum = point , { ";" , point }
1064
-
1068
+
1065
1069
point = dimension , "=" , values
1066
-
1070
+
1067
1071
dimension = "genusSpecies" | "organ" | "developmentStage" | "libraryConstructionApproach"
1068
-
1072
+
1069
1073
values = value , { "," , value }
1070
-
1074
+
1071
1075
value = [^\n;=,]+
1072
1076
1073
1077
Examples:
1074
1078
1075
- - Not stratified::
1076
-
1079
+ - Not stratified::
1080
+
1077
1081
""
1078
1082
1079
1083
- Stratified::
@@ -1231,8 +1235,8 @@ information about CGMs.
1231
1235
CGM in the deprecated mechanism (`Describing CGMs as supplementary files `_)
1232
1236
1233
1237
- the ``analysis_protocol `` contains an optional ``matrix `` module schema
1234
- containing the properties ``data_normalization_methods `` and
1235
- ``derivation_process ``
1238
+ containing the properties ``data_normalization_methods `` and
1239
+ ``derivation_process ``
1236
1240
1237
1241
Traversing the approximate CGM subgraphs, the Azul indexer infers a
1238
1242
stratification tree of exactly the same structure as the one it derives from
@@ -1242,7 +1246,7 @@ mechanism (`Describing CGMs as supplementary files`_). The Data Browser
1242
1246
exposes that tree in the same manner on the project details page. The inferral
1243
1247
algorithm is identical to the one used for ``DCP/2-generated matrices `` with
1244
1248
the one distinction that the subgraphs in the latter are exact, not
1245
- approximate.
1249
+ approximate.
1246
1250
1247
1251
Additionally, the CGM analysis files are listed on the Files tab of the Data
1248
1252
Browser.
0 commit comments