@@ -939,6 +939,103 @@ import tool with be requested manually in #ingest-to-tdr-shared Slack channel
939
939
940
940
941
941
942
+ Ontologies
943
+ ==========
944
+
945
+ The `HCA Metadata Schema `_ designates certain document properties as
946
+ ontologized. An *ontologized property * (OP) contains a JSON object referencing a
947
+ term in an ontology that is hosted externally, outside of the DCP/2. The shape
948
+ of that JSON object is specified by one of the `ontology modules `_ of the `HCA
949
+ Metadata Schema `_. All such modules specify at least the following three child
950
+ properties:
951
+
952
+ ``ontology ``
953
+ optional; the stable and unique identifier of an ontology term
954
+
955
+ ``ontology_label ``
956
+ optional; a human readable description of the term refered to by the
957
+ ``ontology `` child property
958
+
959
+ ``text ``
960
+ required; a human readable description to fall back on should no term exist
961
+
962
+ .. _ontology modules : https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema/module/ontology
963
+
964
+
965
+ Rules for producers
966
+ -------------------
967
+
968
+ When setting an OP in a metadata document, producers of metadata should
969
+ select the most specific ontology term currently available that best describes
970
+ the experimental facts and satisfies the requirements of the ontology module
971
+ governing the the OP.
972
+
973
+ A) If a sufficiently specific match is found, the producer
974
+
975
+ - sets the ``ontology `` child property of OP to the identifier of the
976
+ selected term and
977
+
978
+ - sets the ``ontology_label `` and ``text `` child properties to the label
979
+ of the selected term.
980
+
981
+ The label of an ontology term can change over time. The producer must keep
982
+ the ``ontology_label `` and ``text `` child properties up to date whenever the
983
+ document is updated. There is no requirement to update the document whenever
984
+ the label changes.
985
+
986
+ B) If no sufficiently specific term exists, but a more general one does, the
987
+ producer
988
+
989
+ - sets the ``ontology `` child property of OP to the identifier of the more
990
+ general term,
991
+
992
+ - sets the ``ontology_label `` child property to the label of that term and
993
+
994
+ - sets the ``text `` child property of the OP to what they expect the label
995
+ of a hypothetical exact match would be.
996
+
997
+ The producer initiates the process of adding that expected term to the
998
+ ontology. After that term has been added, the producer updates the
999
+ document as described under A).
1000
+
1001
+ C) Otherwise, the producer
1002
+
1003
+ - omits the ``ontology `` and ``ontology_label `` child properties of the OP
1004
+ and
1005
+
1006
+ - sets the ``text `` child property of the OP to what they expect the
1007
+ label of a hypothetical term would be if it existed.
1008
+
1009
+ The producer initiates the process of adding that expected term to the
1010
+ ontology. After that term has been added, the producer updates the
1011
+ document as described under A).
1012
+
1013
+
1014
+ Rules for consumers
1015
+ -------------------
1016
+
1017
+ When reading an ontologized property (OP) in a metadata document, consumers of
1018
+ metadata should read the ``ontology `` child property of the OP, if that child
1019
+ property is present. If a description of the term in English (or any other
1020
+ language supported by the ontology) is needed, the consumer should look that
1021
+ description up in the ontology API referred to by the module governing the OP,
1022
+ using the term identifier in the ``ontology `` child property. If a lookup is not
1023
+ possible for technical reasons, the producer should read the ``text `` child
1024
+ property if present or the ``ontology_label `` otherwise. If both are absent, the
1025
+ consumer should raise an error.
1026
+
1027
+ If the ``ontology `` child property is absent, the consumer instead reads the
1028
+ ``text `` child property of the OP.
1029
+
1030
+ |nn | Under the above rules, if an OP was set under scenario B, consumers will
1031
+ ignore the hypothetical label. This leads to a more consistent user experience.
1032
+ There is no guarantee that different wranglers come up with different
1033
+ hypothetical terms and we don't want the UX to suffer in that case, considering
1034
+ that there is at least a partial match available. If an OP was set using
1035
+ scenario C, the hypothetical term label is the best we have. In both scenarios
1036
+ the producer must update the document once the term becomes available, so the
1037
+ degraded UX is only temporary. |ne |
1038
+
942
1039
Project-level matrices
943
1040
======================
944
1041
0 commit comments