Skip to content

Commit fd79ddc

Browse files
committed
Specify the production/consumption of ontologized properties (#13)
1 parent f25f94c commit fd79ddc

File tree

1 file changed

+97
-0
lines changed

1 file changed

+97
-0
lines changed

docs/dcp2_system_design.rst

+97
Original file line numberDiff line numberDiff line change
@@ -989,6 +989,103 @@ Broad Institute).
989989

990990

991991

992+
Ontologies
993+
==========
994+
995+
The `HCA Metadata Schema`_ designates certain document properties as
996+
ontologized. An *ontologized property* (OP) contains a JSON object referencing a
997+
term in an ontology that is hosted externally, outside of the DCP/2. The shape
998+
of that JSON object is specified by one of the `ontology modules`_ of the `HCA
999+
Metadata Schema`_. All such modules specify at least the following three child
1000+
properties:
1001+
1002+
``ontology``
1003+
optional; the stable and unique identifier of an ontology term
1004+
1005+
``ontology_label``
1006+
optional; a human readable description of the term refered to by the
1007+
``ontology`` child property
1008+
1009+
``text``
1010+
required; a human readable description to fall back on should no term exist
1011+
1012+
.. _ontology modules: https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema/module/ontology
1013+
1014+
1015+
Rules for producers
1016+
-------------------
1017+
1018+
When setting an OP in a metadata document, producers of metadata should
1019+
select the most specific ontology term currently available that best describes
1020+
the experimental facts and satisfies the requirements of the ontology module
1021+
governing the the OP.
1022+
1023+
A) If a sufficiently specific match is found, the producer
1024+
1025+
- sets the ``ontology`` child property of OP to the identifier of the
1026+
selected term and
1027+
1028+
- sets the ``ontology_label`` and ``text`` child properties to the label
1029+
of the selected term.
1030+
1031+
The label of an ontology term can change over time. The producer must keep
1032+
the ``ontology_label`` and ``text`` child properties up to date whenever the
1033+
document is updated. There is no requirement to update the document whenever
1034+
the label changes.
1035+
1036+
B) If no sufficiently specific term exists, but a more general one does, the
1037+
producer
1038+
1039+
- sets the ``ontology``child property of OP to the identifier of the more
1040+
general term,
1041+
1042+
- sets the ``ontology_label``child property to the label of that term and
1043+
1044+
- sets the ``text`` child property of the OP to what they expect the label
1045+
of a hypothetical exact match would be.
1046+
1047+
The producer initiates the process of adding that expected term to the
1048+
ontology. After that term has been added, the producer updates the
1049+
document as described under A).
1050+
1051+
C) Otherwise, the producer
1052+
1053+
- omits the ``ontology`` and ``ontology_label`` child properties of the OP
1054+
and
1055+
1056+
- sets the ``.text``child property of the OP to what they expect the
1057+
label of a hypothetical term would be if it existed.
1058+
1059+
The producer initiates the process of adding that assumed term to the
1060+
ontology. After that term has been added, the producer updates the
1061+
document as described under A).
1062+
1063+
1064+
Rules for consumers
1065+
-------------------
1066+
1067+
When reading an ontologized property (OP) in a metadata document, consumers of
1068+
metadata should read the ``ontology`` child property of the OP, if that child
1069+
property is present. If a description of the term in English (or any other
1070+
language supported by the ontology) is needed, the consumer should look that
1071+
description up in the ontology API referred to by the module governing the OP,
1072+
using the term identifier in the ``ontology`` child property. If a lookup is not
1073+
possible for technical reasons, the producer should read the ``text`` child
1074+
property if present or the ``ontology_label`` otherwise. If both are absent, the
1075+
consumer should raise an error.
1076+
1077+
If the ``.ontology`` child property is absent, the consumer instead reads the
1078+
``text`` child property of the OP.
1079+
1080+
|nn| Under the above rules, if an OP was set under scenario B, consumers will
1081+
ignore the hypothetical label. This leads to a more consistent user experience.
1082+
There is no guarantee that different wranglers come up with different
1083+
hypothetical terms and we don't want the UX to suffer in that case, considering
1084+
that there is at least a partial match available. If an OP was set using
1085+
scenario C, the hypothetical term label is the best we have. In both scenarios
1086+
the producer must update the document once the term becomes available, so the
1087+
degraded UX is only temporary. |ne|
1088+
9921089
Project-level matrices
9931090
======================
9941091

0 commit comments

Comments
 (0)