Skip to content

Commit 9491dd6

Browse files
committed
Specify the production/consumption of ontologized properties (#13)
1 parent 685ce17 commit 9491dd6

File tree

1 file changed

+99
-2
lines changed

1 file changed

+99
-2
lines changed

docs/dcp2_system_design.rst

+99-2
Original file line numberDiff line numberDiff line change
@@ -733,7 +733,7 @@ descriptors, one for metadata files and one for subgraphs.
733733
of one entity to coexist in a non-delta staging area. A delta staging area,
734734
on the other hand, must contain at most one object with a given
735735
``entity_id``, and therefore only one version of that entity.
736-
736+
737737

738738
The ``.remove`` suffix is used to request the removal of an entity. It can
739739
only be used in staging areas that have the ``is_delta`` property set to
@@ -1134,7 +1134,7 @@ staging areas may contain updates is for backwards compatibility: The DCP
11341134
already utilized this functionality before this section of the specification was
11351135
written. |ne|
11361136

1137-
|nn| It may be tempting to reuse an existing staging area after it has been
1137+
|nn| It may be tempting to reuse an existing staging area after it has been
11381138
imported so as to avoid having to repopulate a completely new staging area for
11391139
the next import. For non-delta staging areas this can be a good strategy. For
11401140
delta staging areas it usually isn't because delta staging areas can only
@@ -1443,6 +1443,103 @@ row and finally soft-deleting any unmarked rows. |ne|
14431443

14441444

14451445

1446+
Ontologies
1447+
==========
1448+
1449+
The `HCA Metadata Schema`_ designates certain document properties as
1450+
ontologized. An *ontologized property* (OP) contains a JSON object referencing a
1451+
term in an ontology that is hosted externally, outside of the DCP/2. The shape
1452+
of that JSON object is specified by one of the `ontology modules`_ of the `HCA
1453+
Metadata Schema`_. All such modules specify at least the following three child
1454+
properties:
1455+
1456+
``ontology``
1457+
optional; the stable and unique identifier of an ontology term
1458+
1459+
``ontology_label``
1460+
optional; a human readable description of the term refered to by the
1461+
``ontology`` child property
1462+
1463+
``text``
1464+
required; a human readable description to fall back on should no term exist
1465+
1466+
.. _ontology modules: https://github.com/HumanCellAtlas/metadata-schema/tree/master/json_schema/module/ontology
1467+
1468+
1469+
Rules for producers
1470+
-------------------
1471+
1472+
When setting an OP in a metadata document, producers of metadata should
1473+
select the most specific ontology term currently available that best describes
1474+
the experimental facts and satisfies the requirements of the ontology module
1475+
governing the the OP.
1476+
1477+
A) If a sufficiently specific match is found, the producer
1478+
1479+
- sets the ``ontology`` child property of OP to the identifier of the
1480+
selected term and
1481+
1482+
- sets the ``ontology_label`` and ``text`` child properties to the label
1483+
of the selected term.
1484+
1485+
The label of an ontology term can change over time. The producer must keep
1486+
the ``ontology_label`` and ``text`` child properties up to date whenever the
1487+
document is updated. There is no requirement to update the document whenever
1488+
the label changes.
1489+
1490+
B) If no sufficiently specific term exists, but a more general one does, the
1491+
producer
1492+
1493+
- sets the ``ontology`` child property of OP to the identifier of the more
1494+
general term,
1495+
1496+
- sets the ``ontology_label`` child property to the label of that term and
1497+
1498+
- sets the ``text`` child property of the OP to what they expect the label
1499+
of a hypothetical exact match would be.
1500+
1501+
The producer initiates the process of adding that expected term to the
1502+
ontology. After that term has been added, the producer updates the
1503+
document as described under A).
1504+
1505+
C) Otherwise, the producer
1506+
1507+
- omits the ``ontology`` and ``ontology_label`` child properties of the OP
1508+
and
1509+
1510+
- sets the ``text`` child property of the OP to what they expect the
1511+
label of a hypothetical term would be if it existed.
1512+
1513+
The producer initiates the process of adding that expected term to the
1514+
ontology. After that term has been added, the producer updates the
1515+
document as described under A).
1516+
1517+
1518+
Rules for consumers
1519+
-------------------
1520+
1521+
When reading an ontologized property (OP) in a metadata document, consumers of
1522+
metadata should read the ``ontology`` child property of the OP, if that child
1523+
property is present. If a description of the term in English (or any other
1524+
language supported by the ontology) is needed, the consumer should look that
1525+
description up in the ontology API referred to by the module governing the OP,
1526+
using the term identifier in the ``ontology`` child property. If a lookup is not
1527+
possible for technical reasons, the producer should read the ``text`` child
1528+
property if present or the ``ontology_label`` otherwise. If both are absent, the
1529+
consumer should raise an error.
1530+
1531+
If the ``ontology`` child property is absent, the consumer instead reads the
1532+
``text`` child property of the OP.
1533+
1534+
|nn| Under the above rules, if an OP was set under scenario B, consumers will
1535+
ignore the hypothetical label. This leads to a more consistent user experience.
1536+
There is no guarantee that different wranglers come up with different
1537+
hypothetical terms and we don't want the UX to suffer in that case, considering
1538+
that there is at least a partial match available. If an OP was set using
1539+
scenario C, the hypothetical term label is the best we have. In both scenarios
1540+
the producer must update the document once the term becomes available, so the
1541+
degraded UX is only temporary. |ne|
1542+
14461543
Project-level matrices
14471544
======================
14481545

0 commit comments

Comments
 (0)