This was an experiment: we learned some things, but development will not continue in this repo.
This is a test bed for demonstrating how the HCA Metadata Standard and W3C PROV will be used in HuBMAP.
Eventually, this will be renamed to hubmap-metadata
, and it will become a tool that could be used
either in development by Django management scripts, or in production by some API.
- It will take as input Metadata CSVs and a workflow name,
- and produce as intermediate output templated HCA-validated JSON describing entities and RDF relating the entities,
- and as final output flattened JSON ready for Elasticsearch ingest.
In production, I think the two templating steps will actually be separated in time, with the intermediate output JSON being stored in Neo4J. That said, the templating mechanism is similar in both phases, so this continuous flow may be easier to conceptualize.
HuBMAP uses the HCA Metadata Standard's five entity types for its own metadata, but the HCA Standard does not provide a sufficiently flexible way of describing provenance. For that, we are using W3C PROV.
That said, the domain-specific symbology of HCA is easier to understand at a glance, so we will use it in the workflow examples. Here is a demonstration of how the two symbologies correspond:
(All diagrams in this repo are editable with draw.io, either on the web or with their desktop app.)
- We are not using the HCA's
Project
entity type. - We are using only a fraction of the PROV vocabulary; In particular, for now, we are not using
Agent
. - The two systems have different arrow direction conventions.
- In PROV,
used
relates both the Protocol and Tissue Section to the Process. Roles are distinguished byqualifiedUsage
.