Skip to content

Commit

Permalink
Merge pull request #697 from monarch-initiative/sync-synonym2
Browse files Browse the repository at this point in the history
Synonym sync without data build
  • Loading branch information
twhetzel authored Nov 20, 2024
2 parents eceb06a + a61e368 commit ed9541e
Show file tree
Hide file tree
Showing 42 changed files with 373,478 additions and 43 deletions.
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ semantic.cache/
*.tmp.owl
*.tmp.json
*.db
!/tests/input/**/*.db
.vscode/*

# - static
Expand Down Expand Up @@ -104,7 +105,8 @@ src/scripts/.ipynb_checkpoints/*
src/scripts/mondo_unmapped.tsv

# Test
tests/output/
tests/output/*
!tests/output/.gitkeep
src/scripts/dataframes/*
src/ontology/reports/gard.subclass.added-obsolete.robot.tsv
src/ontology/reports/gard.subclass.added.robot.tsv
Expand Down
12 changes: 12 additions & 0 deletions docs/developer/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ These workflows will help with excluding certain terms from integration into Mon
## Synchronization
These workflows help synchronize Mondo with source ontologies.

### Sub-class of
#### Makefile goals
1. `generate-synchronization-files`: Runs synchronization pipeline.
2. `sync-subclassof`: Runs 'sync-subclassof' part of synchronization pipeline, generating set of outputs for all ontologies.
Expand All @@ -80,3 +81,14 @@ These workflows help synchronize Mondo with source ontologies.
7. `reports/%.subclass.direct-in-mondo-only.tsv`: Path to create file for relations for given ontology where direct subclass relation exists only in Mondo and not in the source. Running this also runs / generates `reports/%.subclass.added.robot.tsv`, `reports/%.subclass.added-obsolete.robot.tsv`, and `reports/%.subclass.confirmed.robot.tsv`.
8. `reports/sync-subClassOf.direct-in-mondo-only.tsv`: For all subclass relationships in Mondo, shows which sources do not have it and whether no source has it. Combination of all `--outpath-direct-in-mondo-only` outputs for all sources, using those as inputs, and then deletes them after.
9. `reports/sync-subClassOf.confirmed.tsv`: For all subclass relationships in Mondo, by source, a robot template containing showing what is in Mondo and are confirmed to also exist in the source. Combination of all `--outpath-confirmed` outputs for all sources.

### Synonyms
#### Makefile goals
1. `sync-synonyms`: Runs 'sync-synonyms' part of synchronization pipeline, creating outputs for all sources for each of the 4 cases - 'added', 'confirmed', 'updated', and 'deleted'.
2. `reports/%.subclass.added.robot.tsv`: ROBOT template TSV to create which will contain synonyms that aren't yet integrated into Mondo for all mapped source terms.
3. `reports/%.subclass.confirmed.robot.tsv`: ROBOT template TSV to create which will contain synonym confirmations; combination of synonym scope predicate and synonym string exists in both source and Mondo for a given mapping.
4. `reports/%.subclass.deleted.robot.tsv`: ROBOT template TSV to create which will contain synonym deletions; exists in Mondo but not in source(s) for a given mapping.
5. `reports/%.subclass.updated.robot.tsv`: ROBOT template TSV to create which will contain updates to synonym scope predicate; cases where the synonym exists in Mondo and on the mapped source term, but the scope predicate is different.
6. `reports/sync-synonyms.added.tsv`: Combination of all 'added' synonym outputs for all sources.
7. `reports/sync-synonyms.confirmed.tsv`: Combination of all 'confirmed' synonym outputs for all sources.
8. `reports/sync-synonyms.updated.tsv`: Combination of all 'updated' synonym outputs for all sources.
Original file line number Diff line number Diff line change
Expand Up @@ -2400,3 +2400,12 @@ exclusions:
scope: "oio:hasExactSynonym"
value: "India rubber skin"
reason: "MONDO:CuratorDecision"
# synonym_type_abbreviation: List of synonym strings that would otherwise be marked as abbreviations by our detection logic. This exclusion rule will cause them not to be marked as such.
synonym-type-abbreviation:
- PYOMETRA
- EMPHYSEMA
- FRACTURE
- INFECTION
- HAMARTOMA
- SUBCUTIS
- ULCER
1 change: 1 addition & 0 deletions src/ontology/config/properties.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ http://www.geneontology.org/formats/oboInOwl#hasSynonymType
http://www.geneontology.org/formats/oboInOwl#SynonymTypeProperty
http://purl.obolibrary.org/obo/mondo#GENERATED
http://purl.obolibrary.org/obo/mondo#omim_included
http://purl.obolibrary.org/obo/mondo#omim_formerly
http://www.w3.org/2004/02/skos/core#broadMatch
http://www.w3.org/2004/02/skos/core#narrowMatch
http://www.w3.org/2004/02/skos/core#relatedMatch
Expand Down
77 changes: 70 additions & 7 deletions src/ontology/mondo-ingest.Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -369,22 +369,21 @@ deploy-mondo-ingest:
gh release create $(GHVERSION) --notes "TBD." --title "$(GHVERSION)" --draft $(DEPLOY_ASSETS_MONDO_INGEST)


# make function, not target!
# Builds tmp/mondo/ and rebuilds mondo.owl and mondo.sssom.tsv, and stores hash of latest commit of mondo repo main branch in tmp/mondo_repo_built

# make function, not target!
# Builds tmp/mondo/ and rebuilds mondo.owl, mondo-edit.owl and mondo.sssom.tsv, and stores hash of latest commit of mondo repo main branch in tmp/mondo_repo_built
define build_mondo
cd $(TMPDIR) && \
rm -rf ./mondo/ && \
git clone --depth 1 https://github.com/monarch-initiative/mondo && \
cd mondo/src/ontology && \
make mondo.owl mappings -B MIR=false IMP=false MIR=false \
make mondo.owl mappings -B MIR=false IMP=false MIR=false &&\
latest_hash=$$(git rev-parse origin/master) && \
cd ../../../.. && \
echo "$$latest_hash" > $(1)
endef

# Triggers a refresh of tmp/mondo/ and a rebuild of mondo.owl, mondo-edit.owl, and mondo.sssom.tsv, only if mondo repo main branch has new commits, or if has never been run before
tmp/mondo_repo_built: .FORCE
@if [ ! -f $@ ]; then \
if [ ! -f $@ ]; then \
$(call build_mondo, $@); \
else \
current_hash=$$(cat $@); \
Expand Down Expand Up @@ -553,8 +552,9 @@ slurp-modifications-ordo: slurp/ordo.tsv tmp/ordo-subsets.tsv
###### Synchronization ######
#############################
.PHONY: sync
sync: sync-subclassof
sync: sync-subclassof sync-synonyms

# Synchronization: SubclassOf
.PHONY: sync-subclassof
sync-subclassof: $(REPORTDIR)/sync-subClassOf.confirmed.tsv $(REPORTDIR)/sync-subClassOf.direct-in-mondo-only.tsv $(TMPDIR)/sync-subClassOf.added.self-parentage.tsv

Expand Down Expand Up @@ -585,6 +585,52 @@ $(REPORTDIR)/%.subclass.confirmed.robot.tsv $(REPORTDIR)/%.subclass.added.robot.
--mondo-mappings-path $(TMPDIR)/mondo.sssom.tsv \
--onto-config-path metadata/$*.yml

# Synchronization: Synonyms
SYN_SYNC_DIR=$(REPORTDIR)/sync-synonym
$(SYN_SYNC_DIR):
mkdir -p $@

.PHONY: sync-synonyms
sync-synonyms: $(SYN_SYNC_DIR)/synonym_sync_combined_cases.robot.tsv $(SYN_SYNC_DIR)/sync-synonyms.added.robot.tsv $(SYN_SYNC_DIR)/sync-synonyms.confirmed.robot.tsv $(SYN_SYNC_DIR)/sync-synonyms.updated.robot.tsv

tmp/mondo-synonyms-scope-type-xref.tsv: $(TMPDIR)/mondo.owl
$(ROBOT) query -i tmp/mondo.owl --query ../sparql/synonyms-scope-type-xref.sparql $@

tmp/%-synonyms-scope-type-xref.tsv: $(COMPONENTSDIR)/%.owl
$(ROBOT) query -i $(COMPONENTSDIR)/$*.owl --query ../sparql/synonyms-scope-type-xref.sparql $@

../../tests/input/sync_synonym/%-synonyms-scope-type-xref.tsv:
$(ROBOT) query -i ../../tests/input/sync_synonym/test_$*.owl --query ../sparql/synonyms-scope-type-xref.sparql $@

# todo: we may remove this output later output for analysis during development; at the end, remove it and its usages
INPUT_FILES := $(wildcard tmp/synonym_sync_combined_cases_*.tsv)
$(SYN_SYNC_DIR)/synonym_sync_combined_cases.robot.tsv: $(foreach n,$(ALL_COMPONENT_IDS), $(SYN_SYNC_DIR)/$(n)-synonyms.added.robot.tsv)
@head -n 2 $(firstword $(INPUT_FILES)) > $@
@for file in $(INPUT_FILES); do \
tail -n +3 $$file >> $@; \
done

$(SYN_SYNC_DIR)/sync-synonyms.added.robot.tsv: $(foreach n,$(ALL_COMPONENT_IDS), $(SYN_SYNC_DIR)/$(n)-synonyms.added.robot.tsv)
awk '(NR == 1) || (NR == 2) || (FNR > 2)' $(SYN_SYNC_DIR)/*.synonyms.added.robot.tsv > $@

$(SYN_SYNC_DIR)/sync-synonyms.confirmed.robot.tsv: $(foreach n,$(ALL_COMPONENT_IDS), $(SYN_SYNC_DIR)/$(n)-synonyms.confirmed.robot.tsv)
awk '(NR == 1) || (NR == 2) || (FNR > 2)' $(SYN_SYNC_DIR)/*.synonyms.confirmed.robot.tsv > $@

$(SYN_SYNC_DIR)/sync-synonyms.updated.robot.tsv: $(foreach n,$(ALL_COMPONENT_IDS), $(SYN_SYNC_DIR)/$(n)-synonyms.updated.robot.tsv)
awk '(NR == 1) || (NR == 2) || (FNR > 2)' $(SYN_SYNC_DIR)/*.synonyms.updated.robot.tsv > $@

$(SYN_SYNC_DIR)/%-synonyms.added.robot.tsv $(SYN_SYNC_DIR)/%-synonyms.confirmed.robot.tsv $(SYN_SYNC_DIR)/%-synonyms.updated.robot.tsv: $(TMPDIR)/mondo.sssom.tsv $(COMPONENTSDIR)/%.db metadata/%.yml tmp/mondo-synonyms-scope-type-xref.tsv tmp/%-synonyms-scope-type-xref.tsv | $(SYN_SYNC_DIR)
python3 $(SCRIPTSDIR)/sync_synonym.py \
--mondo-mappings-path $(TMPDIR)/mondo.sssom.tsv \
--ontology-db-path $(COMPONENTSDIR)/$*.db \
--mondo-synonyms-path tmp/mondo-synonyms-scope-type-xref.tsv \
--mondo-exclusion-configs config/mondo-exclusion-configs.yml \
--onto-synonym-types-path tmp/$*-synonyms-scope-type-xref.tsv \
--onto-config-path metadata/$*.yml \
--outpath-added $(SYN_SYNC_DIR)/$*.synonyms.added.robot.tsv \
--outpath-confirmed $(SYN_SYNC_DIR)/$*.synonyms.confirmed.robot.tsv \
--outpath-updated $(SYN_SYNC_DIR)/$*.synonyms.updated.robot.tsv

##################################
## Externally managed content ####
##################################
Expand Down Expand Up @@ -791,6 +837,23 @@ help:
@echo "For all subclass relationships in Mondo, shows which sources do not have it and whether no source has it. Combination of all --outpath-direct-in-mondo-only outputs for all sources, using those as inputs, and then deletes them after.\n"
@echo "reports/sync-subClassOf.confirmed.tsv"
@echo "For all subclass relationships in Mondo, by source, a robot template containing showing what is in Mondo and are confirmed to also exist in the source. Combination of all --outpath-confirmed outputs for all sources.\n"
# - Synchronization: synonyms
@echo "sync-synonyms"
@echo "Runs 'sync-synonyms' part of synchronization pipeline, creating outputs for all sources for each of the 4 cases - 'added', 'confirmed', 'updated', and 'deleted'.\n"
@echo "reports/%.subclass.added.robot.tsv"
@echo "ROBOT template TSV to create which will contain synonyms that aren't yet integrated into Mondo for all mapped source terms.\n"
@echo "reports/%.subclass.confirmed.robot.tsv"
@echo "ROBOT template TSV to create which will contain synonym confirmations; combination of synonym scope predicate and synonym string exists in both source and Mondo for a given mapping.\n"
@echo "reports/%.subclass.deleted.robot.tsv"
@echo "ROBOT template TSV to create which will contain synonym deletions; exists in Mondo but not in source(s) for a given mapping.\n"
@echo "reports/%.subclass.updated.robot.tsv"
@echo "ROBOT template TSV to create which will contain updates to synonym scope predicate; cases where the synonym exists in Mondo and on the mapped source term, but the scope predicate is different.\n"
@echo "reports/sync-synonyms.added.tsv"
@echo "Combination of all 'added' synonym outputs for all sources.\n"
@echo "reports/sync-synonyms.confirmed.tsv"
@echo "Combination of all 'confirmed' synonym outputs for all sources.\n"
@echo "reports/sync-synonyms.updated.tsv"
@echo "Combination of all 'updated' synonym outputs for all sources.\n"
# - Refresh externally managed content
@echo "update-externally-managed-content"
@echo "Downloads and processes all externally managed content like cross references, subsets and labels, including NORD and GARD.\n"
Loading

0 comments on commit ed9541e

Please sign in to comment.