From effd1464c613a28386cb5542ed6fc566dff9db03 Mon Sep 17 00:00:00 2001 From: April Shen Date: Fri, 19 Jan 2024 12:10:00 +0000 Subject: [PATCH 1/4] manual curation SOP updates --- docs/manual-curation/step2-manual-curation.md | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/docs/manual-curation/step2-manual-curation.md b/docs/manual-curation/step2-manual-curation.md index ebbd7c7e..87454ef6 100644 --- a/docs/manual-curation/step2-manual-curation.md +++ b/docs/manual-curation/step2-manual-curation.md @@ -6,7 +6,7 @@ The goals of the manual curation: * _Suggested previous mapping_ traits should be checked for any terms that have become obsolete since the last iteration. These will be colored red and likely have a _suggested replacement mapping_ provided in the appropriate column. If no replacement is provided, curate as usual. * For the rest of the traits, we curate as many as possible. -Good mappings must be eyeballed to ensure they are actually good. Alternative mappings for medium or low quality mappings can be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. Most HP/ORDO/MONDO terms will also be in EFO but some are not. These can be imported to EFO using the Webulous submission service. +Good mappings must be eyeballed to ensure they are actually good. Alternative mappings for medium or low quality mappings can be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. Most HP/MONDO terms will also be in EFO but some are not. ## Criteria to manually evaluate mapping quality * Exact string for string matches are _good_ @@ -20,7 +20,10 @@ Good mappings must be eyeballed to ensure they are actually good. Alternative ma In general, complex traits with modifiers (e.g. "autosomal recessive", "early onset", or "history of") should not be mapped to the more general term (i.e. without modifiers) because it loses important information. For now the curator should follow the same protocol as for any other term and request to import/create a new term containing the necessary modifiers. ## Unmapped trait names -Trait names that haven't been automatically mapped against any ontology term can also be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. If these are not already in EFO they should be imported to EFO using the Webulous submission service. +Trait names that haven't been automatically mapped against any ontology term can also be searched for using OLS. +If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. +HP and MONDO terms can be imported into EFO if not present. +ORDO terms cannot be directly imported but can be used as the basis for new EFO terms. ## Curation workflow Curation should be done by subsequently applying filters to appropriate columns, then making decisions for the traits in the filtered selection. @@ -43,8 +46,13 @@ Curation should be done by subsequently applying filters to appropriate columns, * 3.1. Set the Status column to only include "blank" entries * 3.2. Search for suitable mappings using OLS - https://www.ebi.ac.uk/ols4/ +The curator can also leverage any additional mappings provided, which have the format `URL|LABEL|ZOOMA_QUALITY|ZOOMA_SOURCE|EFO_STATUS`. +* `ZOOMA_QUALITY` indicates the confidence returned by Zooma (high/medium/low), or "not specified" if the term originates from outside Zooma. +* `ZOOMA_SOURCE` indicates the datasource of the mapping in Zooma (e.g. EVA or ClinVar Xrefs, or a specific ontology), or can indicate the source is a previously-used or replacement mapping. +* `EFO_STATUS` indicates whether the term is current, obsolete, or not present in EFO. + ### Time-saving options -The new manual workflow can be shortened if necessary, while the quality of the results will be _at least as good as for the old workflow_ (because we're reusing the results of previous curations): +The manual workflow can be shortened if necessary, while the quality of the results will be _at least as good as for the old workflow_ (because we're reusing the results of previous curations): * Complete all Step 1 instances from the Curation workflow * All subsections of Step 2 - they involve review of mappings previously selected by ourselves. The only changes will be those where the previously mapped term has now become obsolete, however a new mapping can be found during step 2.1 @@ -60,11 +68,13 @@ Make sure **not** to use a mixed format, `URL|LABEL|ZOOMA_QUALITY|ZOOMA_SOURCE|| ### Marking the status of curated terms The “Status” column has the following acceptable values: * **DONE** — an acceptable trait contained in EFO has been found for the trait -* **IMPORT** — an acceptable trait has been found from the MONDO/ORDO/HP ontologies which is not contained in EFO and must be imported +* **IMPORT** — an acceptable trait has been found from the MONDO/HP ontologies which is not contained in EFO and must be imported * **NEW** — new term must be created in EFO * **SKIP** — trait is going to be skipped in this iteration, due to being too non-specific, or just having a low frequency * **UNSURE** — temporary status; traits to be discussed with reviewers/the team +Note that IMPORT and NEW terms are processed in Step 4, for now you should ignore the `Add EFO disease` spreadsheet and simply mark the status appropriately. + ### Comment field for curation review The "Comment" field can be used to enter arbitrary additional information which will be used by reviewers. Precede any text with initials e.g. "BK - example comment". Comments should be ordered chronologically in reverse: most recent ones at the top. Any comments will become available in the Notes field within the next iteration. From 92a3222ed3203f284817690bac544129b7eaa81d Mon Sep 17 00:00:00 2001 From: April Shen Date: Fri, 19 Jan 2024 12:52:51 +0000 Subject: [PATCH 2/4] updates to replacement term logic --- .../create_table_for_manual_curation.py | 5 ++++- cmat/trait_mapping/ols.py | 12 +++++++++++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/bin/trait_mapping/create_table_for_manual_curation.py b/bin/trait_mapping/create_table_for_manual_curation.py index 644c60ec..f06c9a14 100755 --- a/bin/trait_mapping/create_table_for_manual_curation.py +++ b/bin/trait_mapping/create_table_for_manual_curation.py @@ -21,12 +21,15 @@ def previous_and_replacement_mappings(trait_name, previous_mappings, ontology): yield trait_string, replacement_string -def find_replacement_mapping(previous_uri, ontology): +def find_replacement_mapping(previous_uri, ontology, max_depth=1): replacement_uri = get_replacement_term(previous_uri, ontology) if not replacement_uri: return '' label = get_ontology_label(replacement_uri) trait_status = get_trait_status(replacement_uri, ontology) + # If this term is also obsolete, try to find its replacement (at most max_depth times) + if 'OBSOLETE' in trait_status and replacement_uri.startswith('http') and max_depth > 0: + return find_replacement_mapping(replacement_uri, ontology, max_depth-1) trait_string = '|'.join([replacement_uri, label, 'NOT_SPECIFIED', 'replacement', trait_status]) return trait_string diff --git a/cmat/trait_mapping/ols.py b/cmat/trait_mapping/ols.py index f746867b..47a95175 100644 --- a/cmat/trait_mapping/ols.py +++ b/cmat/trait_mapping/ols.py @@ -1,4 +1,5 @@ import os +import re from functools import lru_cache import logging import requests @@ -6,6 +7,7 @@ from retry import retry +from cmat.trait_mapping.oxo import OntologyUri from cmat.trait_mapping.utils import json_request, ServerError OLS_SERVER = 'https://www.ebi.ac.uk/ols4' @@ -120,7 +122,15 @@ def get_replacement_term(uri: str, ontology: str = 'EFO') -> str: return "" response_json = response.json() if response_json["term_replaced_by"] is not None: - return response_json["term_replaced_by"] + replacement_uri = response_json["term_replaced_by"] + if not replacement_uri.startswith('http'): + try: + # Attempt to correct the most common weirdness found in this field - MONDO:0020783 or HP_0045074 + db, iden = re.split(':|_', replacement_uri) + replacement_uri = OntologyUri(iden, db.lower()).uri + except: + logger.warning(f'Could not normalise replacement term: {replacement_uri}') + return replacement_uri return "" From 005a1ce0887af06a8b0aa0bc138510c174d624fc Mon Sep 17 00:00:00 2001 From: April Shen Date: Fri, 19 Jan 2024 12:58:44 +0000 Subject: [PATCH 3/4] extract OntologyURI to fix circular imports --- cmat/trait_mapping/ols.py | 2 +- cmat/trait_mapping/ontology_uri.py | 20 ++++++++++++++++++++ cmat/trait_mapping/oxo.py | 22 +--------------------- 3 files changed, 22 insertions(+), 22 deletions(-) create mode 100644 cmat/trait_mapping/ontology_uri.py diff --git a/cmat/trait_mapping/ols.py b/cmat/trait_mapping/ols.py index 47a95175..6856e359 100644 --- a/cmat/trait_mapping/ols.py +++ b/cmat/trait_mapping/ols.py @@ -7,7 +7,7 @@ from retry import retry -from cmat.trait_mapping.oxo import OntologyUri +from cmat.trait_mapping.ontology_uri import OntologyUri from cmat.trait_mapping.utils import json_request, ServerError OLS_SERVER = 'https://www.ebi.ac.uk/ols4' diff --git a/cmat/trait_mapping/ontology_uri.py b/cmat/trait_mapping/ontology_uri.py new file mode 100644 index 00000000..7924dc06 --- /dev/null +++ b/cmat/trait_mapping/ontology_uri.py @@ -0,0 +1,20 @@ + +class OntologyUri: + db_to_uri_dict = { + "orphanet": "http://www.orpha.net/ORDO/Orphanet_{}", + "omim": "http://identifiers.org/omim/{}", + "efo": "http://www.ebi.ac.uk/efo/EFO_{}", + "mesh": "http://identifiers.org/mesh/{}", + "medgen": "http://identifiers.org/medgen/{}", + "hp": "http://purl.obolibrary.org/obo/HP_{}", + "doid": "http://purl.obolibrary.org/obo/DOID_{}", + "mondo": "http://purl.obolibrary.org/obo/MONDO_{}", + } + + def __init__(self, id_, db): + self.id_ = id_ + self.db = db + self.uri = self.db_to_uri_dict[self.db.lower()].format(self.id_) + + def __str__(self): + return self.uri diff --git a/cmat/trait_mapping/oxo.py b/cmat/trait_mapping/oxo.py index d2fa67d2..2d784897 100644 --- a/cmat/trait_mapping/oxo.py +++ b/cmat/trait_mapping/oxo.py @@ -5,33 +5,13 @@ from cmat.trait_mapping.ols import get_ontology_label_from_ols, is_in_ontology from cmat.trait_mapping.ols import is_current_and_in_ontology +from cmat.trait_mapping.ontology_uri import OntologyUri from cmat.trait_mapping.utils import json_request logger = logging.getLogger(__package__) -class OntologyUri: - db_to_uri_dict = { - "orphanet": "http://www.orpha.net/ORDO/Orphanet_{}", - "omim": "http://identifiers.org/omim/{}", - "efo": "http://www.ebi.ac.uk/efo/EFO_{}", - "mesh": "http://identifiers.org/mesh/{}", - "medgen": "http://identifiers.org/medgen/{}", - "hp": "http://purl.obolibrary.org/obo/HP_{}", - "doid": "http://purl.obolibrary.org/obo/DOID_{}", - "mondo": "http://purl.obolibrary.org/obo/MONDO_{}", - } - - def __init__(self, id_, db): - self.id_ = id_ - self.db = db - self.uri = self.db_to_uri_dict[self.db.lower()].format(self.id_) - - def __str__(self): - return self.uri - - @total_ordering class OxOMapping: """ From a551e609ebd88a0cfafce2b76ee5b6a56f3729b1 Mon Sep 17 00:00:00 2001 From: April Shen Date: Wed, 31 Jan 2024 09:28:31 +0000 Subject: [PATCH 4/4] Update docs/manual-curation/step2-manual-curation.md Co-authored-by: M-casado --- docs/manual-curation/step2-manual-curation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/manual-curation/step2-manual-curation.md b/docs/manual-curation/step2-manual-curation.md index 87454ef6..8bee3955 100644 --- a/docs/manual-curation/step2-manual-curation.md +++ b/docs/manual-curation/step2-manual-curation.md @@ -73,7 +73,7 @@ The “Status” column has the following acceptable values: * **SKIP** — trait is going to be skipped in this iteration, due to being too non-specific, or just having a low frequency * **UNSURE** — temporary status; traits to be discussed with reviewers/the team -Note that IMPORT and NEW terms are processed in Step 4, for now you should ignore the `Add EFO disease` spreadsheet and simply mark the status appropriately. +Note that IMPORT and NEW terms are processed in Step 4, for now you should ignore the `Add EFO disease` tab within the manual curation spreadsheet and simply mark the status appropriately. ### Comment field for curation review The "Comment" field can be used to enter arbitrary additional information which will be used by reviewers. Precede any text with initials e.g. "BK - example comment". Comments should be ordered chronologically in reverse: most recent ones at the top.