Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 402: Manual curation fixes from October #412

Merged
merged 4 commits into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion bin/trait_mapping/create_table_for_manual_curation.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,15 @@ def previous_and_replacement_mappings(trait_name, previous_mappings, ontology):
yield trait_string, replacement_string


def find_replacement_mapping(previous_uri, ontology):
def find_replacement_mapping(previous_uri, ontology, max_depth=1):
replacement_uri = get_replacement_term(previous_uri, ontology)
if not replacement_uri:
return ''
label = get_ontology_label(replacement_uri)
trait_status = get_trait_status(replacement_uri, ontology)
# If this term is also obsolete, try to find its replacement (at most max_depth times)
if 'OBSOLETE' in trait_status and replacement_uri.startswith('http') and max_depth > 0:
return find_replacement_mapping(replacement_uri, ontology, max_depth-1)
trait_string = '|'.join([replacement_uri, label, 'NOT_SPECIFIED', 'replacement', trait_status])
return trait_string

Expand Down
12 changes: 11 additions & 1 deletion cmat/trait_mapping/ols.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
import os
import re
from functools import lru_cache
import logging
import requests
import urllib

from retry import retry

from cmat.trait_mapping.ontology_uri import OntologyUri
from cmat.trait_mapping.utils import json_request, ServerError

OLS_SERVER = 'https://www.ebi.ac.uk/ols4'
Expand Down Expand Up @@ -120,7 +122,15 @@ def get_replacement_term(uri: str, ontology: str = 'EFO') -> str:
return ""
response_json = response.json()
if response_json["term_replaced_by"] is not None:
return response_json["term_replaced_by"]
replacement_uri = response_json["term_replaced_by"]
if not replacement_uri.startswith('http'):
try:
# Attempt to correct the most common weirdness found in this field - MONDO:0020783 or HP_0045074
db, iden = re.split(':|_', replacement_uri)
replacement_uri = OntologyUri(iden, db.lower()).uri
except:
logger.warning(f'Could not normalise replacement term: {replacement_uri}')
return replacement_uri
return ""


Expand Down
20 changes: 20 additions & 0 deletions cmat/trait_mapping/ontology_uri.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@

class OntologyUri:
db_to_uri_dict = {
"orphanet": "http://www.orpha.net/ORDO/Orphanet_{}",
"omim": "http://identifiers.org/omim/{}",
"efo": "http://www.ebi.ac.uk/efo/EFO_{}",
"mesh": "http://identifiers.org/mesh/{}",
"medgen": "http://identifiers.org/medgen/{}",
"hp": "http://purl.obolibrary.org/obo/HP_{}",
"doid": "http://purl.obolibrary.org/obo/DOID_{}",
"mondo": "http://purl.obolibrary.org/obo/MONDO_{}",
}

def __init__(self, id_, db):
self.id_ = id_
self.db = db
self.uri = self.db_to_uri_dict[self.db.lower()].format(self.id_)

def __str__(self):
return self.uri
22 changes: 1 addition & 21 deletions cmat/trait_mapping/oxo.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,33 +5,13 @@

from cmat.trait_mapping.ols import get_ontology_label_from_ols, is_in_ontology
from cmat.trait_mapping.ols import is_current_and_in_ontology
from cmat.trait_mapping.ontology_uri import OntologyUri
from cmat.trait_mapping.utils import json_request


logger = logging.getLogger(__package__)


class OntologyUri:
db_to_uri_dict = {
"orphanet": "http://www.orpha.net/ORDO/Orphanet_{}",
"omim": "http://identifiers.org/omim/{}",
"efo": "http://www.ebi.ac.uk/efo/EFO_{}",
"mesh": "http://identifiers.org/mesh/{}",
"medgen": "http://identifiers.org/medgen/{}",
"hp": "http://purl.obolibrary.org/obo/HP_{}",
"doid": "http://purl.obolibrary.org/obo/DOID_{}",
"mondo": "http://purl.obolibrary.org/obo/MONDO_{}",
}

def __init__(self, id_, db):
self.id_ = id_
self.db = db
self.uri = self.db_to_uri_dict[self.db.lower()].format(self.id_)

def __str__(self):
return self.uri


@total_ordering
class OxOMapping:
"""
Expand Down
18 changes: 14 additions & 4 deletions docs/manual-curation/step2-manual-curation.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The goals of the manual curation:
* _Suggested previous mapping_ traits should be checked for any terms that have become obsolete since the last iteration. These will be colored red and likely have a _suggested replacement mapping_ provided in the appropriate column. If no replacement is provided, curate as usual.
* For the rest of the traits, we curate as many as possible.

Good mappings must be eyeballed to ensure they are actually good. Alternative mappings for medium or low quality mappings can be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. Most HP/ORDO/MONDO terms will also be in EFO but some are not. These can be imported to EFO using the Webulous submission service.
Good mappings must be eyeballed to ensure they are actually good. Alternative mappings for medium or low quality mappings can be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. Most HP/MONDO terms will also be in EFO but some are not.

## Criteria to manually evaluate mapping quality
* Exact string for string matches are _good_
Expand All @@ -20,7 +20,10 @@ Good mappings must be eyeballed to ensure they are actually good. Alternative ma
In general, complex traits with modifiers (e.g. "autosomal recessive", "early onset", or "history of") should not be mapped to the more general term (i.e. without modifiers) because it loses important information. For now the curator should follow the same protocol as for any other term and request to import/create a new term containing the necessary modifiers.

## Unmapped trait names
Trait names that haven't been automatically mapped against any ontology term can also be searched for using OLS. If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name. If these are not already in EFO they should be imported to EFO using the Webulous submission service.
Trait names that haven't been automatically mapped against any ontology term can also be searched for using OLS.
If a mapping can't be found in EFO, look for a mapping to a HP, ORDO, or MONDO trait name.
HP and MONDO terms can be imported into EFO if not present.
ORDO terms cannot be directly imported but can be used as the basis for new EFO terms.

## Curation workflow
Curation should be done by subsequently applying filters to appropriate columns, then making decisions for the traits in the filtered selection.
Expand All @@ -43,8 +46,13 @@ Curation should be done by subsequently applying filters to appropriate columns,
* 3.1. Set the Status column to only include "blank" entries
* 3.2. Search for suitable mappings using OLS - https://www.ebi.ac.uk/ols4/

The curator can also leverage any additional mappings provided, which have the format `URL|LABEL|ZOOMA_QUALITY|ZOOMA_SOURCE|EFO_STATUS`.
* `ZOOMA_QUALITY` indicates the confidence returned by Zooma (high/medium/low), or "not specified" if the term originates from outside Zooma.
* `ZOOMA_SOURCE` indicates the datasource of the mapping in Zooma (e.g. EVA or ClinVar Xrefs, or a specific ontology), or can indicate the source is a previously-used or replacement mapping.
* `EFO_STATUS` indicates whether the term is current, obsolete, or not present in EFO.

### Time-saving options
The new manual workflow can be shortened if necessary, while the quality of the results will be _at least as good as for the old workflow_ (because we're reusing the results of previous curations):
The manual workflow can be shortened if necessary, while the quality of the results will be _at least as good as for the old workflow_ (because we're reusing the results of previous curations):
* Complete all Step 1 instances from the Curation workflow
* All subsections of Step 2 - they involve review of mappings previously selected by ourselves. The only changes will be those where the previously mapped term has now become obsolete, however a new mapping can be found during step 2.1

Expand All @@ -60,11 +68,13 @@ Make sure **not** to use a mixed format, `URL|LABEL|ZOOMA_QUALITY|ZOOMA_SOURCE||
### Marking the status of curated terms
The “Status” column has the following acceptable values:
* **DONE** — an acceptable trait contained in EFO has been found for the trait
* **IMPORT** — an acceptable trait has been found from the MONDO/ORDO/HP ontologies which is not contained in EFO and must be imported
* **IMPORT** — an acceptable trait has been found from the MONDO/HP ontologies which is not contained in EFO and must be imported
* **NEW** — new term must be created in EFO
* **SKIP** — trait is going to be skipped in this iteration, due to being too non-specific, or just having a low frequency
* **UNSURE** — temporary status; traits to be discussed with reviewers/the team

Note that IMPORT and NEW terms are processed in Step 4, for now you should ignore the `Add EFO disease` tab within the manual curation spreadsheet and simply mark the status appropriately.

### Comment field for curation review
The "Comment" field can be used to enter arbitrary additional information which will be used by reviewers. Precede any text with initials e.g. "BK - example comment". Comments should be ordered chronologically in reverse: most recent ones at the top.
Any comments will become available in the Notes field within the next iteration.
Expand Down
Loading