Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual curation for 2024.03 release #410

Closed
7 tasks done
apriltuesday opened this issue Jan 15, 2024 · 7 comments
Closed
7 tasks done

Manual curation for 2024.03 release #410

apriltuesday opened this issue Jan 15, 2024 · 7 comments

Comments

@apriltuesday
Copy link
Contributor

apriltuesday commented Jan 15, 2024

Refer to documentation for full description of steps.

Checklist:

  • Step 1 — Process
  • Step 2 — Curate
    • Curation
    • Review 1
    • Review 2
  • Step 3 — Export
  • Step 4 — EFO feedback
@apriltuesday
Copy link
Contributor Author

Hello @tcezard @M-casado, here is the curation spreadsheet for this round. Some notes on this one:

  • I've implemented some changes mentioned in the last curation round in PR Issue 402: Manual curation fixes from October #412, mostly documentation and improvements to how replacement terms are fetched. If you could have a look particularly at the SOP and leave your feedback, this would be much appreciated 🙏
  • There are quite a number of obsoleted terms this round, I think in large part due to MONDO making some large-scale obsoletion efforts. I don't think any are due to the code changes mentioned above, but as always if there's something suspicious let me know.

@tcezard
Copy link
Member

tcezard commented Jan 23, 2024

Ready for review.
112 DONE
227 IMPORT
1 NEW
4 UNSURE

Couple of extra points:

  1. A few lines have been duplicated which mean the total count is actually less than this
  2. I focused on the OBSOLETE terms so there aren't that many new curations.
  3. The UNSURE are terms where the Clinvar label point to a group of multiple specific terms (grouped together) but not quite the more generic terms For example in "stickler syndrome, dominant" the "dominant" aspects englobes 3 possible stickler syndromes types (1,2 IIa6) but not the other types (4, 5). We could use the "stickler syndrome" term which would be correct but is not quite specific enough.
  4. I quickly followed a pattern in my curation and wanted to post the "algorithm"
  • Search for the term in EFO looking for a perfect or close match
  • Search for the term in MONDO looking for a perfect or close match
  • Search for the term in HP looking for a perfect or close match
  • Search for the term in MEDGEN looking for a synonym or a definition that I could use to search the first EFO/MONDO/HP

I think we could have these search precomputed before the spreadsheet is made. That would make the manual curation much faster.

@tcezard
Copy link
Member

tcezard commented Jan 24, 2024

@apriltuesday pointed me to the documentation on multiple mappings so I change the UNSURE curations by duplicating the rows and add terms that cover different portion of the ClinVar description.
In some case that create a mixture of IMPORT and DONE terms which might be confusing

@apriltuesday
Copy link
Contributor Author

apriltuesday commented Jan 24, 2024

café-au-lait macules with pulmonary stenosis is confusing, but based on that Medgen page I changed it to Watson syndrome... let me know if you disagree.

For the others I'm not really sure whether we should recreate in EFO the intermediate grouping that Mondo obsoleted (so make these NEW terms), or map to the multiple terms as you've done... Mondo seems to have a lot of exclusion rules that I don't know apply to us (or EFO, or Open Targets...). But there might be something problematic about the intermediate grouping that I'm not seeing.... Maybe should wait for @M-casado's input.

@M-casado
Copy link
Collaborator

M-casado commented Jan 30, 2024

UNSURE ClinVar labels

  • stickler syndrome, dominant - I read your comment @tcezard regarding the duplicated rows to cover types 1, 2 and IIa6. Nevertheless, I also found that there are other types of the disease that are dominant (see highlighted text). Some of these are present already in Mondo and EFO, some only in EFO, and some in other ontologies.
    • I wonder if we would not be incurring in a map from a "parental term" (i.e. "dominant disease") to a "subtype term" (types 1, 2 and IIa6) if we were not to add the other dominant ones to the ontologies. I noticed most but not all are within the parent term Stickler syndrome (MONDO:0019354)
    • Also, I found other types imported into EFO, but from other sources, like Orphanet (Orphanet:166100). The hierarchies are clashing and thus don't fully match, so I wonder what's the best approach to import the other types, were we to.
    • We should review the mappings you duplicated, since I noticed some should be EFO_CURRENT, but are not (e.g. http://purl.obolibrary.org/obo/MONDO_0011493|Stickler syndrome type 2|||)
  • antenatal bartter syndrome - A similar case, where a parental term was made obsolete and we are mapping to subtypes.
    • Regarding which subtypes of the disease correspond to the onset of antenatal, @tcezard mentioned it was 1 and 2, but I also found that Type 5 (ORDO:570371) is defined as antenatal in ORDO.
    • Furthermore, I got even more confused when I also found that types 1, 2 and 4 (but not 5) are defined as the antenatal in MedinePlus.
    • This time I noticed the opposite of the previous EFO_CURRENT issue: MONDO:0009424 is not in EFO, but has EFO_CURRENT in its mapping.
  • autosomal recessive stickler syndrome - Continuation of the first UNSURE, but with a different type of inheritance.
    • At this point I'm not even sure if the types are related to the mode of inheritance, given that the previous source I cited mentions type 3 as dominant, but I can also find cases like this one, where it's autosomal recessive.
    • I'm also worried that we are taken for granted that "those that are not autosomal dominant must be autosomal recessive" and it may not be the case. Either that, or I'm missing a source that explicitly says which ones are autosomal recessive.

We could use the "stickler syndrome" term which would be correct but is not quite specific enough

I think that, based on our rubric, we either: (1) import/create a new parent term that distinguishes the inheritance mode; (2) import/create all subtypes to map them extensively; (3) recur to mapping a subtype to a parental type. My experience tells me we ought to do the last one. This is my fear regarding mappings from parent types to subtypes: unless we are extensive on the duplicated mappings, we would be incurring in a wrong and skimmed association.

I also changed severe myoclonic epilepsy in infancy to UNSURE, given that the associated term (MONDO_0014960) to be IMPORTED was not related for what I could find. I added a comment regarding which EFO term we could map it to, although it's a parental type, and has a note regarding possible obsoletion.

@tcezard
Copy link
Member

tcezard commented Jan 31, 2024

All done for resolving the UNSURE: Thank you @M-casado @apriltuesday
I've made a copy of the spreadsheet that we can use during upcoming KT session without risking modifying the one we use for submission.

@apriltuesday
Copy link
Contributor Author

Thanks all, export done and EFO issue created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants