Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Genome Nexus sometimes annotates SNV as DNP [issue #32] #519

Closed
ao508 opened this issue Mar 18, 2021 · 8 comments
Closed

Genome Nexus sometimes annotates SNV as DNP [issue #32] #519

ao508 opened this issue Mar 18, 2021 · 8 comments
Labels
GENIE Genome Nexus help wanted Extra attention is needed

Comments

@ao508
Copy link
Collaborator

ao508 commented Mar 18, 2021

genome-nexus/annotation-tools#32

@ao508 ao508 added cmo-ops-scrum CMO Operations Scrum GENIE Genome Nexus help wanted Extra attention is needed labels Mar 18, 2021
@ao508 ao508 assigned inodb and unassigned inodb Mar 24, 2021
@sheridancbio sheridancbio removed the cmo-ops-scrum CMO Operations Scrum label Mar 24, 2021
@averyniceday
Copy link
Collaborator

averyniceday commented Mar 25, 2021

I will take notes in this card for now. These notes can be moved to a google doc or something after the problem is fixed.

Questions to Answer

  • Which Genome Nexus does Tom point to (and which version of server code does it run?)

&nbspShould be genie genome nexus which runs this image genomenexus/gn-spring-boot:v1.1.21-ignore-myvariantinfo-clinvar. For reference looks like server code is up to** 1.1.24. This is another avenue to look into (migrate hack to newer version and see if this is fixed). Best test for this is to annotate our own genome-nexus and see if annotation is correct.

Talked with Rob about this and it looks like that code fixing the clinvar bug was merged into master, so might be good to switch to 1.1.24 regardless.

Note for retrospective or discussion: Let's deploy version changes all at once. Right now have to consider how version changes .21 -> .24 might affect the output. Would be nice not have to worry about it.

  • Is the problem after running the whole wrapper or just at a specific step?

&nbspAccording to Tom, this problem of mis-annotating an SNV as DNP/ONP occurs in two places, both after the pre-processing (vcf2maf?) and after Genome Nexus annotation (maybe server code?). If this is the case, will focus on fixing the Genome Nexus server code first... and revisit vcf2maf (python equivalent) if necessary.

  • For the annotator, what are the values in application.properties that Tom uses? Is this pre-built and provided?

Tom's application.properties file is added below. We might want to check this in (or maybe it already is but I wasn't sure where to find it)

  • Is this problem specific to server, annotator, vcf2maf?

Seems specific to server, switching genome nexus servers changes results from DNP to SNPs.

  • How is the request being sent in to Genome Nexus

  • If this is specific to GENIE genome nexus because we're on a different version, is this something we want to propagate.

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 25, 2021

Setting Up a Genome Nexus Server
Relatively easy following steps on the genome-nexus server page. Biggest challenge here is making sure you deploy the relevant server code (e.g hack for genie genome nexus vs master)

Setting Up Annotation Wrapper
We need to build an annotator jar for this, need a copy of the application.properties file that Tom uses. (Or what the default one is)

Testing Setup
For now I am building a genome nexus server image locally and pushing into Docker. Then I replace the image inside the current production genie genome nexus and run tests. If we want to do this locally we'll need to build a VEP cache which takes a couple hours. Seeing that Tom is the only one using GENIE genome nexus and it doesn't work... think this is okay for now (discussed with Rob)

I HIGHLY recommend altering the dockerfile to just copy the pre-built war file instead of rebuilding as part of the Dockerfile

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 25, 2021

Tom's property file

spring.batch.job.enabled=false
spring.jmx.enabled=false
chunk=100
#genomenexus.enrichment_fields=annotation_summary,my_variant_info
genomenexus.enrichment_fields=annotation_summary,sift,polyphen,my_variant_info
genomenexus.isoform_query_parameter=isoformOverrideSource
genomenexus.base=https://genie.genomenexus.org/

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 25, 2021

Beginning to think that this will be fixed with latest genome nexus server code.

jk no dice

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 25, 2021

Confirmed that there's a difference in behavior in VariantTypeResolver (see here) based on whether we are running a local VEP-backed vs ENSEMBL VEP-backed genome nexus

Beginning to trace how incoming requests to Genome Nexus are converted...

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 26, 2021

Need to change this function if we want to fix it:
NotationConverter.genomicToEnsemblRestRegion

jk false alarm

@averyniceday
Copy link
Collaborator

averyniceday commented Mar 26, 2021

Example query (based on the codebase these should be equivalent, this happens around here:

HGVSp: 11:g.69514130_69514131delinsTT
http://grch37.rest.ensembl.org/vep/human/hgvs/11:g.69514130_69514131delinsTT?

Region: 11:69514130-69514131:1/TT
http://grch37.rest.ensembl.org/vep/human/region/11:69514130-69514131:1/TT?

Look at the allele_string field of the two responses. One is C/T versus TC/TT. That gets fed into the resolvers listed below.

Resolvers

VariantTypeResolver.resolveVariantType() here

This uses ref/var allele length to decide (does not look at prefix)
For HGVS, output would be: Ref C, Var T
For Region, output would be: Ref TC, Var TT

GenomicLocationResolver.resolveReferenceAllele()/resolveVariantAllele() here
This does a simple split of allele_string to determine ref/var alleles (which are then used by VariantTypeResolver). Also does not check for prefix.
For HGVS: looks at change C -> T, sees str length 1 and assigns SNP
For Region: looks at change TC -> TT, sees str length 2 and assigns DNP

Not sure if we want to add something similar to the processing in the NotationConverter here

@averyniceday
Copy link
Collaborator

this is the command I'm running to test:

sh annotation_suite_wrapper.sh -i=/<placeholder>/annotation-tools/tmp/ -o=/<placeholder>/annotation-tools/outputmp/ -m=/<placeholder>/annotation-tools/outputmp/randomfile.txt -c=randostr -s=anotherrandostring -p=/<placeholder>/annotation-tools/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GENIE Genome Nexus help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants