-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genome Nexus sometimes annotates SNV as DNP #32
Comments
At first glance, by looking at the |
It looks like something is going on related to the reference alleles. They are changing between the input.txt file and the input.txt.temp.annotated.txt file. The variant type annotation seems to be based on input.txt reference allele (old reference) instead of the input.txt.temp.annotated.txt reference allele (new reference). |
@thomasyu888 could I get some additional details on this issue? Are there specific records inside the uploaded files that I can use as an example to trace the problem? |
Thanks for looking into this @averyniceday . If you download As stated in this comment, there seems to be a couple things going on:
Due to these two steps, my hypothesis is as follows:
So in this scenario, our collaborator is saying these variants should be annotated with |
I have looked over the file input.txt and queried all of the variant positions in the ucsc genome browser to extract the latest/final version of the hg19 nucleotide sequences at the variant positions listed. All cases show a discrepancy between the provided Reference_Allele column and the hg19 sequence as extracted from the ucsc browser. A preponderance of cases are of these two patterns: Ref_allele=CC, UCSC_hg19=TC, Tumor_seq_allele2=TT (34 cases) Ref_allele=GG, UCSC_hg19=AG, Tumor_seq_allele2=AA (19 cases) But in all cases, a SNP is seen when comparing the tumor_seq_allele to the UCSC_hg19 allele, and a DMP (or in one case a 6-NT replacement) is seen when comparing the tumor_seq_allele to the provided Ref_allele. (full details below) We are confirming that all of these cases should have been returned with a "failure to annotate" message with a possible note that the provided reference_allele was not consistent with the reference genome used by our installation of VEP (which seems to use the latest/final version of the hg19/GRCh37 assembly). Instead the queries to VEP were sent in a form which did not provide the reference genome sequence explicitly - instead relying on a format which specifies only the genome position range which is deleted and providing the nucleotides which replace the deleted region.
|
input: input.txt
Intermediate files:
annotation-tools
intermediate files I must add the .txt at the end or github won't allow me to upload these. My understanding it theinput.txt.temp.annotated.txt
is the output from Genome Nexus. But because the annotation-tools allows us to include a directory with a list of mafs or vcfs, it annotates each one of those files separately.processed.txt
is all of these merged.input.txt.temp.annotated.txt
input.txt.temp.txt
Processed:
processed.txt
The text was updated successfully, but these errors were encountered: