-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue with creating database #57
Comments
looks like |
in the vcf header it is described as: Thanks for getting back to me so quickly and resolving my issue.. Andrew |
hmm. did you run |
i ran vt decompose -s on the vcf before loading... The only difference with this vcf was it had been put through the GATK refinement workflow i.e. https://gatkforums.broadinstitute.org/gatk/discussion/4723/genotype-refinement-workflow. I wonder if that affected something? Andrew |
I also had that issue now I always do
|
I ran into the same issue as well:
The VCF had been processed by 'vt decompose'
|
I believe the problem are the |
Thanks @mmoisse The error is fixed by removing the second value of these fields. The root of the problem is that there are duplicate records in the gnomad genome VCF:
vcfanno concatenated the allele numbers(ANs) from rs114481025 and rs34163425 with "op=["self"]". The error is gone after I set "op=["max"]" |
Hello, I am experiencing similar issue. We have multiple exomes annotated with VEP from which we create a multisample vcf using bcftools merge. After the merge this multisample vcf is decomposed with vt decompose -s and is input to vcf2db.py to create a GEMINI db. Some sites previously multiallelic during the process generate error as is discussed in this issue here. I can't figure out what is wrong. Your help is very much appreciated. I attach here the vcf.gz and vcf.gz.tbi and ped of 4 samples with only those two lines that prevent from loading. All in zip file. If it would be possible to identify which field impairs the loading and how , then we would take care of it before using vcf2db.py . Many thanks in advance! Update: |
Hello...
I used gemini and vcf2db previously with great successful, but I'm having issues when using a new set of VCFs I've just received..
I annotated with snpeff in the my usual way but received the following error message:
Traceback (most recent call last):
File "/home/atimms/programs/vcf2db/vcf2db.py", line 923, in
impacts_extras=a.impacts_field, aok=a.a_ok)
File "/home/atimms/programs/vcf2db/vcf2db.py", line 233, in init
self.load()
File "/home/atimms/programs/vcf2db/vcf2db.py", line 318, in load
i = self._load(self.cache, create=True, start=1)
File "/home/atimms/programs/vcf2db/vcf2db.py", line 311, in _load
self.insert(variants, expanded, keys, i, create=create)
File "/home/atimms/programs/vcf2db/vcf2db.py", line 373, in insert
vilengths, variant_impacts)
File "/home/atimms/programs/vcf2db/vcf2db.py", line 401, in _insert
self.__insert(v_objs, self.metadata.tables['variants'].insert())
File "/home/atimms/programs/vcf2db/vcf2db.py", line 443, in __insert
trans.execute(stmt, o)
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 980, in execute
return meth(self, multiparams, params)
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 273, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1099, in _execute_clauseelement
distilled_params,
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1240, in _execute_context
e, statement, parameters, cursor, context
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1458, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 296, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1236, in _execute_context
cursor, statement, parameters, context
File "/home/atimms/miniconda2/envs/hg38_genomes/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 536, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.InterfaceError: (sqlite3.InterfaceError) Error binding parameter 48 - probably unsupported type. [SQL: u'INSERT INTO variants (variant_id, chrom, start, "end", vcf_id, ref, alt, qual, filter, type, sub_type, call_rate, num_hom_ref, num_het, num_hom_alt, num_unknown, aaf, gene, ensembl_gene_id, transcript, is_exonic, is_coding, is_lof, is_splicing, is_canonical, exon, codon_change, aa_change, aa_length, biotype, impact, impact_so, impact_severity, polyphen_pred, polyphen_score, sift_pred, sift_score, an, baseqranksum, clippingranksum, db, dp, ds, excesshet, fs, mq, mqranksum, negative_train_site, pg, positive_train_site, qd, raw_mq, readposranksum, sor, vqslod, culprit, loconfdenovo, old_multiallelic, old_variant, lof, consequence, symbol, feature_type, feature, intron, hgvsc, hgvsp, cdna_position, cds_position, protein_position, amino_acids, codons, existing_variation, distance, strand, flags, variant_class, symbol_source, hgnc_id, canonical, sift, hgvs_offset, hgvsg, amino_acid_change, transcript_biotype, gene_coding, transcript_id, exon_rank, genotype, gts, gt_types, gt_phases, gt_depths, gt_ref_depths, gt_alt_depths, gt_quals, gt_alt_freqs) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)'] [parameters: (1, u'chr1', 10143, 10150, None, u'TAACCCC', u'T', 120.08000183105469, None, 'indel', 'del', 1.0, 1, 2, 0, 0, 0.3333333333333333, u'DDX11L1', None, u'ENST00000456328', 0, 0, 0, 0, 0, u'', u'1724', u'', None, u'processed_transcript', 'upstream_gene_variant', 'upstream_gene_variant', 'LOW', None, None, None, None, 6, -0.550000011920929, -0.550000011920929, 0, 75, 0, 3.9793999195098877, 0.0, 22.270000457763672, 0.9369999766349792, 0, (0, 0, 0), 0, 17.149999618530273, 17356.0, 0.9369999766349792, 0.36800000071525574, 3.0899999141693115, u'FS', None, None, u'None', u'None', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', u'', u'processed_transcript', u'NON_CODING', u'ENST00000456328', u'', u'T', <read-only buffer for 0x7fffdfe884f8, size -1, offset 0 at 0x7fffdef27270>, <read-only buffer for 0x7fffdfeed7a0, size -1, offset 0 at 0x7fffdef272b0>, <read-only buffer for 0x7fffdfe91120, size -1, offset 0 at 0x7fffdef272f0>, <read-only buffer for 0x7fffdfeed7d8, size -1, offset 0 at 0x7fffdef27330>, <read-only buffer for 0x7fffdfeed810, size -1, offset 0 at 0x7fffdef27370>, <read-only buffer for 0x7fffdfeed848, size -1, offset 0 at 0x7fffdef273b0>, <read-only buffer for 0x7fffdfeed880, size -1, offset 0 at 0x7fffdef273f0>, <read-only buffer for 0x7fffdfef8b30, size -1, offset 0 at 0x7fffdef27430>)] (Background on this error at: http://sqlalche.me/e/rvf5)
the VCF I received does have some strange fields in the genotypes (generated by GATK), here's an example line...
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT F03-00008 F03-00006 F03-00007
chr1 10144 . TAACCCC T 120.08 PASS AC=2;AF=0.333;AN=6;BaseQRankSum=-0.55;ClippingRankSum=-0.55;DP=75;ExcessHet=3.9794;FS=0;MLEAC=1;MLEAF=0.25;MQ=22.27;MQRankSum=0.937;PG=0,0,0;QD=17.15;RAW_MQ=17356;ReadPosRankSum=0.937;SOR=0.368;VQSLOD=3.09;culprit=FS;EFF=MOTIFMA0341.1:Egr1,MOTIFMA0366.1:Egr1,UPSTREAM(MODIFIER||1724|||DDX11L1|processed_transcript|NON_CODING|ENST00000456328||T),UPSTREAM(MODIFIER||1865|||DDX11L1|transcribed_unprocessed_pseudogene|NON_CODING|ENST00000450305||T),DOWNSTREAM(MODIFIER||4259|||WASH7P|unprocessed_pseudogene|NON_CODING|ENST00000488147||T),INTERGENIC(MODIFIER||||||||||T) GT:AD:DP:FT:GQ:JL:JP:PL:PP 0/1:40,0:40:lowGQ:2:-1:-1:0,0,545:2,0,547 0/1:3,4:7:PASS:50:-1:-1:126,0,46:127,0,50 0/0:23,0:23:lowGQ:0:.:.:0,0,0:0,0,0
Any help would be greatly appreciated.
Andrew
The text was updated successfully, but these errors were encountered: