You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've mentioned this before, but I don't think vcf-to-tab from VCFtools was designed to work with Graphtyper-formatted vcf files.
It works ok when dealing with haploid organisms where you are just filtering for homozygous SNPs and have haploid calls, but it in the conversion process to tabular format, output tables don't have any diploid calls (see screenshots below).
As a result, it also collapses any heterozygous SNPs.
I was reminded of this issue after Camilo used this tool with VCF files produced using GATK, and interestingly, all diploid calls were retained, as were heterozygous SNPs.
In the conversion to multi-fasta alignment, he then saw the expected ambiguity codes.
Let's discuss this more, but I think we may need to select another tool convert the vcf to tabular format, before creating the SNP-multifasta alignment files that are inputs into poppr and when making the SNP tree.
Finally, as I look at the VCF ouputs:
In the current iteration of the pipeline, during variant filtering, there is a line where only homozygous SNPs are retained regardless of ploidy. We should discuss the implications of this.
I think another flag needs to be added to retain only biallelic SNPs (if that is the goal).
Command used and terminal output
No response
Relevant files
Here is a snapshot of a bacterial output file generated using Graphtyper -> vcf-to-tab as part of a pipeline run
#CHROM POS REF GCF_002251695_1_ralstonia_solanacearum1 GCF_002251695_1_ralstonia_solanacearum2
NZ_NCTK01000001.1 28751 T G/ G/
NZ_NCTK01000001.1 28755 C T/ T/
NZ_NCTK01000001.1 28757 G A/ A/
NZ_NCTK01000001.1 28763 C G/ G/
NZ_NCTK01000001.1 28776 A G/ G/
NZ_NCTK01000001.1 28791 A G/ G/
NZ_NCTK01000001.1 28793 G T/ T/
NZ_NCTK01000001.1 28817 C C/ C/
NZ_NCTK01000001.1 28830 A A/ A/
NZ_NCTK01000001.1 28835 T T/ C/
NZ_NCTK01000001.1 28842 A G/ A/
NZ_NCTK01000001.1 28847 C A/ A/
NZ_NCTK01000001.1 28877 C T/ T/
NZ_NCTK01000001.1 28892 T C/ C/
NZ_NCTK01000001.1 28904 GA G/ GA/
NZ_NCTK01000001.1 28983 A G/ G/
NZ_NCTK01000001.1 29026 T C/ C/
NZ_NCTK01000001.1 29049 G G/ C/
Here is an oomycete output file generating using Graphtyoper -> vcf-to-tab (as part of same run)
#CHROM POS REF GCA_001466705_2_phytophthora_palmivora2
LATX02000001.1 1873814 G A/
LATX02000001.1 2111531 C T/
LATX02000007.1 1838 T C/
LATX02000012.1 104622 C ./
LATX02000013.1 24767 C T/
LATX02000013.1 665680 T G/
LATX02000017.1 2844 G A/
LATX02000017.1 55049 G A/
LATX02000020.1 52018 A G/
LATX02000020.1 52031 T C/
LATX02000024.1 498059 T A/
LATX02000046.1 457792 A ./
LATX02000046.1 457859 G T/
LATX02000046.1 457862 T C/
LATX02000046.1 457906 A C/
LATX02000046.1 457936 C ./
LATX02000046.1 457998 T C/
LATX02000057.1 373757 G A/
LATX02000057.1 406057 G C/
LATX02000057.1 442436 T C/
LATX02000078.1 731901 T C/
System information
No response
The text was updated successfully, but these errors were encountered:
masudermann
changed the title
We need to review step where we convert Graphtyper VCF outputs files to tabular format using vcf-to-tab
We need to review the step where we convert Graphtyper VCF outputs files to tabular format using vcf-to-tab
May 31, 2024
Description of the bug
I've mentioned this before, but I don't think vcf-to-tab from VCFtools was designed to work with Graphtyper-formatted vcf files.
It works ok when dealing with haploid organisms where you are just filtering for homozygous SNPs and have haploid calls, but it in the conversion process to tabular format, output tables don't have any diploid calls (see screenshots below).
As a result, it also collapses any heterozygous SNPs.
I was reminded of this issue after Camilo used this tool with VCF files produced using GATK, and interestingly, all diploid calls were retained, as were heterozygous SNPs.
In the conversion to multi-fasta alignment, he then saw the expected ambiguity codes.
Let's discuss this more, but I think we may need to select another tool convert the vcf to tabular format, before creating the SNP-multifasta alignment files that are inputs into poppr and when making the SNP tree.
Finally, as I look at the VCF ouputs:
In the current iteration of the pipeline, during variant filtering, there is a line where only homozygous SNPs are retained regardless of ploidy. We should discuss the implications of this.
I think another flag needs to be added to retain only biallelic SNPs (if that is the goal).
Command used and terminal output
No response
Relevant files
Here is a snapshot of a bacterial output file generated using Graphtyper -> vcf-to-tab as part of a pipeline run
#CHROM POS REF GCF_002251695_1_ralstonia_solanacearum1 GCF_002251695_1_ralstonia_solanacearum2
NZ_NCTK01000001.1 28751 T G/ G/
NZ_NCTK01000001.1 28755 C T/ T/
NZ_NCTK01000001.1 28757 G A/ A/
NZ_NCTK01000001.1 28763 C G/ G/
NZ_NCTK01000001.1 28776 A G/ G/
NZ_NCTK01000001.1 28791 A G/ G/
NZ_NCTK01000001.1 28793 G T/ T/
NZ_NCTK01000001.1 28817 C C/ C/
NZ_NCTK01000001.1 28830 A A/ A/
NZ_NCTK01000001.1 28835 T T/ C/
NZ_NCTK01000001.1 28842 A G/ A/
NZ_NCTK01000001.1 28847 C A/ A/
NZ_NCTK01000001.1 28877 C T/ T/
NZ_NCTK01000001.1 28892 T C/ C/
NZ_NCTK01000001.1 28904 GA G/ GA/
NZ_NCTK01000001.1 28983 A G/ G/
NZ_NCTK01000001.1 29026 T C/ C/
NZ_NCTK01000001.1 29049 G G/ C/
Here is an oomycete output file generating using Graphtyoper -> vcf-to-tab (as part of same run)
#CHROM POS REF GCA_001466705_2_phytophthora_palmivora2
LATX02000001.1 1873814 G A/
LATX02000001.1 2111531 C T/
LATX02000007.1 1838 T C/
LATX02000012.1 104622 C ./
LATX02000013.1 24767 C T/
LATX02000013.1 665680 T G/
LATX02000017.1 2844 G A/
LATX02000017.1 55049 G A/
LATX02000020.1 52018 A G/
LATX02000020.1 52031 T C/
LATX02000024.1 498059 T A/
LATX02000046.1 457792 A ./
LATX02000046.1 457859 G T/
LATX02000046.1 457862 T C/
LATX02000046.1 457906 A C/
LATX02000046.1 457936 C ./
LATX02000046.1 457998 T C/
LATX02000057.1 373757 G A/
LATX02000057.1 406057 G C/
LATX02000057.1 442436 T C/
LATX02000078.1 731901 T C/
System information
No response
The text was updated successfully, but these errors were encountered: