Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional first phasing symbol introduced in VCF 4.4 #263

Open
timothymillar opened this issue Jul 1, 2024 · 4 comments
Open

Optional first phasing symbol introduced in VCF 4.4 #263

timothymillar opened this issue Jul 1, 2024 · 4 comments

Comments

@timothymillar
Copy link

The VCF 4.4 spec now allows for an initial symbol indicating the phasing of the first allele. For example, /0/1 is a valid genotype. At present, vcf2zarr is raising on this input with Couldn't read GT data: value not a number or '.' ....

@timothymillar
Copy link
Author

Related issue around supporting partial phasing in the VCF-Zarr spec: sgkit-dev/vcf-zarr-spec#24

@jeromekelleher
Copy link
Contributor

I think we'll need to wait on htslib and cyvcf2 support for this - presumably it'll be a while coming through the pipeline. I had a quick scan of the htslib issue tracker but didn't find anything.

What does bcftools view give for this VCF @timothymillar?

@timothymillar
Copy link
Author

Good point, I don't think we can do anything for now. With the VCF:

##fileformat=VCFv4.4
##FILTER=<ID=PASS,Description="All filters passed">
##fileDate=20201009
##source=.
##reference=./simple.fasta
##contig=<ID=CHR1,length=60>
##contig=<ID=CHR2,length=60>
##contig=<ID=CHR3,length=60>
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1 SAMPLE2 SAMPLE3
CHR1    2       .       A       T       60      PASS    NS=3;AC=3       GT      /1/1    /0/0    /0/0
CHR1    7       .       A       C       60      PASS    NS=3;AC=4       GT      /0/0    /0/1    /0/1

bcftools view (version 1.20) omits all of the records (nothing after #CHROM ...

bcftools view (version 1.10.2) inserts an additional reference allele:

...
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SAMPLE1 SAMPLE2 SAMPLE3
CHR1    2       .       A       T       60      PASS    NS=3;AC=3       GT      0/1/1   0/0/0   0/0/0
CHR1    7       .       A       C       60      PASS    NS=3;AC=4       GT      0/0/0   0/0/1   0/0/1

@jeromekelleher
Copy link
Contributor

Hmm - that's not a great sign. I don't think this feature is going to get used much for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants