You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am trying to use seqVCF2GDS and am getting the following error:
library(SeqArray)
library(data.table)
seqVCF2GDS(high_mod_vcf, "r4_chr1_high_mod.gds", parallel=6L)
Mon Nov 6 16:09:06 2023
Variant Call Format (VCF) Import:
file(s):
r4_PASS_chr1_updated_varID_dups_drop_updated_IDs_nhw_hwe6_noNHWrelateds_high_mod_impact.vcf (198.8M)
file format: VCFv4.2
the number of sets of chromosomes (ploidy): 2
the number of samples: 14,306
genotype storage: bit2
compression method: LZMA_RA
# of samples: 14306
calculating the total number of variants ...
the total number of variants for import: 3,632
Writing to 6 files:
r4_chr1_high_mod_tmp01_ad336f56fc72 [1..606]
r4_chr1_high_mod_tmp02_ad3315e862b7 [607..1,212]
r4_chr1_high_mod_tmp03_ad33613818b1 [1,213..1,818]
r4_chr1_high_mod_tmp04_ad33473817c6 [1,819..2,424]
r4_chr1_high_mod_tmp05_ad334e0fea8c [2,425..3,030]
r4_chr1_high_mod_tmp06_ad33607634f8 [3,031..3,632]
Done (Mon Nov 6 16:09:10 2023).
Output:
r4_chr1_high_mod.gds
Merging:
opening 'r4_chr1_high_mod_tmp01_ad336f56fc72' ... [done]
opening 'r4_chr1_high_mod_tmp02_ad3315e862b7' ... [done]
opening 'r4_chr1_high_mod_tmp03_ad33613818b1' ... [done]
opening 'r4_chr1_high_mod_tmp04_ad33473817c6' ... [done]
opening 'r4_chr1_high_mod_tmp05_ad334e0fea8c' ... [done]
opening 'r4_chr1_high_mod_tmp06_ad33607634f8' ... [done]
Digests:
sample.idError: segfault from C stack overflow
Do the sampel IDs need to be in a particular format? I created my vcf with plink and used double-id option. IDs are in format: A-[Cohort]-[A#####]. A .gds file is outputed, but I don't know if it's is incorrect due to the segfault.
gds <- seqOpen(r4_chr1_high_mod.gds)
gds
Object of class "SeqVarGDSClass"
File: r4_chr1_high_mod.gds (294.4K)
See: the total number of variants for import: 3,632
This number is too small, parallel=6L does not help at all.
I guess parallel=6L might trigger a bug when merging the data files when the number of variants is too small.
I've now tried this with a vcf with a 200k+ varaints. I have successfully converted this vcf to a gds using SNPRelate. However, I am using another software that specifically needs the gds file in SeqArray format, not SNPRelate. But I am still getting the same error: sample.idError: segfault from C stack overflow.
Your R version and gdsfmt versions are old.
The recent update was made with a focus on R (>= v4.0).
I suggest using SeqArray GDS format instead of SNPRelate GDS.
Hello, I am trying to use seqVCF2GDS and am getting the following error:
library(SeqArray)
library(data.table)
seqVCF2GDS(high_mod_vcf, "r4_chr1_high_mod.gds", parallel=6L)
Mon Nov 6 16:09:06 2023
Variant Call Format (VCF) Import:
file(s):
r4_PASS_chr1_updated_varID_dups_drop_updated_IDs_nhw_hwe6_noNHWrelateds_high_mod_impact.vcf (198.8M)
file format: VCFv4.2
the number of sets of chromosomes (ploidy): 2
the number of samples: 14,306
genotype storage: bit2
compression method: LZMA_RA
# of samples: 14306
calculating the total number of variants ...
the total number of variants for import: 3,632
Writing to 6 files:
r4_chr1_high_mod_tmp01_ad336f56fc72 [1..606]
r4_chr1_high_mod_tmp02_ad3315e862b7 [607..1,212]
r4_chr1_high_mod_tmp03_ad33613818b1 [1,213..1,818]
r4_chr1_high_mod_tmp04_ad33473817c6 [1,819..2,424]
r4_chr1_high_mod_tmp05_ad334e0fea8c [2,425..3,030]
r4_chr1_high_mod_tmp06_ad33607634f8 [3,031..3,632]
Done (Mon Nov 6 16:09:10 2023).
Output:
r4_chr1_high_mod.gds
Merging:
opening 'r4_chr1_high_mod_tmp01_ad336f56fc72' ... [done]
opening 'r4_chr1_high_mod_tmp02_ad3315e862b7' ... [done]
opening 'r4_chr1_high_mod_tmp03_ad33613818b1' ... [done]
opening 'r4_chr1_high_mod_tmp04_ad33473817c6' ... [done]
opening 'r4_chr1_high_mod_tmp05_ad334e0fea8c' ... [done]
opening 'r4_chr1_high_mod_tmp06_ad33607634f8' ... [done]
Digests:
sample.idError: segfault from C stack overflow
Do the sampel IDs need to be in a particular format? I created my vcf with plink and used double-id option. IDs are in format: A-[Cohort]-[A#####]. A .gds file is outputed, but I don't know if it's is incorrect due to the segfault.
gds <- seqOpen(r4_chr1_high_mod.gds)
gds
Object of class "SeqVarGDSClass"
File: r4_chr1_high_mod.gds (294.4K)
|--+ description [ ] *
|--+ sample.id { Str8 14306 LZMA_ra(2.94%), 12.6K }
|--+ variant.id { Int32 3632 LZMA_ra(12.7%), 1.8K }
|--+ position { Int32 3632 LZMA_ra(62.3%), 8.8K }
|--+ chromosome { Str8 3632 LZMA_ra(1.62%), 125B }
|--+ allele { Str8 3632 LZMA_ra(24.4%), 4.0K }
|--+ genotype [ ] *
| |--+ data { Bit2 2x14306x3632 LZMA_ra(0.95%), 242.2K }
| |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
| --+ extra { Int16 0 LZMA_ra, 18B }
|--+ phase [ ]
| |--+ data { Bit1 14306x3632 LZMA_ra(0.02%), 1.3K }
| |--+ extra.index { Int32 3x0 LZMA_ra, 18B } *
| --+ extra { Bit1 0 LZMA_ra, 18B }
|--+ annotation [ ]
| |--+ id { Str8 3632 LZMA_ra(28.1%), 16.0K }
| |--+ qual { Float32 3632 LZMA_ra(0.92%), 141B }
| |--+ filter { Int32 3632 LZMA_ra(0.92%), 141B }
| |--+ info [ ]
| | --+ PR { Bit1 3632 LZMA_ra(18.9%), 93B } *
| --+ format [ ]
--+ sample.annotation [ ]
sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/lib/libR.so
LAPACK: /cvmfs/priv.accre.vanderbilt.edu/mirror/optimized/sandy_bridge/easybuild/software/MPI/intel/2019.1.144/impi/2018.4.274/R/3.6.0/lib64/R/modules/lapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.8 SeqArray_1.26.2 gdsfmt_1.22.0
loaded via a namespace (and not attached):
[1] zlibbioc_1.32.0 compiler_3.6.0 IRanges_2.20.2
[4] XVector_0.26.0 parallel_3.6.0 GenomicRanges_1.38.0
[7] GenomeInfoDbData_1.2.2 RCurl_1.95-4.12 Biostrings_2.54.0
[10] S4Vectors_0.24.4 BiocGenerics_0.32.0 GenomeInfoDb_1.22.1
[13] bitops_1.0-6 stats4_3.6.0
Thank you,
Alexis
The text was updated successfully, but these errors were encountered: