GNU General Public License, GPLv3
This package provides functions for format conversion from bgen files to SeqArray GDS files.
v0.9.3
Dr. Xiuwen Zheng ([email protected])
Requires R (≥ v3.5.0), gdsfmt (≥ v1.20.0), SeqArray (≥ v1.24.0)
- Installation from Github:
library("devtools")
install_github("zhengxwen/gds2bgen")
The install_github()
approach requires that you build from source, i.e. make
and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.
Or manually intall the package
git clone https://github.com/zhengxwen/gds2bgen
cd gds2bgen/src
unzip bgen_v1.1.8.zip
cd bgen_v1.1.8
python2 ./waf configure
python2 ./waf
cp build/libbgen.a ..
cp build/3rd_party/zstd-1.1.0/libzstd.a ..
rm -rf build
sleep 1; touch ../libbgen.a
cd ../../..
R CMD INSTALL gds2bgen
This package includes the sources of the bgen library (https://enkre.net/cgi-bin/code/bgen/dir?ci=trunk), Boost (the C++ libraries, https://www.boost.org) and Zstandard (https://zstd.net).
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.
library(gds2bgen)
seqBGEN_Info() # bgen library version
## "bgen_lib_v1.1.8"
bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
# or bgen_fn <- "your_bgen_file.bgen"
seqBGEN_Info(bgen_fn)
## File: gds2bgen/extdata/example.8bits.bgen
## # of samples: 500
## # of variants: 199
## Compression method: zlib
## Layout version: v1.2
## Unphased: TRUE
## # of bits: 8
## Ploidy: 2
## sample id: sample_001, sample_002, sample_003, sample_004, ...
# example.8bits.bgen ==> example.gds, using 4 cores
seqBGEN2GDS(bgen_fn, "example.gds",
storage.option="LZMA_RA", # compression option, e.g., ZIP_RA for zlib or LZ4_RA for LZ4
float.type="packed8", # 8-bit packed real numbers
geno=FALSE, # 2-bit integer genotypes, stored in 'genotype/data'
dosage=TRUE, # numeric alternative allele dosages, stored in 'annotation/format/DS'
prob=FALSE, # numeric genotype probabilities, stored in 'annotation/format/GP'
parallel=4 # the number of cores
)
# show file structure
library(SeqArray)
(f <- seqOpen("example.gds"))
seqClose(f)
## File: example.gds (137.7K)
## + [ ] *
## |--+ description [ ] *
## |--+ sample.id { Str8 500 LZMA_ra(7.02%), 393B } *
## |--+ variant.id { Int32 199 LZMA_ra(33.9%), 277B } *
## |--+ position { Int32 199 LZMA_ra(60.6%), 489B } *
## |--+ chromosome { Str8 199 LZMA_ra(15.7%), 101B } *
## |--+ allele { Str8 199 LZMA_ra(11.8%), 101B } *
## |--+ genotype [ ] *
## |--+ phase [ ]
## |--+ annotation [ ]
## | |--+ id { Str8 199 LZMA_ra(18.6%), 321B } *
## | |--+ qual { Float32 199 LZMA_ra(11.8%), 101B } *
## | |--+ filter { Int32 199 LZMA_ra(11.3%), 97B } *
## | |--+ info [ ]
## | \--+ format [ ]
## | |--+ DS [ ] *
## | | \--+ data { PackedReal8U 500x199 LZMA_ra(55.6%), 54.0K } *
## \--+ sample.annotation [ ]
seqVCF2GDS() in the SeqArray package, conversion from VCF files to GDS files.
seqBED2GDS() in the SeqArray package, conversion from PLINK BED files to GDS files.