Skip to content

Latest commit

 

History

History
48 lines (33 loc) · 1.7 KB

README.md

File metadata and controls

48 lines (33 loc) · 1.7 KB

Finding non-biallelic SNPs

SNPs are used as markers for various population genetic purposes. In particular non-biallelic SNPs are of interest for identification purposes; to have the same discriminative power, more biallelic than non-biallelic SNPs are required.

In this project, we present a tool that uses the NCBI public database dbSNP to identify non-biallelic SNPs.

This work was published in FSI Genetics in 2009.

Installation

Install the expat development files:

apt-get install libexpat1-dev

Retrieve the source code and compile the program:

git clone https://github.com/jfjlaros/snp.git
cd snp/src
make

Usage

The program requires a dump of the database in XML format. These files are typically found in the subfolder named genotype of any of the builds hosted on the download site of the NCBI.

For a file named gt_chrXX.xml.gz, use the following command to find the SNP candidates:

  zcat gt_chrXX.xml.gz | ./snp <threshold> > output.txt

The treshold parameter is used to specify the minimum allele frequency (in percentages). If this option is omitted, the threshold defaults to 0. By increasing this variable the amount of output can be greatly reduced, setting it to 1 or higher is recommended.

Related work

Some notes on allele frequencies on the X- and Y chromosomes.

Inspired by this research, we looked into the degradation of methylated nucleotides for similar purposes.