Skip to content

ngs_RMDUP

Stephen Fisher edited this page Aug 7, 2014 · 3 revisions

Module: RMDUP

This module will remove duplicate reads.

Usage:
	ngs.sh rmdup [-i inputDir] [-se] sampleID
Input:
	sampleID/inputDir/unaligned_1.fq
	sampleID/inputDir/unaligned_2.fq (paired-end reads)
Output:
	sampleID/rmdup/unaligned_1.fq
	sampleID/rmdup/unaligned_1.fq
 	sampleID/rmdup/sampleID.rmdup.stats.txt
Requires:
	removeDuplicates.py
Options:
	-i inputDir - location of source files (default: init).
	-se - single-end reads (default: paired-end)

Remove duplicate reads. Reads are considered duplicates if they exactly match. For paired-end reads, the mate pairs both must exactly match to be considered duplicates. This is very RAM intensive, requiring RAM amounts up to three times the input file size (e.g. if your fastq files total 20GB then up to 60GB RAM may be used when removing duplicates).

Clone this wiki locally