GitHub - txje/lrc_eval: Long read error correction evaluation scripts

A few tools to evaluate error correction of long reads (Pacbio, nanopore), largely mirroring the Error Correction Evaluation Toolkit

From the ECET paper:

We use the following measures for each program: number of erroneous bases identified and successfully corrected (true positives, TP), correct bases wrongly identified as errors and changed (false positives, FP), and erroneous bases that were either uncorrected or falsely corrected (false negatives, FN). We report sensitivity and specificity for each program. Then, we combine these into the gain metric [21], defined by gain = (TP - FP) / (TP + FN), which is the percentage of errors removed from the data set by the error-correction program. A negative gain value indicates that more errors have been introduced due to false corrections, which is not captured by measures such as sensitivity and specificity.

Utilities:

maf2tef.py
- converts MAF to TEF format
sam2tef.py
- converts SAM to TEF format
m52tef.py
- converts BLASR -m5 format to TEF format
remap_m5.py
- rewrites the read names in a FASTA file according to the renaming scheme for several long read error correction methods
- it's easier to compare post- to pre-corrected sequences if the names are consistent...

Plumbing:

fasta.py
- A very simple FASTA file API
aln_formats.py
- Provides a common API to parse and iterate through alignment formats, including MAF, m4, and m5

Statistics can be computed directly from several alignment formats, with slightly different capabilities:

tef_stats.py
- Computes error correction statistics given uncorrected and corrected TEF files, in line with original ECET
maf_stats.py
m5_stats.py
- THIS IS THE RECOMMENDED METHOD and the method used in the FMLRC paper
- Statistics are computed directly from -m5 format, allowing BLASR results to be used directly and loci compared relative to the reference sequence

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
aln_formats.py		aln_formats.py
fasta.py		fasta.py
m52tef.py		m52tef.py
m5_stats.py		m5_stats.py
maf2tef.py		maf2tef.py
maf_stats.py		maf_stats.py
remap_m5.py		remap_m5.py
sam2tef.py		sam2tef.py
tef_stats.py		tef_stats.py

txje/lrc_eval

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages