MultiFastaDemultiplexer

CDS or protein fasta files from annotation often contain multiple gene version for the same locus. For various application only one single gene model is needed. MultiFastaReduceR subsets a input fasta and keeps only the longest annotated sequence.

Usage

Dependencies

# On R
install.packages("tidyverse")
install.packages("seqinr")

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("Biostrings")

Download

git clone https://github.com/mwylerCH/MultiFastaDemultiplexer
chmod +x MultiFastaDemultiplexer/MultiFastaReduceR
echo 'export PATH="$HOME/MultiFastaDemultiplexer:$PATH"' >> ~/.bashrc

Subset fasta

MultiFastaReduceR [MOLECULE] [OverAnnotatedFASTA.fa]

MOLECULE is a string: DNA for nucleotide sequence or AA for proteins.
OverAnnotatedFASTA.fa is the original multifasta file. The file can be compressed (gzip) or not. Fasta header can contain gene description or other information, but the different gene model versions need to be named as follow: "NC_000019.1", "NC_000019.2", "NC_000019.3",...

Technical

Developed with R version 3.5.1 and Biostrings version 2.50.2, seqinr 3.6-1 and tidyverse 1.3.0 on a Ubuntu 16.06 LTS machine.
22th August 2020, Giubiasco, Switzerland.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
MultiFastaReduceR		MultiFastaReduceR
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MultiFastaDemultiplexer

Usage

Dependencies

Download

Subset fasta

Technical

About

Releases

Packages

Languages

mwylerCH/MultiFastaReduceR

Folders and files

Latest commit

History

Repository files navigation

MultiFastaDemultiplexer

Usage

Dependencies

Download

Subset fasta

Technical

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages