Skip to content

Latest commit

 

History

History
29 lines (20 loc) · 918 Bytes

README.rst

File metadata and controls

29 lines (20 loc) · 918 Bytes

Python GFF sequence loader

Setup

This python script depends on external libraries, namely:

Both packages are easy-installable:

$ easy_install GFFutils
$ easy_install pyfasta

Usage

Just locate the GFF file and the FASTA file, and then run the script:

$ ./gff_loader.py athaliana.gff athaliana.fa

This by default, extracts the mRNA IDs and pull out all subfeatures (CDS) that have the same parent and concatenate the seqs together, reverse-complement if needed. However sometimes the GFF does not use the standard names, this is when you need to do:

$ ./gff_loader.py athaliana.gff athaliana.fa --parents Gene --children exon

If the GFF has meant coding sequences, but uses a different term.