Running FICLE

Structure

FICLE is based on one executable python(3) script: ficle.py. The src folder contains all the auxiliary scripts and functions required.

Dependencies

Python-related libraries

gtfparse (v1.2.1)
pandas (v1.1.5)
NumPy (v1.19.5)

External:

SQANTI3 (> v5.0)
CPAT (v3.0.2)
gtfToGenePred and genePredToBed

Installation

Install Anaconda or Miniconda.
Clone the git repository into folder of choice:
https://github.com/SziKayLeung/FICLE.git
Create a conda environment using the FICLE.conda_env.yml script available in the main FICLE folder:

cd FICLE
conda env create -f ficle.condaEnv.yml
source activate ficle

Pre-requisites

Download the reference genome annotation of interest in GTF format, which can be found in GENCODE or CHESS.
Run SQANTI3 QC and filtering with generation of a filtered classification file. See SQANTI3 Git repository for more details.
Run CPAT on the long-read-derived fasta (preferably from SQANTI3), which can be obtained from following the long-read processing pipeline, to generate the ORF_prob.best.tsv.

cpat.py -x Mouse_Hexamer.tsv -d Mouse_logitModel.RData -g <path/to/longRead.fasta> --min-orf=50 --top-orf=50 -o <path/to/output/directory>

Generate a bed file from the long-read-derived GTF.

gtfToGenePred <path/to/longRead.gtf> longRead.genePred
genePredToBed longRead.genePred longRead.bed12
sort -k1,1 -k2,2n longRead.bed12 > longRead_sorted.bed12

Getting ready

Before running FICLE, you will need to:

Activate the ficle conda environment:

-bash-4.2$ source activate ficle
(ficle)-bash-4.2$

Add scripts to path:

-bash-4.2$ FICLE_ROOT=<path/to/cloned/github/FICLE/>
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}/reference

FICLE arguments and usage

FICLE accepts the following arguments:

usage: ficle.py [-h] [-n GENENAME] [-r REFERENCE] [-b INPUT_BED]
                [-g INPUT_GTF] [-c INPUT_CLASS] [--cpat CPAT] [-o OUTPUT_DIR]
                [-v]

Full Isoform Characterisation from (Targeted) Long-read Experiments

optional arguments:
  -h, --help            show this help message and exit
  -n GENENAME, --genename GENENAME
                        Target gene symbol
  -r REFERENCE, --reference REFERENCE
                        Gene reference annotation (<gene>_gencode.gtf)
  -b INPUT_BED, --input_bed INPUT_BED
                        Input bed file of all the final transcripts in long-
                        read derived transcriptome.
  -g INPUT_GTF, --input_gtf INPUT_GTF
                        Input gtf file of all the final transcripts in long-
                        read derived transcriptome.
  -c INPUT_CLASS, --input_class INPUT_CLASS
                        SQANTI classification file
  --cpat CPAT           \ORF_prob.best.tsv file generated from CPAT
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Output path for the annotation and associated files
  -v, --version         Display program version number.

Mandatory arguments

--genename : the target gene symbol of interest (i.e. App/APP), the syntax of which should match the associated_gene column in the output SQANTI classification file
--reference : target gene reference gtf (see Pre-requisite 1)
--input_bed : long-read transcriptome sorted bed file (see Pre-requisite 4)
--input_gtf : long-read gtf (from SQANTI3 filtering)
--input_class : SQANTI filtering classification file (see Pre-requisite 2)
--output_directory : path to output directory

Optional arguments

--cpat : CPAT output file (see Pre-requisite 3)

Usage example

To characterise Trem2 using FICLE:

ficle.py --gene=Trem2 \
    --reference=<path/to/gencode_reference.gtf> \
    --input_bed=<path/to/longRead_sorted.bed12> \
    --input_gtf=<path/to/longRead.gtf>  \
    ---input_class=<path/to/SQANTI_classificiation.txt> \
    --cpat=<path/to/cpat_ORF_prob.best.tsv>  \
    --output_dir=<path/to/output/directory>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly