Skip to content

Running FICLE

SziKayLeung edited this page Oct 18, 2023 · 4 revisions

Table of Contents


Structure

FICLE is based on one executable python(3) script: ficle.py. The src folder contains all the auxiliary scripts and functions required.

Dependencies

Python-related libraries

  • gtfparse (v1.2.1)
  • pandas (v1.1.5)
  • NumPy (v1.19.5)

External:

  • SQANTI3 (> v5.0)
  • CPAT (v3.0.2)
  • gtfToGenePred and genePredToBed

Installation

  1. Install Anaconda or Miniconda.
  2. Clone the git repository into folder of choice:
    https://github.com/SziKayLeung/FICLE.git
  3. Create a conda environment using the FICLE.conda_env.yml script available in the main FICLE folder:
cd FICLE
conda env create -f ficle.condaEnv.yml
source activate ficle

Pre-requisites

  1. Download the reference genome annotation of interest in GTF format, which can be found in GENCODE or CHESS.

  2. Run SQANTI3 QC and filtering with generation of a filtered classification file. See SQANTI3 Git repository for more details.

  3. Run CPAT on the long-read-derived fasta (preferably from SQANTI3), which can be obtained from following the long-read processing pipeline, to generate the ORF_prob.best.tsv.

cpat.py -x Mouse_Hexamer.tsv -d Mouse_logitModel.RData -g <path/to/longRead.fasta> --min-orf=50 --top-orf=50 -o <path/to/output/directory>
  1. Generate a bed file from the long-read-derived GTF.
gtfToGenePred <path/to/longRead.gtf> longRead.genePred
genePredToBed longRead.genePred longRead.bed12
sort -k1,1 -k2,2n longRead.bed12 > longRead_sorted.bed12

Getting ready

Before running FICLE, you will need to:

  1. Activate the ficle conda environment:
-bash-4.2$ source activate ficle
(ficle)-bash-4.2$
  1. Add scripts to path:
-bash-4.2$ FICLE_ROOT=<path/to/cloned/github/FICLE/>
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}/reference

FICLE arguments and usage

FICLE accepts the following arguments:

usage: ficle.py [-h] [-n GENENAME] [-r REFERENCE] [-b INPUT_BED]
                [-g INPUT_GTF] [-c INPUT_CLASS] [--cpat CPAT] [-o OUTPUT_DIR]
                [-v]

Full Isoform Characterisation from (Targeted) Long-read Experiments

optional arguments:
  -h, --help            show this help message and exit
  -n GENENAME, --genename GENENAME
                        Target gene symbol
  -r REFERENCE, --reference REFERENCE
                        Gene reference annotation (<gene>_gencode.gtf)
  -b INPUT_BED, --input_bed INPUT_BED
                        Input bed file of all the final transcripts in long-
                        read derived transcriptome.
  -g INPUT_GTF, --input_gtf INPUT_GTF
                        Input gtf file of all the final transcripts in long-
                        read derived transcriptome.
  -c INPUT_CLASS, --input_class INPUT_CLASS
                        SQANTI classification file
  --cpat CPAT           \ORF_prob.best.tsv file generated from CPAT
  -o OUTPUT_DIR, --output_dir OUTPUT_DIR
                        Output path for the annotation and associated files
  -v, --version         Display program version number.

Mandatory arguments
  1. --genename : the target gene symbol of interest (i.e. App/APP), the syntax of which should match the associated_gene column in the output SQANTI classification file
  2. --reference : target gene reference gtf (see Pre-requisite 1)
  3. --input_bed : long-read transcriptome sorted bed file (see Pre-requisite 4)
  4. --input_gtf : long-read gtf (from SQANTI3 filtering)
  5. --input_class : SQANTI filtering classification file (see Pre-requisite 2)
  6. --output_directory : path to output directory
Optional arguments
  1. --cpat : CPAT output file (see Pre-requisite 3)

Usage example

To characterise Trem2 using FICLE:

ficle.py --gene=Trem2 \
    --reference=<path/to/gencode_reference.gtf> \
    --input_bed=<path/to/longRead_sorted.bed12> \
    --input_gtf=<path/to/longRead.gtf>  \
    ---input_class=<path/to/SQANTI_classificiation.txt> \
    --cpat=<path/to/cpat_ORF_prob.best.tsv>  \
    --output_dir=<path/to/output/directory>