-
Notifications
You must be signed in to change notification settings - Fork 0
Running FICLE
SziKayLeung edited this page Oct 18, 2023
·
4 revisions
- Structure
- Dependencies
- Installation
- Pre-requisites
- Getting ready
- FICLE arguments and usage
- Usage example
FICLE is based on one executable python(3) script: ficle.py
. The src
folder contains all the auxiliary scripts and functions required.
Python-related libraries
- gtfparse (v1.2.1)
- pandas (v1.1.5)
- NumPy (v1.19.5)
External:
- Install Anaconda or Miniconda.
- Clone the git repository into folder of choice:
https://github.com/SziKayLeung/FICLE.git
- Create a conda environment using the FICLE.conda_env.yml script available in the main FICLE folder:
cd FICLE
conda env create -f ficle.condaEnv.yml
source activate ficle
-
Download the reference genome annotation of interest in GTF format, which can be found in GENCODE or CHESS.
-
Run SQANTI3 QC and filtering with generation of a filtered classification file. See SQANTI3 Git repository for more details.
-
Run CPAT on the long-read-derived fasta (preferably from SQANTI3), which can be obtained from following the long-read processing pipeline, to generate the
ORF_prob.best.tsv
.
cpat.py -x Mouse_Hexamer.tsv -d Mouse_logitModel.RData -g <path/to/longRead.fasta> --min-orf=50 --top-orf=50 -o <path/to/output/directory>
- Generate a bed file from the long-read-derived GTF.
gtfToGenePred <path/to/longRead.gtf> longRead.genePred
genePredToBed longRead.genePred longRead.bed12
sort -k1,1 -k2,2n longRead.bed12 > longRead_sorted.bed12
Before running FICLE, you will need to:
- Activate the ficle conda environment:
-bash-4.2$ source activate ficle
(ficle)-bash-4.2$
- Add scripts to path:
-bash-4.2$ FICLE_ROOT=<path/to/cloned/github/FICLE/>
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}
-bash-4.2$ export PATH=$PATH:${FICLE_ROOT}/reference
FICLE accepts the following arguments:
usage: ficle.py [-h] [-n GENENAME] [-r REFERENCE] [-b INPUT_BED]
[-g INPUT_GTF] [-c INPUT_CLASS] [--cpat CPAT] [-o OUTPUT_DIR]
[-v]
Full Isoform Characterisation from (Targeted) Long-read Experiments
optional arguments:
-h, --help show this help message and exit
-n GENENAME, --genename GENENAME
Target gene symbol
-r REFERENCE, --reference REFERENCE
Gene reference annotation (<gene>_gencode.gtf)
-b INPUT_BED, --input_bed INPUT_BED
Input bed file of all the final transcripts in long-
read derived transcriptome.
-g INPUT_GTF, --input_gtf INPUT_GTF
Input gtf file of all the final transcripts in long-
read derived transcriptome.
-c INPUT_CLASS, --input_class INPUT_CLASS
SQANTI classification file
--cpat CPAT \ORF_prob.best.tsv file generated from CPAT
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Output path for the annotation and associated files
-v, --version Display program version number.
-
--genename
: the target gene symbol of interest (i.e. App/APP), the syntax of which should match theassociated_gene
column in the output SQANTI classification file -
--reference
: target gene reference gtf (see Pre-requisite 1) -
--input_bed
: long-read transcriptome sorted bed file (see Pre-requisite 4) -
--input_gtf
: long-read gtf (from SQANTI3 filtering) -
--input_class
: SQANTI filtering classification file (see Pre-requisite 2) -
--output_directory
: path to output directory
-
--cpat
: CPAT output file (see Pre-requisite 3)
To characterise Trem2 using FICLE:
ficle.py --gene=Trem2 \
--reference=<path/to/gencode_reference.gtf> \
--input_bed=<path/to/longRead_sorted.bed12> \
--input_gtf=<path/to/longRead.gtf> \
---input_class=<path/to/SQANTI_classificiation.txt> \
--cpat=<path/to/cpat_ORF_prob.best.tsv> \
--output_dir=<path/to/output/directory>