-
Notifications
You must be signed in to change notification settings - Fork 0
FICLE output
SziKayLeung edited this page Oct 18, 2023
·
3 revisions
As a result of running FICLE, the tool will create a series of output files in the specified directory (--output_dir or -o flag in the FICLE script) (Figure 1).
Figure 1 : Shown is a tree view of the output files generated from FICLE to characterise one gene of interest.-
_final_transcript_classifications
: file documenting the number of AS events across each associated transcript.- the isoform refers to the isoform ID from the gtf file, and is dataset dependent.
-
IR_Exon1Only
,IR_LastExonOnly
: IR events noted in the first and last exon, respectively -
NE_All
: total number of novel exons from NE classifications (NE_1st, NE_Int, NE_Last, NE_FirstLast)
(ficle) [sl693@mrc-comp053 Stats]$ head *_final_transcript_classifications.csv
isoform,Matching,SomeMatch,A5A3,AF,AP,AT,ES,IR,IR_Exon1Only,IR_LastExonOnly,NE_1st,NE_Int,NE_Last,NE_FirstLast,NE_All
PB.20818.16,0,1,0,0,1,0,1,0,0,0,2,0,0,0,2
PB.20818.31,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
PB.20818.32,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1
PB.20818.53,0,0,2,0,0,0,3,0,0,0,0,0,0,0,0
PB.20818.57,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.58,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.62,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.67,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1
PB.20818.70,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0
-
_exon_tab
: file documenting the presence of each exon for each associated transcript, and is useful for downstream analysis to detect which transcripts contain which parts of the gene.- the isoform refers to the isoform ID from the gtf file, and is dataset dependent
- column names (i.e. Gencode_1, Gencode_2...) refer to the newly annotated Gencode known exons after flattening. Refer to
_flattened_gencode
for the coordinates of these newly-annotated exons and the original exon annotation. - Number codes:
-
0
= exon is absent -
1
= exon is present -
2
= exon is present as IR event -
1001
= exon is not registered. This occurs if the exon is downstream the last known exon.
-
(ficle) [sl693@mrc-comp053 Stats]$ head *_exon_tab.csv
isoform,Gencode_1,Gencode_2,Gencode_3,Gencode_4,Gencode_5,Gencode_6
PB.20818.16,1,1,0,1,1,1
PB.20818.31,0,0,0,1,1,1
PB.20818.32,1,1,0,1,1,1
PB.20818.53,1,1,0,0,0,1
PB.20818.57,1,1,0,1,1,1
PB.20818.58,1,1,0,1,1,1
PB.20818.62,1,1,0,1,1,1
PB.20818.67,0,0,0,0,0,1
PB.20818.70,1,1,0,0,1,1
-
_flattened_gencode
: file documenting the start and end coordinates of the newly annotated known exon (referred to output as gencodeExon) after flattening gencode exons across multiple transcripts- transcript_id = known transcript ID associated with gene
- exon_number = original exon number of the transcript_id
- updated_exon_number = newly annotated exon number after "flattening"
- altPromoters = <True/False> for alternative promoter (first exon)
- altTerminators = <True/False> for alternative terminator (last exon)
(ficle) [sl693@mrc-comp053 Stats]$ head Trem2_flattened_gencode.csv
seqname,start,end,exon_number,transcript_id,start_end,updated_exon_number,altPromoters,altTerminators
chr17,48346401,48346595,1,ENSMUST00000024791.14,4834640148346595,1,False,False
chr17,48346440,48348807,1,ENSMUST00000132340.1,4834644048348807,2,False,False
chr17,48346488,48346595,1,ENSMUST00000113237.3,4834648848346595,1,False,False
chr17,48346489,48346595,1,ENSMUST00000148545.1,4834648948346595,1,False,False
chr17,48348457,48348807,2,ENSMUST00000024791.14,4834845748348807,2,False,False
chr17,48350491,48350558,3,ENSMUST00000148545.1,4835049148350558,3,False,True
chr17,48351101,48351191,3,ENSMUST00000024791.14,4835110148351191,4,False,False
chr17,48351691,48351930,4,ENSMUST00000113237.3,4835169148351930,5,False,False
chr17,48351746,48351930,4,ENSMUST00000024791.14,4835174648351930,5,False,False
-
_parsed_transcripts
: file documenting the output from parsing each exon of each associated transcript against the flattened reference annotations. Each row is structured either as:-
<transcript_id>;<transcript_exon>;<classification>_<genocode_exon><difference_bp>;0
.
PB.20818.16;Transcript_Exon3;Match_Gencode_1;7;0
= exon 3 of PB.20818.16 matches exactly with 1st known exon with 0 difference bp
PB.20818.31;Transcript_Exon1;ExtendedA5_Gencode_4;5015;0
= exon 1 of PB.20818.31 has a matching 3' splice site with 4th known exon, but an extension of 5015bp from the 5'splice site
-
(ficle) [sl693@mrc-comp053 Stats]$ head *parsed_transcripts.txt
PB.20818.16;Transcript_Exon3;Match_Gencode_1;7;0
PB.20818.16;Transcript_Exon4;Match_Gencode_2;0;0
PB.20818.16;Transcript_Exon5;Match_Gencode_4;0;0
PB.20818.16;Transcript_Exon6;Match_Gencode_5;0;0
PB.20818.16;Transcript_Exon7;Match_Gencode_6;5;0
PB.20818.31;Transcript_Exon1;ExtendedA5_Gencode_4;5015;0
PB.20818.31;Transcript_Exon2;Match_Gencode_5;0;0
PB.20818.31;Transcript_Exon3;Match_Gencode_6;6;0
PB.20818.32;Transcript_Exon2;Match_Gencode_1;7;0
PB.20818.32;Transcript_Exon3;Match_Gencode_2;0;0
-
A5A3_events_counts
: total number of A5A3 classified events across associated transcripts -
A5A3_transcript_counts
: total number of A5A3 events per associated transcript (all A5A3 classifications) -
A5A3_transcript_level
: detailed information documenting the type of A5A3 event and the associated flattened gencode-exon for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *A5A3*
==> Trem2_A5A3_events_counts.csv <==
category,numTranscripts
ExtendedA3,11
ExtendedA5,73
TruncatedA3,31
TruncatedA5,32
TruncatedBothA3A5,5
==> Trem2_A5A3_transcript_counts.csv <==
transcriptID,numEvents
PB.20818.1036,2
PB.20818.1074,1
PB.20818.1077,1
PB.20818.109,2
PB.20818.1096,2
PB.20818.112,3
PB.20818.1204,2
PB.20818.129,2
PB.20818.1291,2
==> Trem2_A5A3_transcript_level.csv <==
transcriptID,category,gencodeExon
PB.20818.31,ExtendedA5,Gencode_4
PB.20818.53,TruncatedA3,Gencode_2
PB.20818.53,ExtendedA5,Gencode_1
PB.20818.57,ExtendedA5,Gencode_1
PB.20818.58,ExtendedA5,Gencode_1
PB.20818.62,ExtendedA5,Gencode_1
PB.20818.67,TruncatedA3,Gencode_6
PB.20818.70,TruncatedA3,Gencode_2
PB.20818.70,ExtendedA5,Gencode_1
-
ES_events_counts
: total number of ES events across associated transcripts -
ES_exon_counts
: total number of transcripts with that exon skipped -
ES_transcript_level
: detailed information documenting the associated flattened gencode-exon skipped for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *ES*
==> Trem2_ES_events_counts.csv <==
transcriptID,numEvents
PB.20818.16,1
PB.20818.32,1
PB.20818.53,3
PB.20818.57,1
PB.20818.58,1
PB.20818.62,1
PB.20818.70,2
PB.20818.80,1
PB.20818.88,4
==> Trem2_ES_exon_counts.csv <==
gencodeExon,numTranscripts
Gencode_1,0
Gencode_2,6
Gencode_3,88
Gencode_4,18
Gencode_5,12
Gencode_6,5
==> Trem2_ES_transcript_level.csv <==
transcriptID,ES
PB.20818.16,Gencode_3
PB.20818.32,Gencode_3
PB.20818.53,Gencode_3
PB.20818.53,Gencode_4
PB.20818.53,Gencode_5
PB.20818.57,Gencode_3
PB.20818.58,Gencode_3
PB.20818.62,Gencode_3
PB.20818.70,Gencode_3
-
IR_events_counts
: total number of IR classified events across associated transcripts -
IR_numExonOverlap
: number of exons overlapped by IR events per transcript (note there may be more than one IR event) -
IR_transcript_level
: detailed information documenting the associated flattened gencode-exon with IR events for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *IR*
==> Trem2_IR_events_counts.csv <==
transcriptID,numEvents
PB.20818.141,1
PB.20818.645,1
PB.20818.1077,1
PB.20818.2153,1
PB.20818.5309,1
==> Trem2_IR_numExonOverlap.csv <==
transcriptID,numExonsOverlaps
PB.20818.141,2
PB.20818.645,2
PB.20818.1077,2
PB.20818.2153,2
PB.20818.5309,2
==> Trem2_IR_transcript_level.csv <==
transcriptID,category,gencodeExon
PB.20818.141,IRMatch,Gencode_5
PB.20818.141,IRMatch,Gencode_6
PB.20818.645,IRMatch,Gencode_5
PB.20818.645,IRMatch,Gencode_6
PB.20818.1077,IRMatch,Gencode_5
PB.20818.1077,IRMatch,Gencode_6
PB.20818.2153,IRMatch,Gencode_5
PB.20818.2153,IRMatch,Gencode_6
PB.20818.5309,IRMatch,Gencode_5
FICLE can identify the presence of novel exons in your dataset that are not matching with the reference genome. The following output files are generated:
-
NE_coordinates
: genome coordinates of the unique novel exons -
NE_transcript_counts
: number of novel exons per associated transcript -
NE_type_counts
: total number of transcripts classified by novel exon type-
internal
: novel exon within the known gene body (longest representative known transcript) -
beyond_first
: novel exon upstream of the first known exon -
beyond_first
: novel exon downstream of the last known exon
-
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *NE*
==> Trem2_NE_coordinates.csv <==
chr17 48342254 48342350
chr17 48345568 48345667
chr17 48346099 48346222
chr17 48346324 48346581
chr17 48347100 48347146
chr17 48347100 48347209
chr17 48352176 48352264
chr17 48346412 48346631
chr17 48347100 48347209
chr17 48347100 48347146
==> Trem2_NE_transcript_counts.csv <==
transcriptID,numNovelExons
PB.20818.1074,2
PB.20818.1096,1
PB.20818.16,2
PB.20818.192,1
PB.20818.319,1
PB.20818.32,1
PB.20818.362,1
PB.20818.446,1
PB.20818.4633,1
==> Trem2_NE_type_counts.csv <==
typeNovelExon,numTranscripts
Beyond_First,3
First,1
Internal_NovelExon,14
This folder contains bed12
files subdivided by AS classifications.
(ficle) [sl693@mrc-comp053 Tracks]$ ls
Trem2_A5A3_sorted_coloured.bed12 Trem2_ES_NeInt_Both_sorted_coloured.bed12 Trem2_Locator_Bedfiles.txt Trem2_NE_First_sorted_coloured.bed12
Trem2_ESOnly_sorted_coloured.bed12 Trem2_IR_ES_Both_sorted_coloured.bed12 Trem2_NEIntOnly_sorted_coloured.bed12 Trem2_SomeMatch_sorted_coloured.bed12
Multiple transcripts can be characterized with multiple event types. Subsetting and visaulisation is therefore prioritised by the following event types:
- Matching
- AF
- novel exon first
- novel exon last
- novel exon first and last
- IR and ES and Novel exon internal
- IR and ES
- ES and Novel exon internal
- Novel exon internal
- IR
- ES
- A5, A3