Skip to content

FICLE output

SziKayLeung edited this page Oct 18, 2023 · 3 revisions

As a result of running FICLE, the tool will create a series of output files in the specified directory (--output_dir or -o flag in the FICLE script) (Figure 1).

FICLE_output

Figure 1 : Shown is a tree view of the output files generated from FICLE to characterise one gene of interest.

Table of Contents


Stats

General

  • _final_transcript_classifications: file documenting the number of AS events across each associated transcript.
    • the isoform refers to the isoform ID from the gtf file, and is dataset dependent.
    • IR_Exon1Only,IR_LastExonOnly: IR events noted in the first and last exon, respectively
    • NE_All: total number of novel exons from NE classifications (NE_1st, NE_Int, NE_Last, NE_FirstLast)
(ficle) [sl693@mrc-comp053 Stats]$ head *_final_transcript_classifications.csv
isoform,Matching,SomeMatch,A5A3,AF,AP,AT,ES,IR,IR_Exon1Only,IR_LastExonOnly,NE_1st,NE_Int,NE_Last,NE_FirstLast,NE_All
PB.20818.16,0,1,0,0,1,0,1,0,0,0,2,0,0,0,2
PB.20818.31,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
PB.20818.32,0,1,0,0,1,0,1,0,0,0,1,0,0,0,1
PB.20818.53,0,0,2,0,0,0,3,0,0,0,0,0,0,0,0
PB.20818.57,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.58,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.62,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
PB.20818.67,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1
PB.20818.70,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0
  • _exon_tab: file documenting the presence of each exon for each associated transcript, and is useful for downstream analysis to detect which transcripts contain which parts of the gene.
    • the isoform refers to the isoform ID from the gtf file, and is dataset dependent
    • column names (i.e. Gencode_1, Gencode_2...) refer to the newly annotated Gencode known exons after flattening. Refer to _flattened_gencode for the coordinates of these newly-annotated exons and the original exon annotation.
    • Number codes:
      • 0 = exon is absent
      • 1 = exon is present
      • 2 = exon is present as IR event
      • 1001 = exon is not registered. This occurs if the exon is downstream the last known exon.
(ficle) [sl693@mrc-comp053 Stats]$ head *_exon_tab.csv
isoform,Gencode_1,Gencode_2,Gencode_3,Gencode_4,Gencode_5,Gencode_6
PB.20818.16,1,1,0,1,1,1
PB.20818.31,0,0,0,1,1,1
PB.20818.32,1,1,0,1,1,1
PB.20818.53,1,1,0,0,0,1
PB.20818.57,1,1,0,1,1,1
PB.20818.58,1,1,0,1,1,1
PB.20818.62,1,1,0,1,1,1
PB.20818.67,0,0,0,0,0,1
PB.20818.70,1,1,0,0,1,1
  • _flattened_gencode: file documenting the start and end coordinates of the newly annotated known exon (referred to output as gencodeExon) after flattening gencode exons across multiple transcripts
    • transcript_id = known transcript ID associated with gene
    • exon_number = original exon number of the transcript_id
    • updated_exon_number = newly annotated exon number after "flattening"
    • altPromoters = <True/False> for alternative promoter (first exon)
    • altTerminators = <True/False> for alternative terminator (last exon)
(ficle) [sl693@mrc-comp053 Stats]$ head Trem2_flattened_gencode.csv
seqname,start,end,exon_number,transcript_id,start_end,updated_exon_number,altPromoters,altTerminators
chr17,48346401,48346595,1,ENSMUST00000024791.14,4834640148346595,1,False,False
chr17,48346440,48348807,1,ENSMUST00000132340.1,4834644048348807,2,False,False
chr17,48346488,48346595,1,ENSMUST00000113237.3,4834648848346595,1,False,False
chr17,48346489,48346595,1,ENSMUST00000148545.1,4834648948346595,1,False,False
chr17,48348457,48348807,2,ENSMUST00000024791.14,4834845748348807,2,False,False
chr17,48350491,48350558,3,ENSMUST00000148545.1,4835049148350558,3,False,True
chr17,48351101,48351191,3,ENSMUST00000024791.14,4835110148351191,4,False,False
chr17,48351691,48351930,4,ENSMUST00000113237.3,4835169148351930,5,False,False
chr17,48351746,48351930,4,ENSMUST00000024791.14,4835174648351930,5,False,False

  • _parsed_transcripts: file documenting the output from parsing each exon of each associated transcript against the flattened reference annotations. Each row is structured either as:
    • <transcript_id>;<transcript_exon>;<classification>_<genocode_exon><difference_bp>;0.
      PB.20818.16;Transcript_Exon3;Match_Gencode_1;7;0 = exon 3 of PB.20818.16 matches exactly with 1st known exon with 0 difference bp
      PB.20818.31;Transcript_Exon1;ExtendedA5_Gencode_4;5015;0 = exon 1 of PB.20818.31 has a matching 3' splice site with 4th known exon, but an extension of 5015bp from the 5'splice site
(ficle) [sl693@mrc-comp053 Stats]$ head *parsed_transcripts.txt
PB.20818.16;Transcript_Exon3;Match_Gencode_1;7;0
PB.20818.16;Transcript_Exon4;Match_Gencode_2;0;0
PB.20818.16;Transcript_Exon5;Match_Gencode_4;0;0
PB.20818.16;Transcript_Exon6;Match_Gencode_5;0;0
PB.20818.16;Transcript_Exon7;Match_Gencode_6;5;0
PB.20818.31;Transcript_Exon1;ExtendedA5_Gencode_4;5015;0
PB.20818.31;Transcript_Exon2;Match_Gencode_5;0;0
PB.20818.31;Transcript_Exon3;Match_Gencode_6;6;0
PB.20818.32;Transcript_Exon2;Match_Gencode_1;7;0
PB.20818.32;Transcript_Exon3;Match_Gencode_2;0;0

A5A3

  • A5A3_events_counts: total number of A5A3 classified events across associated transcripts
  • A5A3_transcript_counts: total number of A5A3 events per associated transcript (all A5A3 classifications)
  • A5A3_transcript_level: detailed information documenting the type of A5A3 event and the associated flattened gencode-exon for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *A5A3*
==> Trem2_A5A3_events_counts.csv <==
category,numTranscripts
ExtendedA3,11
ExtendedA5,73
TruncatedA3,31
TruncatedA5,32
TruncatedBothA3A5,5

==> Trem2_A5A3_transcript_counts.csv <==
transcriptID,numEvents
PB.20818.1036,2
PB.20818.1074,1
PB.20818.1077,1
PB.20818.109,2
PB.20818.1096,2
PB.20818.112,3
PB.20818.1204,2
PB.20818.129,2
PB.20818.1291,2

==> Trem2_A5A3_transcript_level.csv <==
transcriptID,category,gencodeExon
PB.20818.31,ExtendedA5,Gencode_4
PB.20818.53,TruncatedA3,Gencode_2
PB.20818.53,ExtendedA5,Gencode_1
PB.20818.57,ExtendedA5,Gencode_1
PB.20818.58,ExtendedA5,Gencode_1
PB.20818.62,ExtendedA5,Gencode_1
PB.20818.67,TruncatedA3,Gencode_6
PB.20818.70,TruncatedA3,Gencode_2
PB.20818.70,ExtendedA5,Gencode_1

ES - Exon skipping

  • ES_events_counts: total number of ES events across associated transcripts
  • ES_exon_counts: total number of transcripts with that exon skipped
  • ES_transcript_level: detailed information documenting the associated flattened gencode-exon skipped for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *ES*
==> Trem2_ES_events_counts.csv <==
transcriptID,numEvents
PB.20818.16,1
PB.20818.32,1
PB.20818.53,3
PB.20818.57,1
PB.20818.58,1
PB.20818.62,1
PB.20818.70,2
PB.20818.80,1
PB.20818.88,4

==> Trem2_ES_exon_counts.csv <==
gencodeExon,numTranscripts
Gencode_1,0
Gencode_2,6
Gencode_3,88
Gencode_4,18
Gencode_5,12
Gencode_6,5

==> Trem2_ES_transcript_level.csv <==
transcriptID,ES
PB.20818.16,Gencode_3
PB.20818.32,Gencode_3
PB.20818.53,Gencode_3
PB.20818.53,Gencode_4
PB.20818.53,Gencode_5
PB.20818.57,Gencode_3
PB.20818.58,Gencode_3
PB.20818.62,Gencode_3
PB.20818.70,Gencode_3

IR - Intron retention

  • IR_events_counts: total number of IR classified events across associated transcripts
  • IR_numExonOverlap: number of exons overlapped by IR events per transcript (note there may be more than one IR event)
  • IR_transcript_level: detailed information documenting the associated flattened gencode-exon with IR events for each transcript
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *IR*
==> Trem2_IR_events_counts.csv <==
transcriptID,numEvents
PB.20818.141,1
PB.20818.645,1
PB.20818.1077,1
PB.20818.2153,1
PB.20818.5309,1

==> Trem2_IR_numExonOverlap.csv <==
transcriptID,numExonsOverlaps
PB.20818.141,2
PB.20818.645,2
PB.20818.1077,2
PB.20818.2153,2
PB.20818.5309,2

==> Trem2_IR_transcript_level.csv <==
transcriptID,category,gencodeExon
PB.20818.141,IRMatch,Gencode_5
PB.20818.141,IRMatch,Gencode_6
PB.20818.645,IRMatch,Gencode_5
PB.20818.645,IRMatch,Gencode_6
PB.20818.1077,IRMatch,Gencode_5
PB.20818.1077,IRMatch,Gencode_6
PB.20818.2153,IRMatch,Gencode_5
PB.20818.2153,IRMatch,Gencode_6
PB.20818.5309,IRMatch,Gencode_5

Novel exons

FICLE can identify the presence of novel exons in your dataset that are not matching with the reference genome. The following output files are generated:

  • NE_coordinates: genome coordinates of the unique novel exons
  • NE_transcript_counts: number of novel exons per associated transcript
  • NE_type_counts: total number of transcripts classified by novel exon type
    • internal: novel exon within the known gene body (longest representative known transcript)
    • beyond_first: novel exon upstream of the first known exon
    • beyond_first: novel exon downstream of the last known exon
Example of output
(ficle) [sl693@mrc-comp053 Stats]$ head *NE*
==> Trem2_NE_coordinates.csv <==
chr17 48342254 48342350
chr17 48345568 48345667
chr17 48346099 48346222
chr17 48346324 48346581
chr17 48347100 48347146
chr17 48347100 48347209
chr17 48352176 48352264
chr17 48346412 48346631
chr17 48347100 48347209
chr17 48347100 48347146

==> Trem2_NE_transcript_counts.csv <==
transcriptID,numNovelExons
PB.20818.1074,2
PB.20818.1096,1
PB.20818.16,2
PB.20818.192,1
PB.20818.319,1
PB.20818.32,1
PB.20818.362,1
PB.20818.446,1
PB.20818.4633,1

==> Trem2_NE_type_counts.csv <==
typeNovelExon,numTranscripts
Beyond_First,3
First,1
Internal_NovelExon,14

Tracks

This folder contains bed12 files subdivided by AS classifications.

(ficle) [sl693@mrc-comp053 Tracks]$ ls
Trem2_A5A3_sorted_coloured.bed12    Trem2_ES_NeInt_Both_sorted_coloured.bed12  Trem2_Locator_Bedfiles.txt             Trem2_NE_First_sorted_coloured.bed12
Trem2_ESOnly_sorted_coloured.bed12  Trem2_IR_ES_Both_sorted_coloured.bed12     Trem2_NEIntOnly_sorted_coloured.bed12  Trem2_SomeMatch_sorted_coloured.bed12

Multiple transcripts can be characterized with multiple event types. Subsetting and visaulisation is therefore prioritised by the following event types:

  1. Matching
  2. AF
  3. novel exon first
  4. novel exon last
  5. novel exon first and last
  6. IR and ES and Novel exon internal
  7. IR and ES
  8. ES and Novel exon internal
  9. Novel exon internal
  10. IR
  11. ES
  12. A5, A3