Algorithm

The algorithm to construct the SuperTranscript came be conceptualized with the figure below:

Breaking this down into steps: 1) Input a list of contigs/trancsripts and their sequence in a fasta file and a text file with the clustering information for which gene/cluster each transcript belongs to.

For each gene (or a defined cluster):
2) Using BLAT (https://genome.ucsc.edu/FAQ/FAQblat.html) pairwise align each transcript in the cluster to find the regions which overlap.
3) Construct a directed graph, where each node is a base in one of the transcripts and the directed edge retains the ordering of the bases in each transcript. Using the pairwise alignments of all clusters merge shared bases (nodes) together.
4) Simplify graph and remove all cycles (see below) in order to create a Directed Acyclic Graph (DAG), necessary for sorting. 5) Topologically sort the nodes ( each node know is a string of bases from the original unsimplified graph) using Khan's algorithm, which will give a non-unique sorting of the bases.
6) Extract the annotations (both SuperBlock style and transcript style)

Example of a cycle break:

Overview

Home

Algorithm

Annotation and Visualisation

Installation

Usage Documentation

Getting Help

Tutorials

Example: Differential Transcript Usage on a non model organism

Example: Long Read superTranscriptome Construction and Visualisation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algorithm

Overview

Tutorials

Clone this wiki locally