Python program to assemble a fragmented sequence using the deBruijn graph approach.
from deBruign_graph_assembler import *
db=deBruijnGraph(list) #e.g. list=['atgc', 'atgc', 'tcga', 'tgca', 'atcg']
# OR
db=deBruijnGraph()
db.load_seq(sequence, k) #k is an integer for length of k-mers
db.assemble()
This class provides the functionality to take a sequence and break it down into k-mers, it can also provide unique k-mers.
dna=DNA(sequence)
all_kmers=dna.all_kmer(k)
#k is an integer for the length of k-mers
unique_kmers=dna.unique_kmers() #all_kmers() must be run in prior
This class provides the functionality to make a directed graph by calculating edge weights from the overlaps of k-mers with each other. It can recursively and greedily merge k-mers with maximum overlap and reduce the list until either the SCS is found or there are no further overlaps.
scs=ShortestCommonSuperstring(list) #e.g. list=['ATGC', 'TGCC', 'GCCA']
scs is a list object that contains the shortest common superstring. It could have multiple strings if program is not able to resolve.
obj1=ShortestCommonSuperstring()
You will get a message : Warning! No kmers provided. You can load sequences using load_seq() function.
obj1.load_seq(sequence, k) #k is an integer for length of k-mers
Finding SCS :
scs=obj1.scs()
This class provides the functionality to make a dBruijn graph by calculating edge weights from the overlaps of k-1 mers with each others. It can recursively traverse across the Eulerian walk with maximum overlap and reduce the list until either the assembly is found or there are no further overlaps.
db=deBruijnGraph(list) #e.g. list=['ATGC', 'TGCC', 'GCCA']
db.kmers is a list object that contains the final assembly. It could have multiple strings if program is not able to resolve.
obj1=deBruijnGraph()
You will get a message : Warning! No kmers provided. You can load sequences using load_seq() function.
obj1.load_seq(sequence, k) #k is an integer for length of k-mers
Finding assembly :
assembly=db.assemble()