a bioinformatic toolkit to align large sets of closely related genomes into a graph data structure
Warning
Pangraph is currently undergoing a major migration between v0 and v1. In this short transition period links and documentation may be inconsistent.
pangraph provides a command line interface to find homology amongst large collections of closely related genomes. The core of the algorithm partitions each genome into blocks that represent a sequence interval related by vertical descent. Each genome is then an ordered walk along blocks. The collection of all genomes form a graph that captures all observed structural diversity. pangraph is a standalone tool useful to parsimoniously infer horizontal gene transfer events within a community; perform comparative studies of genome gain, loss, and rearrangement dynamics; or simply to compress many related genomes.
The original implementation of pangraph (version v0) was implemented in Julia and was described in the publication Noll, Molari, Shaw and Neher, 2023. The current version (v1) is a reimplementation of the original algorithm in Rust by Ivan Aksamentov and Marco Molari. The new implementation should be much easier to install and is faster in many use cases.
Pangraph is available:
- as a standalone binary
- as a docker container
For more extended instructions on installation please refer to the documentation.
This is the recommended way to install Pangraph. You can download the latest release for your operating system from here.
PanGraph is available as a Docker container:
docker pull neherlab/pangraph:latest
See the documentation for extended instructions on its usage.
Please refer to the tutorials within the documentation for an in-depth usage guide. For a quick reference, see below.
Align a multi-fasta sequences.fa
in a graph:
pangraph build sequences.fa -o graph.json
Extract the core-genome alignment from the graph, with blocks appearing in the order of the reference genome NC_010468
:
pangraph export core-genome graph.json \
--guide-strain NC_010468 \
-o core_genome_aln.fa
Export the graph in gfa format for visualization:
pangraph export gfa graph.json -o graph.gfa
Reconstruct input sequences from the graph:
pangraph reconstruct graph.json -o sequences.fa
PyPangraph is a python package with convenient utilities to load and explore the graph data structure, see the documentation for installation instructions and more examples.
import pypangraph as pp
graph = pp.Pangraph.load_graph("graph.json")
print(graph)
# pangraph object with 15 paths, 137 blocks and 1042 nodes
If you use PanGraph in scientific publications, please cite the original paper presenting the algorithm:
PanGraph: scalable bacterial pan-genome graph construction Nicholas Noll, Marco Molari, Liam P. Shaw, Richard A. Neher Microbial Genomics, 9(6), 2023; doi: 10.1099/mgen.0.001034