GitHub - at-cg/PHI: Pangenome graph-based genome inference

PHI (Pangenome-based Haplotype Inference)

Getting Started

Prerequisites

Before using PHI, please ensure that Miniforge is installed: Miniforge Installation Guide. This package installer is used for installing a few dependencies such as VG and samtools. To run PHI, you also need a Gurobi license. You can get a free academic license here. You should download and save gurobi.lic file in your home directory.

Get PHI

git clone https://github.com/at-cg/PHI
cd PHI
# Install dependencies (Miniforge is required)
./Installdeps
export PATH="$(pwd)/extra/bin:$PATH"
export LD_LIBRARY_PATH="$(pwd)/extra/lib:$LD_LIBRARY_PATH"
make

# test run 
./PHI -t32 -g test/MHC_4.gfa.gz -r test/CHM13_reads.fq.gz -o CHM13.fa

# test run with VCF file as input
./vcf2gfa.py -v test/MHC_4.vcf.gz -r test/MHC-CHM13.0.fa.gz | bgzip > test/MHC_4_vcf.gfa.gz
./PHI -t32 -g test/MHC_4_vcf.gfa.gz -r test/CHM13_reads.fq.gz -o CHM13.fa

Adding Binary and Library Paths to `.bashrc`

To ensure that the extra/bin and extra/lib directories are automatically loaded for every terminal session, you can export them to your ~/.bashrc. This will make sure the required binaries and libraries for PHI are available.

# Add extra/bin and extra/lib to .bashrc
echo 'export PATH="$(pwd)/extra/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$(pwd)/extra/lib:$LD_LIBRARY_PATH"' >> ~/.bashrc
source ~/.bashrc

Introduction

PHI is a pangenome-based genotyping method. It estimates complete haplotype sequence from low-coverage sequencing data (short-reads or long-reads of a haploid genome). Users should provide a pangenome graph reference in either:

Graph Format (GFA v1.1): A sequence graph-based representation of the pangenome graph. Graph should be acyclic.
Variant Call Format (VCF): A list of multi-sample, multi-allelic phased variants along with a reference genome.

Output of PHI is the haplotype sequence (FASTA) associated with the optimal inferred path from the graph. It identifies a path in the pangenome graph that maximizes the matches between the path and read k-mers while minimizing recombination events (haplotype switches) along the path. We implemented integer programming to compute an optimal solution. The integer program is solved optimally using the Gurobi optimizer. Details of these formulations are described in our paper.

Results

We benchmarked PHI (v1.0) using short-read datasets sampled from MHC sequences of five haplotypes (APD, DBB, MANN, QBL, and SSTO). This data was generated by Houwaart et al. (2022). These datasets were downsampled to various coverages ranging from 0.1x to 10x. We built a pangenome graph using Minigraph-Cactus, comprising 49 complete MHC sequences. To assess the accuracy of PHI, we evaluated the edit distance between the inferred haplotype sequences and the MHC sequences from Houwaart et al. that were determined using de novo assembly and curation.

Edit distance between ground-truth haplotype sequences and the sequences estimated by different tools (PHI, VG, and PanGenie). Lower edit distance implies higher accuracy. PHI provides advangate over existing methods on low-coverage inputs.

In PHI, we have implemented two integer programs (referred to as ILP and IQP respectively). They both solve the same problem, but differ in terms of their runtime and memory-usage. IQP is generally faster but it requires more memory. Users can select between the two using command line argument (see ./PHI -h).

Performance comparison between ILP and IQP.

The scripts to reproduce the results are available here.

Future Work

Add support for diploid genome estimation.
Scale to pangenome graphs having larger number of genomes.

Publications

Ghanshyam Chandra, Md Helal Hossen, Stephan Scholz, Alexander T Dilthey, Daniel Gibney and Chirag Jain. "Integer programming framework for pangenome-based genome inference". RECOMB 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
test		test
Installdeps		Installdeps
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
vcf2gfa.py		vcf2gfa.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PHI (Pangenome-based Haplotype Inference)

Getting Started

Prerequisites

Get PHI

Adding Binary and Library Paths to `.bashrc`

Table of Contents

Introduction

Results

Future Work

Publications

About

Releases 1

Packages

Contributors 2

Languages

License

at-cg/PHI

Folders and files

Latest commit

History

Repository files navigation

PHI (Pangenome-based Haplotype Inference)

Getting Started

Prerequisites

Get PHI

Adding Binary and Library Paths to .bashrc

Table of Contents

Introduction

Results

Future Work

Publications

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Adding Binary and Library Paths to `.bashrc`

Packages