Pangenome Heritability Tool

A Python tool for processing pangenome structural variants and generating PLINK format files. This tool helps analyze structural variants in pangenomes by performing sequence alignment, k-mer window analysis, and converting results to VCF format.

Features

Process VCF files containing structural variants
Align variant sequences using MUSCLE
K-mer based variant quantification
Generate VCF format files (ped/map/bfile)
Parallel processing support
Progress tracking and logging

Quick Start

Installation with Conda (Recommended)

# Create environment
conda create -n panherit python=3.8
conda activate panherit

# Install panherit
git clone https://github.com/PeixiongYuan/pangenome_heritability.git
cd pangenome_heritability
pip install .

# Install MUSCLE
mkdir -p ~/local/bin
cd ~/local/bin
wget https://github.com/rcedgar/muscle/releases/download/v5.3/muscle-linux-x86.v5.3


chmod +x muscle-linux-x86.v5.3
mv muscle-linux-x86.v5.3 muscle

echo 'export PATH="$HOME/local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

# Install MAFFT
conda install conda-forge::mafft

Usage Guide

The Pangenome Heritability Tool provides several commands to process variants, perform alignments, and generate PLINK files. Each step can be run independently or as part of a pipeline.

Command Overview

The tool provides four main commands:

process-vcf: Process VCF file and group overlapping variants
run-alignments: Perform sequence alignments using MUSCLE
process-kmers: Generate and analyze k-mer windows
convert-to-vcf: Convert results to VCF format
run-all: Run the entire workflow in one command

Note

Important: The VCF and reference FASTA files must use numeric chromosome identifiers (e.g., 1, 2, 3 for chromosomes) without additional prefixes or suffixes. Ensure your files adhere to this convention to avoid processing errors.

Example of a VCF File Header:

##source=YourTool
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       12345   rs123   A       T       50      PASS    .
2       67890   rs456   G       C       99      PASS    .

Example of a FASTA File:

>1
AGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG
>2
TGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATG

Quickly Usage

panherit run-all \
    --vcf input.vcf \
    --ref reference.fasta \
    --out output_directory \
    --window-size 4 \
    --threads 4

Options:

--vcf: Input VCF file containing structural variants
--ref: Reference genome FASTA file
--out: Output directory for processed variants and FASTA files
--window-size: Size of k-mer windows (default: 4)
--threads: Number of parallel threads (default: 1)

Detailed Usage

Step 1: Process VCF File

# Process VCF and generate FASTA sequences
panherit process-vcf \
    --vcf input.vcf \
    --ref reference.fasta \
    --out output_directory

Options:

--vcf: Input VCF file containing structural variants
--ref: Reference genome FASTA file
--out: Output directory for processed variants and FASTA files

Step 2: Run Alignments

# Perform sequence alignments
panherit run-alignments \
    --grouped-variants output_directory/variants.fasta \
    --ref reference.fasta \
    --out alignments_directory \
    --threads 4

Options:

--grouped-variants: FASTA file from previous step
--ref: Reference genome FASTA file
--out: Output directory for alignments
--threads: Number of parallel threads (default: 1)

Step 3: Process K-mer Windows

# Process k-mer windows
panherit process-kmers \
    --alignments temp_alignments \
    --grouped-variants output_directory/variants.fasta \
    --window-size 4 \
    --out kmers_directory

Options:

--alignments: Directory containing alignment results
--grouped-variants: FASTA file from previous step
--window-size: Size of k-mer windows (default: 4)
--out: Output directory for k-mer results

Step 4: Convert to VCF

# Generate VCF files
panherit convert-to-vcf \
    --csv-file kmers_directory/output_final_results.csv \
    --vcf-file kmers_directory/test.vcf.gz \
    --grouped-variants output_directory/variants.fasta \ 
    --out vcf_files

Options:

--csv-file: CSV file from previous step
--vcf-file: VCF file from previous step
--grouped-variants: FASTA file from previous step
--out: Output directory for VCF files

Output Files

Each step produces specific output files:

Process VCF

output_directory/
├── variants.fasta      # Grouped variant sequences
└── variants.log       # Processing log file

Alignments

/path/to/output/
├── temp_alignments/
│   ├── Group_2_59_input.fasta
│   ├── Group_2_59_aligned.fasta
├── error_logs/
│   ├── Group_2_59_input_error.log

K-mer Windows

kmers_directory/
├── windows.csv       # K-mer window analysis
└── comparison.log    # Processing log file

VCF Files

vcfiles/
├── pangenome.rsv.vcf
└──  pangenome.sv.vcf.gz

Notes

Each command will create its output directory if it doesn't exist
Log files are generated for each step
Use --help with any command for detailed options
For large datasets, adjust thread count based on available resources

Error Handling

If any step fails, the tool will:

Display an error message
Log the error details
Exit with a non-zero status code

Example error checking:

panherit process-vcf --vcf input.vcf --ref ref.fa --out output || {
    echo "VCF processing failed"
    exit 1
}

Documentation

Requirements

Python 3.8+
MUSCLE 5
MAFFT V7.526
External dependencies:
- pandas
- numpy
- biopython
- click
- tqdm

Performance Tips

Adjust thread count based on available CPU cores
Ensure sufficient disk space for temporary files

Troubleshooting

Common issues and solutions are documented in our troubleshooting guide.

Contributing

Contributions are welcome! Please read our contributing guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this tool in your research, please cite:

[Citation information to be added]

Contact

For questions and support:

Open an issue on GitHub
Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs		docs
pangenome_heritability		pangenome_heritability
test		test
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pangenome Heritability Tool

Features

Quick Start

Installation with Conda (Recommended)

Usage Guide

Command Overview

Note

Quickly Usage

Detailed Usage

Step 1: Process VCF File

Step 2: Run Alignments

Step 3: Process K-mer Windows

Step 4: Convert to VCF

Output Files

Process VCF

Alignments

K-mer Windows

VCF Files

Notes

Error Handling

Documentation

Requirements

Performance Tips

Troubleshooting

Contributing

License

Citation

Contact

About

Releases

Packages

Languages

PeixiongYuan/pangenome_heritability

Folders and files

Latest commit

History

Repository files navigation

Pangenome Heritability Tool

Features

Quick Start

Installation with Conda (Recommended)

Usage Guide

Command Overview

Note

Quickly Usage

Detailed Usage

Step 1: Process VCF File

Step 2: Run Alignments

Step 3: Process K-mer Windows

Step 4: Convert to VCF

Output Files

Process VCF

Alignments

K-mer Windows

VCF Files

Notes

Error Handling

Documentation

Requirements

Performance Tips

Troubleshooting

Contributing

License

Citation

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages