panherit --vcf input.vcf --ref reference.fasta --out output_dir
--vcf
: Input VCF file containing structural variants--ref
: Reference genome FASTA file--out
: Output directory--threads
: Number of parallel processing threads (default: 1)--window-size
: k-mer window size (default: 4)
output_dir/
├── temp_alignments/ # MUSCLE alignment temporary files
├── variants.bed # PLINK binary file
├── variants.bim # PLINK variant information file
├── variants.fam # PLINK sample information file
└── pipeline.log # Runtime log
Run with sample data:
panherit \
--vcf examples/sample.vcf.gz \
--ref examples/reference.fa \
--out results \
--threads 4
-
Input File Requirements:
- VCF file must contain structural variant information
- VCF file should be bgzipped and indexed
- Reference genome must match VCF coordinate system
-
Memory Usage:
- Memory consumption scales with input data size and window size
- For large datasets, consider increasing available memory
-
Temporary Files:
- Temporary files are generated in the output directory during processing
- Automatic cleanup after successful completion
Error message:
Command 'muscle5' not found
Solution:
- Verify MUSCLE installation
- Ensure muscle5 is in system PATH
Error message:
MemoryError: Unable to allocate array
Solution:
- Reduce number of parallel threads
- Increase system available memory
- Consider batch processing for large datasets
Error message:
InputError: Failed to open VCF file
Solution:
- Verify VCF file format
- Check file compression and indexing
- Validate VCF file integrity with bcftools
from pangenome_heritability.config import Config
from pangenome_heritability.variant_processing import process_variants
from pangenome_heritability.alignment import run_alignments
from pangenome_heritability.kmer import process_windows
from pangenome_heritability.genotype import convert_to_plink
# Configure parameters
config = Config(
vcf_file="input.vcf",
ref_fasta="reference.fa",
output_dir="results",
threads=4,
window_size=4
)
# Run pipeline
variants = process_variants(config)
alignments = run_alignments(config, variants)
windows = process_windows(config, alignments)
bfiles = convert_to_plink(config, windows)
-
Parallel Processing:
- Increase thread count (--threads) to improve processing speed
- Recommended not to exceed CPU core count
-
Memory Management:
- Window size affects memory usage, adjust as needed
- Consider batch processing for large datasets
-
Disk Space:
- Ensure sufficient space in output directory
- Temporary files may require significant space
If you use this tool in your research, please cite:
[Citation information to be added]
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please read our contributing guidelines before submitting pull requests.
For bug reports and feature requests, please use the GitHub issue tracker.
For questions and discussions:
- Open an issue on GitHub
- Contact: [contact information]