VNtyper 2.0 is an advanced pipeline designed to genotype MUC1 coding Variable Number Tandem Repeats (VNTR) in Autosomal Dominant Tubulointerstitial Kidney Disease (ADTKD-MUC1) using Short-Read Sequencing (SRS) data. This version integrates enhanced variant calling algorithms, robust logging mechanisms, and streamlined installation processes to provide researchers with a powerful tool for VNTR analysis.
- We have developed a web server to provide free access to VNtyper, which runs in the background for ease of use.
Access it through the following link: vntyper-online
- Features
- Installation
- Usage
- Pipeline Overview
- Dependencies
- Linting and Code Formatting
- Pipeline Logic Diagram
- Results
- Notes
- Citations
- Contributing
- License
- Contact
-
Variant Calling Algorithms:
- Kestrel: Mapping-free genotyping using k-mer frequencies.
- code-adVNTR (optional): Profile-HMM-based method for VNTR genotyping.
- SHARK (optional, FASTQ-only): Rapid filtering and read extraction for MUC1 region in exome/whole-genome data.
-
Comprehensive Logging:
- Logs both to the console and a dedicated log file.
- Generates MD5 checksums for all downloaded and processed files.
-
Flexible Installation:
- Supports installation via
pip
usingsetup.py
. - Provides Conda environment setup for easy dependency management.
- Supports installation via
-
Subcommands:
install-references
pipeline
fastq
bam
kestrel
report
cohort
online
VNtyper 2.0 can be installed using either pip
with setup.py
or via Conda environments for streamlined dependency management.
-
Clone the Repository:
mkdir vntyper git clone https://github.com/hassansaei/vntyper.git cd vntyper pip install .
VNtyper 2.0 offers multiple subcommands that can be used depending on your input data and requirements. Below are the main subcommands available:
To run the entire pipeline using a BAM file:
vntyper --config-path /path/to/config.json pipeline \
--bam /path/to/sample.bam \
--output-dir /path/to/output/dir \
--threads 4 --fast-mode
Alternatively, using paired-end FASTQ files:
vntyper --config-path /path/to/config.json pipeline \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--output-dir /path/to/output/dir \
--threads 4 --fast-mode
The adVNTR genotyping is optional and skipped by default. To enable adVNTR genotyping, use the --extra-modules advntr
option.
New: To enable SHARK filtering on FASTQ reads before the usual QC and alignment (for improved MUC1 detection), add shark
to the --extra-modules
flag (e.g., --extra-modules shark
). This can be done as:
vntyper --config-path /path/to/config.json pipeline \
--fastq1 /path/to/sample_R1.fastq.gz \
--fastq2 /path/to/sample_R2.fastq.gz \
--extra-modules shark \
--threads 4 \
--output-dir /path/to/output/dir
- SHARK will run first on the raw FASTQ files to extract and filter reads covering the MUC1 VNTR region.
- Important: SHARK is only supported in FASTQ mode. If you try to use
--extra-modules shark
together with--bam
or--cram
, VNtyper will exit gracefully with a warning.
Docker image for VNtyper 2.0 is provided and can be pulled and used as follows:
# pull the docker image
docker pull saei/vntyper:main
# run the pipeline using the docker image
docker run -w /opt/vntyper --rm \
-v /local/input/folder/:/opt/vntyper/input \
-v /local/output/folder/:/opt/vntyper/output \
saei/vntyper:main \
vntyper pipeline \
--bam /opt/vntyper/input/filename.bam \
-o /opt/vntyper/output/filename/
Important Host Volume Permissions Note:
When mounting host directories into the container (using the-v
flag), please ensure that the host directories (e.g.,/local/input/folder/
and/local/output/folder/
) have the appropriate permissions so that they are writable by the container's non-root user.Why Non-Root?
VNtyper runs as a non-root user for enhanced security and to avoid file ownership issues on your host. Running as root may create files owned by root, leading to permission problems later.There are two ways to ensure proper permissions:
Adjust Host Directory Permissions:
Change the ownership/permissions on the host directories so that the UID and GID match those expected by VNtyper in the container.Use the
--user
Flag:
Run the container with the--user
flag to specify your current user’s UID and GID. For example:docker run --user $(id -u):$(id -g) -w /opt/vntyper --rm \ -v /local/input/folder/:/opt/vntyper/input \ -v /local/output/folder/:/opt/vntyper/output \ saei/vntyper:main \ vntyper pipeline \ --bam /opt/vntyper/input/filename.bam \ -o /opt/vntyper/output/filename/Using either method ensures VNtyper can write its log files (e.g.,
pipeline.log
) and other outputs without encountering permission errors.
An Apptainer image can be generated from the Docker image as follows:
# create the apptainer sif image
apptainer pull docker://saei/vntyper:main
# run the pipeline using the apptainer image
apptainer run --pwd /opt/vntyper \
-B /local/input/folder/:/opt/vntyper/input \
-B /local/output/folder/:/opt/vntyper/output \
vntyper_main.sif vntyper pipeline \
--bam /opt/vntyper/input/filename.bam \
-o /opt/vntyper/output/filename/
vntyper --config-path /path/to/config.json install-references \
--output-dir /path/to/reference/install \
--skip-indexing # Optional: skip BWA indexing if needed
vntyper --config-path /path/to/config.json report \
--output-dir /path/to/output/dir
VNtyper 2.0 integrates multiple steps into a streamlined pipeline. The following is an overview of the steps involved:
- FASTQ Quality Control: Raw FASTQ files are checked for quality.
- (Optional) SHARK Filtering: If
shark
is specified in--extra-modules
, raw FASTQ reads are first filtered to extract MUC1-specific reads (especially relevant for exome or large WGS datasets). - Alignment: Reads are aligned using BWA (if FASTQ files are provided).
- Kestrel Genotyping: Mapping-free genotyping of VNTRs.
- (Optional) adVNTR Genotyping: Profile-HMM-based method for VNTR genotyping (requires additional setup).
- Summary Report Generation: A final HTML report is generated to summarize the results.
VNtyper 2.0 relies on several tools and Python libraries. Ensure that the following dependencies are available in your environment:
- Python >= 3.9
- BWA
- Samtools
- Fastp
- Pandas
- Numpy
- Biopython
- Pysam
- Jinja2
- Matplotlib
- Seaborn
- IGV-Reports
You can easily set up these dependencies via the provided Conda environment file.
VNtyper adheres to PEP8 style guidelines to ensure clean, readable, and maintainable code. We recommend the following tools:
flake8 is used to check for style violations. Note that flake8 only reports issues—it does not automatically fix them.
-
Install flake8:
You can install it as part of the development extras:pip install -e .[dev]
Or install it directly:
pip install flake8
-
Run flake8:
To check your code, run the following command from the project root:flake8 .
This command will recursively scan your project and report any PEP8 issues.
For automatic formatting, we use Black, which is already included in the development extras.
-
Run Black:
Simply execute the following command in the project root:black .
Black will automatically reformat your code according to its opinionated style, which is also compliant with PEP8.
Below is a logical overview of the VNtyper pipeline:
graph TD
A[Input: FASTQ/BAM] -->|Quality Control| B[Alignment BWA]
B -->|Genotyping| C[Kestrel]
C --> D[Optional: adVNTR]
D --> E[Generate Summary Report]
E --> F[Output: VCF, Summary HTML]
Once the pipeline completes, you will have:
- BAM or FASTQ slices containing MUC1-specific reads.
- VCF files or TSV files with genotyping results (for Kestrel and optional adVNTR).
- HTML summary report detailing coverage stats, genotyping calls, and relevant logs.
- This tool is for research use only.
- Ensure high-coverage WES/WGS or targeted data is used to genotype MUC1 VNTR accurately.
- For questions or issues, refer to the GitHub repository for support.
If you use VNtyper 2.0 in your research, please cite the following:
- Saei H, Morinière V, Heidet L, et al. VNtyper enables accurate alignment-free genotyping of MUC1 coding VNTR using short-read sequencing data. iScience. 2023.
- Audano PA, Ravishankar S, et al. Mapping-free variant calling using haplotype reconstruction from k-mer frequencies. Bioinformatics. 2018.
- Park J, Bakhtiari M, et al. Detecting tandem repeat variants in coding regions using code-adVNTR. iScience. 2022.
We welcome contributions to VNtyper. Please refer to the CONTRIBUTING.md file for guidelines.
VNtyper is licensed under the BSD 3-Clause License. See the LICENSE file for more details.
For questions or issues, please open an issue on GitHub or email the corresponding authors listed in the manuscript.