buttermap

Mapping and VCF calling pipeline for butterfly genomes from paired-end reads

Pipeline information

Pipeline steps

Read mapping with minimap2
BAM file generation with samtools
Duplicate marking with sambamba
Splitting VCF into regions
Variant calling per region with freebayes
Merging chromosome VCFs with bcftools

Dependencies

Full list of dependencies are provided in environment.yaml. If you need to install dependencies manually (e.g. via homebrew for testing on Mac), please use:

Software/package	Version
Python	3.12.0
ruffus	2.8.4
joblib	1.3.2
minimap2	2.26
samtools	1.18
sambamba	1.0
freebayes	1.3.7
bcftools	1.18

Additionally, pytest is required to run the unit tests, which is an optional installation step.

Input files

Buttermap requires the following input files, which must follow the naming rules below:

File	File name
Reference FASTA	[REFERENCE_SPECIES].reference.fasta
Reads	[SAMPLE_ID].[N].fastq.gz

User guide

Installation

1. Clone GitHub repo:

git clone https://github.com/simonharnqvist/buttermap.git

2. Install using mamba/conda:

(mamba is recommended, but conda may work)

cd buttermap # important; see below
mamba env create --file environment.yaml
mamba activate buttermap

Important: You must run this command from within the repo directory as it relies on a relative path to the buttermap wheel.

This should install buttermap itself, but if that doesn't work (or if you installed dependencies manually), try pip install . in the top level directory.

Note that if you wish to edit the pipeline, you will need to make an editable install with pip install -e ..

Running `buttermap`

You should now be able to run buttermap from the command line. Before doing so, I would recommend creating a new directory, containing a subdirectory with copies of the input files. Some tasks in buttermap will change files in-place, and although the original input files should be untouched, there are no guarantees.

cp my_original_input buttermap_copies

Parameters are available by running buttermap:

(buttermap) computer:dir buttermap --help

usage: buttermap [-h] [--reference-fasta REFERENCE_FASTA] [--reads-dir READS_DIR] [--temp-dir TEMP_DIR] [--threads THREADS] [--region-size REGION_SIZE]

Mapping and calling pipeline for butterfly genomes

options:
  -h, --help            show this help message and exit
  --reference-fasta REFERENCE_FASTA
                        Reference genome in FASTA format; filename must conform to [SPECIES].[...].fasta (default: None)
  --reads-dir READS_DIR
                        Directory containing FASTQ read files, each matching '[SAMPLE].[N].fastq.gz' format (default: None)
  --temp-dir TEMP_DIR   Directory for temporary files; automatically deleted after successful run (default: buttermap_temp)
  --threads THREADS     Number of CPU threads to use (default: 1)
  --region-size REGION_SIZE
                        Region size for parallelising variant calling (default: 100000)

Output

buttermap will output a series of intermediary files in the same directory as the provided FASTQ files. It is a good idea to copy the FASTQ files to a new directory before running buttermap.

The final output is a gzipped VCF [REFERENCE_SPECIES].vcf.gz, which is written to the current working directory unless piped elsewhere. Please ensure that this file does not already exist in your current working directory otherwise the pipeline will not run.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
buttermap		buttermap
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

buttermap

Pipeline information

Pipeline steps

Dependencies

Input files

User guide

Installation

1. Clone GitHub repo:

2. Install using mamba/conda:

Running `buttermap`

Output

About

Releases

Packages

Languages

License

simonharnqvist/buttermap

Folders and files

Latest commit

History

Repository files navigation

buttermap

Pipeline information

Pipeline steps

Dependencies

Input files

User guide

Installation

1. Clone GitHub repo:

2. Install using mamba/conda:

Running buttermap

Output

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Running `buttermap`

Packages