Structural genome evolution in E. coli ST131

This repository contains a snakemake pipeline for the analysis of structural genomic evolution of E.coli ST131 presented in our paper.

The dataset consists of complete E. coli ST131 genomes available on RefSeq. Accession numbers and metadata for the considered strains can be found in the datasets folder.

In short, the pipeline uses pangraph to build a pangenome graph representation for the chromosomes of all of the considered strains. It then extracts all regions of structural variations, assigns MGEs and defense systems to each of these regions, and detect events that can be parsimoniously interpreted as simple gain or loss of sequence. See this note for an overview of the pipeline.

The pipeline produces as output a results folder, containing processed data such as the pangenome graph and the junction graphs, and a figs folder, containing amongst other the main figures of the paper.

setup

Execution requires a valid installation of conda, mamba and snakemake (v7.32.4).
For pangenome graph creation, the pangraph command must be available in path, see pangraph documentation for installation instructions.
optionally, to facilitate download of genbank records from ncbi, your personal api key can be saved in config/ncbi_api_key.txt. It will be automatically used when downloading the data.

execution

to execute the pipeline locally, it is sufficient to run:

snakemake --use-conda --cores 1 all

You can replace 1 with the desired number of cores.

Give the high number of jobs and the memory and time requirements we advise executing on cluster. Execution using the SLURM workload manager is already set up and the pipeline can be executed with:

snakemake --profile cluster all

citation

Evolutionary dynamics of genome structure and content among closely related bacteria
Marco Molari, Liam P. Shaw and Richard A. Neher, biorxiv (2024)
doi: https://doi.org/10.1101/2024.07.08.602537

Name		Name	Last commit message	Last commit date
Latest commit History 460 Commits
cluster		cluster
conda_env		conda_env
config		config
exploration		exploration
notes		notes
rules		rules
scripts		scripts
slurm		slurm
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Structural genome evolution in E. coli ST131

setup

execution

citation

About

Releases

Packages

Languages

License

mmolari/ecoliST131-structural-evo

Folders and files

Latest commit

History

Repository files navigation

Structural genome evolution in E. coli ST131

setup

execution

citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages