deep_indel_mutagenesis

Welcome to the GitHub repository for: Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function. https://www.biorxiv.org/content/10.1101/2023.10.06.561180v1

Required Software

To run the deep_indel_mutagenesis pipeline you will need the following software and associated packages:

R (dplyr, modeest, stringr, strex, data.table, bio3d, seqinr, ggplot2, ggridges, GGally, gridExtra, viridis, writexl, matrixStats, vecsets, grpreg, glmnet, glinternet)
Functions to run the pipeline you will need to download the "Functions" folder, containing custom functions made to process the data for deep_indel_mutagenesis pipeline

Required Data

i) Predictions from CADD, DDMut, ESM1v, PROVEAN, and ESM1b ii) STRIDE and rSASA information, iii) the INDELi model predictions for the Tsuboyama data, together with vi) additional dfs neccesery to run the pipleline should be downloaded as "additional_dfs.rds" from here
DiMSum files neccesery for running the deep_indel_mutagenesis pipeline should be downloaded from here. If you want to re-run DiMSum you will also find the neccessery scripts ("VariantIdentity", "ExperimentalDesign" etc) in the same folder.
Tsuboyama et al. 2023 raw data ("Tsuboyama2023_Dataset2_Dataset3_20230416.csv") and the pdb files ("AlphaFold_model_PDBs") should be downloaded here
Pre-processed data for reproducing the figures can also be downloaded from here

Installation Instructions

Make sure you have git and conda installed and then run (expected install time <5min):

# Install dependencies (preferably in a fresh conda environment)
conda install -c conda-forge r-base r-dplyr r-modeest r-stringr r-strex r-data.table r-bio3d r-seqinr r-ggplot2 r-ggridges r-GGally r-gridExtra r-viridis r-writexl r-matrixStats r-vecsets r-grpreg r-glmnet r-glinternet

Usage

1. To re-produce the figures:

Download and unzip additional_files.zip, DiMSum.zip, pre_processed_data.zip and indel_prediction_models.zip from here. Also, download the Functions folder neccessery to execute the scripts.
000_load_functions In stage 00 of the pipeline, we load and set folder locations for the required data (downloaded above) and load the required functions from the functions folder.
01_split_data In stage 01 of the pipeline, we process the raw DiMSum files and call the indel and substitution variants. Furthermore we process the Tsuboyama et al. 2023 data set for further analysis. In this script you have an option to either process the data yourself (PART1, PART2 and PART3) or directly load the processed data frames for further analysis (skip to PART4 and download pre-processed data).
002_figure1_main Reproduce Fig. 1
003_figure2_main Reproduce Fig. 2
004_figure1_extended Reproduce Extended Fig. 1
005_figure3_main Reproduce Fig. 3
006_figure2_extended Reproduce Extended Fig. 2
007_figure4_main Reproduce Fig. 4
008_figure5_main Reproduce Fig. 5
009_figure3_extended Reproduce Extended Fig. 3
010_figure6_main Reproduce Fig. 6
011_figure7_main Reproduce Fig. 7
012_figure4_extended Reproduce Extended Fig. 4
013_figure8_main Reproduce Fig. 8
014_figure5_extended Reproduce Extended Fig. 5

000_load_functions and 01_split_data should be run first.

2. To run the genome-wide predictions for the human proteome:

We provide the README instructions and scripts used to run the genome-wide prediction using INDELi-E in the folder genome_wide_prediction_INDELi_E.

3. Use INDELi-E model for single proteins:

Alternativly, we provide the code to run the pre-trained INDELi-E model to predict stability effects of 1aa deletions and insertions in your protein of interest. Avaliable in the folder single_protein_prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 217 Commits
Functions		Functions
genome_wide_prediction_INDELi_E		genome_wide_prediction_INDELi_E
single_protein_prediction		single_protein_prediction
000_load_functions.R		000_load_functions.R
001_split_data.R		001_split_data.R
002_figure1_main.R		002_figure1_main.R
003_figure2_main.R		003_figure2_main.R
004_figure1_extended.R		004_figure1_extended.R
005_figure3_main.R		005_figure3_main.R
006_figure2_extended.R		006_figure2_extended.R
007_figure4_main.R		007_figure4_main.R
008_figure5_main.R		008_figure5_main.R
009_figure3_extended.R		009_figure3_extended.R
010_figure6_main.R		010_figure6_main.R
011_figure7_main.R		011_figure7_main.R
012_figure4_extended.R		012_figure4_extended.R
013_figure8_main.R		013_figure8_main.R
014_figure5_extended.R		014_figure5_extended.R
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

deep_indel_mutagenesis

Table of contents

Required Software

Required Data

Installation Instructions

Usage

About

Releases

Packages

Languages

License

lehner-lab/deep_indel_mutagenesis

Folders and files

Latest commit

History

Repository files navigation

deep_indel_mutagenesis

Table of contents

Required Software

Required Data

Installation Instructions

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages