Skip to content

lehner-lab/deep_indel_mutagenesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deep_indel_mutagenesis

Welcome to the GitHub repository for: Deep indel mutagenesis reveals the impact of insertions and deletions on protein stability and function. https://www.biorxiv.org/content/10.1101/2023.10.06.561180v1

Table of contents

Required Software

To run the deep_indel_mutagenesis pipeline you will need the following software and associated packages:

  • R (dplyr, modeest, stringr, strex, data.table, bio3d, seqinr, ggplot2, ggridges, GGally, gridExtra, viridis, writexl, matrixStats, vecsets, grpreg, glmnet, glinternet)
  • Functions to run the pipeline you will need to download the "Functions" folder, containing custom functions made to process the data for deep_indel_mutagenesis pipeline

Required Data

  • i) Predictions from CADD, DDMut, ESM1v, PROVEAN, and ESM1b ii) STRIDE and rSASA information, iii) the INDELi model predictions for the Tsuboyama data, together with vi) additional dfs neccesery to run the pipleline should be downloaded as "additional_dfs.rds" from here

  • DiMSum files neccesery for running the deep_indel_mutagenesis pipeline should be downloaded from here. If you want to re-run DiMSum you will also find the neccessery scripts ("VariantIdentity", "ExperimentalDesign" etc) in the same folder.

  • Tsuboyama et al. 2023 raw data ("Tsuboyama2023_Dataset2_Dataset3_20230416.csv") and the pdb files ("AlphaFold_model_PDBs") should be downloaded here

  • Pre-processed data for reproducing the figures can also be downloaded from here

Installation Instructions

Make sure you have git and conda installed and then run (expected install time <5min):

# Install dependencies (preferably in a fresh conda environment)
conda install -c conda-forge r-base r-dplyr r-modeest r-stringr r-strex r-data.table r-bio3d r-seqinr r-ggplot2 r-ggridges r-GGally r-gridExtra r-viridis r-writexl r-matrixStats r-vecsets r-grpreg r-glmnet r-glinternet 

Usage

1. To re-produce the figures:

000_load_functions and 01_split_data should be run first.

2. To run the genome-wide predictions for the human proteome:

We provide the README instructions and scripts used to run the genome-wide prediction using INDELi-E in the folder genome_wide_prediction_INDELi_E.

3. Use INDELi-E model for single proteins:

Alternativly, we provide the code to run the pre-trained INDELi-E model to predict stability effects of 1aa deletions and insertions in your protein of interest. Avaliable in the folder single_protein_prediction.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published