Skip to content

Latest commit

 

History

History
150 lines (109 loc) · 5.98 KB

README.md

File metadata and controls

150 lines (109 loc) · 5.98 KB

example workflow install with bioconda Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge

MoCHI

Welcome to the GitHub repository for MoCHI: Neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis, and allostery from deep mutational scanning data.

Table Of Contents

  1. Installation
  2. Usage
    1. Option A: MoCHI command line tool
    2. Option B: Custom Python script
    3. Demo
  3. Manual
  4. Bugs and feedback
  5. Citing MoCHI

Installation

The easiest way to install MoCHI is by using the bioconda package:

conda install -c bioconda pymochi

See the full Installation Instructions for further details and alternative installation options.

Usage

You can run a standard MoCHI workflow using the command line tool or a custom analysis by taking advantage of the "pymochi" package in your own python script.

MoCHI requires a plain text model design file containing a table describing the measured phenotypes and how they relate to the underlying additive (biophysical) traits. The table should have the following 4 tab-separated columns (see example here):

  • trait: One or more additive trait names
  • transformation: The shape of the global epistatic trend (Linear/ReLU/SiLU/Sigmoid/SumOfSigmoids/TwoStateFractionFolded/ThreeStateFractionBound)
  • phenotype: A unique phenotype name e.g. Abundance, Binding or Kinase Activity
  • file: Path to DiMSum output (.RData) or plain text file with variant fitness and error estimates for the corresponding phenotype(s) (nucleotide sequence example here, amino acid sequence example here)

Option A: MoCHI command line tool

Replace MY_MODEL with the path to your model design file (see example here).

run_mochi.py --model_design MY_MODEL

Get help with additional command line parameters:

run_mochi.py -h

Option B: Custom Python script

Below is an example of a custom MoCHI workflow (written in Python) to infer the underlying free energies of folding and binding from doubledeepPCA data.

#Imports
import pymochi
from pymochi.data import MochiData
from pymochi.models import MochiTask
from pymochi.report import MochiReport
import pandas as pd
from pathlib import Path

#####################
# Step 1: Create a *MochiTask* object with one-hot encoded variant sequences, interaction terms and 10 cross-validation groups
#####################

#Globals
k_folds = 10
abundance_path = str(Path(pymochi.__file__).parent / "data/fitness_abundance.txt") #MoCHI demo data
binding_path = str(Path(pymochi.__file__).parent / "data/fitness_binding.txt") #MoCHI demo data

#Define model
my_model_design = pd.DataFrame({
   'phenotype': ['Abundance', 'Binding'],
   'transformation': ['TwoStateFractionFolded', 'ThreeStateFractionBound'],
   'trait': [['Folding'], ['Folding', 'Binding']],
   'file': [abundance_path, binding_path]})

#Create Task
mochi_task = MochiTask(
   directory = 'my_task',
   data = MochiData(
      model_design = my_model_design,
      k_folds = k_folds))

#####################
# Step 2: Hyperparameter tuning and model fitting
#####################

#Perform grid search overy hyperparameters
mochi_task.grid_search() 

#Fit model using optimal hyperparameters
for i in range(k_folds):
   mochi_task.fit_best(fold = i+1)

#####################
# Step 3: Generate report, phenotype predictions, inferred additive trait summaries and save task
#####################

temperature_celcius = 30

mochi_report = MochiReport(
   task = mochi_task,
   RT = (273+temperature_celcius)*0.001987)

energies = mochi_task.get_additive_trait_weights(
   RT = (273+temperature_celcius)*0.001987)
 
mochi_task.save()

Report plots, predictions and additive trait summaries will be saved to the my_task/report, my_task/predictions and my_task/weights subfolders.

Demo MoCHI

Run the demo to ensure that you have a working MoCHI installation (expected run time <10min):

demo_mochi.py

Manual

Comprehensive documentation is coming soon, but in the meantime get more information about specific classes/methods in python e.g.

help(MochiData)

Bugs and feedback

You may submit a bug report here on GitHub as an issue or you could send an email to [email protected].

Citing MoCHI

Please cite the following publication if you use MoCHI:

Faure, A. J. & Lehner, B. MoCHI: neural networks to fit interpretable models and quantify energies, energetic couplings, epistasis, and allostery from deep mutational scanning data. Genome Biol 25, 303 (2024). 10.1186/s13059-024-03444-y

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.6.

(Vector illustration credit: Vecteezy!)