Skip to content

Kuhlman-Lab/proteinmpnn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProteinMPNN

This repo includes the Kuhlman Lab fork of ProteinMPNN. It includes all the functionality of the original ProteinMPNN repo (linked here), with the following additions:

  • Improved input parsing for custom design runs
  • Multi-state design support
  • Additional utilities to provide integration with EvoPro

ProteinMPNN Read ProteinMPNN paper.

Installation:

git clone [email protected]:Kuhlman-Lab/proteinmpnn.git
cd proteinmpnn
mamba create env -f setup/proteinmpnn.yml

Usage Guidelines:

General Usage

The different input arguments available for each script can be viewed by adding -h to your python call (e.g., python generate_json.py -h).

ProteinMPNN accepts PDB files as input and produces FASTA files as output.

Unlike the original repo, our ProteinMPNN organizes the different input options (aka arguments) into .flag files:

  • json.flags is used to specify design constraints, like fixed residues and symmetry
  • proteinmpnn.flags is used to specify prediction flags, like which sampling temperature and model variant to use.

In general, there are two steps to running ProteinMPNN:

  1. Run the generate_json.py script and pass it the json.flags file.
  • This makes a new file called proteinmpnn_res_specs.json containing parsed design information.
  1. Run the run_protein_mpnn.py script and pass it proteinmpnn.flags and proteinmpnn_res_specs.json to obtain the actual ProteinMPNN prediction.

Useful Flags

Used in json.flags:

--default_design_setting: this is an optional filter to allow/disallow certain residue types during design. By default, it is set to all, which allows all 20 amino acids. Possible settings include: all-hydphob: exclude hydrophobic residues (CDEHKNPQRSTX) all-hydphil: exclude hydrophilic residues (ACFGILMPVWYX) all-CLD: exclude specific amino acids (in this case, Cys, Leu, and Asp) L+polar: mix-and-match amino acids and categories (in this case, allow all polar amino acids and also Leu)

Used in proteinmpnn.flags: --model_name: specifies which ProteinMPNN model checkpoint to use. Possible options include: v_48_002: vanilla (default) model with k=48 neighbors and 0.02A noise s_48_010: soluble protein model with k=48 neighbors and 0.1A noise

--sampling_temp: specifies the sampling temperature, which changes how diverse the generated sequences will be. Ranges from 0 to 1, inclusive. A temperature of 0 returns the "best" prediction every time (zero diversity), while a temperature of 1 will return completely random samples. Recommended range is 0.0 - 0.3 or so.

Example Cases

Example input and expected output files, as well as jobscripts and flag files, for many different design tasks are included in examples/. For a summary and explanation of each example, see examples/EXAMPLES.md. Currently supported protocols include:

  1. Monomer Design (with user-friendly parsing of designable residues)
  2. Binder Design
  3. Oligomer Design (with support for abitrary symmetries in homooligomers)
  4. Multi-state Design (with support for multiple complex design constraints)

Unit Testing

TODO

Code organization:

  • run/run_protein_mpnn.py - the main script to initialialize and run the model.
  • run/generate_json.py - function to automatically generate json of design constraints.
  • run/helper_scripts/ - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
  • examples/ - simple example inputs/outputs and runscripts for different tasks.
  • model_weights/ - trained proteinmpnn model weights.
    • v_48_... - vanilla proteinmpnn models trained at different noise levels.
    • s_48_... - solublempnn models trained at different noise levels.
    • ca_48_... - Ca-only models trained at different noise levels.

License

ProteinMPNN is distributed under an MIT license, which can be found at proteinmpnn/LICENSE. See license file for more details.

About

our in-house version of ProteinMPNN

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published