To run example scripts

First, cd to the proteingym folder and run pip install -e .. After that, running scripts/run_model.sh or run_ensemble.sh should work to run the two example scripts.

Library Structure/Adding a New Model

The scoring (and eventually statistics) scripts just rely on each model subclassing the SequenceFitnessModel class in proteingym/wrappers/generic_models.py and then implementing the predict_fitnesses function. So far I've added ESM2, Tranception (without retrieval), a site-independent model and MSA transformer as example models. Ideally, we eventually write all models into this format, and then have a single script (currently at scripts/run_model.py) that will score any model on any of the ProteinGym datasets. There's also a class for an example ensemble (SequenceFitnessModelEnsemble in proteingym/wrappers/model_ensembles.py) to show that, with a uniform interface, we can combine a whole bunch of models together and write really simple ensembling abstractions.

The configurations for each model are defined by json config files, see the example_configs folder, that specify specific parameters for each model. Those parameters are determined by the model class itself, e.g. the TranceptionModel class, and also the relevant parent classes (ProteinLanguageModel and SequenceFitnessProbabilityModel for Tranception). Writing a more detailed configuration description is on my to-do list, but the ones in example_configs are illustrative.

Documentation

So far almost all functions and some classes have docstrings in the Google docstring format. I'm going to go add the rest sometime soon. If all the classes and functions have uniform docstrings in that format, we can autogenerate documentation using Sphinx and host it somewhere, if that would be useful.

Outstanding Things to Add (likely an incomplete list)

Structure models, e.g. ProteinMPNN
The remaining zero-shot models
Code for fine-tuning and supervised models
Logging and writing scores to output files (can be same as current ProteinGym)
statistics script for computing spearmans and other metrics from score files (can also be the same as the current ProteinGym one if the output file formats are the same)
Additional missing documentation for any functions

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
configs		configs
example_data		example_data
proteingym		proteingym
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
conf.py		conf.py
index.rst		index.rst
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

To run example scripts

Library Structure/Adding a New Model

Documentation

Outstanding Things to Add (likely an incomplete list)

About

Releases

Packages

Languages

danieldritter/ProteinGym_Dev

Folders and files

Latest commit

History

Repository files navigation

To run example scripts

Library Structure/Adding a New Model

Documentation

Outstanding Things to Add (likely an incomplete list)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages