Precise Generation of Conformational Ensembles for Intrinsically Disordered Proteins Using Fine-tuned Diffusion Models

Overview

We developed a generative deep learning model that predict IDP conformational ensembles directly from their sequences using fine-tuned diffusion models, named as IDPFold. IDPFold bypasses the need for Multiple Sequence Alignments (MSA) or experimental data, achieving accurate predictions of ensemble properties across numerous IDPs.

IDPFold is pretrained on the PDB database and fine-tuned on conformational ensembles provided by IDRome, achieving more precise sampling of IDP ensembles than SOTA deep learning models and MD simulation.

The codebase of IDPFold is mainly inspired by Str2Str, thank Jiarui Lu for his valuable suggestions.

Installation

git clone https://github.com/Junjie-Zhu/IDPFold.git
cd IDPFold

# Create a new conda environment
conda env create -f environment.yml
conda activate idpfold

# Install ESM for sequence embedding extraction
pip install fair-esm

# Install IDPFold as a package
pip install -e .

After installation, you need to update the .env file that contains path to datasets. We provide a script for initializing .env file, just run the folloing command:

python initialize.py

Inference

To generate conformational ensembles for given sequences, you should:

Prepare a fasta file, both single sequence and multiple are allowed, an example has been provided in data/example.fasta which contains 3 IDP sequences
Check the checkpoint file, our pretrained model checkpoints can be accessed from Google Drive
Run the following command

# Extract sequence embeddings
python src/read_seqs.py pred_dir='./data/example.fasta'

# Inference
python src/eval.py ckpt_path='/path/to/ckpt'

Training

To be updated ...

This is a test version of IDPFold, if you have any question please either create an issue or directly contact [email protected]!

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
assets		assets
configs		configs
data		data
src		src
.project-root		.project-root
README.md		README.md
environment.yml		environment.yml
initialize.py		initialize.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Precise Generation of Conformational Ensembles for Intrinsically Disordered Proteins Using Fine-tuned Diffusion Models

Overview

Installation

Inference

Training

About

Languages

Junjie-Zhu/IDPFold

Folders and files

Latest commit

History

Repository files navigation

Precise Generation of Conformational Ensembles for Intrinsically Disordered Proteins Using Fine-tuned Diffusion Models

Overview

Installation

Inference

Training

About

Resources

Stars

Watchers

Forks

Languages