Note: We no longer recommend the use of DeepH3 for antibody modeling. Instead, we encourage you to try the new DeepAb.
A deep residual network architecture to predict probability distributions of inter-residue distances and angles for CDR H3 loops in antibodies. This work is protected by https://creativecommons.org/licenses/by-nc/3.0/. Please cite:
- Ruffolo JA, Guerra C, Mahajan SP, Sulam J, & Gray JJ, "Geometric Potentials from Deep Learning Improve Prediction of CDR H3 Loop Structures," bioRXiv 2020. doi:10.1101/2020.02.09.940254
ResNet part of the code is re-implemented from https://github.com/KaimingHe/resnet-1k-layers which was based on
https://github.com/facebook/fb.resnet.torch. Network architecture is based on that of Wang et al. (RaptorX-Contact), and geometric descriptors based on Yang et al. (trRosetta) (references below).
Model trained on ~ 1400 antibodies from the SAbDab Database is available in deeph3/models/
torch, tensorboard (2.1 or higher), biopython (see requirements.txt for the complete list). Install with:
pip3 install -r requirements.txt [--user]
Be sure that your PYTHONPATH environment variable has the deepH3-distances-orientations/ directory. On linux, use the following command:
export PYTHONPATH="$PYTHONPATH:/absolute/path/to/deepH3-distances-orientations"
To train a model using a non-redundant set of bound and unbound antibodies downloaded from SAbDab, run:
cd deeph3
python3 train.py
By default, structures are selected from SAbDab with paired VH/VL chains, a resolution of 3 A or better, and at most 99% sequence identity (ie, the set used in our original preprint.)
Other arguments can be listed using the --help
or -h
option.
Note that you can skip this step since the model described in our paper is available in this archive
To predict the binned distance and angle matrices for a given antibody sequence (in a fasta file), run:
cd deeph3
python3 predict.py [--fasta_file [fasta file path] --model [model file path]]
The fasta file must have the following format:
>[PDB ID]:H [heavy chain sequence length]
[heavy chain sequence]
>[PDB ID]:L [light chain sequence length]
[light chain sequence]
Output is in the form of a pickle file ([fasta_file_basename].p) containing the predicted distance and orientation distributions.
See deeph3/data/antibody_dataset/fastas_testrun for an example.
Other arguments can be listed using the --help
or -h
option.
To generate constraint files to use in Rosetta, run:
cd deeph3
python3 generate_constraints.py [--fasta_file [fasta file path] --model [model file path]]
The fasta file must have the following format:
>[PDB ID]:H [heavy chain sequence length]
[heavy chain sequence]
>[PDB ID]:L [light chain sequence length]
[light chain sequence]
By default, the program will run the 1a0q example from the fasta files in data/
.
Output will go to output_dir/
as a file (for example) 1a0q.constraints
to use in Rosetta as -constraint_file deeph3/output_dir/1a0q.constraints
. In turn, that file references a set of data files with spline parameters in output_dir/1a0q.histograms/
.
Other arguments can be listed using the --help
or -h
option.
- Carlos Guerra - cguerramain
- Sai Pooja Mahajan - heiidii
- Jeff Ruffolo - jeffreyruffolo
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, "Identity Mappings in Deep Residual Networks," ECCV, 2016. arXiv:1603.05027
- S. Wang, S. Sun, Z. Li, R. Zhang and J. Xu, "Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model", PLOS Computational Biology, vol. 13, no. 1, p. e1005324, 2017. Available: 10.1371/journal.pcbi.1005324.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. arXix:1512.03385
- J. Yang, I. Anishchenko, H. Park, Z. Peng, S. Ovchinnikov and D. Baker, “Improved protein structure prediction using predicted interresidue orientations.,” Proceedings of the National Academy of Sciences, 2020. PNAS
- B. D. Weitzner, D. Kuroda, N. Marze, J. Xu and J. J. Gray, “Blind prediction performance of RosettaAntibody 3.0: grafting, relaxation, kinematic loop modeling, and full CDR optimization.,” Proteins: Structure, Function, and Bioinformatics, vol. 82, no. 8, pp. 1611–1623, 2014. Wiley