Implementation in PaddlePaddle and Paddle Graph Learning (PGL) of the method proposed by Gligorijević et al.[1] for protein function prediction.
- Python==3.7
- PaddlePaddle==2.2.1
- Pgl==2.2.2
- scikit-learn==1.0.1
- tqdm==4.62.3
The Protein Data Bank (PDB). Pre-processing and transformation of proteins into graphs can be found here. After preprocessing, the data should be copied in the ./data folder. Dataset splits (i.e., test, validation, and test) as proposed by [1] can be downloaded here or from their repository. They should also be copied to the folder ./data after extraction.
python train.py [params]
Where params are keyword arguments. See train.py for the list of arguments (with their default values).
python test.py --model_name <path-to-saved-model> --label_data_path <path-to-protein-with-their-labels> [more params]
model_name and label_data_path are required arguments. More (optional) parameters can be added as well. See test.py for a full list of expected arguments.
[1] Gligorijević, V., Renfrew, P.D., Kosciolek, T. et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun 12, 3168 (2021).