The PDB dataset can be download by listing the proteins in the file proteins_all.txt and running the python script download_pdb.py).
python download_pdb.py [--save_dir SAVE_DIR] [--prot_list PROT_LIST]
Where the arguments are:
--save_dir Where to save downloaded PDB files. (default=./PDB_files)
--prot_list Text file with list of proteins to be downloaded (default=./proteins_all.txt)
python pdb_to_seq_coords.py [-h] [--pdb_dir PDB_DIR] [--save_dir SAVE_DIR]
Where the arguments are given as:
--pdb_dir Protein PDB files directory.
--save_dir Where to save retrieved chain sequences and coordinates.
python create_chain_graphs.py [-h] [--cmap_thresh CMAP_THRESH] [--save_dir SAVE_DIR] [--input_dir INPUT_DIR]
The arguments are:
--cmap_thresh Threshold for contact map.
--save_dir Where to save generated protein chain graphs.
--input_dir Directory containing protein chain sequences and coordinates
python create_labels_npz.py [--annot_file ANNOT_FILE] [--save_dir SAVE_DIR]
The arguments are:
--annot_file ANNOT_FILE
--save_dir SAVE_DIR
default annot_file =./nrPDB-GO_2019.06.18_annot.tsv proposed by [1].
[1] Gligorijević, V., Renfrew, P.D., Kosciolek, T. et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun 12, 3168 (2021).