Skip to content

Latest commit

 

History

History
92 lines (65 loc) · 3.77 KB

data.md

File metadata and controls

92 lines (65 loc) · 3.77 KB

GraphSite Classifier Dataset

The original dataset has 51677 pockets clustered into 1301 clusters.

small (1-29) middle (30-199) large (200-999) super-large (1000-)
number of classes 1060 193 42 6
number of pockets 6951 11457 18580 11457

The statistics of the dataset(mean diameter is inf because there are unconnected graphs):

number of nodes number of edges density diameter average degree
mean 142.77 765.83 0.082 inf 10.64
median 137 730 0.078 10 10.64

We select a subset of the original dataset (the largest 30 clusters) to form a new dataset that contains 21,125 binding pockets:

cluster description
0 ATP and its related ligand like ADP, ANP
1 glycol and ether groups who are also structurally closely related
2 heme
3 glucopyranose and fructose ( carbohydrate types of ligand)
4 benzene ring containing ligand group such as benzaldehyde, benzoic acid, phenoxyphenylboronic acids etc
5 dihydroxyethyl ether, glycol
6 chlorophyll
7 lipid containing ligands such as phosphocholine, bromododecanol, tetradecylpropanedioic acids etc
8 glucopyranose ( carbohydrate types of ligand)
9 UMP, thymidine monophosphate which are ATP related ligands
10 essential amino acids like Norvaline, lysine, arginine etc
11 ether and glycol
12 NAD which is the metabolites of ATP
13 carbohydrates like alpha-D galactopyranose, manopyranose
14 glucopyranose, pentadiol, trifluroactic acid, phenyllactate, which is a combination of alcohol, carbohydrates and acid groups. The predominant group is pentanediol, propane 1,3 diols which are similar to glycols
15 S-adenosyl-L-homocysteine
16 citric acid and its derivatives
17 coenzymeA
18 pyridoxal phosphate group
19 lipid and fatty acids group of ligands like oleic acid, palmitic acid, hexaenoic acid
20 methylbenzamide, pentanamide etc which are derivative of benzoic acid (benzene ring containing group)
21 (2R)-2,3-dihydroxy propyl (9Z)-octadec-9-enoate, oleic acid etc . They are fatty acids and lipid groups
22 hexaethylene glycol, tetraethylene glycol etc. which are the group of glycols
23 2-(2,3-DIHYDROXY-BENZOYLAMINO)-3-HYDROXY-PROPIONIC ACID, Benzoic acid etc which are the benzene ring containing group
24 Flavin Mononucleotide (FMN). FMN usually take part in electron transport mechanism like coenzymeA and ATP.
25 Adenosine, ADP, Azamethionine-5'-deoxyadenosine, Beta-D-erythrofuranosyl Adenosine, etc. All are ATP related ligands
26 group of 2-N-morpholino ethanesulfonic acid , which contains a morpholine ring
27 glucopyranose
28 Tartaric acid, tetraglycine phosphinate, 1,3 dihydroxyacetone phosphate
29 glycerol-1-phosphate, dihydroxyacetone phosphate, glycerolaldehyde 3 phosphate etc.

The 30 clusters are then grouped into 14 classes:

class clusters label
0 0, 9, 12, 25 ATP
1 2 heme
2 3, 8, 13, 27 carbohydrate
3 4 benzene ring
4 6 chlorophyll
5 7, 19, 21 lipid
6 10, 16, 28 essential amino/citric acids/ tartaric acid
7 15 S-adenosyl-L-homocysteine
8 17 coenzymeA
9 18 pyridoxal phosphate
10 20, 23 benzoic acid
11 24 flavin mononucleotide
12 26 morpholine ring
13 29 phosphate