The original dataset has 51677 pockets clustered into 1301 clusters.
small (1-29) | middle (30-199) | large (200-999) | super-large (1000-) | |
---|---|---|---|---|
number of classes | 1060 | 193 | 42 | 6 |
number of pockets | 6951 | 11457 | 18580 | 11457 |
The statistics of the dataset(mean diameter is inf because there are unconnected graphs):
number of nodes | number of edges | density | diameter | average degree | |
---|---|---|---|---|---|
mean | 142.77 | 765.83 | 0.082 | inf | 10.64 |
median | 137 | 730 | 0.078 | 10 | 10.64 |
We select a subset of the original dataset (the largest 30 clusters) to form a new dataset that contains 21,125 binding pockets:
cluster | description |
---|---|
0 | ATP and its related ligand like ADP, ANP |
1 | glycol and ether groups who are also structurally closely related |
2 | heme |
3 | glucopyranose and fructose ( carbohydrate types of ligand) |
4 | benzene ring containing ligand group such as benzaldehyde, benzoic acid, phenoxyphenylboronic acids etc |
5 | dihydroxyethyl ether, glycol |
6 | chlorophyll |
7 | lipid containing ligands such as phosphocholine, bromododecanol, tetradecylpropanedioic acids etc |
8 | glucopyranose ( carbohydrate types of ligand) |
9 | UMP, thymidine monophosphate which are ATP related ligands |
10 | essential amino acids like Norvaline, lysine, arginine etc |
11 | ether and glycol |
12 | NAD which is the metabolites of ATP |
13 | carbohydrates like alpha-D galactopyranose, manopyranose |
14 | glucopyranose, pentadiol, trifluroactic acid, phenyllactate, which is a combination of alcohol, carbohydrates and acid groups. The predominant group is pentanediol, propane 1,3 diols which are similar to glycols |
15 | S-adenosyl-L-homocysteine |
16 | citric acid and its derivatives |
17 | coenzymeA |
18 | pyridoxal phosphate group |
19 | lipid and fatty acids group of ligands like oleic acid, palmitic acid, hexaenoic acid |
20 | methylbenzamide, pentanamide etc which are derivative of benzoic acid (benzene ring containing group) |
21 | (2R)-2,3-dihydroxy propyl (9Z)-octadec-9-enoate, oleic acid etc . They are fatty acids and lipid groups |
22 | hexaethylene glycol, tetraethylene glycol etc. which are the group of glycols |
23 | 2-(2,3-DIHYDROXY-BENZOYLAMINO)-3-HYDROXY-PROPIONIC ACID, Benzoic acid etc which are the benzene ring containing group |
24 | Flavin Mononucleotide (FMN). FMN usually take part in electron transport mechanism like coenzymeA and ATP. |
25 | Adenosine, ADP, Azamethionine-5'-deoxyadenosine, Beta-D-erythrofuranosyl Adenosine, etc. All are ATP related ligands |
26 | group of 2-N-morpholino ethanesulfonic acid , which contains a morpholine ring |
27 | glucopyranose |
28 | Tartaric acid, tetraglycine phosphinate, 1,3 dihydroxyacetone phosphate |
29 | glycerol-1-phosphate, dihydroxyacetone phosphate, glycerolaldehyde 3 phosphate etc. |
The 30 clusters are then grouped into 14 classes:
class | clusters | label |
---|---|---|
0 | 0, 9, 12, 25 | ATP |
1 | 2 | heme |
2 | 3, 8, 13, 27 | carbohydrate |
3 | 4 | benzene ring |
4 | 6 | chlorophyll |
5 | 7, 19, 21 | lipid |
6 | 10, 16, 28 | essential amino/citric acids/ tartaric acid |
7 | 15 | S-adenosyl-L-homocysteine |
8 | 17 | coenzymeA |
9 | 18 | pyridoxal phosphate |
10 | 20, 23 | benzoic acid |
11 | 24 | flavin mononucleotide |
12 | 26 | morpholine ring |
13 | 29 | phosphate |