You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @gordicaleksa . Thank you for your implementation of GAT.
I'm new to GNNs so I'm not sure whether I understood your code correctly, but I think there is a bug in the feature aggregation in your GATLayer. The direction of aggregation appears as target->source.
In your implementation 1, attention scores are calculated as follows:
The three dimensions of all_attention_coefficients mean (head, src, tgt), and you apply softmax on dim=-1 i.e. dim=2, making the scores sum up to 1 for each attention head and each source node.
Let's ignore the head dimension, then this calculates: out_nodes_features[i,:] = sum_over_j(all_attention_coefficients[i,j], nodes_features_proj[j,:])
The definition of all_attention_coefficients is (head, src, tgt), and nodes_features_proj (node, feat), where "node" corresponds to "tgt" dim, so out_nodes_features's 2 dims should mean (src, feat).
All of the code above has done the following: calculate attention score for each node as source of edge, and aggregate features of all its neighboring target nodes.
However based on my understanding, the feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target.
The implementation 2 also comes with the same problem. I'm still working to understand impl 3 so I don't know if the big persists.
The text was updated successfully, but these errors were encountered:
feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target
I think this is not the case in GNNs.
Equation from grl book
If we think from the point of the adjacency matrix equation, the source node aggregation depending on the row vector of this A is effectively aggregating target nodes for each source node.
Hi, @gordicaleksa . Thank you for your implementation of GAT.
I'm new to GNNs so I'm not sure whether I understood your code correctly, but I think there is a bug in the feature aggregation in your GATLayer. The direction of aggregation appears as target->source.
In your implementation 1, attention scores are calculated as follows:
pytorch-GAT/models/definitions/GAT.py
Lines 464 to 470 in 32bd714
The three dimensions of
all_attention_coefficients
mean (head, src, tgt), and you apply softmax on dim=-1 i.e. dim=2, making the scores sum up to 1 for each attention head and each source node.And then in aggregation:
pytorch-GAT/models/definitions/GAT.py
Lines 476 to 477 in 32bd714
Let's ignore the head dimension, then this calculates:
out_nodes_features[i,:] = sum_over_j(all_attention_coefficients[i,j], nodes_features_proj[j,:])
The definition of
all_attention_coefficients
is (head, src, tgt), andnodes_features_proj
(node, feat), where "node" corresponds to "tgt" dim, soout_nodes_features
's 2 dims should mean (src, feat).All of the code above has done the following: calculate attention score for each node as source of edge, and aggregate features of all its neighboring target nodes.
However based on my understanding, the feature aggregation in GAT should be in the opposite direction: collecting source nodes into each target.
The implementation 2 also comes with the same problem. I'm still working to understand impl 3 so I don't know if the big persists.
The text was updated successfully, but these errors were encountered: