Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query regarding visualization of attention #6

Open
Sowmya-R-Krishnan opened this issue Feb 23, 2021 · 0 comments
Open

Query regarding visualization of attention #6

Sowmya-R-Krishnan opened this issue Feb 23, 2021 · 0 comments

Comments

@Sowmya-R-Krishnan
Copy link

Sowmya-R-Krishnan commented Feb 23, 2021

Thank you @gordicaleksa for the fantastic code and detailed documentation! It has helped me a lot in understanding the details of GAT.
While looking at the visualization functions in the code - I understand that entropy is used because the softmax applied over the attention coefficients bring it into a range of [0, 1] - resembling a probability distribution. While obtaining the attention coefficients from the GAT layer in the code, you have used:

def visualize_entropy_histograms(model_name=r'gat_PPI_000000.pth', dataset_name=DatasetType.PPI.name):
    # Fetch the data we'll need to create visualizations
    all_nodes_unnormalized_scores, edge_index, node_labels, gat = gat_forward_pass(model_name, dataset_name)

all_nodes_unnormalized_scores comes from the GAT forward function:

out_nodes_features = self.skip_concat_bias(attentions_per_edge, in_nodes_features, out_nodes_features)
return (out_nodes_features, edge_index)

When reading the GAT paper (Petar Veliˇckovi ́c et al) - the attention coefficients obtained after softmax are used to obtain the final output node features from the GAT layer. In the GAT implementation:

attentions_per_edge = self.neighborhood_aware_softmax(scores_per_edge, edge_index[self.trg_nodes_dim], num_of_nodes)

the above function gives the attention coefficients in [0, 1] range. The subsequent functions (self.aggregate_neighbors and self.skip_concat_bias) will give the final node features from the GAT layer. So is the "all_nodes_unnormalized_scores" variable used in the entropy histogram visualization function still in the range [0, 1]? Or is the entropy histogram used to visualize the output node features and not the softmax-normalized attention coefficients?

I also came across the entropy visualization in a DGL tutorial on GAT (https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/9_gat.html) and they were using the attention coefficients after softmax normalization for the visualization. Sorry if the question is very naive - I'm trying to apply this visualization to one of my projects involving inductive learning. Let me know if I have misunderstood the information being extracted from the GAT layer. Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant