🎇

How Attentive Are GATs? (GAT v2)

Read the paper

Key Ideas

Introduction

Illustration showing that static attention assigns the same ranking of neighbors regardless of the query node.

Preliminaries

Generic GNN layer:

Generic GNN layer equation: aggregate neighbor representations, apply transformation.

GAT attention scoring:

GAT attention scoring function: $e_{ij} = \text{LeakyReLU}(\vec{a}^T [Wh_i \| Wh_j])$

GAT attention function:

GAT attention coefficients via softmax normalization over the neighborhood.

GAT layer:

Full GAT layer: $h_i' = \sigma(\sum_{j \in N(i)} \alpha_{ij} W h_j)$

Static vs Dynamic & Limited Expressivity of GAT

Static Attention

Dynamic Attention

Need for a New Scoring Function

Analysis showing that in GATv1, there exists a $j_{\max}$ such that $a_2(Wh_j)$ is maximal for all $j \in V$.
GATv2 scoring function: $e_{ij} = \vec{a}^T \text{LeakyReLU}(W_l h_i + W_r h_j)$ — applying attention after the non-linearity enables dynamic attention.

Evaluation

Example graph problem that cannot be solved by GATv1's static attention but can be solved by GATv2's dynamic attention.