🪐

A Generalization of Transformer Networks to Graphs

Paper

Key ideas

Architecture

On Graph Sparsity

On Positional Encodings

Factorization of the graph Laplacian matrix into eigenvectors.

Graph Transformer Architecture

Diagram of the Graph Transformer architecture, similar to the original Transformer.
Node update equations for the Graph Transformer.
Feed-Forward-Network with residual connections applied to attention outputs.

Discussion

References:
Relational inductive biases, deep learning, and graph networks