🏞️

GPT-GNN: Generative Pre-Training of Graph Neural Networks

GPT-GNN

Task-specific labeled data is usually required for GNNs which is arduous to obtain.
Pre-training expressive GNN model on unlabeled data with self-supervision and then transfer the learned model to downstream tasks with only a few labels.
Likelihood of graph generation decomposed into two components: attribute generation and edge generation.

GNNs for semi-supervised node classification, recommendation systems and knowledge graph inference.
Input: graph with attributes — convolutional filters generate node-level representations layer by layer.
Similar to BERT pre-trained model, you can train on an unlabeled corpus and then transfer the model to downstream tasks with few labels.
NN-generation techniques don't work for GNNs because they generate graph structure without attributes. Also their scale is limited.

Diagram showing the GPT-GNN framework: attribute generation and edge generation as a joint optimization problem.

Attribute generation and edge generation joint optimization == maximizing probability likelihood of the whole attributed graph.
OAG: Open Academic Graph of 179M nodes & 2B edges successfully done.

Assuming $H_t$ is the node representation of node $t$ at the $l$-th GNN layer.
$N_t$ are all of the source nodes of node $t$. $E(s,t)$ all edges from $s$ to $t$.

Equation showing the GNN layer update: Aggregate and Extract functions for neighborhood information.

Aggregate: aggregation from neighborhood information (mean, max, sum).
Extract: neighborhood info extractor.
Variational Graph Auto-Encoders for reconstructing the graph structure OR Velickovic et al: Graph Infomax.
InfoGraph: maximizes mutual information between graph-level representations from GNNs.

Input to GNN: $G=(V,E,X)$ — node set, edge set, node-feature matrix.
GPT framework: the problem is how to design an unsupervised learning task over the graph for pre-training the GNN model.
How to get $\theta^* = \max_\theta p(G; \theta)$.
Most existing graph generation methods follow auto-regressive manner to factorize the probability objective:
- i.e: nodes come in an order and edges are generated by connecting new arriving nodes to existing nodes.
- If we have a permutation vector $\pi$, the target graph distribution can be equivalent to the expected likelihood over all permutations.

Equation: expected likelihood over all permutations of the node ordering.

Given a permuted order, we can factorize the log-likelihood autoregressively, generating one node per iteration.

Equation: autoregressive factorization of graph log-likelihood.

Overview diagram of the GPT-GNN generative pre-training process.