🔑

AGL: A Scalable System for Industrial-purpose Graph Machine Learning

Key ideas

GNNs are inherently difficult to train in parallel because of the edge dependencies.
Alternatives are storing the whole graph in memory (very expensive) or use third party storage (very slow).
Instead: generate K-hop neighborhood, information complete subgraph for each node.
Merge values of in-edge neighbors, propagate to out-edge via MapReduce.
2-layer GNN training with 2B nodes and 100B edges in 14h.

Heterogeneous graph in Ant Financial based on message passing, merging and propagation.
Computing each node's K-hop embeddings based on message passing:
- Merging neighbors from in-edges and propagating merged information to neighbors along out-edges.
Each K-hop neighborhood is a subgraph.

AGL system architecture diagram showing GraphFlat, GraphTrainer, and GraphInfer components.

GraphFlat is an efficient and distributed generator, based on message passing for generating K-hop neighborhoods.
- These neighborhoods are flattened into protobuf strings.
GraphTrainer leverages many techniques, such as pipeline, pruning, and edge-partition, to eliminate the overhead on I/O and optimize the floating point calculations.
GraphInfer, a distributed inference module that splits K layer GNN models into K slices, and applies the message passing K times based on MapReduce.
GraphInfer maximally utilizes the embedding of each node because all the intermediate embedding at the k-th layer will be propagated to next round of message passing. This significantly boosts the inference tasks.
We can divide an industrial-scale graph into massive tiny K-hop neighborhoods w.r.t. their target nodes in advance, and load one or a batch of them rather than the entire graph into memory in the training phase.

GraphFlat message passing and K-hop neighborhood generation diagram.

GraphTrainer pipeline and optimization diagram.

GraphInfer distributed inference with MapReduce diagram.