👛

DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs

Key Ideas

Based on the DGL (Deep Graph Library).
Synchronous-training approach, ego-networks forming mini-batches.
Min-cut partitioning algorithm.

Introduction

Challenging to train GNNs because of the inherent dependencies between nodes (vertices) — the next node can be on a completely different machine.
Each mini-batch must incorporate dependent samples (1-hop, 2-hop, etc.). Sampling algorithm is important.
As opposed to regular distributed NN training where the majority of network traffic comes from exchanging gradients, distributed GNN traffic comes from reading hundreds of neighbor vertices.

Background: Mini-Batch Training

Sample a set of $N$ vertices, uniformly at random from the training set.
Pick at most $K$ neighbor vertices (fan-out) for each target.
Compute the target vertex representations by gathering messages.

Diagram of mini-batch training: sampling target nodes and their K-hop neighborhoods for GNN message passing.

System Design: DistDGL

Samplers

Sample the mini-batch graph structures from the input graph.

KVStore

Distributed store for all vertex and edge data (so mini-batches can find each other when necessary).

Special KV-store instead of Redis or similar because we need better co-location of node/edge features in KVStore and graph partitions.
Efficient updates for sparse embeddings.
Flexible partition policies to map data to different machines.
Supports sparse embedding for training transductive models with learnable vertex embeddings.

Trainers

Fetch the mini-batch graphs from the samplers and corresponding vertex/edge features from KVStore.
Run forward/backward in parallel.
Dense model update component synchronized.

DistDGL system architecture: Samplers, KVStore, and Trainers working together across distributed machines.

Graph Partitioning

Uses METIS:

Densely connected vertices are assigned to the same partition to reduce the number of edge cuts.
Minimizing edge-cut improves GNN performance as we need to check the KVStore less and less.

Graph partitioning with METIS: densely connected subgraphs assigned to the same machine partition to minimize cross-machine communication.