Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics

Published in ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2018

Recommended citation: Roshan Dathathri, Gurbinder Gill, Loc Hoang, Hoang-Vu Dang, Alex Brooks, Nikoli Dryden, Marc Snir, Keshav Pingali, “Gluon: A Communication-Optimizing Substrate for Distributed Heterogeneous Graph Analytics,” Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2018. https://dl.acm.org/authorize?N668550

(Download publication here) (Download slides here) (Watch presentation here) (Download source code here)

This work was also presented at a mini-symposium in SIAM CSE 2019 (Download slides here).

Supplementary material: the patch used to analyze and compare against Gemini can be downloaded here.

Abstract

This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models. The key to this approach is Gluon, a communication-optimizing substrate.

Programmers write applications in a shared-memory programming system of their choice and interface these applications with Gluon using a lightweight API. Gluon enables these programs to run on heterogeneous clusters and optimizes communication in a novel way by exploiting structural and temporal invariants of graph partitioning policies.

To demonstrate Gluon’s ability to support different programming models, we interfaced Gluon with the Galois and Ligra shared-memory graph analytics systems to produce distributed-memory versions of these systems named D-Galois and D-Ligra, respectively. To demonstrate Gluon’s ability to support heterogeneous processors, we interfaced Gluon with IrGL, a state-of-the-art single-GPU system for graph analytics, to produce D-IrGL, the first multi-GPU distributed-memory graph analytics system.

Our experiments were done on CPU clusters with up to 256 hosts and roughly 70,000 threads and on multi-GPU clusters with up to 64 GPUs. The communication optimizations in Gluon improve end-to-end application execution time by ∼2.6x on the average. D-Galois and D-IrGL scale well and are faster than Gemini, the state-of-the-art distributed CPU graph analytics system, by factors of ∼3.9x and ∼4.9x, respectively, on the average.