CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics

Published in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2019

Recommended citation: Loc Hoang, Roshan Dathathri, Gurbinder Gill, Keshav Pingali, “CuSP: A Customizable Streaming Edge Partitioner for Distributed Graph Analytics,” Proceedings of the 33rd IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2019.

(Download publication here) (Download slides here) (Download source code here)

Abstract

Graph analytics systems must analyze graphs with billions of vertices and edges which require several terabytes of storage. Distributed-memory clusters are often used for analyzing such large graphs since the main memory of a single machine is usually restricted to a few hundreds of gigabytes. This requires partitioning the graph among the machines in the cluster. Existing graph analytics systems usually come with a built-in partitioner that incorporates a particular partitioning policy, but the best partitioning policy is dependent on the algorithm, input graph, and platform. Therefore, built-in partitioners are not sufficiently flexible. Stand-alone graph partitioners are available, but they too implement only a small number of partitioning policies.

This paper presents CuSP, a fast streaming edge partitioning framework which permits users to specify the desired partitioning policy at a high level of abstraction and generates high-quality graph partitions fast. For example, it can partition wdc12, the largest publicly available web-crawl graph, with 4 billion vertices and 129 billion edges, in under 2 minutes for clusters with 128 machines. Our experiments show that it can produce quality partitions 6x faster on average than the state-of-the-art stand-alone partitioner in the literature while supporting a wider range of partitioning policies.