Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining

Published in ACM International Conference on Supercomputing (ICS), 2021

Recommended citation: Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, and Keshav Pingali, "Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining", Proceedings of the ACM International Conference on Supercomputing (ICS), June 2021. https://dl.acm.org/doi/10.1145/3447818.3460359

(Download publication here) (Download slides here) (Download source code here)

Abstract

Graph pattern mining (GPM) is a key building block in diverse applications, including bioinformatics, chemical engineering, social network analysis, recommender systems and security. Existing GPM frameworks either provide high-level interfaces for productivity at the cost of expressiveness or provide expressive low-level interfaces at the cost of increased programming complexity. They also lack the flexibility to explore combinations of optimizations to achieve performance competitive with hand-optimized applications.

We present Sandslash, an in-memory graph pattern mining framework that uses a novel programming interface to support productive, expressive, and efficient GPM on large graphs. Sandslash provides a high-level API that only needs a specification of the GPM problem from the user, and the system implements fast subgraph enumeration, provides efficient data structures, and applies high-level optimizations automatically. To achieve performance competitive with expert-optimized implementations, Sandslash also provides a low-level API that allows users to express algorithm-specific optimizations. This enables Sandslash to support both high-productivity and high-efficiency without losing expressiveness. We evaluate Sandslash using five GPM applications and a wide range of real-world graphs. Experimental results demonstrate that applications written using Sandslash’s high-level or low-level API outperform those in state-of-the-art GPM systems AutoMine, Pangolin, and Peregrine on average by 13.8x, 7.9x, and 5.4x, respectively. We also show that these Sandslash applications outperform expert-optimized GPM implementations by 2.3x on average with less programming effort.