Graph Systems and Algorithms

Hardware-Software Co-Design for Efficient Graph Application Computations on Emerging Architectures

Graph databases and applications have attracted much attention in the past few years due to the efficiency with which they can represent big data, connecting different layers of data structures and allowing analysis while preserving contextual relationships. This has resulted in a fast-growing community that has been developing various database and algorithmic innovations in this area, many of which will be gathering together in this conference. We joined this field as computer architecture researchers and are currently building a complete hardware-software design, called DECADES, that aims to accelerate the execution of these algorithms. From a computer architecture point of view, applications involving dense matrix operations such as neural networks have garnered much attention for their acceleration through specialized hardware such as GPUs and TPUs, while graph applications remain difficult to improve even with modern specialized accelerator designs. The reason for this is the characteristic pointer-based data structures of graph applications and the resulting irregular memory accesses performed by many of these workloads. Such irregular memory accesses result in memory latency bottlenecks that dominate the total execution time. In this talk, as part of the DECADES infrastructure, we present an elegant hardware-software codesign solution, named FAST-LLAMAs, to overcome these memory-bottlenecks, and thus, accelerate graph and sparse applications in an energy efficient way.
Graph databases and applications have attracted much attention in the past few years due to the efficiency with which they can represent big data, connecting different layers of data structures and allowing analysis while preserving contextual relationships. This has resulted in a fast-growing community that has been developing various database and algorithmic innovations in this area, many of which will be gathering together in this conference. We joined this field as computer architecture researchers and are currently building a complete hardware-software design, called DECADES, that aims to accelerate the execution of these algorithms. From a computer architecture point of view, applications involving dense matrix operations such as neural networks have garnered much attention for their acceleration through specialized hardware such as GPUs and TPUs, while graph applications remain difficult to improve even with modern specialized accelerator designs. The reason for this is the characteristic pointer-based data structures of graph applications and the resulting irregular memory accesses performed by many of these workloads. Such irregular memory accesses result in memory latency bottlenecks that dominate the total execution time. In this talk, as part of the DECADES infrastructure, we present an elegant hardware-software codesign solution, named FAST-LLAMAs, to overcome these memory-bottlenecks, and thus, accelerate graph and sparse applications in an energy efficient way. We propose a 40 minute talk which includes a rigorous characterization of the problem, and an in-depth analysis of our software-hardware co-design solution, FAST LLAMAs. We will present results based on a simulated model of our system which show significant performance improvements (up to 8x), as well as energy improvements (up to 20x) on a set of fundamental graph algorithms and important real-world datasets. Our system is completely open-source, and includes a compiler and cycle-accurate simulator. Our proposed system is compatible and easily extendable to many of the open-source graph analytic and database frameworks and we are excited to engage with the open-source community of this increasingly important domain. The work is part of a large collaboration from three academic groups: Margaret Martonosi (PI Princeton), David Wentzlaff (PI Princeton), Luca Carloni (PI Columbia) with students/researchers: Juan L. Aragón (U. of Murcia, Spain), Jonathan Balkind, Ting-Jung Chang, Fei Gao, Davide Giri, Paul J. Jackson, Aninda Manocha, Opeoluwa Matthews, Tyler Sorensen, Esin Türeci, Georgios Tziantzioulis, and Marcelo Orenes Vera. In addition to the submission author, portions of the talk may be offered by others in the collaboration.

Additional information

Type devroom

More sessions

2/1/20
Graph Systems and Algorithms
Vincent Cave
AW1.121
Python has proven to be a popular choice for data scientists in the domain of graph analytics. The multitude of freely available frameworks and python packages allow to develop applications quickly through ease of expressibility and reuse of code. With petabytes of data generated everyday and an ever evolving landscape of hardware solutions, we observe a graph processing framework should expose the following characteristics: ease of use, scalability, interoperability across data formats, and ...
2/1/20
Graph Systems and Algorithms
Sylvain Baubeau
AW1.121
Graffiti is the graph engine of Skydive - an open source networking analysis tool. Graffiti was created from scratch to provide the features required by Skydive : distributed, replicated, store the whole history of the graph, allow subcribing to events on the graph using WebSocket and visualization.
2/1/20
Graph Systems and Algorithms
Max Kießling
AW1.121
Graph algorithms play an increasingly important role in real-world applications. The Neo4j Graph Algorithms library contains a set of ~50 graph algorithms covering a lot of different problem domains. In our talk, we’ll present the architecture of the library and demonstrate the different execution phases using a real world example.
2/1/20
Graph Systems and Algorithms
Muhammad Osama
AW1.121
Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on vertex or edge frontiers. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies, particularly in the area of fine-grained load balancing, with a high-level programming model that allows programmers to quickly develop new graph ...
2/1/20
Graph Systems and Algorithms
Stijn Eyerman
AW1.121
Large scale graph analytics is essential to analyze relationships in big data sets. Thereto, the DARPA HIVE program targets a leap in power efficient graph analytics. In response to this program, Intel proposes the Programmable Unified Memory Architecture (PUMA). Based on graph workload analysis insights, PUMA consists of many multi-threaded cores, fine-grained memory and network accesses, a globally shared address space and powerful offload engines. In this talk, we will describe the PUMA ...
2/1/20
Graph Systems and Algorithms
AW1.121
In this talk we will introduce enhancements to the Cypher graph query language, enabling queries spanning multiple graphs, intended for use in sharding and federation scenarios. We will also present our experience with sharding the LDBC Social Network Benchmark dataset.
2/1/20
Graph Systems and Algorithms
Ben Steer
AW1.121
Temporal graphs capture the development of relationships within data throughout time. This model fits naturally within a streaming architecture, where new events can be inserted directly into the graph upon arrival from a data source, being compared to related entities or historical state. However, the vast majority of graph processing systems only consider traditional graph analysis on static data, with some outliers supporting batched updating and temporal analysis across graph snapshots. This ...