Graph Systems and Algorithms

AMENDMENT Raphtory: Streaming analysis of distributed temporal graphs

AW1.121
Ben Steer
Temporal graphs capture the development of relationships within data throughout time. This model fits naturally within a streaming architecture, where new events can be inserted directly into the graph upon arrival from a data source, being compared to related entities or historical state. However, the vast majority of graph processing systems only consider traditional graph analysis on static data, with some outliers supporting batched updating and temporal analysis across graph snapshots. This talk will cover recent work defining a temporal graph model which can be updated via event streams and investigating the challenges of distribution and graph maintenance. Some notable challenges within this include partitioning a graph built from a stream, with the additional complexity of managing trade-offs between structural locality (proximity to neighbours) and temporal locality (proximity to an entities history). Synchronising graph state across the cluster and handling out-of-order updates, without a central ground truth limiting scalability. Managing memory constraints and performing analysis in parallel with ongoing update ingestion. To address these challenges, we introduce Raphtory, a system which maintains temporal graphs over a distributed set of partitions, ingesting and processing parallel updates in near real-time. Raphtory's core components consist of Graph Routers and Graph Partition Managers. Graph Routers attach to a given input stream and convert raw data into graph updates, forwarding this to the Graph Partition Manager handling the affected entity. Graph Partition Managers contain a partition of the overall graph, inserting updates into the histories of affected entities at the correct chronological position. This removes the need for centralised synchronisation, as commands may be executed in any given arrival order whilst resulting in the same history. To deal with memory constraints, Partition Managers both compress older history and set an absolute threshold for memory usage. If this threshold is met a cut-off point is established, requiring all updates prior to this time to be transferred to offline storage. Once established and ingesting the selected input, analysis on the graph is permitted via Analysis Managers. These connect to the cluster, broadcasting requests to all Partition Managers who execute the algorithm. Analysis may be completed on the live graph (most up-to-date version), any point back through its history or as a temporal query over a range of time. Additionally, multiple Analysis Managers may operate concurrently on the graph with previously unseen algorithms compiled at run-time, thus allowing modification of ongoing analysis without re-ingesting the data. Raphtory is an ongoing project, but is open source and available for use now. Raphtory is fully containerised for ease of installation and deployment and much work has gone into making it simple for users to ingest their own data sources, create custom routers and perform their desired analysis. The proposed talk will discuss the benefits of viewing data as a temporal graph, the current version of Raphtory and how someone could get involved with the project. We shall also touch on several areas of possible expansion at the end for discussion with those interested.
The intended audience for this talk is a mixture of data scientists and graphy engineers. It is going to be quite high level, but introducing some interesting ideas of how to view data through the lens of a temporal graph as well as novel systems solutions for distribution, maintenance and processing.

Additional information

Type devroom

More sessions

2/1/20
Graph Systems and Algorithms
Vincent Cave
AW1.121
Python has proven to be a popular choice for data scientists in the domain of graph analytics. The multitude of freely available frameworks and python packages allow to develop applications quickly through ease of expressibility and reuse of code. With petabytes of data generated everyday and an ever evolving landscape of hardware solutions, we observe a graph processing framework should expose the following characteristics: ease of use, scalability, interoperability across data formats, and ...
2/1/20
Graph Systems and Algorithms
Sylvain Baubeau
AW1.121
Graffiti is the graph engine of Skydive - an open source networking analysis tool. Graffiti was created from scratch to provide the features required by Skydive : distributed, replicated, store the whole history of the graph, allow subcribing to events on the graph using WebSocket and visualization.
2/1/20
Graph Systems and Algorithms
Max Kießling
AW1.121
Graph algorithms play an increasingly important role in real-world applications. The Neo4j Graph Algorithms library contains a set of ~50 graph algorithms covering a lot of different problem domains. In our talk, we’ll present the architecture of the library and demonstrate the different execution phases using a real world example.
2/1/20
Graph Systems and Algorithms
Muhammad Osama
AW1.121
Gunrock is a CUDA library for graph-processing designed specifically for the GPU. It uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on vertex or edge frontiers. Gunrock achieves a balance between performance and expressiveness by coupling high-performance GPU computing primitives and optimization strategies, particularly in the area of fine-grained load balancing, with a high-level programming model that allows programmers to quickly develop new graph ...
2/1/20
Graph Systems and Algorithms
AW1.121
Graph databases and applications have attracted much attention in the past few years due to the efficiency with which they can represent big data, connecting different layers of data structures and allowing analysis while preserving contextual relationships. This has resulted in a fast-growing community that has been developing various database and algorithmic innovations in this area, many of which will be gathering together in this conference. We joined this field as computer architecture ...
2/1/20
Graph Systems and Algorithms
Stijn Eyerman
AW1.121
Large scale graph analytics is essential to analyze relationships in big data sets. Thereto, the DARPA HIVE program targets a leap in power efficient graph analytics. In response to this program, Intel proposes the Programmable Unified Memory Architecture (PUMA). Based on graph workload analysis insights, PUMA consists of many multi-threaded cores, fine-grained memory and network accesses, a globally shared address space and powerful offload engines. In this talk, we will describe the PUMA ...
2/1/20
Graph Systems and Algorithms
AW1.121
In this talk we will introduce enhancements to the Cypher graph query language, enabling queries spanning multiple graphs, intended for use in sharding and federation scenarios. We will also present our experience with sharding the LDBC Social Network Benchmark dataset.