Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/4416
Title: Workload-sensitive approaches to improving graph data partitioning online
Authors: Firth, Hugo Edward Boswell
Issue Date: 2018
Publisher: Newcastle University
Abstract: Many modern applications, from social networks to network security tools, rely upon the graph data model, using it as part of an offline analytics pipeline or, increasingly, for storing and querying data online, e.g. in a graph database management system (GDBMS). Unfortunately, effective horizontal scaling of this graph data reduces to the NP-Hard problem of “k-way balanced graph partitioning”. Owing to the problem’s importance, several practical approaches exist, producing quality graph partitionings. However, these existing systems are unsuitable for partitioning online graphs, either introducing unnecessary network latency during query processing, being unable to efficiently adapt to changing data and query workloads, or both. In this thesis we propose partitioning techniques which are efficient and sensitive to given query workloads, suitable for application to online graphs and query workloads. To incrementally adapt partitionings in response to workload change, we propose TAPER: a graph repartitioner. TAPER uses novel datastructures to compute the probability of expensive inter -partition traversals (ipt) from each vertex, given the current workload of path queries. Subsequently, it iteratively adjusts an initial partitioning by swapping selected vertices amongst partitions, heuristically maintaining low ipt and high partition quality with respect to that workload. Iterations are inexpensive thanks to time and space optimisations in the underlying datastructures. To incrementally create partitionings in response to graph growth, we propose Loom: a streaming graph partitioner. Loom uses another novel datastructure to detect common patterns of edge traversals when executing a given workload of pattern matching queries. Subsequently, it employs a probabilistic graph isomorphism method to incrementally and efficiently compare sub-graphs in the stream of graph updates, to these common patterns. Matches are assigned within individual partitions if possible, thereby also reducing ipt and increasing partitioning quality w.r.t the given workload. - i - Both partitioner and repartitioner are extensively evaluated with real/synthetic graph datasets and query workloads. The headline results include that TAPER can reduce ipt by upto 80% over a naive existing partitioning and can maintain this reduction in the event of workload change, through additional iterations. Meanwhile, Loom reduces ipt by upto 40% over a state of the art streaming graph partitioner.
Description: PhD Thesis
URI: http://theses.ncl.ac.uk/jspui/handle/10443/4416
Appears in Collections:School of Computing Science

Files in This Item:
File Description SizeFormat 
Firth H 2018.pdfThesis1.52 MBAdobe PDFView/Open
dspacelicence.pdfLicence43.82 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.