On optimising rescaling operations of streaming application

Omoregbee, Paul Osagie

Please use this identifier to cite or link to this item: http://theses.ncl.ac.uk/jspui/handle/10443/6198

Title:	On optimising rescaling operations of streaming application
Authors:	Omoregbee, Paul Osagie
Issue Date:	2023
Publisher:	Newcastle University
Abstract:	The primary objective of an auto-scaler is to allocate resources to meet demand, while adapting to varying workloads in a distributed streaming systems. This is done in order to achieve low latency and high throughput. However, achieving optimal performance in the auto-scaling of stream processing applications can be challenging due to various factors such as workload patterns, and application state sizes. While most auto-scaling systems assume that application state sizes and offered load remain unchanged during scaling intervals, in rapidly changing workload environments, long scaling durations may exacerbate suboptimal parallelism decisions as additional state may have been accrued during a rescaling interval, thereby causing multiple rescalings. The execution of this recurring task may negatively impact the system’s performance. Similarly, accurately measuring processing capacity is crucial for optimal performance in streaming applications. This helps ensure that the system can handle the application’s data volume and processing requirements without introducing bottlenecks or increasing latency. Furthermore, relying on conventional techniques, such as using offered load as a proxy for application state size, can be misleading, especially in window-based applications where both measures may not align perfectly. The stateful nature of individual window configuration creates a trade-off between memory usage and processing throughput. This can result in a false positive, causing a premature scaling decision, and leading to reduced throughput. We address these challenges by empirically evaluating the interplay between application state size, end-to-end checkpoint duration, and the duration of scaling procedures. Large checkpointing intervals could lead to longer recovery duration due to the accumulation of more state, while short intervals can lead to high processing overhead due to the frequency and potential checkpointing overlap, a delay in a preceding checkpoint influenced by state size. Based on our findings, we develop predictive models to provide future auto-scalers with intelligence to inform scaling decisions. Next, we conduct empirical evaluations to assess the relationship between operator throughput and state size, showcasing the relationship between the state size and the operator’s throughput of a streaming application. We explore the impact of window selectivity, an approach where the length of the window and the sliding period can impact the effectiveness and efficiency of streaming applications. In stateful operations, offered load is accumulated in a buffer until it reaches the end of the window, at which point the buffer is subsequently processed. Given that these buffers are retained in memory, a surge in offered load or larger windows may result in a rapid expansion of buffer size, thereby causing a spike in memory consumption. We therefore demonstrate first, how growing application state sizes can spuriously decrease operator throughput and trigger premature scale- viii up or scaling-down decisions and secondly, the impact of windowing on instantaneous state size.
Description:	PhD Thesis
URI:	http://hdl.handle.net/10443/6198
Appears in Collections:	School of Computing

Files in This Item:

File	Description	Size	Format
Omoregbee P O 2023.pdf		2.73 MB	Adobe PDF	View/Open
dspacelicence.pdf		43.82 kB	Adobe PDF	View/Open

Show full item record