Product Description
Performance Modeling and Predictive Scheduling for Distributed Stream Data Processing
Abstract—Abstract In a distributed stream data processing system, an application is usually modeled using a directed graph, in which each vertex corresponds to a data source or a processing unit, and edges indicate data flow. In this paper, we propose a novel predictive scheduling framework to enable fast and distributed stream data processing, which features topology-aware modeling for performance prediction and predictive scheduling. For prediction, we present a topology-aware method to accurately predict the average tuple processing time of an application for a given scheduling solution, according to the topology of the application graph and runtime statistics. For scheduling, we present an effective algorithm to assign threads to machines under the guidance of prediction results.To validate and evaluate the proposed framework, we implemented it based on a highly-regarded distributed stream data processing platform, Storm, and tested it with 3 representative applications: word count (stream version), log stream processing and continuous query. Extensive experimental results show 1) The topology-aware prediction method offers an average accuracy of 84.2 percent.The predictive scheduling framework reduces the average tuple processing time by 24.9 percent on average, compared to Storm’s default scheduler.< final year projects >
Including Packages
Our Specialization
Support Service
Statistical Report
satisfied customers
3,589Freelance projects
983sales on Site
11,021developers
175+