Cost-Efficient Tasks and Data Co-Scheduling with AffordHadoop
Abstract-With today’s massive jobs spanning thousands of tasks each, cost-optimality has become more important than ever.Modern distributed data processing paradigms can be significantly more sensitive to cost than makespan, especially for long jobs deployed in commercial clouds. This paper posits that minimized dollar costs can not be achieved unless data and tasks are scheduled simultaneously. In this paper, we introduce the problem of cost-efficient co-scheduling for highly data-intensive jobs in cloud, such as MapReduce. We show that while the problem is polynomial in some cases, its general problem is NP-Hard. We propose to tackle the problem by using integer programming techniques coupled with heuristic reduction and optimization to enable a near-realtime solution. AffordHadoop, a pluggable co-scheduler for Hadoop, is implemented as an example of such a co-scheduler. AffordHadoop can save up to 48% of the overall dollar costs when compared to existing schedulers and provides significant flexibility in fine-tuning the cost-performance tradeoff.
sales on Site11,021