Towards Long-View Computing Load Balancing in Cluster Storage Systems
Abstract– In large-scale computing clusters, when the server storing a task’s requested data does not have sufficient computing
capacity for the task, current job schedulers either schedule the task to the closest server and transmit to it the requested data, or
let the task wait until the server has sufficient computing capacity. The former solution generates network load while the latter solution
increases task delay. To handle this problem, load balancing methods are needed to reduce the number of overloaded servers due to computing workloads. However, current load balancing methods do not aim to balance the computing load for the long term. Through trace analysis, we demonstrate the diversity of computing workloads of different tasks and the necessity of balancing the computing workloads among servers. Then, we propose a cost-efficient Computing load Aware and Long-View load balancing approach (CALV). CALV is novel in that it achieves long-term computing load balance by migrating out an overloaded server data blocks contributing more computing workloads when the server is more overloaded and contribute less computing workloads when the server is more underloaded at different epochs during a time period. Based upon the task schedules, we further propose a task reassignment algorithm that reassigns tasks from an overloaded server to other data servers of the tasks to make it non-overloaded before CALV is conducted. The above methods are for the tasks whose submission times and execution latencies can be predicted. To handle unexpected tasks or insufficiently accurate predictions, we propose a dynamic load balancing method, in which an overloaded server dynamically redirects tasks to other data servers of the tasks, or replicates the tasks’ requested data to other servers and redirects the tasks to those servers in order to become non-overloaded. Finally, we propose a proximity-aware tree based distributed load balancing method to reduce the reallocation cost and improve the scalability of CALV. Trace-driven experiments in simulation and a real computing cluster show that CALV outperforms other methods in terms of balancing the computing workloads and cost efficiency.
sales on Site11,021