Repair Tree: Fast Repair for Single Failure in Erasure-coded Distr ibuted Storage Systems
Abstract– In order to guarantee data reliability, erasure codes have been used in distributed storage systems. Nevertheless, this mechanism suffers from the repair problem that excess data are needed to repair a single failure, causing both high bandwidth consuming for the network and heavy computing load on the replacement node. To reduce repair traffic, researchers pointed out the tradeoff between storage and repair traffic and proposed regenerating codes by combining network coding. However, the combination only focuses on the storage terminal and the construction of the codes is quite complicated. Therefore, this paper further combines network coding with network structure and proposes a repair tree model based on general erasure codes to simplify the repair procedure. By decomposing repair computing and distributing it among the tree nodes, our model can mitigate the computing tension. The performance of repair tree is analyzed and evaluated by preliminary emulation. The result shows it can make about 3 times faster computing than conventional measure and the repair throughput is doubled if there are network bottlenecks. For proper topology, it can significantly reduce the repair traffic. We present algorithms to generate trees across the network topology. At last, we present the idea of extending repair tree to repair multiple failures.
sales on Site11,021