Reliability of Large Scale Disk Arrays
Abstract
Performance and reliability are major concerns in the design of large disk arrays. Hellerstein et al. pioneered the study of erasure-resilient codes that allow one to reconstruct data without loss in the presence of disk failures. Chee, Colhourn, and Ling used the close connection between erasure-resilient codes and certain combinatorial designs to establish much improved asymptotic and exact existence results for these codes. The design-theoretic approach provided the scientific basis for the project. In the subsequent sections we first provide the relevant background on the design of erasure codes for RAID, contrasting with the more extensively studied erasure codes for digital communications. Then we summarize highlights of the research in the ARO project now completed. Our research effort on codes for disk arrays revealed an unexpected means of optimizing I/O performance through appropriate orderings of codewords. Indeed our simulation results show a marked improvement in performance when codewords are ordered such that consecutive sets of codewords exhibit a maximum overlap. We undertook an investigation of optimal orderings for triple erasure codes, and obtained substantial results on orderings for double erasure codes.
Document Details
- Document Type
- Technical Report
- Publication Date
- May 29, 2001
- Accession Number
- ADA391340
Entities
People
- Charles J. Colbourn
Organizations
- University of Vermont