Sawmill: A Logging File System for a High-Performance RAID Disk Array

Abstract

The widening disparity between processor speeds and disk performance is causing an increasing I/O performance gap. One method of increasing disk bandwidth is through arrays of multiple disks (RAIDs). In addition, to prevent the file server from limiting disk performance, new controller architectures connect the disks directly to the network so that data movement bypasses the file server. These developments raise two questions for file systems: how to get the best performance from a RAID, and how to use such a controller architecture. This thesis describes the implementation of a high-bandwidth log-structured file system called "Sawmill" that uses a RAID disk array. Sawmill runs on the RAID-II storage system; this architecture provides a fast data path that moves data rapidly among the disks, high-speed controller memory, and the network. By using a log-structured file system, Sawmill avoids the high cost of small writes to a RAID. Small writes through Sawmill are a factor of three faster than writes to the underlying RAID. Sawmill also uses new techniques to obtain better bandwidth from a log-structured file system. The thesis also examines how a file system can take advantage of the data path and controller memory of a storage system such as RAID-II. Sawmill uses a stream-based approach instead of a block cache to permit large, efficient transfers. Sawmill can read at up to 21 MB/s and write at up to 15 MB/s while running on a fairly slow (19 SPECmarks) Sun-4 workstation. In comparison, existing file systems provide less than 1 MB/s on the RAID-II architecture because they perform inefficient small operations and don't take advantage of the data path of RAID-II. In many cases, Sawmill performance is limited by the relatively slow server CPU, suggesting that the system would be able to handle larger and faster disk arrays simply by using a faster processor.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 1995
Accession Number
ADA637038

Entities

People

  • Kenneth W. Shirriff

Organizations

  • University of California, Berkeley

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Application Protocols
  • Application Software
  • Bandwidth
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Storage Systems
  • Databases
  • Device Drivers
  • Local Area Networks
  • Mass Storage
  • Network Protocols
  • Operating Systems
  • Servers (Computer Hardware)
  • Transport Protocols

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Parallel and Distributed Computing.