Understanding Customer Dissatisfaction with Underutilized Distributed File Servers

Abstract

Modern distributed file systems very successfully cache file data on client machines. While this ensures that average response time is low, it also ensures large variance in response time because operations that must contact remote servers are much slower. Direct measurement of these remote servers show that their overall utilization can be quite low, 3% in our data, while users are simultaneously sufficiently dissatisfied with performance to pay for a faster server. This study shows that the faster server is in fact needed because, although 97% idle overall, these file servers can be intensely overloaded during bursts of activity, leading to periods of poor response time long enough to disgruntle users. In addition to focusing our attention on burst server loads, our analysis shows that the distribution of operation types during bursts is different from overall distributions. Servers should be optimized for workloads with much more data transfer than the overall distribution suggests. These results confirm our intuition that network-attached storage, if it can re-route most data transfer directly to storage devices, has the potential to reduce customer response time in two ways - (1) it avoids the copying steps at the server; and (2) it off-loads the work of data transfer from the server, reducing the chance of a burst of overutilization. Our future work, then, is to evaluate the client performance on such network-attached storage architectures and demonstrate the implications on distributed file system design.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1996
Accession Number
ADA461115

Entities

People

  • Erik Riedel
  • Garth Gibson

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Computer Programming
  • Computer Science
  • Computers
  • Computing System Architectures
  • Data Analysis
  • Data Transmission
  • Device Drivers
  • Directories
  • Information Science
  • Mass Storage
  • Measurement
  • Network Protocols
  • Network Science
  • Operating Systems
  • Servers (Computer Hardware)
  • Software Development
  • Workload

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Parallel and Distributed Computing.