Work stealing for interactive services to meet target latency
Abstract
Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests.
Document Details
- Document Type
- Pub Defense Publication
- Publication Date
- Feb 27, 2016
- Source ID
- 10.1145/3016078.2851151
Entities
People
- Chenyang Lu
- I-ting Angelina Lee
- Kathryn S. Mckinley
- Kunal Agrawal
- Li Jing
- Sameh Elnikety
- Yuxiong He
Organizations
- Microsoft
- National Science Foundation
- Office of Naval Research
- Washington University in St. Louis