Beyond Fail-Stop: Wait-Free Serializability and Resiliency in the Presence of Slow-Down Failures

Abstract

Historically, database researchers have dealt with two kinds of process failures: fail-stop failures and malicious failures. Under the fail-stop assumption, processes fail by halting. Such failures are easily detectable. Under the malicious (or Byzantine) failure assumption, processes fail by behaving unpredictably, perhaps as adversaries. Such failures are not necessarily detectable. When system designers discuss fault tolerance, they typically restrict themselves to the problem of handling fail-stop failures only. This paper proposes an intermediate failure model and presents a practical algorithm for handling transactions under this model. The new failure model allows processes to fail by either slowing down or stopping: slow processes may later speed up, continue to proceed slowly, or, (eventually) stop. We call such failures slow-down failures. The model does not assume the ability to distinguish among these possibilities, say, by using a timeout mechanism, nor does it assume that it is possible to kill a slow process. Our algorithm, instead allows for a new process to be dispatched to do the job that had been assigned to a slow process. The problem is that several processes may end up doing the same task and interfere with one another. Our algorithm controls such interference while guaranteeing both serializability and resiliency.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: May 25, 1994
Accession Number: AD1020249

Entities

People

Dennis Shasha
John Turek

Organizations

New York University

Beyond Fail-Stop: Wait-Free Serializability and Resiliency in the Presence of Slow-Down Failures

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers