Health Maintenance System: An Application of Recovery Oriented Computing for HPEC Systems
Abstract
Until recently, the single aspect of HPEC systems that has been most critical has been "performance," in terms of processor speeds and I/O throughput. As processor speeds and I/O throughput have continued to increase, and as the capability to build larger and larger systems has improved, the need for raw performance is becoming less critical. Now, it is the ability to achieve a high level of application availability that is becoming as critical as performance. In this paper, the author presents a CORBA-based framework upon which highly available applications can be constructed. This framework, known as the Health Maintenance System, provides the application, system managers, and management tools that have the ability to "manage" all resources within a system such that the "health" of the system can be maintained. The management of these resources involves the ability to "sense" the state of the resource, to control the resource, and to run tests on the resource to pro-actively detect any latent problems. The primary facet of the framework is the "resource manager." The resource manager provides local management support for all system resources. In addition, the resource manager provides management access to clients (e.g., the application). This access is provided via a set of "client interface" modules that provide a wide variety of interfaces (e.g., APIs, agents, etc). It is this combination of resource managers and client interface modules that allow the framework to be easily configured for a specific HPEC system. Ten briefing charts summarize the presentation.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 20, 2004
- Accession Number
- ADA428761
Entities
People
- Gerry Pocock