Failure-resistant, Elastic, Platform-Independent Execution Environments
Abstract
Research Problem and Objectives: With the advance of computing hardware, computing devices come in varied sizes and capabilities to suit specific computing needs; for instance, from powerful GPUs for machine learning tasks, to laptops for personal use, to smart phones for lightweight applications, to embedded devices with very limited computing power. It is very common for one person to own several devices. Such diversity in devices also opens up opportunities for more reliable and robust software execution. Thisproject aims to leverage such device diversity and develop software and hardware foundations that enable software applications to adapt themselves in resource limited environments, in the presence of arbitrary failures. When the application fails to complete a task (e.g., due to a device or power failure), the execution can resume from a previously check-pointed state on a different device, possibly with fewer resources.Technical Approaches: To allow programs to run on different devices, we propose to leverage the webassembly language (WASM). WASM is a language with rigorous semantics and strong memory safety guarantees. It is gaining popularity and can provide a lightweight and portable execution environment. To achieve elastic execution and allow for turning off unnecessarycomponents in a principled way, we propose to architect the applications as micro-services. Declarative policies specifying the dependencies of services and resource constraints are used as orchestration plans when migrating to a new device. To reuse previously computed results, we will use checkpointing techniques to efficiently restore interrupted executions due to failure (either power, orforceful termination due to other errors). While starting afresh is always an option, reusing partial results can further save timeand computing resources. Combined with advances in intermittent computing on batteryless devices, this project has the potential topush the computation beyond traditional devices (e.g., desktop, laptop, tablets) to tiny embedded devices. Finally, to seamlessly port unfinished computation to a different device, we plan to investigate how to efficiently store multiple copies of the checkpointed data both in physical storage and in the cloud for fault tolerance. Expected Outcome: If successful, the proposed research will develop methodologies and tools that enable software applications to adapt themselves to differently resource-constrained computing environments. The proposed research project will also produce applications in the field of object detection and software-defined radio that can migrate and adapt to different execution environments.Relevance to Navy: The proposed project would enable Navy engineers to develop safety-critical applications that can execute across devices with differently available computing capabilities. This is useful in scenarios where devices are compromised or experiencing failures, forcing the programs to run on different (and maybe more resource-limited) devices.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- May 15, 2024
- Source ID
- N000142412297
Entities
People
- Limin Jia
Organizations
- Carnegie Mellon University
- Office of Naval Research
- United States Navy