Resource Allocation in Massively Heterogeneous Computer Systems: A Distributed Approval

Abstract

Major Goals: The goal of this project is to develop distributed placement and scheduling algorithms for heterogeneous computing jobs run across a network of heterogeneous computing devices. We characterize heterogeneity as follows. Computing jobs may arrive in the system at different times and are characterized by their resource requirements, which may encompass multiple types of resources, e.g., requirements for both compute power and memory. Devices, in turn, are characterized by their heterogeneous resource availability, e.g., providing different amounts of CPU or GPU resources, memory, etc. These devices may even have different types of computing paradigms, e.g., CPUs compared to GPUs, and will have various amounts of these resources available at different times. Our algorithms to match jobs to providers over time should consider heterogeneity of both devices and jobs, and are designed to scale to the potentially massive number of jobs and devices present. While centralized matching algorithms allow users to easily coordinate their assignment of jobs to users, they may not scale well to massive numbers of jobs and devices. Thus, we focus on distributed algorithms that empower users and devices to find a mutually satisfying matching that meets job needs within device resource constraints. In particular, our framework is based on distributed pricing algorithms, in which devices announce virtual prices for their resources and users attempt to allocate their jobs to resources so as to incur the lowest cost. These prices indicate the capacity limitations of each device relative to users demands for them, and thus serve as a means for users to indirectly coordinate their job scheduling and placement. Thus, it requires little exchange of knowledge between devices and users; devices set the prices based on resource availability and users react based on their job requirements.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 06, 2022
Accession Number
AD1210623

Entities

People

  • Carlee Joe-Wong

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • 5G Wireless Networks
  • Abstracts
  • Algorithms
  • Availability
  • Classification
  • Computer Communications
  • Computers
  • Computing Devices
  • Contracts
  • Data Processing
  • Distance Learning
  • Distributed Computing
  • Edge Computing
  • Heterogeneity
  • Information Operations
  • Information Processing
  • Information Science
  • Instructions
  • Machine Learning
  • Military Research
  • Monitoring
  • Networks
  • Reinforcement Learning
  • Scheduling (Production)
  • Security
  • Simulations
  • Simulators
  • Standards
  • Students
  • Universities

Fields of Study

  • Computer science

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Economics
  • Parallel and Distributed Computing.