Data-aware and System-aware Algorithms for Distributed Machine Learning

Abstract

APPROVED FOR PUBLIC RELEASEOVERVIEW. In many modern information processing applications, edge devices such as cell phones, sensors,and drones collect vast amounts of data. These data can be used to train machine learning models or perform inference on existing models. The state-of-the-art approach to training models on edge data is to transfer this data to the cloud. However, since edge devices typically have high-latency and low-bandwidth links to the cloud, sending raw training data to the cloud can be prohibitively expensive. Thus, instead of transferring edge data to the cloud, we need to bring the machine learning (ML) training and inference process to the edge devices. Existing data-center-based ML algorithms cannot be directly applied to this edge-based learning setting because of heterogeneous local datasets and disparate computational capabilities of the devices. The proposed research seeks to bridgethis gap by designing distributed training and inference algorithms that are data-aware (can handle correlations, heterogeneity, scarcity, and privacy/security of the data collected by the edge devices) and system-aware (robust and adaptable to communication and computation variabilities).INTELLECTUAL MERIT. Most state-of-the-art distributed ML algorithms are system-agnostic (they focus on the error-versus-iterations convergence without accounting for variations in the wallclock time spent per iteration) and data-agnostic(they do not account for correlations and redundancy across devices and over time). The proposed research aims to address this limitation by laying the foundation of data-aware and system-aware distributed algorithms. It will enable a network of resource-limited edge devices to collaboratively train and perform inference on machine learning model(s), while retaining control of their data. In terms of analytical tools and techniques, the proposed research is a unique confluence of optimization and learning theory, with techniques from information theory and applied probability. ONR RELEVANCE. Data collected by edge devices such as drones, ROVs, and sensors can provide valuable intelligence to aid decision-making. Current and future edge devices used in naval applications are also increasingly capable of intensive onboard computation. Thus, instead of sending data collected by these devices to a fusion center, they can locally train machine learning models with limited guidance from the fusion center. The proposed project will enable distributed training and address challenges such as data and computational heterogeneity unique to edge-based training. The proposed research will advance compression methods to obtain low-dimensional representations of the data and fusion algorithms that leverage spatial and temporal correlations to accurately estimate the mean and other aggregate statistics from the data. These statistics are not only of interest during decision-making but also form essential building blocks of the training process. We will also facilitate fast inference on edge data, which is critical for time-sensitive operational decisions.

Document Details

Document Type: DoD Grant Award
Publication Date: Jan 12, 2023
Source ID: N000142312149

Entities

People

Gauri Joshi

Organizations

Carnegie Mellon University
Office of Naval Research
United States Navy

Data-aware and System-aware Algorithms for Distributed Machine Learning

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas