Distributed Statistical Machine Learning via Concurrency Control

Abstract

Project Summary One of the grand challenges in modern computing is the design of data-analysis systems that scale to extremely large collections of data, simultaneously providing control over statistical error rates and control over computational resources such as runtime. Achieving bounds that are not merely correct but are useful in practice, particularly for problems at extreme scales, requires exploring parallel and distributed computing architectures. We tackle this challenge by taking concurrency-control ideas from the database community as a point of departure, adapting the concurrency-control paradigm to the needs of large-scale statistical inference. We focus on problems involving clustering and other combinatorial tasks, given the heterogeneity present in large-scale data sets and the combinatorial nature of distributed computing architectures. We propose two main threads of research: the first involving the highly-scalable paradigm of correlation clustering, and the second involving hierarchical Bayesian nonparametric models. Most existing work in these areas involves sequential algorithms that run on a single machine. We will develop parallel and distributed computational models for these tasks, implement them in a distributed computing environment and analyze the models both theoretically and empirically.

Document Details

Document Type: DoD Grant Award
Publication Date: Aug 12, 2016
Source ID: N000141512670

Entities

People

Michael I. Jordan

Organizations

Office of Naval Research
United States Navy
University of California Regents

Distributed Statistical Machine Learning via Concurrency Control

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas