Parallel and Distributed Systems for Probabilistic Reasoning

Abstract

Scalable probabilistic reasoning is the key to unlocking the full potential of the age of big data. From untangling the biological processes that govern cancer to effectively targeting products and advertisements, probabilistic reasoning is how we make sense of noisy data and turn information into understanding and action. Unfortunately, the algorithms and tools for sophisticated structured probabilistic reasoning were developed for the sequential Von Neumann architecture and have therefore been unable to scale with big data. In this thesis we propose a simple set of design principles to guide the development of new parallel and distributed algorithms and systems for scalable probabilistic reasoning. We then apply these design principles to develop a series of new algorithms for inference in probabilistic graphical models and derive theoretical tools to characterize the parallel properties of statistical inference. We implement and assess the efficiency and scalability of the new inference algorithms in the multicore and distributed settings demonstrating the substantial gains from applying the thesis methodology to real-world probabilistic reasoning. Based on the lessons learned in statistical inference we introduce the GraphLab parallel abstraction which generalizes the thesis methodology and enable the rapid development of new efficient and scalable parallel and distributed algorithms for probabilistic reasoning. We demonstrate how the GraphLab abstraction can be used to rapidly develop new scalable algorithms for probabilistic reasoning and assess their performance on real-world problems in both the multicore and distributed settings. Finally, we identify a unique challenge associated with the underlying graphical structure in a wide range of probabilistic reasoning tasks. To address this challenge we introduce PowerGraph which refines the GraphLab abstraction and achieves orders of magnitude improvements in performance relative to existing systems.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 2012
Accession Number
ADA584084

Entities

People

  • Joseph M Gonzalez

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Compressed Sensing
  • Computational Science
  • Computer Languages
  • Data Mining
  • Databases
  • Information Processing
  • Information Science
  • Machine Learning
  • Monte Carlo Method
  • Natural Language Processing
  • Network Science
  • Parallel Computing
  • Random Variables
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computational Modeling and Simulation
  • Parallel and Distributed Computing.

Technology Areas

  • AI & ML