Next-Generational Variational Methods: Active Inference, Streaming Inference, and Assessing Model Fitness
Abstract
Next-Generation Variational Methods: Active Inference, Streaming Inference, and Assessing Model Fitness David M. Blei Columbia University 1 Summary Analyzing, exploring, and predicting from data have become critical to science, industry, military, government, and society. Consider the following problems about data. (1) We have a social network of 250M people; we want to identify the communities in this network and summarize their demographic characteristics. (2) An unending stream of intelligence is monitored by a team of analysts. We want to intuitively organize the information in a navigator and deliver important information to the right people. (3) We have recorded the location and times of detonated explosives in a war-torn city. We want to predict where and when the next device will be detonated. Experts solve these problems by following a data analysis pipeline. (a) Form assumptions about the data: How do different parts of the data relate to each other and what hidden structures might exist in the observations? (b) Analyze data (or multiple data sets) under the assumptions. (c) Use the analysis to form predictions, answer questions, make hypotheses, and explore the data. Probabilistic modeling provides an elegant framework for executing this pipeline. It gives a formalism for describing assumptions about data, generic algorithms for analyzing data under those assumptions, and meaningful calculations for making predictions and exploring hidden structure. Probabilistic models promise to let domain experts quickly develop and use sophisticated models without sophisticated machine learning expertise. But probabilistic modeling cannot yet fulfill this promise. (I) We need general-purpose algorithms that scale to massive data and we need to develop the theory and practice of applying probabilistic models to data streams. This is crucial for including probabilistic models in larger systems that continually collect, analyze, and act on data. (II) We need new methods to understand how well models work, methods for assessing model fitness and model diagnostics. As model building, fitting, and revising becomes a mainstream technological activity, assessing model fitness and diagnosing misfit must become equally mainstream. (III) We need to stretch probabilistic modeling into new scientific and technical applications, allowing real-world problems to drive our methodological innovations. (IV) In addition to developing and demonstrating algorithms and theory, we need to build usable tools for implementing and deploying probabilistic models; we will implement our methods in modern probabilistic programming systems.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141512209
Entities
People
- David M. Blei
Organizations
- Office of Naval Research
- Trustees of Columbia University in the City of New York
- United States Navy