Reliability and robustness for fast Bayesian inference of complex data
Abstract
In theory, Bayesian analysis provides a coherent accounting of uncertainty, and the modularity of Bayesian generative models allows the practitioner to synthesize any number of data modalities (such as real-valued data, images, text, etc.) and to capture arbitrarily complex data relationships. Moreover, exact Bayesian inference is a theoretical framework that can be precisely adapted to combining information across an arbitrary number of distinct, and potentially streaming, data sources. However, there are a number of points in the analysis where subjective practitioner choices or computational approximations must be made in order to come to a decision in a practical amount of time. In order to guarantee reliability of the final decision, we need to examine reliability at each step in the analysis is: in the choice of representation to describe the data, in the approximation used to extract point estimates and uncertainties, and in the combination of information across data sources and as more data accrues. Recently, a trend in speeding up inference is to discard quality guarantees in favor of faster inference. By contrast, we propose to provide fast inference together with precise theoretical guarantees on how our approximation affect the quality of inference after a finite amount of computation--and fast, automatic, user -friendly quantification of the robustness of our outputs to representation choices. In particular, any analysis starts by specifying a model, which expresses any existing domain knowledge, memory, or other expertise of a practitioner before collecting and analyzing data. This knowledge may be difficult to precisely and fully express in limited time or reasonably vary somewhat even between similar practitioners. One challenge is to quantify the extent of this variation among models and how it affects the final output. In particular, we plan to use perturbation ideas from statistic.al mechanics to provide this quantification as a practical and automated part of a standard Bayesian analysis, without requiring significant additional effort on the part of the data analyst. A related challenge is to ensure that the model is correctly specified, i.e., it does not rule out important and realistic possibilities in the analysis. We need to ensure that our model frameworks are capable of capturing desired behaviors in the data. To accomplish this, we will build on recent advances in the theory of exchangeability and combinatorial stochastic processes. Finally, once we have a model in hand, we can proceed with extracting information from the model and the data together. But any model complex enough to be of modem interest requires an approximation when undertaking a Bayesian analysis. Each approximation, perhaps distributed across a number of actors, individually introduces some error. Then these approximations will typically lead to a further compounding of error with each communication of information. This compounding is especially problematic for the frequent communication expected in massively distributed systems and streaming data contexts. We propose basic research, building on the practical and theoretical development of ÒcoresetsÓ in the theoretical computer science literature, to better understand the nature and extent of this compounding.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Sep 11, 2018
- Source ID
- W911NF1810063
Entities
People
- Tamara Broderick
Organizations
- Army Contracting Command
- Massachusetts Institute of Technology
- United States Army