Efficient Bayesian Computation for Massive Data Sets - Theory and Methods
Abstract
Statement of Work:The purpose of the two long-term projects outlined by the PI is to develop a class of computationally efficient methods for performing Bayesian inference involving massive data sets and/or intractable likelihoods. Markov Chain Monte Carlo (MCMC) methods are indispensable for Bayesian inference. However, with advancement in technology and increase in the complexity of physical models that scientists and engineers wish to fit, naive implementations of MCMC methods will result in prohibitively slow algorithms. Two distinct themes are woven together in this proposal. The firsttheme is to develop fast, scalable, MCMC algorithms using local approximations in models where the likelihood function is difficult to compute. Intractable likelihoods are ubiquitous in a wide range of applications that are of significant interest to the Navy. These include inference from satellite images, applications in oceanography such as data assimilation from velocity measurements of ocean currents and spatio-temporal prediction of sea surface temperatures using partial differential equation models. The second theme is to develop rigorous methods for parallelizing MCMC algorithms efficiently. The unifying perspective that binds the above two themes is to quantify the computational vs. statistical trade-off in these methods.Objective:The PI~s long-term research goal is to develop statistically and computationally efficient Bayesian procedures in high dimensional models motivated by real scientific applications and rigorously establish their optimality properties. The PI will place a strong emphasis on developing efficient statistical estimation for stochastic processes modeling various physical and biological phenomena. This will require advancing the current statistical practice by developing new theory and fast but accurate sampling algorithms in situations where the parameter space is much larger than thesample size. A key step entails quantifying the interplay between statistical efficiency and computational efficiency.Approach:In addition to building our previous work on optimal scaling of MCMC algorithms, we also propose new ideas for parallelization of MCMC algorithms. It is easy to ~naively~ parallelize most MCMC algorithms: simply run independent copies of the Markov chain on every available core and combine the samples. However, with the modern computing infrastructure and the complexity of problems MCMC has been used for, it has become clear that this ~naive~ parallelization can and should be improved upon. An important and timely problem is thus to develop new parallelization methods and find conditions under which they are better than the naive parallelization. This problem has received tremendous attention in the recent times. We propose a novel collection of methods for parallelizing MCMCby partitioning the underlying state space. The main intuition behind all state space partitioning methods is to replace a single Markov chain targeting a highly multimodal distribution with several Markov chains, each targeting distinct unimodal distributions. Since Markov chains tend to mix much more quickly on unimodal distributions than on multimodal distributions, we expect each of the new chains to be much more efficient than the original chain, increasing computational efficiency. Our key idea is to find partitions based on spectral clustering.Overall Merit and ONR Mission/Relevance:Statistical models involving intractable likelihoods are ubiquitous in a wide range of applications that are of significant interest to the Navy. These include inference from satellite images, applications in oceanography such as data assimilation from velocity measurements of ocean currents and spatio-temporal prediction of sea surface temperatures using partial differential equation models. It is highly desirable to characterize the uncertainty of estimation is such models and facilitate efficient implementation leading to real time inference. This pr
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Aug 12, 2016
- Source ID
- N000141612663
Entities
People
- Natesh Pillai
Organizations
- Office of Naval Research
- President and Fellows of Harvard College
- United States Navy