Probabilistic Programming with Missing Data

Abstract

This report summarizes work performed to develop a computer algorithm that is capable of handling missing data FIelds in multivariate data sets. The results presented here are based upon prior work which examined the applicability of inverse covariance matrices, or precision matrices,to representing missing data as zero eigenvalues in the precision matrices. The prior work used maximum a posteriori (MAP) estimates for a combination of normally distributed multivariate data with normally distributed multivariate measurement errors and assumed that the prior probability distributions for means and precision matrices were uniform. This work extends the previous technique to one that uses normal-Wishart matrices to describe the prior probability distribution for a normal data distribution and to estimate posterior parameters for these distributions for multivariate data with missing fields. While the integrals to estimate posterior probabilities from likelihood and prior probability distributions may in fact be analytically solvable, the authors were unable to discover such a solution. Instead, analytic integral solutions were used for individual probability measurements and a probabilistic programming language was used to perform numerical integration on the remaining integrals. The chosen solution leverages the strengths of one integration method to address the weakness of the complimentary method: analytic integration is performed where numerical integration cannot be performed, and numerical integration is performed where analytic solutions are currently unknown. Most of the recent work was performed by an undergraduate intern for Group 104 at MIT Lincoln Laboratory during the Summer of 2018. A model for the problem is defIned and analyzed mathematically. A discussion of the probabilistic programming languages and programs is also provided, along with results for a number of simulations.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 26, 2019
Accession Number
AD1069713

Entities

People

  • D Suen
  • K L Nahabedian
  • M J Yee
  • M. B. Hurley

Organizations

  • Massachusetts Institute of Technology

Tags

Communities of Interest

  • Biomedical
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Computational Science
  • Computer Programming
  • Data Analysis
  • Data Sets
  • Distribution Functions
  • Information Science
  • Language
  • Monte Carlo Method
  • Numerical Integration
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Programming Languages
  • Random Variables
  • Three Dimensional
  • Two Dimensional

Fields of Study

  • Mathematics

Readers

  • Calculus or Mathematical Analysis
  • Computational Linguistics
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms