Mixture of Gaussian Models for Classification and Hypothesis Testing under Differential Privacy

Abstract

Gaussian mixture models are an important tool in Bayesian decision theory. In this study, we focus on building such models over statistical database protected under differential privacy. Our approach involves querying necessary statistics from a database, and using the noise added responses generated according to differential privacy in classification and hypothesis test. We first formally analyze the sensitivity of our query set. Since there are multiple methods to query a statistic, either directly or indirectly, we analyze the sensitivities for different querying methods. We discover that adding Laplace noises may become problematic. For example variance-covariance matrix after noise addition is no longer positive definite. We propose a heuristic algorithm to repair the noise added variance-covariance matrix. We then examine the Bayes error under differential privacy through experiments with both simulated data and real life data, and demonstrate under which condition the impact of the added noises can be reduced. We compute the type I and type II errors under differential privacy for one sample z test, one sample t test, and two sample t test with equal variances, and show when a hypothesis test becomes unreliable under differential privacy mechanism.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2012
Accession Number
AD1191475

Entities

People

  • Ali Inan
  • Bowei Xi
  • Murat Kantarcıoğlu
  • Xiaosu Tong

Organizations

  • University of Texas at Dallas

Tags

Communities of Interest

  • Biomedical
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Computational Science
  • Computations
  • Computer Science
  • Data Analysis
  • Data Mining
  • Data Sets
  • Databases
  • Distribution Functions
  • Eigenvalues
  • Gaussian Distributions
  • Information Science
  • Machine Learning
  • Network Science
  • Probability
  • Statistics
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Database Systems and Applications
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference
  • AI & ML - Machine Learning Algorithms