Bayesian inference under cluster sampling with probability proportional to size

Abstract

Cluster sampling is common in survey practice, and the corresponding inference has been predominantly design based. We develop a Bayesian framework for cluster sampling and account for the design effect in the outcome modeling. We consider a two‐stage cluster sampling design where the clusters are first selected with probability proportional to cluster size, and then units are randomly sampled inside selected clusters. Challenges arise when the sizes of the nonsampled cluster are unknown. We propose nonparametric and parametric Bayesian approaches for predicting the unknown cluster sizes, with this inference performed simultaneously with the model for survey outcome, with computation performed in the open‐source Bayesian inference engine Stan. Simulation studies show that the integrated Bayesian approach outperforms classical methods with efficiency gains, especially under informative cluster sampling design with small number of selected clusters. We apply the method to the Fragile Families and Child Wellbeing study as an illustration of inference for complex health surveys.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jul 04, 2018
Source ID
10.1002/sim.7892

Entities

People

  • Andrew Gelman
  • Susanna Makela
  • Yajuan Si

Organizations

  • Alfred P. Sloan Foundation
  • Columbia University
  • Institute of Education Sciences
  • National Institutes of Health
  • National Science Foundation
  • Office of Naval Research
  • University of Michigan

Tags

Fields of Study

  • Mathematics

Readers

  • Mental Health of Military Veterans with Posttraumatic Stress Disorder (PTSD): Risk Factors, Prevalence, Symptoms, and Treatment.
  • Quantum Chemistry
  • Statistical inference.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference