A fair comparison of tree‐based and parametric methods in multiple imputation by chained equations

Abstract

Multiple imputation by chained equations (MICE) has emerged as a leading strategy for imputing missing epidemiological data due to its ease of implementation and ability to maintain unbiased effect estimates and valid inference. Within the MICE algorithm, imputation can be performed using a variety of parametric or nonparametric methods. Literature has suggested that nonparametric tree‐based imputation methods outperform parametric methods in terms of bias and coverage when there are interactions or other nonlinear effects among the variables. However, these studies fail to provide a fair comparison as they do not follow the well‐established recommendation that any effects in the final analysis model (including interactions) should be included in the parametric imputation model. We show via simulation that properly incorporating interactions in the parametric imputation model leads to much better performance. In fact, correctly specified parametric imputation and tree‐based random forest imputation perform similarly when estimating the interaction effect. Parametric imputation leads to slightly higher coverage for the interaction effect, but it has wider confidence intervals than random forest imputation and requires correct specification of the imputation model. Epidemiologists should take care in specifying MICE imputation models, and this paper assists in that task by providing a fair comparison of parametric and tree‐based imputation in MICE.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jan 29, 2020
Source ID
10.1002/sim.8468

Entities

People

  • Emily Slade
  • Melissa G. Naylor

Organizations

  • AbbVie
  • Alzheimer's Association
  • Alzheimer's Disease Neuroimaging Initiative
  • Alzheimer's Drug Discovery Foundation
  • BioClinica
  • Biogen
  • Bristol-Myers Squibb
  • Canadian Institutes of Health Research
  • Chiron Corporation
  • Eli Lilly and Company
  • Foundation for the National Institutes of Health
  • GE HealthCare
  • Hoffmann-La Roche
  • Laboratoires Servier
  • Lundbeck
  • Merck & Co.
  • National Institute of Biomedical Imaging and Bioengineering
  • National Institute on Aging
  • National Institutes of Health
  • Norman Cousins Center for Psychoneuroimmunology
  • Northern California Institute for Research and Education
  • Pfizer
  • Roche (United States)
  • Takeda Pharmaceutical Company
  • United States Department of Defense
  • University of Kentucky

Tags

Fields of Study

  • Mathematics

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Fault Tolerant Diagnosis of Black and White Balloon Isolation Tests Using ¥.

Technology Areas

  • AI & ML
  • AI & ML - Bayesian Inference