A Hypothesis-Testing Approach to Discriminant Analysis with Mixed Categorical and Continuous Variables When Data are Missing.

Abstract

In this report we consider the problem of discriminant analysis with discrete (categorical) and continuous variables with data missing at random. We use a hypothesis-testing approach based on the generalized likelihood ratio as proposed by Baek, et al. We use bootstrapping to determine critical values in order to control the Type 1 error rate. We present three algorithms for dealing with this case, each assuming a different model for the data: the INDICATOR algorithm replaces categorical variables with indicator variables, and treats these as if they were continuous, the FULL algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with mean and covariances depending on the discrete part) as the conditional distribution of the continuous part given the discrete part, and the COMMON algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with only the means depending on the discrete part) as the conditional distribution of the continuous part given the discrete part. (That is, a common covariance matrix is assumed across all multinomial cells.) The performance of these algorithms is compared through a simulation study. While the INDICATOR algorithm seems to have highest power, it also tends to display a higher Type 1 error rate than desired. The FULL and the COMMON algorithms have very similar power, but the COMMON algorithm appears to control the Type 1 error rate most effectively, and is least susceptible to problems occurring when some multinomial cells are sparsely represented. (AN)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 1994
Accession Number
ADA293714

Entities

People

  • G. D. Mccartor
  • H. L. Gray
  • J. W. Miller
  • W. A. Woodward

Organizations

  • Southern Methodist University

Tags

Communities of Interest

  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Algorithms
  • Atmospheric Sciences
  • Covariance
  • Data Science
  • Discriminant Analysis
  • Earth Sciences
  • Geography
  • Geophysics
  • Indicators
  • Information Science
  • Maximum Likelihood Estimation
  • Planetary Sciences
  • Probability
  • Random Variables
  • Simulations
  • Statistics
  • Universities

Fields of Study

  • Mathematics

Readers

  • Approximation Theory.
  • Computer Vision.
  • Operations Research