A Hypothesis-Testing Approach to Discriminant Analysis with Mixed Categorical and Continuous Variables When Data are Missing.
Abstract
In this report we consider the problem of discriminant analysis with discrete (categorical) and continuous variables with data missing at random. We use a hypothesis-testing approach based on the generalized likelihood ratio as proposed by Baek, et al. We use bootstrapping to determine critical values in order to control the Type 1 error rate. We present three algorithms for dealing with this case, each assuming a different model for the data: the INDICATOR algorithm replaces categorical variables with indicator variables, and treats these as if they were continuous, the FULL algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with mean and covariances depending on the discrete part) as the conditional distribution of the continuous part given the discrete part, and the COMMON algorithm assumes a multinomial distribution for the discrete part, and a multivariate normal distribution (with only the means depending on the discrete part) as the conditional distribution of the continuous part given the discrete part. (That is, a common covariance matrix is assumed across all multinomial cells.) The performance of these algorithms is compared through a simulation study. While the INDICATOR algorithm seems to have highest power, it also tends to display a higher Type 1 error rate than desired. The FULL and the COMMON algorithms have very similar power, but the COMMON algorithm appears to control the Type 1 error rate most effectively, and is least susceptible to problems occurring when some multinomial cells are sparsely represented. (AN)
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 1994
- Accession Number
- ADA293714
Entities
People
- G. D. Mccartor
- H. L. Gray
- J. W. Miller
- W. A. Woodward
Organizations
- Southern Methodist University