THE PARAMETERS OF CROSS-VALIDATION

Abstract

The validation of predictor weights, derived in one sample, by computing the correlation of the weighted sum of the predictors with the criterion in new samples is called cross-validation. The technique applies to any method of calculating the predictor weights. In this study three prediction methods were compared by cross-validation--multiple regression on the predictors, on the principal components of the predictors, and on the principal predictors. Prediction from the principal predictors is only possible when there are several criterion variables. In order to discover the parameters of the multivariate distribution which affect the choice of prediction method and the number of principal components or principal predictors to include in the regression, a large number of distributions were simulated on a computer and samples generated from these distributions. The population distributions varied in the following parameters: n, the number of predictors, m, the number of criteria, rho sq, the squared multiple correlation in the case of one criterion or the average squared multiple correlation of m criteria when m > 1; and pi sq, the average predictor variance related to the criteria. A typical calculation consisted of the following steps: generation of a population distribution for a set of values of the parameters; generation of two samples of size N from this population; calculation, in one sample, of the predictor weights for one or more prediction methods; and validation of these weights in the second sample. A large number of populations were generated, varying in the values of the parameters.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1967
Accession Number
AD0667467

Entities

People

  • Paul A. Herzberg

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Computational Science
  • Computer Programs
  • Computers
  • Data Science
  • Data Sets
  • Demography
  • Factor Analysis
  • Information Science
  • Mathematical Models
  • New York
  • Normal Distribution
  • Psychological Tests
  • Random Variables
  • Regression Analysis
  • Simulations
  • Statistical Algorithms
  • Statistics

Readers

  • Regression Analysis.