The Matching Methodology: Some Statistical Properties.

Abstract

Matching is defined as the methodology of merging micro-data files to create larger files of data. Matching is often done to extract statistical information which cannot be obtained from the individual files that are incomplete. Current federal statistical practice involving multivariate file-merging techniques is typically not based on a formal statistical theory. In view of this situation, a survey on matching is given. All known models for matching are presented under a unified framework, which consists of three situations involving the same or similar individuals. The properties of a maximum likelihood strategy to match files of data involving the same individuals are derived via ranks and order-statistics from bivariate populations. In addition, the properties of this strategy have been examined with respect to a more reasonable criterion called epsilon-correct matching. Asymptotic results for such situation, including the Poisson approximation for the distribution of the number of correct matches, and convergence in probability of the average number of epsilon-correct matches, have been derived. Small-sample properties, like the monotone behavior of the expected number of matches with respect to the dependence of parameters of the underlying models, have been proved. Two matching strategies due to Kadane (1978) and one strategy due to Sims (1978) for merging files of data on similar individuals are discussed. These strategies are evaluated via a Monte-Carlo study of matching models involving trivariate normal distributions. Keywords: Monte Carlo method; trivariate normal distributions.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1986
Accession Number
ADA194801

Entities

People

  • Prem K. Goel
  • T. Ramalingam

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Computational Science
  • Data Science
  • Distribution Functions
  • Estimators
  • Information Science
  • Maximum Likelihood Estimation
  • Normal Distribution
  • Order Statistics
  • Probability
  • Random Variables
  • Statistical Algorithms
  • Statistical Analysis
  • Statistical Inference
  • Statistical Samples
  • Statistics
  • Surveys
  • Theorems

Fields of Study

  • Mathematics

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Database Systems and Applications
  • Statistical inference.