Principles for Dealing with Large Programs and Large Data Files in Policy Studies
Abstract
The formal training of most analysts covers the theory of statistical and quantitative modeling in great detail. But, because of time constraints, even application-oriented courses provide experience in dealing with only small data sets. My experience at RAND indicates that many real-life problems require analyses of large data sets. Although the theoretical concerns are equally applicable to small and large data sets, some practical concerns (such as data cleaning, an analyst's understanding of the data sets, and writing computer code to transform variables) could be considerably more difficult for large data sets. The purpose of this paper is to briefly explain these principles. The paper is written for technically competent analysts who already know data analysis and how to write computer programs but wish to improve their effectiveness by approaching the task of analyzing large data sets systematically. Given the benefits of these principles in dealing with real-life problems, I advocate making the study of these principles a requirement in any graduate program for statisticians, economists, operations researchers, and other quantitative analysts.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 01, 1988
- Accession Number
- ADA216591
Entities
People
- R. Y. Arguden
Organizations
- RAND Corporation