Principles for Dealing with Large Programs and Large Data Files in Policy Studies

Abstract

The formal training of most analysts covers the theory of statistical and quantitative modeling in great detail. But, because of time constraints, even application-oriented courses provide experience in dealing with only small data sets. My experience at RAND indicates that many real-life problems require analyses of large data sets. Although the theoretical concerns are equally applicable to small and large data sets, some practical concerns (such as data cleaning, an analyst's understanding of the data sets, and writing computer code to transform variables) could be considerably more difficult for large data sets. The purpose of this paper is to briefly explain these principles. The paper is written for technically competent analysts who already know data analysis and how to write computer programs but wish to improve their effectiveness by approaching the task of analyzing large data sets systematically. Given the benefits of these principles in dealing with real-life problems, I advocate making the study of these principles a requirement in any graduate program for statisticians, economists, operations researchers, and other quantitative analysts.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 01, 1988
Accession Number
ADA216591

Entities

People

  • R. Y. Arguden

Organizations

  • RAND Corporation

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Audiovisual Aids
  • Computer Programming
  • Computer Programs
  • Computers
  • Data Analysis
  • Data Sets
  • Databases
  • Efficiency
  • Information Science
  • Language
  • Observation
  • Personnel Management
  • Programming Languages
  • Regression Analysis
  • Reliability
  • Structured Programming

Fields of Study

  • Computer science

Readers

  • Computer Science.
  • Distributed Systems and Data Platform Development
  • Systems Analysis and Design