Improving massive experiments with threshold blocking

Abstract

Inferences from randomized experiments can be improved by blocking: assigning treatment in fixed proportions within groups of similar units. However, the use of the method is limited by the difficulty in deriving these groups. Current blocking methods are restricted to special cases or run in exponential time; are not sensitive to clustering of data points; and are often heuristic, providing an unsatisfactory solution in many common instances. We present an algorithm that implements a widely applicable class of blocking—threshold blocking—that solves these problems. Given a minimum required group size and a distance metric, we study the blocking problem of minimizing the maximum distance between any two units within the same group. We prove this is a nondeterministic polynomial-time hard problem and derive an approximation algorithm that yields a blocking where the maximum distance is guaranteed to be, at most, four times the optimal value. This algorithm runs in O ( n log n ) time with O ( n ) space complexity. This makes it, to our knowledge, the first blocking method with an ensured level of performance that works in massive experiments. Whereas many commonly used algorithms form pairs of units, our algorithm constructs the groups flexibly for any chosen minimum size. This facilitates complex experiments with several treatment arms and clustered data. A simulation study demonstrates the efficiency and efficacy of the algorithm; tens of millions of units can be blocked using a desktop computer in a few minutes.

Document Details

Document Type
Pub Defense Publication
Publication Date
Jul 05, 2016
Source ID
10.1073/pnas.1510504113

Entities

People

  • Fredrik Sävje
  • Jasjeet S. Sekhon
  • Michael J. Higgins

Organizations

  • Kansas State University
  • Office of Naval Research
  • Statistics New Zealand
  • Uppsala University

Tags

Fields of Study

  • Computer science

Readers

  • Applied Combinatorial Optimization and Logic Circuit Design.
  • Distributed Systems and Data Platform Development
  • Mathematical Modeling and Probability Theory.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • Space