Bias reduction in population size estimation of big data

Abstract

We have focused on the problem of bias reduction in the population size estimation of big data sets. We have provided a review of some of the most popular general population size estimation techniques, methods that are more speci c to graphs and networks, a short discussion of various sampling techniques for graphs and networks, and some of the drawbacks of these diff erent methods. Four population size estimators were proposed as alternatives to two existing methods by [24] and [39]. These were based around the idea of estimating the reciprocal of the population mean of a data set { something that the two existing methods do not estimate well. Through our simulations and analysis of nine real network data sets, of social networks and communications networks, our results show that they appear to overcome some of the problems of these existing methods such as a minimum sample size, and show some evidence of success especially when used with small samples from large data. Our estimators show improvements, in particular for small samples, generating smaller bias and more accurate population size estimates, and in limited cases even outperform existing estimators for larger sample sizes. However, for the majority of cases tested, the existing estimators appear to still show an advantage when few samples of a larger size are used. The code used for the real data analysis has been written into an R package which can be used to compute population estimates, the estimator bias, and plots of the bias of all size estimators.

Document Details

Document Type
DoD Grant Award
Publication Date
Oct 30, 2018
Source ID
W911NF1710048

Entities

People

  • Saralees Nadarajah

Organizations

  • Army Contracting Command
  • United States Army
  • University of Manchester

Tags

Fields of Study

  • Computer science
  • Mathematics

Readers

  • Statistical inference.
  • Systems Analysis and Design