Bias reduction in population size estimation of big data
Abstract
we have focused on the problem of bias reduction in the population size estimation of big data sets. We have provided a review of some of the most popular general population size estimation techniques, methods that are more speci c to graphs and networks, a short discussion of various sampling techniques for graphs and networks, and some of the drawbacks of these di erent methods. Four population size estimators were proposed as alternatives to two existing methods by [24] and [39]. These were based around the idea of estimating the reciprocal of the population mean of a data set { something that the two existing methods do not estimate well. Through our simulations and analysis of nine real network data sets, of social networks and communications networks, our results show that they appear to overcome some of the problems of these existing methods such as a minimum sample size, and show some evidence of success especially when used with small samples from large data. Our estimators show improvements, in particular for small samples, generating smaller bias and more accurate population size estimates, and in limited cases even outperform existing estimators for larger sample sizes. However, for the majority of cases tested, the existing estimators appear to still show an advantage when few samples of a larger size are used. The code used for the real data analysis has been written into an R package which can be used to compute population estimates, the estimator bias, and plots of the bias of all size estimators.
Document Details
- Document Type
- DoD Grant Award
- Publication Date
- Oct 30, 2018
- Source ID
- W911NF1710487
Entities
People
- Saralees Nadarajah
Organizations
- Army Contracting Command
- United States Army
- University of Manchester