Recursive Fact-finding: A Streaming Approach to Truth Estimation in Crowdsourcing Applications

Abstract

This paper presents a streaming approach to solve the truth estimation problem in crowdsourcing applications. We consider a category of crowdsourcing applications where a group of individuals volunteer (or are recruited to) share certain observations or measurements about the physical world. Examples include reporting locations of gas stations that remain operational after a natural disaster or reporting locations of potholes on city streets. We call such applications social sensing. Ascertaining the correctness of reported observations is a key challenge in such applications, referred to as the truth estimation problem. This problem is made difficult by the fact that the reliability of individual sources is usually unknown a priori, since any concerned citizen may, in principle, participate. Moreover, the timescales of crowdsourcing campaigns of interest can be as small as a few hours or days, which does not offer enough history for a reputation system to converge. Instead, recent prior work, including our own, developed fact-finding algorithms to solve this problem by iteratively assessing the credibility of sources and their claims in the absence of reputation scores. Such algorithms, however, operate on the entire dataset of reported observations in a batch fashion, which makes them less suited to applications where new observations arrive continually. In this paper, we describe a streaming fact-finder that recursively updates previous estimates based on new data. The recursive algorithm solves an expectation maximization (EM) problem to determine the odds of correctness of different observations. We compare the performance of our recursive EM algorithm to a batch EM algorithm, as well as to several state-of-art fact-finders through extensive simulations. We also demonstrate convergence of the recursive algorithm to the results of the batch version through a real social sensing experiment. Our evaluation shows that the proposed approach can process data streams.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2013
Accession Number
ADA582792

Entities

People

  • Charu Aggarwal
  • Dong Wang
  • Lance Kaplan
  • Tarek Abdelzaher

Organizations

  • United States Army Research Laboratory

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Artificial Intelligence
  • Computational Complexity
  • Computer Science
  • Crowdsourcing
  • Data Mining
  • Data Sets
  • Machine Learning
  • Maximum Likelihood Estimation
  • Military Research
  • Mobile Devices
  • Mobile Phones
  • Simulations
  • Social Networks
  • Test And Evaluation
  • Time Intervals

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Educational Psychology
  • Statistical inference.