Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses

Abstract

This paper describes a crowdsourcing system that integrates machine learning techniques with human classifiers, showing how to apply a Bayesian approach to classifier combination to the challenge of crowdsourcing document topic labels. First, we use a number of NLP techniques to extract informative document features. We then screen and select workers using Amazon Mechanical Turk to label selected documents. We then apply Independent Bayesian Classifier Combination (IBCC) to classify the complete set of documents in a semi-supervised manner, taking into account the unreliability of crowd-sourced labels. More documents are then selected intelligently for labeling by the crowd. We demonstrate superior results using IBCC compared to a two-stage classifier and strong performance with only 16% documents labelled by the crowd.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Feb 05, 2013
Accession Number
ADA580905

Entities

People

  • Antonio Penta
  • Edwin Simpson
  • Sarvapali Ramchurn
  • Steven Reece

Organizations

  • University of Oxford

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes
  • Sensors

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence Software
  • Bayesian Networks
  • Computer Science
  • Crowdsourcing
  • Data Sets
  • Feature Extraction
  • Human-Machine Interfaces
  • Human-Machine Systems
  • Information Science
  • Machine Learning
  • Models
  • Natural Language Processing
  • Probability
  • Standards
  • Training
  • User Interface

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval