Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses

Abstract

This paper describes a crowdsourcing system that integrates machine learning techniques with human classifiers, showing how to apply a Bayesian approach to classifier combination to the challenge of crowdsourcing document topic labels. First, we use a number of NLP techniques to extract informative document features. We then screen and select workers using Amazon Mechanical Turk to label selected documents. We then apply Independent Bayesian Classifier Combination (IBCC) to classify the complete set of documents in a semi-supervised manner, taking into account the unreliability of crowd-sourced labels. More documents are then selected intelligently for labeling by the crowd. We demonstrate superior results using IBCC compared to a two-stage classifier and strong performance with only 16% documents labelled by the crowd.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Feb 05, 2013
Accession Number: ADA580905

Entities

People

Antonio Penta
Edwin Simpson
Sarvapali Ramchurn
Steven Reece

Organizations

University of Oxford

Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas