Using a Bayesian Model to Combine LDA Features with Crowdsourced Responses
Abstract
This paper describes a crowdsourcing system that integrates machine learning techniques with human classifiers, showing how to apply a Bayesian approach to classifier combination to the challenge of crowdsourcing document topic labels. First, we use a number of NLP techniques to extract informative document features. We then screen and select workers using Amazon Mechanical Turk to label selected documents. We then apply Independent Bayesian Classifier Combination (IBCC) to classify the complete set of documents in a semi-supervised manner, taking into account the unreliability of crowd-sourced labels. More documents are then selected intelligently for labeling by the crowd. We demonstrate superior results using IBCC compared to a two-stage classifier and strong performance with only 16% documents labelled by the crowd.
Document Details
- Document Type
- Technical Report
- Publication Date
- Feb 05, 2013
- Accession Number
- ADA580905
Entities
People
- Antonio Penta
- Edwin Simpson
- Sarvapali Ramchurn
- Steven Reece
Organizations
- University of Oxford