Learning for Microblogs with Distant Supervision: Political Forecasting with Twitter
Abstract
Microblogging websites such as Twitter offer a wealth of insight into a population's current mood. Automated approaches to identifying general sentiment toward a particular topic often perform two steps: Topic Identification and Sentiment Analysis. Topic Identification identifies tweets that are relevant to a desired topic (e.g., a politician or event), and Sentiment Analysis extracts each tweet's attitude toward the topic. Many techniques for Topic Identification simply involve selecting tweets using a keyword search. Here we present an approach that uses distant supervision to train a classifier on the tweets returned by the search. We show that distant supervision leads to improved performance in the Topic Identification task as well as in the downstream Sentiment Analysis task. We then use a system that incorporates distant supervision into both stages to analyze sentiments toward President Obama as expressed in a dataset of tweets. That is, we apply our approach to the problem of predicting Presidential Job Approval polls from Twitter data. Our results show better correlation with Gallup's Presidential Job Approval polls than previous work. We also present a novel baseline that performs remarkably well without using Topic Identification.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 2012
- Accession Number
- ADA589957
Entities
People
- Micol Marchetti-bowick
- Nathanael Chambers