Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

Abstract

We address the problem of part-of-speech tagging for English data from the popular microblogging service Twitter. We develop a tagset, annotate data, develop features, and report tagging results nearing 90% accuracy. The data and tools have been made available to the research community with the goal of enabling richer text analysis of Twitter and related social media data sets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2010
Accession Number
ADA547371

Entities

People

  • Brendan T O'Connor
  • Dani Yogatama
  • Daniel Mills
  • Dipanjan Das
  • Jacob Eisenstein
  • Jeffrey Flanigan
  • Kevin Gimpel
  • Michael Heilman
  • Nathan Schneider
  • Noah A. Smith

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Accuracy
  • Computational Science
  • Computer Languages
  • Computer Science
  • Data Sets
  • Information Science
  • Language
  • Machine Learning
  • Microblogging Services
  • Natural Language Processing
  • Online Communications
  • Probabilistic Models
  • Social Media
  • Speech
  • Standards
  • Supervised Machine Learning
  • Test Sets

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Computational Linguistics