The Ubuntu Chat Corpus for Multiparticipant Chat Analysis

Abstract

We present the Ubuntu Chat Corpus as a data source for multiparticipant chat analysis. This addresses the problem of the lack of a large, publicly suitable corpora for research in this medium. The advantages of using this corpus for research is its large number of chat messages its multiple languages, its technical nature, and all of the original chat messages are in the public domain.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2013
Accession Number
ADA602658

Entities

People

  • David C. Uthus
  • David W. Aha

Organizations

  • United States Naval Research Laboratory

Tags

Communities of Interest

  • C4I

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Text Summarization
  • Command And Control
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Computer Science
  • Computers
  • Data Mining
  • English Language
  • Intelligent Agents
  • Language
  • Linguistics
  • Machine Translation
  • Operating Systems
  • Social Media

Fields of Study

  • Computer science

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.