A Study of Topic and Topic Change in Conversational Threads

Abstract

This thesis applies Latent Dirichlet Allocation (LDA) to the problem of topic and topic change in conversational threads using e-mail. We demonstrate that LDA can be used to successfully classify raw e-mail messages with threads to which they belong, and compare the results with those for processed threads, where quoted and reply text have been removed. Raw thread classification performs better, but processed threads show promise. We then present two new, unsupervised techniques for identifying topic change in e-mail. The first is a keyword clustering approach using LDA and DBSCAN to identify clusters of topics, and transition points between them. The second is a sliding window technique which assesses the current topic for every window, identifying transition points. The keyword clustering performs better than the sliding window approach. Both can be used as a baseline for future work.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2009
Accession Number
ADA508982

Entities

People

  • Jessy Cowan-sharp

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Accuracy
  • Cognitive Science
  • Computational Science
  • Data Mining
  • Data Processing
  • Electronic Mail
  • Information Processing
  • Information Science
  • Machine Learning
  • Natural Language Processing
  • Natural Languages
  • Network Science
  • Online Communications
  • Probabilistic Models
  • Probability
  • Probability Distributions
  • Supervised Machine Learning

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Neural Network Machine Learning.
  • Theoretical Analysis.