Unsupervised Topic Discovery by Anomaly Detection

Abstract

With the vast amount of information and public comment available online, it is of increasing interest to understand what is being said and what topics are trending online. Government agencies, for example, want to know what policies concern the public without having to look through thousands of comments manually. Topic detection provides automatic identification of topics in documents based on the information content and enhances many natural language processing tasks, including text summarization and information retrieval. Unsupervised topic detection, however, has always been a difficult task. Methods such as Latent Dirichlet Allocation (LDA) convert documents from word space into document space (weighted sums over topic space), but do not perform any form of classification, nor do they address the relation of generated topics with actual human level topics. In this thesis we attempt a novel way of unsupervised topic detection and classification by performing LDA and then clustering. We propose variations to the popular K-Mean Clustering algorithm to optimize the choice of centroids, and we perform experiments using Facebook data and the New York Times (NYT) corpus. Although the results were poor for the Facebook data, our method performed acceptably with the NYT data. The new clustering algorithms also performed slightly and consistently better than the normal K-Means algorithm.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Sep 01, 2013
Accession Number
ADA589398

Entities

People

  • Leon Cheng

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Air Platforms
  • Autonomy
  • Cyber

DTIC Thesaurus Topics

  • Anomaly Detection
  • Artificial Intelligence Software
  • Automata Theory
  • Automated Text Summarization
  • Change Detection
  • Computational Science
  • Computer Languages
  • Computer Science
  • Data Mining
  • Generative Models
  • Information Retrieval
  • Information Science
  • Kernel Functions
  • Machine Learning
  • Natural Language Processing
  • Network Science
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Space
  • Space - Space Objects