Topic Similarity Networks: Visual Analytics for Large Document Sets

Abstract

We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to tease out non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 14, 2014
Accession Number
AD1123718

Entities

People

  • Arun S. Maiya
  • Robert M. Rolfe

Organizations

  • Institute for Defense Analyses

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Big Data
  • Case Studies
  • Computational Science
  • Computer Languages
  • Computer Science
  • Computers
  • Data Mining
  • Extraction
  • Fluid Dynamics
  • Fluid Mechanics
  • Information Processing
  • Information Science
  • Language
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • New York
  • Probability
  • Probability Distributions
  • Visualizations

Fields of Study

  • Computer science

Readers

  • Neural Network Machine Learning.
  • Systems Analysis and Design
  • Technical Research and Report Writing.