Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

Abstract

In an ever-increasing data rich environment, actionable information must be extracted, filtered, and correlated from massive amounts of disparate often free text sources. The usefulness of the retrieved information depends on how we accomplish these steps and present the most relevant information to the analyst. One method for extracting information from free text is Latent Dirichlet Allocation (LDA), a document categorization technique to classify documents into cohesive topics. Although LDA accounts for some implicit relationships such as synonymy (same meaning) it often ignores other semantic relationships such as polysemy (different meanings), hyponym (subordinate), meronym (part of), and troponomys (manner). To compensate for this de ciency, we incorporate explicit word ontologies, such as WordNet, into the LDA algorithm to account for various semantic relationships. Experiments over the 20 Newsgroups, NIPS, OHSUMED, and IED document collections demonstrate that incorporating such knowledge improves perplexity measure over LDA alone for given parameters.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2010
Accession Number
ADA517229

Entities

People

  • Laura A. Isaly

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Counter IED
  • Ground and Sea Platforms
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Air Force
  • Bayesian Networks
  • Computer Languages
  • Computer Science
  • Data Mining
  • Department Of Defense
  • Generative Models
  • Information Processing
  • Information Retrieval
  • Information Science
  • Information Systems
  • Language
  • Natural Language Processing
  • Network Science
  • Ontologies
  • Probabilistic Models
  • Probability

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development
  • Image Processing and Computer Vision.