The Application of Text Mining and Data Visualization Techniques to Textual Corpus Exploration

Abstract

Unstructured data in the digital universe is growing rapidly and shows no evidence of slowing anytime soon. With the acceleration of growth in digital data being generated and stored on the World Wide Web, the prospect of information overload is much more prevalent now than it has been in the past. As a preemptive analytic measure, organizations across many industries have begun implementing text mining techniques to analyze such large sources of unstructured data. Utilizing various text mining techniques such as n -gram analysis, document and term frequency analysis, correlation analysis, and topic modeling methodologies, this research seeks to develop a tool to allow analysts to maneuver effectively and efficiently through large corpuses of potentially unknown textual data. Additionally, this research explores two notional data exploration scenarios through a large corpus of text data, each exhibiting unique navigation methods an analysts may elect to take. Research concludes with the validation of inferential results obtained through each corpus's exploration scenario.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 23, 2018
Accession Number
AD1056425

Entities

People

  • Jeffrey R. Smith

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Biomedical
  • Space
  • Weapons Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Big Data
  • Computational Science
  • Computer Languages
  • Correlation Analysis
  • Data Mining
  • Data Visualization
  • Databases
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Markov Models
  • Monte Carlo Method
  • Named Entity Recognition
  • Natural Language Processing
  • Network Science
  • Text Mining

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Distributed Systems and Data Platform Development