Science and Technology Text Mining: Origins of Database Tomography and Multi-Word Phrase Clustering

Abstract

This report initially describes the motivations for co-word analysis in support of research policy formulation and research implementation evaluation. It compares co-word analysis in relation to other co-occurrence techniques such as co-citation and co-nomination analyses. It then traces the origins of co-word analysis in computational linguistics, describes in detail the development of co-word analysis for research evaluation, and concludes by presenting a new approach to co-word analysis for research evaluation (Database Tomography). The report shows that this new approach to co-word analysis, which requires no index or key words but deals with text directly, is a useful tool for scanning large bodies of text. It can identify pervasive thrust areas and their interrelationships, and serves as a starting point for further in-depth analysis of the text. Its value increases as the size of text increases and the breadth of topical areas covered by the text increases beyond the expertise of a moderate number of expert panels. A single link clustering example is shown that represents the first use of multi-word technical phrases in modern clustering. (75 refs.)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 15, 2003
Accession Number
ADA416268

Entities

People

  • Ronald Neil Kostoff

Organizations

  • Office of Naval Research

Tags

Communities of Interest

  • Air Platforms
  • Energy and Power Technologies
  • Ground and Sea Platforms
  • Sensors
  • Space
  • Weapons Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Composite Materials
  • Computational Linguistics
  • Computational Science
  • Computer Science
  • Data Mining
  • Databases
  • Detectors
  • Information Processing
  • Information Science
  • Information Systems
  • Linguistics
  • Natural Language Processing
  • Polymer Chemistry
  • Text Mining
  • Turbines
  • Two Dimensional

Readers

  • Business Analytics
  • Computational Linguistics