Science and Technology Text Mining Basic Concepts

Abstract

This literature review presents a broad array of techniques that are becoming available to mine textual data. The review initially presents a three-function (i.e., data collection, data warehousing, and data exploitation) text mining architecture consisting of a six-step text mining process (i.e., source selection, text retrieval, information extraction, data storage, data mining, and presentation). It then presents some of the most widely used data and text mining techniques, including clustering and classification methods, such as nearest neighbor, relational learning models, and genetic algorithms, and dependency models, including graph-theoretic link analysis, linear regression, decision trees, nonlinear regression, and neural networks. The review illustrates some of their potential by describing the Office of Naval Research (ONR) text mining pilot program. In the first year of that program, existing metadata from commercial bibliographic databases were used. There is presently an unacceptably long delay between the development of key component technologies for textual data mining and the deployment of the integrated tools that S&T sponsors need. The first year of the ONR text mining pilot program represents an initial attempt to bridge that gap. Important lessons have been learned about the use of text mining for the management of science and technology research, but much remains to be done. (37 refs.)

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2003
Accession Number
ADA415886

Entities

People

  • Donald N. Kostoff
  • Douglas W. Oard
  • Paul Losiewicz

Organizations

  • Air Force Research Laboratory

Tags

Communities of Interest

  • Air Platforms
  • Autonomy
  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computational Science
  • Computer Languages
  • Computer Science
  • Computers
  • Data Mining
  • Data Storage Systems
  • Genetic Algorithms
  • Information Retrieval
  • Information Science
  • Machine Learning
  • Military Research
  • Network Science
  • Neural Networks
  • Statistical Analysis

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Neural Network Machine Learning.
  • Technical Research and Report Writing.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • Biotechnology