Factor Analysis Optimization: Applied on Natural Language Knowledge Discovery

Abstract

The Technology Opportunities Analysis of Scientific Information System (Tech OASIS) automates the identification and visualization of relationships inherent in sets (i.e., hundreds or thousands) of literature abstracts. An automated Tech OASIS algorithm applies principal components analysis (PCA), multidimensional scaling (MDS) and a path-erasing algorithm to elicit and display clusters of related concepts. However, cluster groupings and visual representations are not singular for the same set of literature abstracts (i.e., user selection of the items to be clustered and the number of factors to be considered will generate alternative cluster solutions and relationships displays). Our current research, herein documented, seeks to identify and automate selection of a "best" PCA factor analysis solution for a set of literature abstracts. How then can a "best" solution be identified? Research on quality measures of factor/cluster groups indicates that terms/factors selections based on entropy, F-measure and cohesiveness appear promising. Our developed approach applies a composite metric, which strives to minimize the factor grouping entropy and F-measure and maximize each group's cohesiveness, while also considering set coverage. We apply the detailed approach to automatically map conceptual (term) relationships for 1202 abstracts concerning "natural language knowledge discovery."

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA484817

Entities

People

  • Alan L. Porter
  • Donghua Zhu
  • Robert J. Watts

Organizations

  • Tank-automotive and Armaments Command

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Computational Linguistics
  • Computational Science
  • Data Mining
  • Factor Analysis
  • Information Science
  • Information Systems
  • Language
  • Natural Language Processing
  • Natural Languages
  • Optimization
  • Square Roots
  • Statistical Analysis
  • Text Mining
  • Two Dimensional
  • Word Processors

Readers

  • Artificial Intelligence
  • Distributed Systems and Data Platform Development
  • Instructional Design and Training Evaluation.