Investigation into Text Classification With Kernel Based Schemes

Abstract

The development of the Internet has resulted in a rapid explosion of information available on the Web. In addition, the speed and anonymity of internet media "publishing" make this medium ideal for rapid dissemination of various contents. As a result, there is a strong need for automated text analysis and mining tools, which can identify the main topics of texts, chat room discussions, Web postings, etc. This thesis investigates whether the nonlinear kernel-based feature vector selection approach may be beneficial for categorizing unstructured text documents. Results using a nonlinear kernel-based classification are compared to results obtained using the Latent Semantic Analysis (LSA) Approach commonly used in text categorization applications. The nonlinear kernel-based scheme considered in this work applies the feature vector selection (FVS) approach followed by the Linear Discriminant Analysis (LDA) scheme. Titles, along with abstracts from IEEE journal articles published between 1990 and 1999 with specific key terms, were used to construct the data set for classification. Overall, taking into account both classification performance and timing issues, results showed the FVS-LDA with a polynomial kernel of degree 1, and an added constant of 1, to be the best classifier for the database considered.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 2010
Accession Number
ADA518358

Entities

People

  • Steven M. Benveniste

Organizations

  • Naval Postgraduate School

Tags

Communities of Interest

  • Autonomy
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Computer Languages
  • Computer Science
  • Data Mining
  • Data Sets
  • Databases
  • Dimensionality Reduction
  • Discriminant Analysis
  • Electrical Engineering
  • Information Science
  • Kernel Functions
  • Machine Learning
  • Natural Language Processing
  • Network Science
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Calculus or Mathematical Analysis
  • Computational Linguistics
  • Library and Information Science

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Machine Learning Algorithms