Investigation into Text Classification With Kernel Based Schemes
Abstract
The development of the Internet has resulted in a rapid explosion of information available on the Web. In addition, the speed and anonymity of internet media "publishing" make this medium ideal for rapid dissemination of various contents. As a result, there is a strong need for automated text analysis and mining tools, which can identify the main topics of texts, chat room discussions, Web postings, etc. This thesis investigates whether the nonlinear kernel-based feature vector selection approach may be beneficial for categorizing unstructured text documents. Results using a nonlinear kernel-based classification are compared to results obtained using the Latent Semantic Analysis (LSA) Approach commonly used in text categorization applications. The nonlinear kernel-based scheme considered in this work applies the feature vector selection (FVS) approach followed by the Linear Discriminant Analysis (LDA) scheme. Titles, along with abstracts from IEEE journal articles published between 1990 and 1999 with specific key terms, were used to construct the data set for classification. Overall, taking into account both classification performance and timing issues, results showed the FVS-LDA with a polynomial kernel of degree 1, and an added constant of 1, to be the best classifier for the database considered.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 2010
- Accession Number
- ADA518358
Entities
People
- Steven M. Benveniste
Organizations
- Naval Postgraduate School