Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis

Abstract

This paper considers the task of automatically collecting words with their entity class labels, starting from a small number of labeled examples ( seed words). We show that spectral analysis is useful for compensating for the paucity of labeled examples by learning from unlabeled data. The proposed method significantly outperforms a number of methods that employ techniques such as EM and co-training. Furthermore, when trained with 300 labeled examples and unlabeled data, it rivals Naive Bayes classifiers trained with 7500 labeled examples.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2004
Accession Number
ADA460254

Entities

People

  • Rie K. Ando

Organizations

  • IBM Thomas J. Watson Research Center

Tags

Communities of Interest

  • Biomedical

DTIC Thesaurus Topics

  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Computer Languages
  • Construction
  • Data Analysis
  • Data Mining
  • Data Sets
  • Frequency
  • Information Science
  • Knowledge Management
  • Learning
  • Machine Learning
  • Natural Language Processing
  • Numerical Analysis
  • Statistics
  • Supervised Machine Learning

Readers

  • Neural Network Machine Learning.