Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis

Abstract

This paper considers the task of automatically collecting words with their entity class labels, starting from a small number of labeled examples ( seed words). We show that spectral analysis is useful for compensating for the paucity of labeled examples by learning from unlabeled data. The proposed method significantly outperforms a number of methods that employ techniques such as EM and co-training. Furthermore, when trained with 300 labeled examples and unlabeled data, it rivals Naive Bayes classifiers trained with 7500 labeled examples.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jan 01, 2004
Accession Number: ADA460254

Entities

People

Rie K. Ando

Organizations

IBM Thomas J. Watson Research Center

Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Readers