Semantic Lexicon Construction: Learning from Unlabeled Data via Spectral Analysis
Abstract
This paper considers the task of automatically collecting words with their entity class labels, starting from a small number of labeled examples ( seed words). We show that spectral analysis is useful for compensating for the paucity of labeled examples by learning from unlabeled data. The proposed method significantly outperforms a number of methods that employ techniques such as EM and co-training. Furthermore, when trained with 300 labeled examples and unlabeled data, it rivals Naive Bayes classifiers trained with 7500 labeled examples.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2004
- Accession Number
- ADA460254
Entities
People
- Rie K. Ando
Organizations
- IBM Thomas J. Watson Research Center