Collective Segmentation and Labeling of Distant Entities in Information Extraction
Abstract
In information extraction, we often wish to identify all mentions of an entity such as a person or organization. Traditionally a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example if the same word is used multiple times it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2004
- Accession Number
- ADA439444
Entities
People
- Andrew McCallum
- Charles Sutton
Organizations
- University of Massachusetts Amherst