Collective Segmentation and Labeling of Distant Entities in Information Extraction

Abstract

In information extraction, we often wish to identify all mentions of an entity such as a person or organization. Traditionally a group of words is labeled as an entity based only on local information. But information from throughout a document can be useful; for example if the same word is used multiple times it is likely to have the same label each time. We present a CRF that explicitly represents dependencies between the labels of pairs of similar words in a document. On a standard information extraction data set we show that learning these dependencies leads to a 13.7% reduction in error on the field that had caused the most repetition errors.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Jul 01, 2004
Accession Number: ADA439444

Entities

People

Andrew McCallum
Charles Sutton

Organizations

University of Massachusetts Amherst

Collective Segmentation and Labeling of Distant Entities in Information Extraction

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas