Labeling Documents with Timestamps: Learning from their Time Expressions
Abstract
Temporal reasoners for document understanding typically assume that a document's creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. This paper addresses the task of automatic document timestamping, presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text's time expressions (e.g., 'since 1999'). This model alone improves on previous generative models by 77%. The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally we present a new experiment design that facilitates easier comparison by future work.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jul 01, 2012
- Accession Number
- ADA586366
Entities
People
- Nathanael Chambers