Labeling Documents with Timestamps: Learning from their Time Expressions

Abstract

Temporal reasoners for document understanding typically assume that a document's creation date is known. Algorithms to ground relative time expressions and order events often rely on this timestamp to assist the learner. Unfortunately, the timestamp is not always known, particularly on the Web. This paper addresses the task of automatic document timestamping, presenting two new models that incorporate rich linguistic features about time. The first is a discriminative classifier with new features extracted from the text's time expressions (e.g., 'since 1999'). This model alone improves on previous generative models by 77%. The second model learns probabilistic constraints between time expressions and the unknown document time. Imposing these learned constraints on the discriminative model further improves its accuracy. Finally we present a new experiment design that facilitates easier comparison by future work.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jul 01, 2012
Accession Number
ADA586366

Entities

People

  • Nathanael Chambers

Tags

Communities of Interest

  • Autonomy
  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Computational Linguistics
  • Computer Science
  • Generative Models
  • Information Retrieval
  • Knowledge Management
  • Language
  • Learning
  • Linguistics
  • Machine Learning
  • Natural Language Processing
  • Probability
  • Test Sets
  • Training
  • United States
  • United States Naval Academy

Fields of Study

  • Computer science

Readers

  • Computational Modeling and Simulation
  • Database Systems and Applications
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval
  • AI & ML - Neural Networks