Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory

Abstract

We describe our experience in developing a discourse-annotated corpus for community-wide use. Working in the framework of Rhetorical Structure Theory, we were able to create a large annotated resource with very high consistency, using a well-defined methodology and protocol. This resource is made publicly available through the Linguistic Data Consortium to enable researchers to develop empirically grounded, discourse-specific applications.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2001
Accession Number
ADA460581

Entities

People

  • Daniel Marcu
  • Lynn Carlson
  • Mary E. Okurowski

Organizations

  • United States Department of Defense

Tags

Communities of Interest

  • Materials and Manufacturing Processes
  • Space

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Text Summarization
  • Chinese Language
  • Computational Linguistics
  • Computational Science
  • Computer Languages
  • Hard Copy
  • Information Retrieval
  • Information Science
  • Language
  • Linguistics
  • Machine Translation
  • Natural Language Processing
  • Natural Languages
  • Standards
  • Test And Evaluation

Readers

  • Computational Linguistics
  • Defense Technology Research and Development.