Linguistic Resources for Speech Parsing

Abstract

We report on the success of a two-pass approach to annotating metadata, speech effects and syntactic structure in English conversational speech: separately annotating transcribed speech for structural metadata, or structural events, (fillers, speech repairs (or edit dysfluencies) and SUs, or syntactic/semantic units) and for syntactic structure (treebanking constituent structure and shallow argument structure). The two annotations were then combined into a single representation. Certain alignment issues between the two types of annotation led to the discovery and correction of annotation errors in each, resulting in a more accurate and useful resource. The development of this corpus was motivated by the need to have both metadata and syntactic structure annotated in order to support synergistic work on speech parsing and structural event detection. Automatic detection of these speech phenomena would simultaneously improve parsing accuracy and provide a mechanism for cleaning up transcriptions for downstream text processing. Similarly, constraints imposed by text processing systems such as parsers can be used to help improve identification of dysfluencies and sentence boundaries. This paper reports on our efforts to develop a linguistic resource providing both spoken metadata and syntactic structure information, and describes the resulting corpus of English conversational speech.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2006
Accession Number
ADA456754

Entities

People

  • Ann Bies
  • Haejoong Lee
  • Kazuaki Maeda
  • Mary Harper
  • Matthew Lease
  • Seth Kulick
  • Stephanie Strassel
  • Yang Liu

Tags

Communities of Interest

  • Autonomy

DTIC Thesaurus Topics

  • Accuracy
  • Automated Speech Recognition
  • Automatic
  • Boundaries
  • Case Studies
  • Cognition
  • Consortiums
  • Data Sets
  • Detection
  • Errors
  • Event Detection
  • Language
  • Materials
  • Natural Languages
  • Recognition
  • Specifications
  • Text Processing

Readers

  • Computational Linguistics