Parsing Conversational Speech Using Enhanced Segmentation
Abstract
The lack of sentence boundaries and presence of disfluencies pose difficulties for parsing conversational speech. This work investigates the effects of automatically detecting these phenomena on a probabilistic parser's performance. We demonstrate that a state-of-the-art segmenter, relative to a pause-based segmenter, gives more than 45% of the possible error reduction in parser performance, and that presentation of interruption points to the parser improves performance over using sentence boundaries alone. Parsing speech can be useful for a number of tasks, including information extraction and question answering from audio transcripts. However, parsing conversational speech presents a different set of challenges than parsing text: sentence boundaries are not well-defined, punctuation is absent, and disfluencies (edits and restarts) impact the structure of language.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2004
- Accession Number
- ADA457886
Entities
People
- Ciprian Chelba
- Jeremy G. Kahn
- Mari Ostendorf
Organizations
- University of Washington