Comparison of Evaluation Metrics for Sentence Boundary Detection
Abstract
Automatic detection of sentences in speech is useful to enrich speech recognition output and ease subsequent language processing modules. In the recent NIST evaluations for this task, an error rate was used to evaluate system performance. A variety of metrics such as F-measure, ROC or DET curves have also been explored in other studies. This paper aims to take a closer look at the evaluation issue for sentence boundary detection. We employ different metrics NIST error rate, classification error rate per word boundary, precision and recall, ROC curve, DET curve, precision-recall curve, and the area under the curves, to compare different system output. In addition we use two different corpora in order to evaluate the impact of different imbalance in the data set. We show that it is helpful to use curves as well as a single performance metric, and that different curves show different advantages in visualization. Furthermore, the data skewness also has an impact on the metrics.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 01, 2007
- Accession Number
- ADA534445
Entities
People
- Elizabeth Shriberg
- Yang Liu
Organizations
- SRI International