Comparison of Evaluation Metrics for Sentence Boundary Detection

Abstract

Automatic detection of sentences in speech is useful to enrich speech recognition output and ease subsequent language processing modules. In the recent NIST evaluations for this task, an error rate was used to evaluate system performance. A variety of metrics such as F-measure, ROC or DET curves have also been explored in other studies. This paper aims to take a closer look at the evaluation issue for sentence boundary detection. We employ different metrics NIST error rate, classification error rate per word boundary, precision and recall, ROC curve, DET curve, precision-recall curve, and the area under the curves, to compare different system output. In addition we use two different corpora in order to evaluate the impact of different imbalance in the data set. We show that it is helpful to use curves as well as a single performance metric, and that different curves show different advantages in visualization. Furthermore, the data skewness also has an impact on the metrics.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Apr 01, 2007
Accession Number: ADA534445

Entities

People

Elizabeth Shriberg
Yang Liu

Organizations

SRI International

Comparison of Evaluation Metrics for Sentence Boundary Detection

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers

Technology Areas