Comparison of Evaluation Metrics for Sentence Boundary Detection

Abstract

Automatic detection of sentences in speech is useful to enrich speech recognition output and ease subsequent language processing modules. In the recent NIST evaluations for this task, an error rate was used to evaluate system performance. A variety of metrics such as F-measure, ROC or DET curves have also been explored in other studies. This paper aims to take a closer look at the evaluation issue for sentence boundary detection. We employ different metrics NIST error rate, classification error rate per word boundary, precision and recall, ROC curve, DET curve, precision-recall curve, and the area under the curves, to compare different system output. In addition we use two different corpora in order to evaluate the impact of different imbalance in the data set. We show that it is helpful to use curves as well as a single performance metric, and that different curves show different advantages in visualization. Furthermore, the data skewness also has an impact on the metrics.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2007
Accession Number
ADA534445

Entities

People

  • Elizabeth Shriberg
  • Yang Liu

Organizations

  • SRI International

Tags

Communities of Interest

  • Autonomy
  • Human Systems

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automated Speech Recognition
  • Boundaries
  • Classification
  • Computer Languages
  • Computer Science
  • Data Sets
  • Detection
  • False Alarms
  • Language
  • Machine Learning
  • Precision
  • Recognition
  • Test And Evaluation

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Computational Modeling and Simulation
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval