A Study in Machine Learning from Imbalanced Data for Sentence Boundary Detection in Speech

Abstract

Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textural information.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Oct 01, 2006
Accession Number
AD1002399

Entities

People

  • Andreas Stolcke
  • Elizabeth Shriberg
  • Mary P. Harper
  • Nitesh Chawla
  • Yang Liu

Organizations

  • SRI International

Tags

Communities of Interest

  • Autonomy
  • Human Systems

DTIC Thesaurus Topics

  • Artificial Intelligence Software
  • Automata Theory
  • Automated Speech Recognition
  • Computational Science
  • Computer Languages
  • Data Mining
  • Data Sets
  • Hidden Markov Models
  • Information Science
  • Language
  • Machine Learning
  • Markov Models
  • Natural Language Processing
  • Pilot Studies
  • Probability
  • Signal Processing
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Speech Processing/Speech Recognition.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • AI & ML - Machine Translation
  • AI & ML - Neural Networks