Leveraging Multimodal Redundancy for Dynamic Learning, with SHACER - a Speech and HAndwriting reCognizER

Abstract

New language constantly emerges from complex, collaborative human-human interactions like meetings such as when a presenter handwrites a new term on a whiteboard while saying it redundantly. Fixed vocabulary recognizers fail on such new terms, which often are critical to dialogue understanding. This dissertation presents SHACER, our Speech and HAndwriting reCoginzER (pronounced "shaker"). SHACER learns out-of-vocabulary terms dynamically by integrating information from instances of redundant handwriting and speaking. SHACER can automatically populate an MS Project TM Gantt Chart by observing a whiteboard scheduling meeting. To document the occurrence and importance of such multimodal redundancy, we examine (1) whiteboard presentations, (2) a spontaneous brainstorming meeting, and (3) informal annotation discussions about travel photographs. Averaged across these three contexts 96.5% of handwritten words were also spoken redundantly. We also find that redundantly presented terms are (a) highly topic specific and thus likely to be out-of-vocabulary, (b) more memorable, and (c) significantly better query terms for later search and retrieval. To combine information SHACER normalizes handwriting and speech recognizer out- puts by applying letter-to-sound and sound-to-letter transformations. SHACER then uses an articulatory-feature based distance metric to align handwriting to redundant speech. Phone sequence information from that aligned segment then constrains a second pass phone recognition over cached speech features. The resulting refined pronunciation serves as a measure against which the integration of all orthographic and pronunciation hypotheses is scored. High-scoring integrations are enrolled in the system's dictionaries and reinforcement tables. When a presenter subsequently says a newly enrolled term it is more easily recognized.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Apr 01, 2007
Accession Number
ADA573719

Entities

People

  • Edward C. Kaiser

Organizations

  • Oregon Health & Science University

Tags

Communities of Interest

  • Autonomy
  • Cyber
  • Energy and Power Technologies

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Automata Theory
  • Automated Speech Recognition
  • Computational Science
  • Computer Languages
  • Computer Programming
  • Computer Programs
  • Computer Vision
  • Computers
  • Engineering
  • Gantt Charts
  • Language
  • Machine Learning
  • Natural Language Processing
  • Ontologies
  • Operating Systems

Readers

  • Agent-Based Social Robotics and Mobile-Assisted Learning in Virtual Environments.
  • Educational Psychology
  • Speech Processing/Speech Recognition.