Transducing Markov sequences

Abstract

A Markov sequence is a basic statistical model representing uncertain sequential data, and it is used within a plethora of applications, including speech recognition, image processing, computational biology, radio-frequency identification (RFID), and information extraction. The problem of querying a Markov sequence is studied under the conventional semantics of querying a probabilistic database, where queries are formulated as finite-state transducers. Specifically, the complexity of two main problems is analyzed. The first problem is that of computing the confidence (probability) of an answer. The second is the enumeration of the answers in the order of decreasing confidence (with the generation of the top- k answers as a special case), or in an approximate order thereof. In particular, it is shown that enumeration in any subexponential-approximate order is generally intractable (even for some fixed transducers), and a matching upper bound is obtained through a proposed heuristic. Due to this hardness, a special consideration is given to restricted (yet common) classes of transducers that extract matches of a regular expression (subject to prefix and suffix constraints), and it is shown that these classes are, indeed, significantly more tractable.

Document Details

Document Type
Pub Defense Publication
Publication Date
Sep 08, 2014
Source ID
10.1145/2630065

Entities

People

  • Benny Kimelfeld
  • Christopher RĂ©

Organizations

  • Alfred P. Sloan Foundation
  • Division of Computing and Communication Foundations
  • Division of Information and Intelligent Systems
  • Office of Naval Research
  • Stanford University

Tags

Fields of Study

  • Mathematics

Readers

  • Adaptive Control and Estimation with Uncertainty in Dynamic Systems.
  • Computational Linguistics

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms