Transducing Markov sequences

Abstract

A Markov sequence is a basic statistical model representing uncertain sequential data, and it is used within a plethora of applications, including speech recognition, image processing, computational biology, radio-frequency identification (RFID), and information extraction. The problem of querying a Markov sequence is studied under the conventional semantics of querying a probabilistic database, where queries are formulated as finite-state transducers. Specifically, the complexity of two main problems is analyzed. The first problem is that of computing the confidence (probability) of an answer. The second is the enumeration of the answers in the order of decreasing confidence (with the generation of the top- k answers as a special case), or in an approximate order thereof. In particular, it is shown that enumeration in any subexponential-approximate order is generally intractable (even for some fixed transducers), and a matching upper bound is obtained through a proposed heuristic. Due to this hardness, a special consideration is given to restricted (yet common) classes of transducers that extract matches of a regular expression (subject to prefix and suffix constraints), and it is shown that these classes are, indeed, significantly more tractable.

Document Details

Document Type: Pub Defense Publication
Publication Date: Sep 08, 2014
Source ID: 10.1145/2630065

Entities

People

Benny Kimelfeld
Christopher Ré

Organizations

Alfred P. Sloan Foundation
Division of Computing and Communication Foundations
Division of Information and Intelligent Systems
Office of Naval Research
Stanford University

Transducing Markov sequences

Abstract

Document Details

Entities

People

Organizations

Tags

Fields of Study

Readers

Technology Areas