Attribute and object selection queries on objects with probabilistic attributes

Abstract

Modern data processing techniques such as entity resolution, data cleaning, information extraction, and automated tagging often produce results consisting of objects whose attributes may contain uncertainty. This uncertainty is frequently captured in the form of a set of multiple mutually exclusive value choices for each uncertain attribute along with a measure of probability for alternative values. However, the lay end-user, as well as some end-applications, might not be able to interpret the results if outputted in such a form. Thus, the question is how to present such results to the user in practice, for example, to support attribute-value selection and object selection queries the user might be interested in. Specifically, in this article we study the problem of maximizing the quality of these selection queries on top of such a probabilistic representation. The quality is measured using the standard and commonly used set-based quality metrics. We formalize the problem and then develop efficient approaches that provide high-quality answers for these queries. The comprehensive empirical evaluation over three different domains demonstrates the advantage of our approach over existing techniques.

Document Details

Document Type
Pub Defense Publication
Publication Date
Feb 01, 2012
Source ID
10.1145/2109196.2109199

Entities

People

  • Dmitri V. Kalashnikov
  • Rabia Nuray-turan
  • Sharad Mehrotra
  • Yaming Yu

Organizations

  • Defense Advanced Research Projects Agency
  • Division of Computer and Network Systems
  • University of California, Irvine

Tags

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Regression Analysis.
  • Software Engineering.

Technology Areas

  • AI & ML
  • AI & ML - Information Retrieval