Content Based Retrieval Database Management System with Support for Similarity Searching and Query Refinement

Abstract

With the emergence of many application domains that require imprecise similarity based access to information, techniques to support such a retrieval paradigm over database systems have emerged as a critical area of research. This thesis explores how to enhance database systems with content based search over arbitrary abstract data types in a similarity based framework with query refinement. This scope opens a number of challenges previously not faced by databases, among them: * Extension of abstract data types to support arbitrary similarity functions and support for query refinement. (Intra type similarity and feedback) * Extension of the already developed query refinement models under the MARS system to a general multi table relational model. (Inter Type similarity and feedback) * Extension of query processing models from a set based model where tuples either satisfy or not the query predicate to a result where the degree to which tuples satisfy a predicate is represented by their similarity values. (Similarity predicates) * Based on the similarity values, return only the best k matches. This implies a sorting on the similarity values and ample optimizations are possible to use lazy evaluation and only compute those answers that the user will see. (Ranked Retrieval) * Optimization of query execution under the similarity conditions which requires access to specialized indices. Optimized composite predicate merging is possible based on earlier work on the MARS project to compute the similarity value for a predicate based on independent streams rather than using the value directly. (Incremental top-k merging) We are building a prototype system that implements the proposed functionality in an efficient way and we evaluate the quality of the answers returned to the user.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2002
Accession Number
ADA466965

Entities

People

  • Michael Ortega-binderberger

Organizations

  • University of Illinois Urbana–Champaign

Tags

Communities of Interest

  • Energy and Power Technologies
  • Materials and Manufacturing Processes
  • Sensors

DTIC Thesaurus Topics

  • Computers
  • Data Analysis
  • Database Management Systems
  • Databases
  • Dielectric Gases
  • Domain Specific Programming Languages
  • Electronic Commerce
  • Gray Scale
  • Information Retrieval
  • Information Science
  • Information Systems
  • Probabilistic Models
  • Relational Database Management Systems
  • Relational Databases
  • Trees (Data Structures)
  • Two Dimensional
  • User Interface

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Regression Analysis.