Learning Effective and Robust Knowledge for Semantic Query Optimization.

Abstract

Optimizing queries to heterogeneous, distributed multidatabases is an important problem. Due to the query complexity and the heterogeneity of databases, it is difficult for conventional optimization approaches to solve the problem satisfactorily. Semantic Query Optimization (SQO) can complement conventional approaches to overcome the heterogeneity and considerably reduce redundant data transmission. SQO optimizers use rules about data regularities to yield significant cost reduction. However, hand coding useful rules for SQO is impracticable. This dissertation presents a machine learning approach to this knowledge bottleneck problem. Unlike search control rules or classification rules studied extensively in machine learning, two roughly correlated measures must be maximized in the learning of high utility rules for SQO. The first measure is the effectiveness. Effective rules must be applicable in many different queries and yield high cost reduction. The second measure is the robustness against database changes. That is, they must remain valid regardless of database changes. This dissertation presents a new inductive learning approach to learning effective and robust rules. The learning approach considers both applicability and cost-reduction in rule induction to learn effective rules. The learned rules are robust because the learner is able to guide the learning for robust rules with an approach to estimating the probabilities of database changes. To evaluate the utility of the learning approach, this dissertation also describes an extended SQO approach for query plans that retrieve data from heterogeneous multidatabases. The experimental results show that the learned rules produce significant savings while being robust against database changes.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1996
Accession Number
ADA327515

Entities

People

  • Chun-Nan Hsu

Organizations

  • University of Southern California

Tags

Communities of Interest

  • Autonomy
  • Biomedical
  • C4I
  • Ground and Sea Platforms

DTIC Thesaurus Topics

  • Artificial Intelligence
  • Artificial Intelligence Software
  • Bayesian Networks
  • Computer Programming
  • Computer Science
  • Computers
  • Cost Reductions
  • Costs
  • Data Mining
  • Data Transmission
  • Database Management Systems
  • Databases
  • Information Science
  • Information Systems
  • Machine Learning
  • Probability
  • Theses

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Regression Analysis.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms
  • AI & ML - Neural Networks