Update Propagation Strategies for Improving the Quality of Data on the Web

Abstract

Dynamically generated web pages are ubiquitous today but their high demand for resources creates a huge scalability problem at the servers. Traditional web caching is not able to solve this problem since it cannot provide any guarantees as to the freshness of the cached data. A robust solution to the problem is web materialization, where pages are cached at the web server and constantly updated in the background, resulting in fresh data accesses on cache hits. In this work, we define Quality of Data metrics to evaluate how fresh the data served to the users is. We then focus on the update scheduling problem: given a set of views that are materialized, find the best order to refresh them, in the presence of continuous updates, so that the overall Quality of Data (QoD) is maximized. We present a QoD-aware Update Scheduling algorithm that is adaptive and tolerant to surges in the incoming update stream. We performed extensive experiments using real traces and synthetic ones, which show that our algorithm consistently outperforms FIFO scheduling by up to two orders of magnitude. Prepared through collaborative participation in the Advanced Telecommunications/Information Distribution Research Program (ATIRP) Consortium sponsored by the U.S. Army Research Laboratory under the Federated Laboratory Program, Cooperative Agreement DAAL01-96-2-0002. An abridged version of this work appears in the Proceedings of the 27th VLDB Conference, Roma, Italy, 2001.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 28, 2001
Accession Number
ADA440812

Entities

People

  • Alexandros Labrinidis
  • Nick Roussopoulos

Organizations

  • University of Maryland

Tags

DTIC Thesaurus Topics

  • Algorithms
  • Commerce
  • Computations
  • Computer Science
  • Database Management Systems
  • Department Of Defense
  • Military Research
  • New York
  • Operating Systems
  • Probability
  • Scheduling (Production)
  • Simulations
  • Simulators
  • Software Development
  • Universities
  • Websites
  • World Wide Web

Fields of Study

  • Computer science

Readers

  • Computer Networking
  • Geospatial Intelligence and Artificial Intelligence Analytics
  • Parallel and Distributed Computing.