Data Mining on an OLTP System (Nearly) for Free

Abstract

This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to provide a consistent one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head passes over them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jun 01, 1999
Accession Number
ADA457583

Entities

People

  • Christos Faloutsos
  • David Nagle
  • Erik Riedel
  • Greg Ganger

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Energy and Power Technologies
  • Engineered Resilient Systems

DTIC Thesaurus Topics

  • Abstracts
  • Algorithms
  • Backup Systems
  • Bandwidth
  • Computations
  • Computer Science
  • Computers
  • Data Mining
  • Databases
  • Decision Support Systems
  • Load Monitoring
  • Multiprogramming
  • Scheduling (Production)
  • Simulations
  • Simulators
  • Throughput
  • Workload

Fields of Study

  • Computer science

Readers

  • Computer Science/Computer Engineering/Data Science/Digital Signal Processing.
  • Database Systems and Applications
  • Neural Network Machine Learning.

Technology Areas

  • AI & ML
  • AI & ML - Machine Learning Algorithms