Learning Classification Trees from Distributed Horizontally and Vertically Fragmented Data Sets

Abstract

Recent advances in data storage and acquisition technologies have made it possible to produce increasingly large data repositories. Most of these data sources are physically distributed and assembling them together at a central site is expensive in terms of network bandwidth and insecure. Hence there is a need for Learning Algorithms that are able to learn from distributed data without collecting it in a central location. We present provably exact algorithms for learning decision trees from distributed data sets. We prove that the results obtained in this case are the same as those obtained if the data were stored at a central location. We also give a time, space and communication cost analysis. We conclude with a discussion of a general technique for adapting some of the existing learning algorithms to learn from distributed datasets.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Jan 01, 2000
Accession Number
ADA638206

Entities

People

  • Adrian Silvescu
  • Tarkeshwari Sharma
  • Vasant Honavar

Organizations

  • Iowa State University

Tags

DTIC Thesaurus Topics

  • Abstracts
  • Acquisition
  • Algorithms
  • Artificial Intelligence
  • Boundaries
  • Classification
  • Cost Analysis
  • Data Sets
  • Data Storage Systems
  • Extraction
  • Information Operations
  • Information Processing
  • Learning
  • Standards
  • Storage

Fields of Study

  • Computer science

Readers

  • Distributed Systems and Data Platform Development
  • Operations Research

Technology Areas

  • Space