Learning Classification Trees from Distributed Horizontally and Vertically Fragmented Data Sets
Abstract
Recent advances in data storage and acquisition technologies have made it possible to produce increasingly large data repositories. Most of these data sources are physically distributed and assembling them together at a central site is expensive in terms of network bandwidth and insecure. Hence there is a need for Learning Algorithms that are able to learn from distributed data without collecting it in a central location. We present provably exact algorithms for learning decision trees from distributed data sets. We prove that the results obtained in this case are the same as those obtained if the data were stored at a central location. We also give a time, space and communication cost analysis. We conclude with a discussion of a general technique for adapting some of the existing learning algorithms to learn from distributed datasets.
Document Details
- Document Type
- Technical Report
- Publication Date
- Jan 01, 2000
- Accession Number
- ADA638206
Entities
People
- Adrian Silvescu
- Tarkeshwari Sharma
- Vasant Honavar
Organizations
- Iowa State University