A Parallel, Multithreaded Decision Tree Builder
Abstract
Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high level, fine grained parallel formulation of a decision tree based classifier for memory resident datasets on SMPs. We exploit two levels of divide and conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.
Document Details
- Document Type
- Technical Report
- Publication Date
- Dec 01, 1998
- Accession Number
- ADA363531
Entities
People
- Girija J. Narlikar
Organizations
- Carnegie Mellon University