A Parallel, Multithreaded Decision Tree Builder

Abstract

Parallelization has become a popular mechanism to speed up data classification tasks that deal with large amounts of data. This paper describes a high level, fine grained parallel formulation of a decision tree based classifier for memory resident datasets on SMPs. We exploit two levels of divide and conquer parallelism in the tree builder: at the outer level across the tree nodes, and at the inner level within each tree node. Lightweight Pthreads are used to express this highly irregular and dynamic parallelism in a natural manner. The task of scheduling the threads and balancing the load is left to a space efficient Pthreads scheduler. Experimental results on large datasets indicate that the space and time performance of the tree builder scales well with both the data size and number of processors.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Dec 01, 1998
Accession Number
ADA363531

Entities

People

  • Girija J. Narlikar

Organizations

  • Carnegie Mellon University

Tags

Communities of Interest

  • Human Systems

DTIC Thesaurus Topics

  • Accuracy
  • Algorithms
  • Automated Speech Recognition
  • Classification
  • Computations
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Mining
  • Hash Tables
  • Lightweight
  • Multithreading
  • Parallel Computing
  • Programming Languages
  • Scheduling (Production)
  • Training

Fields of Study

  • Computer science

Readers

  • Database Systems and Applications
  • Neural Network Machine Learning.
  • Parallel and Distributed Computing.

Technology Areas

  • Space