Parallel Data Mining with the Message Passing Interface Standard on Clusters of Personal Computers.
Abstract
Piles of personal computers (PoPCs) have begun to challenge the performance of the traditional Massively Parallel Processors (MPPs) and the less traditional networks of workstations (NOWs) as platforms for parallel computing. Large clusters of PCs have reached and at times exceeded the performance of modern MPPs at a fraction of the cost. Built with commodity components, these clusters can be constructed for about half the cost of a comparable NOW. The primary competing operating systems (OIS) in use on PoPCs are Linux and Windows NT. This thesis investigation compares the performance of an NT cluster with that of a Linux cluster, a NOW, and an MPP. A comparison of the MPI tools available for NT is also accomplished. These comparisons are made using the Pallas benchmark suite for MPI and a parallel data mining algorithm. This data mining technique, known as the Genetic Rule and Classifier Construction Environment (GRaCCE), uses a genetic algorithm to mine decision rules from data. Results from experimentation and statistical analysis have produced three important conclusions. First, NT clusters are viable, cost effective alternatives to Linux clusters, NOWs, and MPPs for parallel computing. Second, the two primary communication libraries currently available for NT-PaTENT MPI and MPI/Pro-are statistically equivalent in performance. Third, the parallel GRaCCE algorithm is capable of relatively good speedup and efficiency, even for significantly unbalanced processor workloads, if the effects of first loop iteration caching are ignored.
Document Details
- Document Type
- Technical Report
- Publication Date
- Mar 01, 1999
- Accession Number
- ADA361637
Entities
People
- Lonnie P. Hammack
Organizations
- Air Force Institute of Technology