Parallel Data Mining with the Message Passing Interface Standard on Clusters of Personal Computers.

Abstract

Piles of personal computers (PoPCs) have begun to challenge the performance of the traditional Massively Parallel Processors (MPPs) and the less traditional networks of workstations (NOWs) as platforms for parallel computing. Large clusters of PCs have reached and at times exceeded the performance of modern MPPs at a fraction of the cost. Built with commodity components, these clusters can be constructed for about half the cost of a comparable NOW. The primary competing operating systems (OIS) in use on PoPCs are Linux and Windows NT. This thesis investigation compares the performance of an NT cluster with that of a Linux cluster, a NOW, and an MPP. A comparison of the MPI tools available for NT is also accomplished. These comparisons are made using the Pallas benchmark suite for MPI and a parallel data mining algorithm. This data mining technique, known as the Genetic Rule and Classifier Construction Environment (GRaCCE), uses a genetic algorithm to mine decision rules from data. Results from experimentation and statistical analysis have produced three important conclusions. First, NT clusters are viable, cost effective alternatives to Linux clusters, NOWs, and MPPs for parallel computing. Second, the two primary communication libraries currently available for NT-PaTENT MPI and MPI/Pro-are statistically equivalent in performance. Third, the parallel GRaCCE algorithm is capable of relatively good speedup and efficiency, even for significantly unbalanced processor workloads, if the effects of first loop iteration caching are ignored.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Mar 01, 1999
Accession Number
ADA361637

Entities

People

  • Lonnie P. Hammack

Organizations

  • Air Force Institute of Technology

Tags

Communities of Interest

  • Energy and Power Technologies
  • Human Systems

DTIC Thesaurus Topics

  • Air Force
  • Algorithms
  • Computer Programming
  • Computer Programs
  • Computer Science
  • Computers
  • Data Analysis
  • Data Mining
  • Genetic Algorithms
  • Graphical User Interface
  • Information Science
  • Machine Learning
  • Network Science
  • Operating Systems
  • Parallel Computing
  • Parallel Processing
  • Two Dimensional

Fields of Study

  • Computer science

Readers

  • Data Mining and Knowledge Discovery.
  • Parallel and Distributed Computing.
  • Systems Analysis and Design

Technology Areas

  • AI & ML
  • Biotechnology