A Parallel Data Mining Toolbox Using MatlabMPI

Abstract

The ready availability of vast quantities of data has driven the need for data mining algorithms that can discover patterns, correlations and changes in the data. The amount and high dimensionality of the data make data mining an important application for high performance computing Joshi, 2002. The mathematical and interactive nature of many of the data mining algorithm, makes it natural to use a language like MATLAB both to design algorithms and for post-processing of the results. Recently, Kepner 2002 has developed a system, called MatlabMPI, which implements the six basic functions of the Message Passing Interface (MPI) standard in MATLAB, and thus allows any Matlab program to exploit multiple processors. This has motivated us to develop a parallel data mining toolbox that is based on MatlabMPI. Implementations of a parallel clustering algorithm and a parallel classification algorithm have been completed, and other functions are currently under development.

Open PDF

Document Details

Document Type
Technical Report
Publication Date
Aug 20, 2004
Accession Number
ADA428768

Entities

People

  • Ashok K. Krishnamurthy
  • John W. Nehrbass
  • Juan C. Chaves
  • Parna Khot
  • Stanley C. Ahalt

Organizations

  • Ohio State University

Tags

Communities of Interest

  • Materials and Manufacturing Processes

DTIC Thesaurus Topics

  • Algorithms
  • Classification
  • Clustering
  • Computations
  • Convergence
  • Data Mining
  • Data Sets
  • Electrical Engineering
  • Engineering
  • High Performance Computing
  • Hybrid Systems
  • Machine Learning
  • Parallel Computing
  • Parallel Processing
  • Pattern Recognition
  • Standards
  • Supervised Machine Learning

Fields of Study

  • Computer science

Readers

  • Computational Linguistics
  • Distributed Systems and Data Platform Development

Technology Areas

  • AI & ML