A Parallel Data Mining Toolbox Using MatlabMPI
Abstract
The ready availability of vast quantities of data has driven the need for data mining algorithms that can discover patterns, correlations and changes in the data. The amount and high dimensionality of the data make data mining an important application for high performance computing Joshi, 2002. The mathematical and interactive nature of many of the data mining algorithm, makes it natural to use a language like MATLAB both to design algorithms and for post-processing of the results. Recently, Kepner 2002 has developed a system, called MatlabMPI, which implements the six basic functions of the Message Passing Interface (MPI) standard in MATLAB, and thus allows any Matlab program to exploit multiple processors. This has motivated us to develop a parallel data mining toolbox that is based on MatlabMPI. Implementations of a parallel clustering algorithm and a parallel classification algorithm have been completed, and other functions are currently under development.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 20, 2004
- Accession Number
- ADA428768
Entities
People
- Ashok K. Krishnamurthy
- John W. Nehrbass
- Juan C. Chaves
- Parna Khot
- Stanley C. Ahalt
Organizations
- Ohio State University