Model-Based Clustering and Data Transformations for Gene Expression Data
Abstract
Clustering is a useful exploratory technique for the analysis of gene expression data, and many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. Model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. This Gaussian mixture model has been shown to be a power tool for many applications. In addition, the issues of selecting a "good" clustering method and determining the "correct" number of clusters are reduced to model selection problems in the probability framework. We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has supeflor performance on our synthetic data sets, consistently selecting the correct model and the right number of clusters.
Document Details
- Document Type
- Technical Report
- Publication Date
- Apr 30, 2001
- Accession Number
- ADA458752
Entities
People
- Adrian Raftery
- Alejandro Murua
- Chris Fraley
- Ka Y. Yeung
- Walter L. Ruzzo
Organizations
- George Washington University