Model-Based Clustering and Data Transformations for Gene Expression Data

Abstract

Clustering is a useful exploratory technique for the analysis of gene expression data, and many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. Model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. This Gaussian mixture model has been shown to be a power tool for many applications. In addition, the issues of selecting a "good" clustering method and determining the "correct" number of clusters are reduced to model selection problems in the probability framework. We benchmarked the performance of model-based clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has supeflor performance on our synthetic data sets, consistently selecting the correct model and the right number of clusters.

Open PDF

Document Details

Document Type: Technical Report
Publication Date: Apr 30, 2001
Accession Number: ADA458752

Entities

People

Adrian Raftery
Alejandro Murua
Chris Fraley
Ka Y. Yeung
Walter L. Ruzzo

Organizations

George Washington University

Model-Based Clustering and Data Transformations for Gene Expression Data

Abstract

Document Details

Entities

People

Organizations

Tags

Communities of Interest

DTIC Thesaurus Topics

Fields of Study

Readers