Generative Model Based Statistical Analysis of Gene Expression Patterns in Breast Cancer
Abstract
Micro array analysis provides an efficient unbiased strategy to identify differentially expressed genes in breast cancer. The statistical analysis of large-scale gene expression studies, however, imposes several serious novel challenges. Most importantly, gene expression arrays have a well-defined internal data structure dictated by the genetic network of the living cell. We showed that ignoring this data structure leads to errors of several orders of magnitudes in the statistical analysis. The consequence of this is either producing false leads for experimenters or eliminating truly important leads in cancer research. We have introduced two methods to overcome this problem. The first is based on generative models that produce random data sets while retaining the overall level of gene co-regulation as reflected in the distribution of pair-wise co-regulation measures. The second methodology is an information theoretic approach based on the theory of RxC contingency tables. This latter method also deals with the lack of replicates often encountered in cancer genomics. These methods determine the probability that a given feature, such as a cluster or separator, will appear by chance in a gene expression array.
Document Details
- Document Type
- Technical Report
- Publication Date
- Aug 01, 2002
- Accession Number
- ADA412180
Entities
People
- Zoltán Szállási
Organizations
- Henry M. Jackson Foundation for the Advancement of Military Medicine