Collaborative Research: Bayesian ANOVA for Microarrays

Information

  • NSF Award
  • 0405675
Owner
  • Award Id
    0405675
  • Award Effective Date
    8/15/2004 - 20 years ago
  • Award Expiration Date
    7/31/2008 - 16 years ago
  • Award Amount
    $ 118,120.00
  • Award Instrument
    Standard Grant

Collaborative Research: Bayesian ANOVA for Microarrays

DNA microarrays can provide insight into genetic changes occurring<br/>during stagewise progression of diseases like cancer. Accurate<br/>identification of these changes has significant therapeutic and<br/>diagnostic implications. Statistical analysis of such data is however<br/>challenging due to the sheer volume of information. With new<br/>microarray technology it is possible to measure expressions on nearly<br/>60,000 transcripts for each sample of tissue analyzed. To properly<br/>understand the evolution of a progressive disease, expression values<br/>are collected over all possible biological stages, thus the number of<br/>parameters in such problems can be in the hundreds of thousands, or<br/>even millions. The high dimensionality presents theoretical problems<br/>to standard ANOVA-based extensions of two-sample Z-tests, a popular<br/>method for detecting differentially expressed genes in two groups.<br/>Additionally, standard approaches that focus on controlling false<br/>detection rates primarily apply to simpler experimental designs;<br/>moreover these approaches tend to be conservative and are expected to<br/>be worse in multigroup settings. This work introduces a new<br/>methodology called Bayesian ANOVA for Microarrays (BAM) for reliably<br/>detecting differentially expressed genes in complex experimental<br/>settings. The method rests on a high dimensional variable selection<br/>method that exploits a rescaled spike and slab hierarchical model.<br/>BAM is shown to be risk optimal in terms of the total number of<br/>misclassified genes. The exact mechanisms for this risk optimality<br/>are theoretically delineated as a selective shrinkage effect. Theory<br/>guides development of graphical devices for adaptive optimal gene<br/>selection. A large multistage colon cancer microarray repository<br/>collected at the Ireland Cancer Center of Case Western Reserve<br/>University serves as a testbed for the methods. In parallel to this<br/>is the development of JAVA-based software for implementing BAM.<br/>Software uses a menu driven GUI and includes a minimal number of<br/>user-specified tuning parameters, thus making it user friendly for use<br/>by other molecular biology laboratories.<br/><br/>DNA microarrays allow for high throughput analysis of potential<br/>genetic determinants of diseases like cancer. It is now typical to<br/>have expression on nearly 60,000 transcripts for each sample of tissue<br/>analyzed. This information can potentially provide information about<br/>which genes are involved in stagewise development of cancer as well as<br/>indicate novel therapeutic and diagnostic targets. However,<br/>statistical inferences to identify interesting genes is challenging<br/>due to the large number of statistical tests that are run. Standard<br/>approaches employ ANOVA test statistics and are prone to high false<br/>detections. False detection rate control methods tend to be overly<br/>conservative and do not extend naturally to more complex multistage<br/>experimental designs. This work introduces a new methodology called<br/>Bayesian ANOVA for Microarrays (BAM) which reliably detects<br/>differentially expressed genes in multigroup experimental design<br/>settings. The method employs a special hierarchical model that<br/>imparts an oracle like behaviour for gene selection --- that is,<br/>ultimately, only those truly differentially expressing genes are<br/>selected. The reasons for this behaviour are theoretically delineated<br/>in this research, and the theory guides the development of novel<br/>graphical devices for adaptively optimal gene selection in real<br/>microarray datasets. A large multistage colon cancer microarray<br/>repository collected at the Ireland Cancer Center of Case Western<br/>Reserve University serves as a testbed for the methods and also<br/>provides a tremendous opportunity to understand the colon cancer<br/>disease process, a topic which is of great medical importance. While<br/>colon cancer has a well defined evolution defined by clinical stage,<br/>very little is known about its molecular evolution. In parallel to<br/>this, is the development of JAVA-based software using a menu driven<br/>GUI having a minimal number of user-specified tuning parameters, thus<br/>making it feasible to port the software to molecular biology<br/>laboratories for active use in analysis of other disease processes and<br/>potentially other high throughput sources of data.

  • Program Officer
    Gabor J. Szekely
  • Min Amd Letter Date
    8/2/2004 - 20 years ago
  • Max Amd Letter Date
    7/30/2007 - 17 years ago
  • ARRA Amount

Institutions

  • Name
    Cleveland Clinic Foundation
  • City
    Cleveland
  • State
    OH
  • Country
    United States
  • Address
    9500 Euclid Avenue
  • Postal Code
    441950001
  • Phone Number
    2164456440

Investigators

  • First Name
    Hemant
  • Last Name
    Ishwaran
  • Email Address
    hemant.ishwaran@gmail.com
  • Start Date
    8/2/2004 12:00:00 AM

FOA Information

  • Name
    Other Applications NEC
  • Code
    99