Spike and Slab Models: Theory and Applications

Information

  • NSF Award
  • 0705037
Owner
  • Award Id
    0705037
  • Award Effective Date
    8/1/2007 - 17 years ago
  • Award Expiration Date
    7/31/2011 - 13 years ago
  • Award Amount
    $ 159,995.00
  • Award Instrument
    Standard Grant

Spike and Slab Models: Theory and Applications

The investigator seeks to expand the theory and application for rescaled spike and slab models, a class of Bayesian models, to address the general problem of variable selection and prediction. This will be accomplished in three distinct aims: (1) By developing theory as well as fast computational algorithms for non-orthogonal designs making using of spike and slab orthogonalization. The resulting predictor, a bagged ensemble derived using generalized ridge regression, will be shown to possess state of the art predictiveness, when one factors in interpretation over black-box prediction. Theory, in the form of finite sample arguments, will show this is due to selective shrinkage, a property whereby only truly zero coefficients are shrunk towards zero; (2) By developing general methodology for hard thresholding estimated regression coefficients; (3) By extending the rescaled spike and slab framework to include non-linear models such as generalized linear models and non-proportional survival regression models with time dependent predictors.<br/><br/><br/>Intellectually, this research will enhance our understanding of model building and outcome prediction, especially in ill-determined settings when the sample size is on the order of, or dominated by, the number of predictors (variables). This type of setting is becoming all too common in scientific settings. Among applications considered will be colon cancer genomics, an important public health problem. Currently, colorectal cancer is the second leading cause of cancer mortality in the adult American population, accounting for 140,000 new cases annually and 60,000 deaths. Although widely used, it is known that the current classification scheme is highly imperfect in reflecting the actual underlying molecular determinants of colon cancer behavior. For instance, upwards of 20% of patients whose cancers metastasize to the liver are not given life saving adjuvant chemotherapy based on the current clinical staging system. Thus, there is an important need for the identification of a molecular signature that will identify tumors that metastasize. Another area of application will be long-term prediction models for predicting outcomes following coronary artery bypass surgery, a widely used surgical modality for patients with obstructive coronary artery disease. Current long-term prediction models have serious limitations which have hindered our understanding. Yet another application will be in understanding survival behavior of heart and lung transplant recipients and the role viruses play in potential dysfunction of the transplanted organs. Methodology will be complemented by development of software for fast computational solutions in high dimensional settings.

  • Program Officer
    Gabor J. Szekely
  • Min Amd Letter Date
    6/7/2007 - 17 years ago
  • Max Amd Letter Date
    12/11/2007 - 16 years ago
  • ARRA Amount

Institutions

  • Name
    Cleveland Clinic Foundation
  • City
    Cleveland
  • State
    OH
  • Country
    United States
  • Address
    9500 Euclid Avenue
  • Postal Code
    441950001
  • Phone Number
    2164456440

Investigators

  • First Name
    Hemant
  • Last Name
    Ishwaran
  • Email Address
    hemant.ishwaran@gmail.com
  • Start Date
    6/7/2007 12:00:00 AM

FOA Information

  • Name
    Other Applications NEC
  • Code
    99