Tree Ensemble Regression and Classification Methods

Information

  • Research Project
  • 6832086
  • ApplicationId
    6832086
  • Core Project Number
    R43CA105724
  • Full Project Number
    1R43CA105724-01A1
  • Serial Number
    105724
  • FOA Number
  • Sub Project Id
  • Project Start Date
    8/1/2004 - 20 years ago
  • Project End Date
    11/30/2005 - 19 years ago
  • Program Officer Name
    GEZMU, MISRAK
  • Budget Start Date
    8/1/2004 - 20 years ago
  • Budget End Date
    11/30/2005 - 19 years ago
  • Fiscal Year
    2004
  • Support Year
    1
  • Suffix
    A1
  • Award Notice Date
    7/9/2004 - 20 years ago
Organizations

Tree Ensemble Regression and Classification Methods

[unreadable] DESCRIPTION (provided by applicant): [unreadable] This SBIR aims to produce next generation classification and regression software based upon ensembles of decision trees: bagging, random forests, and boosting. The prediction accuracy of these methods has caused much excitement in the machine learning community, and both challenges and complements the data modeling culture prevalent among biostatisticians. Recent research extends the methodology to likelihood based methods used in biostatistics, leading to models for survival data and generalized forest models. Generalized forest models extend regression forests in the same way that generalized linear models extend linear models. [unreadable] [unreadable] This software would apply broadly, including to medical diagnosis, prognostic modeling, and detecting cancer; and for modeling patient characteristics like blood pressure, discrete responses in clinical trials, and count data. [unreadable] [unreadable] Phase I work will prototype software for survival data, and investigate the performance of ensemble methods on simulated and real data. For survival applications, we will assess out-of-bag estimates of performance, and investigate measures of variable importance and graphics that help clinicians understand the results. Experience writing prototypes and using them on data will lead to a preliminary software design that serves as the foundation of Phase II work. [unreadable] [unreadable] Phase II will expand upon this work to create commercial software. We will research and implement algorithms for a wider range of applications including generalized forest models, classification, and least squares regression. We will also implement robust loss criteria that enable good performance on noisy data, and make adaptations to handle large data sets. [unreadable] [unreadable] This proposed software will enable medical researchers to obtain high prediction accuracy, and complement traditional tools like discriminant analysis, linear and logistic regression models, and the Cox model. [unreadable] [unreadable]

IC Name
NATIONAL CANCER INSTITUTE
  • Activity
    R43
  • Administering IC
    CA
  • Application Type
    1
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    99937
  • Sub Project Total Cost
  • ARRA Funded
  • CFDA Code
    394
  • Ed Inst. Type
  • Funding ICs
    NCI:99937\
  • Funding Mechanism
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    INSIGHTFUL CORPORATION
  • Organization Department
  • Organization DUNS
    150683779
  • Organization City
    SEATTLE
  • Organization State
    WA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    98109
  • Organization District
    UNITED STATES