Clustering Software for Biomedical Applications

Information

  • Research Project
  • 7294271
  • ApplicationId
    7294271
  • Core Project Number
    R44RR016386
  • Full Project Number
    5R44RR016386-03
  • Serial Number
    16386
  • FOA Number
  • Sub Project Id
  • Project Start Date
    7/1/2001 - 23 years ago
  • Project End Date
    9/29/2008 - 16 years ago
  • Program Officer Name
    CHOUDHRY, JAWAHAR
  • Budget Start Date
    9/30/2007 - 17 years ago
  • Budget End Date
    9/29/2008 - 16 years ago
  • Fiscal Year
    2007
  • Support Year
    3
  • Suffix
  • Award Notice Date
    9/5/2007 - 17 years ago
Organizations

Clustering Software for Biomedical Applications

[unreadable] DESCRIPTION (provided by applicant): We propose to provide clustering software for very large databases and for categorical data. Investigators in virtually all areas of research seek to discover patterns and relationships in data. Computer intensive exploratory analysis, or data mining, is having a huge impact in science and industry (e.g. Berkhin 2002, Maitra 2002). However, the availability of software for obtaining partitions and for their visualization lags far behind the proliferation of proposed methods and the growth in size of available databases. We believe that implementing new algorithms for clustering of large datasets that may include non-numeric attributes, and visualizing cluster properties will open new opportunities for data analysis. [unreadable] [unreadable] In Phase I, we developed scalable implementations of clustering methods, including k-means and its extensions to categorical and mixed mode data, and demonstrated that we could discover things about data through a combination of clustering and visualization that neither alone could provide. Our ultimate goal in Phases II and III is to develop a modular addition to the S-PLUS language called S+CLUSTER that provides the following key features: [unreadable] [unreadable] - A suite of clustering algorithms suitable for large and possibly high-dimensional datasets that may include categorical attributes; [unreadable] - Extensive capabilities for visual data exploration of the results of clustering; and [unreadable] - Tools for validation and diagnostics facilitating objective assessment of clustering results. [unreadable] [unreadable] We intend to create software that is flexible and easy to use, and which should enable the analysis and understanding of data from a wide range of applications. Clustering or unsupervised classification has been used in genetics research, protein classification, psychiatric research, analysis of biomedical signals, segmentation of medical images, etc. The software will be part of an integrated environment for data analysis, and it will permit the customization of the clustering process, which will extend the ability of biomedical researchers to understand complex data. New insights into microarrays, epidemiological data and protein database may have high potential in drug discovery, disease diagnosis, and treatment. [unreadable] [unreadable] [unreadable]

IC Name
NATIONAL CENTER FOR RESEARCH RESOURCES
  • Activity
    R44
  • Administering IC
    RR
  • Application Type
    5
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    364606
  • Sub Project Total Cost
  • ARRA Funded
  • CFDA Code
    389
  • Ed Inst. Type
  • Funding ICs
    NCRR:364606\
  • Funding Mechanism
  • Study Section
    BCHI
  • Study Section Name
    Biomedical Computing and Health Informatics Study Section
  • Organization Name
    INSIGHTFUL CORPORATION
  • Organization Department
  • Organization DUNS
    150683779
  • Organization City
    SEATTLE
  • Organization State
    WA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    98109
  • Organization District
    UNITED STATES