Association Analysis Software for Mining Clinical Next-Gen Sequencing Data

Information

  • Research Project
  • 8236680
  • ApplicationId
    8236680
  • Core Project Number
    R44HG006578
  • Full Project Number
    1R44HG006578-01
  • Serial Number
    006578
  • FOA Number
    RFA-HG-10-019
  • Sub Project Id
  • Project Start Date
    9/25/2012 - 11 years ago
  • Project End Date
    6/24/2013 - 11 years ago
  • Program Officer Name
    SOFIA, HEIDI J
  • Budget Start Date
    9/25/2012 - 11 years ago
  • Budget End Date
    6/24/2013 - 11 years ago
  • Fiscal Year
    2012
  • Support Year
    01
  • Suffix
  • Award Notice Date
    9/24/2012 - 11 years ago
Organizations

Association Analysis Software for Mining Clinical Next-Gen Sequencing Data

DESCRIPTION (provided by applicant): Remarkable improvements in throughput, accuracy and cost-effectiveness of next-generation sequencing (next-gen) technologies are ushering in a new era of clinical medicine. Genome wide association studies (GWAS) in particular have begun to leverage these advances to determine the complete catalog of common and rare variants for each member of a cohort. The resolving power of this approach has the potential to greatly accelerate our understanding, diagnosis and treatment of human disease. Unfortunately, analysis of these massive data sets requires that several disparate pieces of software be cobbled together including a large capacity next-gen sequencing assembler, variation detection modules, mapping and comparison tools for tens to hundreds of variant reports, statistical analysis packages, reporting tools, and so on. Combining and using these tools typically requires extensive bioinformatic expertise as the software is rarely well documented or supported and often depends on having elaborate hardware. These hurdles makes next-gen based GWAS inaccessible to the vast majority of the crucial user base, the physician researchers. The goal of this proposal is to assemble the essential next-gen based GWAS software components into a single coherent pipeline that that is fully equipped to meet the needs of the medical research community. Consistent with DNASTAR's 28 year tradition, the software will be easy to use, run on a reasonably priced (<$3000) desktop computer, and will be fully documented and supported. The pipeline will consist of two modules already available through DNASTAR, SeqMan NGen 3.0 (SM NGen 3.0) and ArrayStar. SM NGen 3.0, our recently released human genome scale assembly and analysis package, forms the front end of pipeline. Reference-guided assemblies of whole human genome or exome next-gen data sets produce variation reports including impact on gene features and associations with the dbSNP database. Putative variations can be verified by direct inspection of the alignment through the SeqMan Pro component of the package. Variation reports from each member of a GWAS cohort will then be fed into our multi-sample comparison and analysis program, ArrayStar, at the back end of the pipeline. ArrayStar has the infrastructure for multi-sample management and processing which can be easily adapted to GWAS analysis. These adaptations and their documentation are a central focus of this application. Critical to the successful development of this software is our collaboration with Dr. Douglas McNeel (Dept. of Oncology, UW-Madison). The exomes from a panel of prostate cancer vaccine recipients, including responders and non-responders, from the McNeel lab will be sequenced as input from which to build the pipeline using iterative cycles of development followed by evaluation by the McNeel group. This relationship offers an ideal opportunity to build the analysis and reporting software needed by physician researchers to form, test and validate GWAS generated hypotheses. PUBLIC HEALTH RELEVANCE: The easy to use tools to be developed and integrated in this project will dramatically enhance the efficiency of clinical and diagnostic research for a wide range of life scientists and medical professionals using next-generation DNA sequencing technologies, allowing new treatments to be brought to market sooner, enhancing scientists' understanding of treatment efficacy, and supporting the tailoring of different treatments to specific groups of individuals based on their genetic composition. These tools will be flexible enough to support critical analysis of large populations for clinical research and easy enough to use for all life scientists and medical professionals to feel comfortable with them.

IC Name
NATIONAL HUMAN GENOME RESEARCH INSTITUTE
  • Activity
    R44
  • Administering IC
    HG
  • Application Type
    1
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    150000
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    172
  • Ed Inst. Type
  • Funding ICs
    NHGRI:150000\
  • Funding Mechanism
    SBIR-STTR RPGs
  • Study Section
    ZHG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    DNASTAR, INC.
  • Organization Department
  • Organization DUNS
    130194947
  • Organization City
    MADISON
  • Organization State
    WI
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    537055202
  • Organization District
    UNITED STATES