Semantic Data Lake for Biomedical Research

Information

  • Research Project
  • 9443736
  • ApplicationId
    9443736
  • Core Project Number
    R44CA206782
  • Full Project Number
    3R44CA206782-01A1S1
  • Serial Number
    206782
  • FOA Number
    PA-14-154
  • Sub Project Id
  • Project Start Date
    9/15/2016 - 8 years ago
  • Project End Date
    5/31/2017 - 8 years ago
  • Program Officer Name
    EVANS, GREGORY
  • Budget Start Date
    9/15/2016 - 8 years ago
  • Budget End Date
    5/31/2017 - 8 years ago
  • Fiscal Year
    2017
  • Support Year
    01
  • Suffix
    A1S1
  • Award Notice Date
    3/13/2017 - 8 years ago
Organizations

Semantic Data Lake for Biomedical Research

Capitalizing on the transformative opportunities afforded by the extremely large and ever-growing volume, velocity, and variety of biomedical data being continuously produced is a major challenge. The development and increasingly widespread adoption of several new technologies, including next generation genetic sequencing, electronic health records and clinical trials systems, and research data warehouses means that we are in the midst of a veritable explosion in data production. This in turn results in the migration of the bottleneck in scientific productivity into data management and interpretation: tools are urgently needed to assist cancer researchers in the assembly, integration, transformation, and analysis of these Big Data sets. In this project, we propose to develop the Semantic Data Lake for Biomedical Research (SDL-BR) system, a cluster-computing software environment that enables rapid data ingestion, multifaceted data modeling, logical and semantic querying and data transformation, and intelligent resource discovery. SDL-BR is based on the idea of a data lake, a distributed store that does not make any assumptions about the structure of incoming data, and that delays modeling decisions until data is to be used. This project adds to the data lake paradigm methods for semantic data modeling, integration, and querying, and for resource discovery based on learned relationships between users and data resources.

IC Name
NATIONAL CANCER INSTITUTE
  • Activity
    R44
  • Administering IC
    CA
  • Application Type
    3
  • Direct Cost Amount
  • Indirect Cost Amount
  • Total Cost
    50000
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    393
  • Ed Inst. Type
  • Funding ICs
    NCI:50000\
  • Funding Mechanism
    SBIR-STTR RPGs
  • Study Section
    ZRG1
  • Study Section Name
    Special Emphasis Panel
  • Organization Name
    INFOTECH SOFT, INC.
  • Organization Department
  • Organization DUNS
    035354070
  • Organization City
    MIAMI
  • Organization State
    FL
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    331313207
  • Organization District
    UNITED STATES