Methods for Evolutionary Genomics Analysis

Information

  • Research Project
  • 10405153
  • ApplicationId
    10405153
  • Core Project Number
    R35GM139540
  • Full Project Number
    3R35GM139540-01S1
  • Serial Number
    139540
  • FOA Number
    PA-20-272
  • Sub Project Id
  • Project Start Date
    2/1/2021 - 4 years ago
  • Project End Date
    1/31/2026 - 5 months from now
  • Program Officer Name
    JANES, DANIEL E
  • Budget Start Date
    7/1/2021 - 4 years ago
  • Budget End Date
    1/31/2022 - 3 years ago
  • Fiscal Year
    2021
  • Support Year
    01
  • Suffix
    S1
  • Award Notice Date
    8/31/2021 - 3 years ago

Methods for Evolutionary Genomics Analysis

PROJECT SUMMARY This administrative supplement request aims to develop a cloud-enabled, highly scalable version of the computational core of the Molecular Evolutionary Genetics Analysis software (MEGA-CC: www.megasoftware.net). The development of MEGA-CC is a significant component of the NIH-funded research project to develop machine learning methods and tools for comparative analysis of molecular sequences. With big advances in genome sequencing, researchers are assembling datasets containing large numbers of species, strains, genes, and genomic segments. Phylogenomic analyses of these data are essential to understanding the dynamics of evolutionary change of pathogens, humans, and species across the tree of life. Machine learning methods and software tools for phylogenomics are now necessary because the expanding size of phylogenomic datasets limits the practical utility of currently available methods and tools due to excessive computational time and memory requirements. One component of the funded grant is implementing our new machine learning methods in the MEGA software suite (www.megasoftware.net), an extremely popular bioinformatics software (>20,000 peer-reviewed citations and 350,000 software downloads in the year 2020 alone). The MEGA software includes a large repertoire of tools for assembling sequence alignments, inferring evolutionary trees, estimating genetic distances and diversities, inferring ancestral sequences, computing timetrees, and testing selection. These analyses are now required in all research investigations and fields in which multiple DNA or RNA sequences are used. However, MEGA and its computational core (MEGA-CC) are not optimized for distribution and execution on cloud infrastructure and high-performance computing clusters. This supplement to the funded grant will enable us to advance MEGA for cloud readiness to harness the scalability, elastic computing power, and easy software upgrade and maintenance enabled by cloud infrastructure (MEGA-CR). It will also make MEGA interoperable with existing and future cloud infrastructure. Additionally, this supplement will facilitate using the new machine learning methods in MEGA with big genomic data in practice, thus addressing an imminent and fast-growing need for an increasingly larger community of researchers using MEGA. MEGA-CR will increase the usability of MEGA for the scientific community analyzing very large datasets for which greater accessibility, cost-efficiency, and scalability of cloud-readiness is becoming crucial.

IC Name
NATIONAL INSTITUTE OF GENERAL MEDICAL SCIENCES
  • Activity
    R35
  • Administering IC
    GM
  • Application Type
    3
  • Direct Cost Amount
    87480
  • Indirect Cost Amount
    51176
  • Total Cost
    138656
  • Sub Project Total Cost
  • ARRA Funded
    False
  • CFDA Code
    310
  • Ed Inst. Type
    SCHOOLS OF ARTS AND SCIENCES
  • Funding ICs
    OD:138656\
  • Funding Mechanism
    Non-SBIR/STTR RPGs
  • Study Section
  • Study Section Name
  • Organization Name
    TEMPLE UNIV OF THE COMMONWEALTH
  • Organization Department
    BIOLOGY
  • Organization DUNS
    057123192
  • Organization City
    PHILADELPHIA
  • Organization State
    PA
  • Organization Country
    UNITED STATES
  • Organization Zip Code
    191226003
  • Organization District
    UNITED STATES