Objective and reliable methods for inference from modern omics data

Information

  • NSF Award
  • 2413294
Owner
  • Award Id
    2413294
  • Award Effective Date
    9/1/2024 - 5 months ago
  • Award Expiration Date
    8/31/2027 - 2 years from now
  • Award Amount
    $ 150,000.00
  • Award Instrument
    Standard Grant

Objective and reliable methods for inference from modern omics data

Modern “omics” (e.g., transcriptomics or proteomics) studies often generate data using single-cell or spatially-resolved sequencing technologies. These technologies enable researchers to study, for example, the spatial variation of gene expression across cells or tissues, offering a high-resolution perspective of complex biological dynamics. This perspective allows researchers to better understand disease mechanisms and can lead to the development of novel treatments. However, the data generated by these technologies are high-dimensional and dependent, which can complicate statistical inference. Existing inferential methods are often subjective or unreliable, either requiring user input that may bias or invalidate results, or requiring rigid model assumptions that are frequently violated in practice. This project will address these issues by developing statistical methods that do not rely on user input, and work reliably in more general settings than existing methods. The new methods will be theoretically justified and equipped with fast computational algorithms. Software implementing these methods will be made publicly available, enabling their wide use in academia and industry. The project will also provide training opportunities for both graduate and undergraduate students.<br/><br/>This project develops new statistical methods for inference with high-dimensional dependent data, motivated by challenges in analyzing single-cell and spatially-resolved sequencing data. Specific challenges include the failure of traditional inferential methods when the parameter is at or near the boundary of the parameter space; the need to both generate and test hypotheses from the same data without inflating Type I error rates; and insufficient model flexibility and scalability. The investigator will address each of these issues directly by (i) developing a new test procedure that resolves a well-known challenge of constructing confidence regions for variance components (or functions thereof) near zero; (ii) providing a unified approach for valid post-clustering inference with high-dimensional data from a broad class of distributions; and (iii) developing a general class of penalized mixture models that accommodates multiple latent sources of heterogeneity. The methodological developments in this project lay the groundwork for more general methods addressing more broad challenges in inference near the boundary of the parameter space, post-selection inference, and modeling heterogeneous high-dimensional data.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Jun Zhujzhu@nsf.gov7032924551
  • Min Amd Letter Date
    7/19/2024 - 7 months ago
  • Max Amd Letter Date
    7/19/2024 - 7 months ago
  • ARRA Amount

Institutions

  • Name
    University of Minnesota-Twin Cities
  • City
    MINNEAPOLIS
  • State
    MN
  • Country
    United States
  • Address
    200 OAK ST SE
  • Postal Code
    554552009
  • Phone Number
    6126245599

Investigators

  • First Name
    Aaron
  • Last Name
    Molstad
  • Email Address
    amolstad@umn.edu
  • Start Date
    7/19/2024 12:00:00 AM

Program Element

  • Text
    STATISTICS
  • Code
    126900