Collaborative Research: RI: Medium: Programmatic Foundation Models for Visual Analysis on a Planetary Scale

Information

  • NSF Award
  • 2403016
Owner
  • Award Id
    2403016
  • Award Effective Date
    8/15/2024 - a year ago
  • Award Expiration Date
    7/31/2028 - 2 years from now
  • Award Amount
    $ 95,677.00
  • Award Instrument
    Continuing Grant

Collaborative Research: RI: Medium: Programmatic Foundation Models for Visual Analysis on a Planetary Scale

Imagery around the world—from satellites to drones and social media photographs—provide vital information about our planet. There is a unique opportunity in the fields of artificial intelligence and computer vision to understand global and local phenomena from these images, providing insight about climate change, public health, and agriculture. However, the state-of-the-art methods in computer vision are not designed for these applications where decision-making is complex, and accuracy, robustness, and interpretability are required. Existing large-scale AI models, such as ChatGPT, only process individual images on the internet and cannot synthesize conclusions from planet-scale image collections. Even on single images, these models cannot reliably perform sophisticated logical reasoning, and building models to do such reasoning reliably requires unfeasibly large datasets. Creating such large models and datasets is a significant barrier for scientific and societal applications of computer vision, particularly for organizations that do not have the computational resources of large corporations. This project will create a new class of machine learning models, called programmatic foundation models, that have the capability and efficiency to scale to planetary-scale image and video datasets. These models can be queried by experts using natural language, thus empowering scientists and experts to benefit from AI related visual discovery from the vast amounts of visual information available in satellite imagery even if they lack expertise in machine learning. The proposed research has applications across public health, climate change, agriculture, security, and the economy. <br/><br/>The research objective of this project is to tightly integrate visual representations and program synthesis together, thereby delivering an accurate, interpretable, and robust machine learning framework for answering questions about what is visible in image collections. Across two research thrusts, the project will drive the creation of these new programmatic foundation models. The first thrust proposes new techniques for building open-world recognition primitives across multiple sensing modalities based on vision-language models, but without any language annotations. It introduces new cross-modal contrastive learning techniques, as well as approaches for reasoning about temporal change. The second thrust proposes new techniques to learn to synthesize programs, incorporating uncertainty, learning from feedback and adaptive computation. Given a query, our proposed framework learns to synthesize a customized program that breaks the task down into constituent steps and control flow that can be directly executed for solving the vision task. To execute each step, the project proposes new methods for training open-world classification, detection and segmentation models for satellite, aerial, and ground imagery. Unlike prior foundation models, this integrated approach has many potential benefits in interpretability, logical soundness, modularity, compositionality, efficiency, and generality to different tasks. The two thrusts taken together combine program synthesis with open-world recognition models for analyzing satellite, drone, and ground imagery around the world.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Jie Yangjyang@nsf.gov7032924768
  • Min Amd Letter Date
    8/15/2024 - a year ago
  • Max Amd Letter Date
    8/15/2024 - a year ago
  • ARRA Amount

Institutions

  • Name
    Columbia University
  • City
    NEW YORK
  • State
    NY
  • Country
    United States
  • Address
    615 W 131ST ST
  • Postal Code
    100277922
  • Phone Number
    2128546851

Investigators

  • First Name
    Carl
  • Last Name
    Vondrick
  • Email Address
    cv2428@columbia.edu
  • Start Date
    8/15/2024 12:00:00 AM

Program Element

  • Text
    Robust Intelligence
  • Code
    749500

Program Reference

  • Text
    ROBUST INTELLIGENCE
  • Code
    7495
  • Text
    MEDIUM PROJECT
  • Code
    7924