A Statistical Foundation of In-Context Learning and Chain-of-Thought Prompting with Large Language Models

Information

  • NSF Award
  • 2413243
Owner
  • Award Id
    2413243
  • Award Effective Date
    8/1/2024 - 6 months ago
  • Award Expiration Date
    7/31/2027 - 2 years from now
  • Award Amount
    $ 80,882.00
  • Award Instrument
    Continuing Grant

A Statistical Foundation of In-Context Learning and Chain-of-Thought Prompting with Large Language Models

Large Language Models (LLMs) like GPT-4 have transformed natural language processing and related fields by demonstrating unprecedented capabilities in interpreting human instructions and completing complex reasoning tasks. Representing a paradigm shift in statistical machine learning, these models are trained on extensive text corpora and can perform novel tasks without modifications to their parameters. This project seeks to develop a comprehensive theoretical framework to understand mainstream methods used with deployed LLMs through a statistical lens. The anticipated broader impacts of this research include enriching educational curricula at participating institutions and providing significant training opportunities for both graduate and undergraduate students. Moreover, the project's outcomes are expected to enhance high-impact applications in sectors such as robotics and transportation systems, thereby improving the practical deployment of LLMs in complex decision-making scenarios.<br/><br/>Specifically, this research explores the statistical foundations of various prompting methods utilized in LLMs, including in-context learning (ICL) and chain-of-thought (CoT) prompting. The study is organized around three main thrusts: first, deciphering how LLMs perform ICL and CoT as forms of implicit Bayesian inference and understanding how transformer architectures' attention mechanisms approximately encode these Bayesian estimators. Second, the project will develop algorithms to analyze the statistical errors—incurred during pre-training and prompting stages —associated with these prompting-based estimators. The third thrust aims to apply this theoretical framework to real-world applications like robotic control and autonomous driving, formulating principled methods that utilize pre-trained LLMs for complex decision-making. By establishing a robust statistical foundation for prompting-based methodologies, this research aims to advance the field of prompt engineering and contribute to the development of principled methods for using LLMs.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    Tapabrata Maititmaiti@nsf.gov7032925307
  • Min Amd Letter Date
    7/24/2024 - 6 months ago
  • Max Amd Letter Date
    7/24/2024 - 6 months ago
  • ARRA Amount

Institutions

  • Name
    Yale University
  • City
    NEW HAVEN
  • State
    CT
  • Country
    United States
  • Address
    150 MUNSON ST
  • Postal Code
    065113572
  • Phone Number
    2037854689

Investigators

  • First Name
    Zhuoran
  • Last Name
    Yang
  • Email Address
    zhuoran.yang@yale.edu
  • Start Date
    7/24/2024 12:00:00 AM

Program Element

  • Text
    STATISTICS
  • Code
    126900

Program Reference

  • Text
    STATISTICS
  • Code
    1269