RAISE: Chip-to-chip photonic connectivity in multi-accelerator servers for ML

Information

  • NSF Award
  • 2444537
Owner
  • Award Id
    2444537
  • Award Effective Date
    10/1/2024 - 8 months ago
  • Award Expiration Date
    9/30/2027 - 2 years from now
  • Award Amount
    $ 970,000.00
  • Award Instrument
    Continuing Grant

RAISE: Chip-to-chip photonic connectivity in multi-accelerator servers for ML

This RAISE project will develop new methods to connect multiple chips within computers using light instead of electrical wires. Using light to transfer data between chips can make data transfer faster and more energy efficient, which is crucial for working with large and complex data needed for societal applications like artificial intelligence, climate modeling, and biomedical research. The project will closely engage with industry partners to facilitate adoption of the proposed research into practice. The close collaboration with industry will help train a new generation of scientists and engineers with interdisciplinary expertise. The skills and insights gained through this project will prepare them to tackle future challenges that lie at the intersection of multiple scientific fields, aligning with the NSF's mission to advance the frontiers of knowledge and innovation.<br/><br/>The project proposes to optically interconnect accelerators within compute servers using newly viable reconfigurable chip-to-chip optical interconnects. In contrast, today, commercial multi-accelerator compute servers that are workhorses of machine learning, use electrical interconnects to network accelerator chips in the server. However, recent trends show the prominence of an interconnect bandwidth wall caused by accelerator scaling at a magnitude faster rate than the bandwidth of the interconnect between accelerators in the same server. This has led to under-utilization and idling of Graphical Processing Units (GPUs) resources in cloud datacenters. Therefore, it is important to scale interconnect bandwidth in multi-accelerator servers to keep power-hungry and expensive accelerators adequately fed with data and parameters. This project will use novel silicon photonics to create optical interconnections between accelerators within a server to meet this need. This research will benefit the complementary efforts of hyper-scale cloud providers by unlocking customized multi-accelerator topologies that achieve bandwidth-optimal collective communication between accelerators during distributed machine learning and can minimize the blast radius of accelerator failures.<br/><br/>This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

  • Program Officer
    SUDHARMAN KANKANAMGE JAYAWEERAsjayawee@nsf.gov7032922828
  • Min Amd Letter Date
    8/21/2024 - 9 months ago
  • Max Amd Letter Date
    8/21/2024 - 9 months ago
  • ARRA Amount

Institutions

  • Name
    Cornell University
  • City
    ITHACA
  • State
    NY
  • Country
    United States
  • Address
    341 PINE TREE RD
  • Postal Code
    148502820
  • Phone Number
    6072555014

Investigators

  • First Name
    Rachee
  • Last Name
    Singh
  • Email Address
    rs2293@cornell.edu
  • Start Date
    8/21/2024 12:00:00 AM

Program Element

  • Text
    TIP-CHIPS KTA-6 Communications
  • Text
    Networking Technology and Syst
  • Code
    736300

Program Reference

  • Text
    RAISE-Research Advanced by Interdiscipli
  • Text
    Wireless comm & sig processing
  • Text
    SMALL PROJECT
  • Code
    7923