Memory module with optical interconnect that enables scalable high-bandwidth memory access

Abstract
One embodiment of the present invention provides a system that facilitates scalable high-bandwidth memory access using a memory module with optical interconnect. This system includes an optical channel, a memory buffer, and a random-access memory module. The memory buffer is configured to receive a request from a memory controller via the optical channel. The memory buffer handles the received request by performing operations on the random-access memory module and then sending a response to the memory controller via the optical channel. Hence, the memory module with optical interconnect provides a high-speed serial link to the random-access memory module without consuming a large number of pins per channel on the memory controller.
Description

BRIEF DESCRIPTION OF THE FIGURES


FIG. 1 illustrates the difficulty of improving system performance by only increasing processor performance without increasing memory access speeds in accordance with an embodiment of the present invention.



FIG. 2 illustrates “throughput computing” in accordance with an embodiment of the present invention.



FIG. 3A illustrates a memory module with optical interconnect in accordance with an embodiment of the present invention.



FIG. 3B illustrates a memory controller that accesses a set of memory modules with optical interconnect in accordance with an embodiment of the present invention.



FIG. 4 is a flow chart illustrating the process of handling a memory request to a memory module with optical interconnect in accordance with an embodiment of the present invention.



FIG. 5 illustrates a memory module with optical interconnect that separates and handles one wavelength from an optical channel supporting wavelength-division-multiplexing in accordance with an embodiment of the present invention.





DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.


Memory Latency


FIG. 1 illustrates the difficulty of improving system performance purely by improving processor performance without improving memory access speed. The top graph in FIG. 1 shows the initial compute performance 100 of the system based on compute cycles 104 and the memory latency 106 between compute cycles. Dramatically improving the speed of the processor may result in improved compute performance 102, but results in a relatively small overall time savings 108. For instance, doubling the speed of the processor may decrease the compute time by half, but only improves the total application running time by a small percentage because of memory latency effects which limit single-thread performance. Many data-center workloads are simply unable to take advantage of hard-won advances in processors because of such memory latency problems.



FIG. 2 illustrates “throughput computing,” a technique that mitigates the negative effects of memory latency. Launching multiple threads in parallel hides memory latency and uses processor cycles more efficiently, thereby improving overall application performance.


Multi-threading provides an effective way to combat memory latency, but demands memory modules with higher communication bandwidth. While the capacity of dual inline memory modules (DIMMs) has increased due to improved dynamic random access memory (DRAM) density, the total bandwidth per channel has typically stayed flat. Adding additional parallel channel interfaces could increase the effective memory bandwidth, but may require a large number of pins per channel and thereby exceed the number of pins available in a given semiconductor package. Because of such difficulties in cost-effectively increasing memory sub-system bandwidth to match processor improvements, the memory subsystem typically becomes the limiting system resource.


Fully-Buffered Memory

Fully-buffered memory (also referred to as FB-DIMM) can be used to increase memory capacity and to keep pace with both processor and input/output (I/O) improvements by replacing parallel memory channels with a high-speed serial interface. FB-DIMM technology splits the signaling interface between the memory controller and DRAM chips into two independent signaling interfaces with a buffer between them. The interface between the buffer and DRAM chips remains substantially similar to existing DRAM interfaces, for instance supporting existing memory interface standards such as double data rate DDR2 and DDR3. However, the interface between the memory controller and the buffer is changed from a shared parallel interface to a point-to-point serial interface, with the buffer (also referred to as an advanced memory buffer (AMB)) operating in response to memory controller commands. Upon receiving a command containing a DRAM request over the FB-DIMM interface, the AMB presents a DRAM request to the DRAM chips.


FB-DIMM modules improve scalability and throughput. For instance, one embodiment of FB-DIMM technology offers a capacity of up to 192 gigabytes and 6.7 gigabytes per second sustained data throughput per channel when using six channels with eight DIMMS per channel, two ranks per DIMM, and 1 gigabyte DRAMs.


FB-DIMM interfaces typically use serial-differential signaling, and can support backward compatibility of memory devices, but carry signals over electrical wiring. The power consumed by clock and data recovery (CDR) circuits in electrical FB-DIMMs increases with the distance traversed, which limits the maximum distance between the FB-DIMM and the memory controller. Electrical FB-DIMMs typically also have significant bit lane to bit lane skew, which requires de-skewing that tends to increase the per-DIMM access latency. For instance, the system multiplexing together the individual bit lanes (for the DRAM) for serial transmission, transporting them to the FB-DIMM, and then de-multiplexing the transmission back into individual bit lanes can result in skew during the clock and data recovery of the individual bit lanes.


Note that each FB-DIMM channel also uses a separate serial connection. While increasing the number of channels uses less area than the number of pins used in previous designs, the number of connections still scales in proportion to the number of desired channels.


One embodiment of the present invention provides a memory module with optical interconnect that provides scalable high-speed memory access and overcomes the distance limitations of high-speed electrical signaling. This embodiment uses optical FB-DIMMs with on-module electrical-to-optical transceivers to achieve high aggregate transmission capacity and low latency for memory accesses. Optics help to reduce the power consumption and can reduce, if not eliminate, the distance dependence of electrical FB-DIMMs. Optics, particularly wave-division-multiplexed optics, also can help to reduce or eliminate bit line skew, and can be used to increase the number of DIMMs per FB-DIMM channel.


Optically Connected Fully-Buffered Memory

Optical transmission techniques play an important role in supporting long distance communication for global, inter-state, metro, campus, and even intra-building or central-office applications. However, the question of whether optical transmission techniques can be used within individual computer systems depends on improvements in bandwidth-density and the I/O bandwidth achievable per unit area or volume.


Electrical VLSI circuits are expected to be used to process information in the foreseeable future. Because any optical-interconnect system involves optical-to-electrical and electrical-to-optical conversion, using optical components to break electrical bottlenecks involves tightly-integrated photonics and electronics to efficiently deliver data to the desired electrical components. As mentioned previously, an important electrical bottleneck occurs between the DIMMs and the memory controller chip.


One embodiment of the present invention involves using an FB-DIMM-based memory subsystem that provides optical links to overcome the distance, connection, and throughput limitations of high-speed serial electrical links. Such optically-enabled FB-DIMMs achieve very high I/O bandwidth per unit area, and allow capacity and bandwidth scaling, thereby enabling memory to keep pace with processor and I/O improvements. Furthermore, such modules also extend the reach of FB-DIMM technology by allowing a less-constrained physical architecture to be deployed.



FIG. 3A illustrates an optical FB-DIMM that includes an optical advanced memory buffer (OAMB) 300, a DRAM 306, and several optical channels. In one embodiment of the present invention, the optical FB-DIMM uses high-speed, unidirectional point-to-point optical signals for the memory channels. Traffic on the eastbound optical channel 302 travels from the memory controller to the optical FB-DIMM, and includes commands and data to be written to the memory of the optical FB-DIMM. Traffic on the westbound optical channel 304 includes data read from the DRAM 306 and other responses to the memory controller. In one embodiment of the present invention, traffic on the optical channels travels at six times the speed of data on a non-optical FB-DIMM.


OAMB 300 presents the FB-DIMM memory requests to the local DRAM 306. OAMB 300 also provides intelligent eastbound and westbound channel initialization to align high-speed serial clocks, locate frame boundaries, and verify channel connectivity.



FIG. 4 is a flow chart illustrating the process of handling a memory request to a memory module with optical interconnect. First, the memory module receives a request from a memory controller via the optical channel (step 400). The memory module services this request by performing operations on the random-access memory module (step 402). The memory module then returns the output of these operations to the memory controller by sending a response to the memory controller via the optical channel (step 404).


In one embodiment of the present invention, OAMB 300 includes pass-through logic 308 on the eastbound optical channel and pass-through and merging logic 310 on the westbound optical channel 304. This logic allows OAMB 300 to, for instance, selectively de-serialize and decode optical signals, or allow such signals to pass through to other optical FB-DIMMs coupled in series. In this embodiment, when the memory controller sends a frame on the eastbound optical channel 302 to the optical FB-DIMMs, the first optical FB-DIMM's OAMB 300 checks whether the requests applies to the local DRAM. If not, the frame is passed-through or repeated to the next eastbound optical FB-DIMM. OAMB 300 similarly repeats or passes-through westbound frames to the memory controller or an adjacent westbound optical FB-DIMM.


OAMB 300 also provides control and interface signals for the DRAM(s) 306 on the given FB-DIMM. Hence, OAMB 300 converts eastbound write data destined for its module to standard DRAM signals comprised of DRAM addresses and commands 312. In doing so, OAMB 300 serializes data read from the DRAM 314 in response to a request and then sends the data to the memory controller via the westbound optical channel 304. Note that the optical FB-DIMM buffers the DRAM signals from the memory controller within the OAMB 300. Note also that in one embodiment, the optical channels carry separate eastbound and westbound uni-directional signals, thereby allowing simultaneous data reads and writes.


Note that an arrangement using such optical channels may result in non-uniform memory latency if a number of memory modules serially utilize the optical channels, because the last memory module in the chain will experience longer latencies on both the eastbound and westbound optical channels. Such non-uniformity could become appreciable if the length of the optical channels is increased.



FIG. 3B illustrates an embodiment of the present invention in which traffic is directed eastbound for both of the optical channels. In FIG. 3B, a memory controller 316 sends requests to three optical FB-DIMMs 318. In this embodiment, the system uses a second eastbound optical channel 320 instead of a westbound channel. Connecting the memory controller to the first and second optical channels at opposite ends of the chain of memory modules reduces the optical latency disparity, because the memory modules closer to the memory controller on the first optical channel are farther away on the second channel, and vice-versa.


Using optical signals between the memory controller and the optical FB-DIMM allow the architecture to be extended across longer distances and more memory modules than previous approaches. Sophisticated protocols may be used to discover and communicate between the memory controller, multiple memory modules, and other potential participants. For longer distances, an OAMB can be designed to act as a simple buffered repeater. Optionally, flow-control can be added to signal lanes within the optical channels. Such lanes can be implemented via optical signals traveling on optical fiber. Note also that the eastbound and westbound optical channels may include a different number of signal lanes and/or optical fibers.


In one embodiment of the present invention, optical signal lanes may be implemented as different wavelengths on the same optical fiber via wavelength-division multiplexing (WDM). Note that in this embodiment, not every wavelength needs to be converted. For instance, each OAMB may be assigned to monitor and respond to signals on one wavelength and pass-through all other wavelengths on an optical fiber, as shown in FIG. 5, in which the OAMB 300 illustrated separates and then receives data from or adds data to only wavelength λN. Alternatively, the system can also use multiple optical fibers to provide additional bandwidth. In an embodiment involving multiple fibers, each OAMB may be assigned to monitor and respond to signals on only one fiber, and pass-through signals on all other optical fibers.


In one embodiment of the present invention, a system including a FB-DIMM with optical interconnect can use WDM to place all of the individual bit-lane channels into a fiber without needing to temporally multiplex the channels. Alternatively, the system may choose to use temporal multiplexing, or a mixture of the two techniques. Using WDM allows the system to eliminate the need for de-skewing as well as potentially eliminate or reduce the need for clock and data recovery.


By intimately integrating an optical interface with an FB-DIMM module to create an optical FB-DIMM module, the present invention:

    • Provides seamless, scalable electrical-to-optical communication capacity over extended distances;
    • Reduces power dissipation by using low-power optical links;
    • Allows channel count and capacity to increase scalably without increasing the number of optical fibers when using wavelength-division multiplexing;
    • Avoids fundamental bottlenecks for high channel data rates (e.g. above 20 GHz);
    • Supports the ability to find and communicate with optical FB-DIMMs by using an optical broadcast and discovery capability;
    • Reduces electromagnetic interference by reducing off-chip electrical bandwidth; and
    • Reduces module weight.


      In addition, the optical FB-DIMM concept retains many of the useful benefits of FB-DIMM technology, including: the compatibility of FB-DIMMs across several DIMM generations; the ability to field-swap DIMMs; the ability to repurpose a system for compute-intensive, data-intensive, or I/O intensive applications; and high-reliability memory interfaces that include cyclical-redundancy-check protection on address lines.


In summary, the present invention provides a memory module with optical interconnect that provides scalable high-speed memory access. By tightly-integrating an optical interface with a FB-DIMM module, the present invention increases memory bandwidth, reduces memory latency, and overcomes the distance limitations of electrical signaling.


The foregoing descriptions of embodiments of the present invention have been presented only for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.

Claims
  • 1. A memory module with optical interconnect, comprising: an optical channel;a memory buffer; anda random-access memory module;wherein the memory buffer is configured to receive a request from a memory controller via the optical channel;wherein the memory buffer is configured to handle the request by performing operations on the random-access memory module; andwherein the memory buffer is configured to send a response to the memory controller via the optical channel;whereby the memory module provides a high-speed serial link to the random-access memory module without consuming a large number of pins per channel on the memory controller.
  • 2. The memory module of claim 1, wherein using an optical channel allows the distance between the memory module and the memory controller to be increased.
  • 3. The memory module of claim 1, wherein the memory buffer includes pass-through and merging logic that allows the optical channel to be shared between multiple memory modules that are coupled in series.
  • 4. The memory module of claim 3, wherein the memory module is configured to use wavelength-division multiplexing to increase the capacity of the optical channel without increasing the number of optical fibers needed; andwherein multiple memory modules with optical interconnect are configured to share the optical channel using wavelength-division multiplexing.
  • 5. The memory module of claim 4, wherein the optical channel includes: a first optical channel that carries requests from the memory controller to the memory buffer; anda second optical channel that carries responses from the memory buffer to the memory controller;wherein the first optical channel and the second optical channel are separate high-speed, uni-directional optical channels.
  • 6. The memory module of claim 5, wherein the first optical channel comprises one or more optical fibers; andwherein the bandwidth of the first optical channel and the bandwidth of the second optical channel are asymmetric.
  • 7. The memory module of claim 3, wherein the memory controller includes a discovery mechanism that detects the memory modules present on the optical channel.
  • 8. The memory module of claim 7, wherein the discovery mechanism includes a broadcast mechanism.
  • 9. The memory module of claim 3, wherein the memory module is configured to allow the number of memory modules sharing the optical channel to be changed based on system memory needs.
  • 10. The memory module of claim 1, wherein the random-access memory module is a fully-buffered dual inline memory module with dynamic random-access memory.
  • 11. A method for handling a memory request to a memory module with optical interconnect, where the memory module includes an optical channel, a memory buffer, and a random-access memory module, comprising: receiving a request from a memory controller via the optical channel;servicing the request by performing operations on the random-access memory module; andsending a response to the memory controller via the optical channel;whereby the memory module with optical interconnect provides a high-speed serial link to the random-access memory module without consuming a large number of pins per channel on the memory controller.
  • 12. The method of claim 11, wherein using an optical channel allows the distance between the memory module and the memory controller to be increased.
  • 13. The method of claim 11, wherein receiving the request and sending the response involve using pass-through and merging logic included in the memory buffer that allows the optical channel to be shared between multiple memory modules with optical interconnect that are coupled in series.
  • 14. The method of claim 13, wherein receiving the request and sending the response involve: using wavelength-division multiplexing to increase the capacity of the optical channel without increasing the number of optical fibers needed;sharing the optical channel across multiple memory modules using wavelength-division multiplexing; andwherein the number of memory modules sharing the optical channel scales based on system memory need.
  • 15. The method of claim 14, wherein the optical channel includes a first optical channel that carries requests from the memory controller to the memory buffer and a second optical channel that carries responses from the memory buffer to the memory controller;wherein the first optical channel and the second optical channel are separate high-speed, uni-directional optical channels; andwherein the bandwidth of the first optical channel and the bandwidth of the second optical channel are asymmetric.
  • 16. A computer system that includes a memory module with optical interconnect, comprising: a processor that includes a memory controller; anda memory module with optical interconnect;wherein the memory module includes an optical channel that communicates with the processor, a memory buffer, and a random-access memory module;wherein the memory buffer is configured to receive a request from the memory controller via the optical channel;wherein the memory buffer is configured to handle the request by performing operations on the random-access memory module; andwherein the memory buffer is configured to send a response to the memory controller via the optical channel;whereby the memory module with optical interconnect provides a high-speed serial link to the random-access memory module without consuming a large number of pins per channel on the memory controller.
  • 17. The computer system of claim 16, wherein using an optical channel allows the distance between the memory module and the memory controller to be increased.
  • 18. The computer system of claim 16, wherein the memory buffer includes pass-through and merging logic that allows the optical channel to be shared between multiple memory modules that are coupled in series.
  • 19. The computer system of claim 18, wherein the memory module is configured to use wavelength-division multiplexing to increase the capacity of the optical channel without increasing the number of optical fibers needed;wherein multiple memory modules with optical interconnect are configured to share the optical channel using wavelength-division multiplexing; andwherein the multiple memory modules are configured to allow the number of memory modules sharing the optical channel to be changed based on system memory need.
  • 20. The computer system of claim 19, wherein the optical channel includes: a first optical channel that carries requests from the memory controller to the memory buffer; anda second optical channel that carries responses from the memory buffer to the memory controller; andwherein the first optical channel and the second optical channel are separate high-speed, uni-directional optical channels.
  • 21. The computer system of claim 20, wherein the first optical channel comprises one or more optical fibers; andwherein the bandwidth of the first optical channel and the bandwidth of the second optical channel are asymmetric.