The present invention generally relates to digital circuits with components synchronized by a clock signal. More particularly, this invention relates to a memory module configured with an onboard phase lock loop circuit to reduce input clock skew among components of the module.
The computer industry has moved to higher speed grades not only in the field of processor technology but also relating to all peripheral devices including the system memory. The latter has become the main bottleneck in the overall system performance in that, with increasing clock rates, the central processors are starved for data and more and more cycles are wasted idly because of the lack of data and instructions to be processed.
Memory clock and data frequency are limited primarily by two different factors, the first being the core and input/output (I/O) design of the actual memory integrated circuit (IC) and the second being the interface with the rest of the system logic. With respect to the latter aspect, one critical factor is the dilution of the clock signals among the target devices. This dilution also leads to what is termed “clock skew,” characterized by components of a digital circuit receiving clock signals from a clock generator with slightly different time shifts as a result of the components not being equal distances from the generator. Within memory subsystems, this phenomenon is exacerbated by the ability to add and remove memory modules, which alters the capacitance and impedance of the system that directly contribute to clock skew. Currently employed strategies to address this problem include turning off clock signals to unused memory slots (sockets), both in order to reduce electromagnetic interference and to reduce the clock load that is wasted by the impedance of unused sockets without devices being used at any given time.
Despite all attempts to maximize the efficacy of the clock signals compared to the effective load on the clock and, consequently, reduce clock skew, overall memory frequencies attainable are still somehow inversely correlated to the total amount of memory or, by extension, the total number of devices that need to be driven. High density system memory configurations (such as those for servers) have worked around the load issue by using what is generally known as “registered” memory modules, which are typically dual in-line memory modules (DIMM's). As known in the art, a registered memory module is intended to reduce electrical loading on a memory bus by routing address and command lines through a register on the memory module. The register is interposed between the command and address bus to capture the commands and addresses, and then amplifies and distributes them on the next rising clock edge to the memory components. Registered memory modules also include a phase lock loop (PLL) circuit that locks on the frequency of the system clock input and generates its own stronger clock output that is sent to the individual devices. The main advantage of a registered memory module is that the chipset and system clock only see one component each, that is, the register and the PLL, respectively, and do not need to drive the entire clock, address, and command tree.
While the register amplifies the command and address signals, which are timed by the clock signal generated by the PLL, a notable disadvantage is the requirement for one additional latency cycle for the address and command translation after the chip select signal has been issued and before a row activate signal can be given. Because the register delays all information transferred to the module by one clock cycle, latencies are increased, primarily on random accesses. However, the same increase of latencies will be encountered on any access, even a page hit, in a non-streaming application where idle periods are inserted into the data transfers.
The present invention provides a method and device capable of reducing input clock skew on memory modules, with particular benefit to high-frequency memory modules, e.g., modules operating at 400 MHz data rate and beyond. More particularly, the present invention provides an unbuffered memory module comprising a substrate, multiple memory components mounted to the substrate, address and command connectors that transmit digital information to and from the memory components without routing the information through a register, and a phase lock loop (PLL) circuit on the substrate and electrically interconnecting a clock-in connector to the memory components for generating and transmitting a module clock signal to the memory components. In this manner, the phase lock loop circuit operates to provide the memory module with an onboard clock generator that synchronizes the memory components of the module.
In view of the above, the present invention has the ability to optimize the clock input to each memory component of the memory module, resulting in reduced clock skew-related errors, without the latency increases associated with the use of registered memory modules. Because the system clock signal sees only the PLL circuit, the load on the system clock is reduced, as are stray clock signals and noise. The invention provides the further possibility of optimizing a memory module, including its clock tree to the memory components, which is not susceptible to any possible trace variations in either width or length on the motherboard level that could introduce variations in signal propagation delays.
Other objects and advantages of this invention will be better appreciated from the following detailed description.
As represented in
Because of the close proximity of the PLL 12 to each memory component 14 on the module 10 (as compared to the PLL associated with the system clock signal), and the absence of socketed interfaces that can cause signal reflections and other unwanted noise, the clock signal to each component 14 has a level of integrity (e.g., strength and precision) that is unattainable with conventional technology. A high clock signal integrity with minimized skew between any of the memory components 14 or, by extension, the I/O pins along the edge connector 20, results in a better “data-I” or “data valid” window. A longer “data valid” window with minimized skew across the bus, in turn, allows the ability to operate the same memory components 14 at higher frequencies with better reliability and lower error rates.
As is conventional for prior art memory modules, the module 10 is preferably manufactured such that all physical structures of the module 10, including the PLL 12, memory components 14, signal line 24, and clock tree 26, are not intended to be modified or replaced on the substrate 18. As such, the module 10 can be designed to optimize synchronization among the memory components 14 of the module clock signal generated by the PLL 12. Notable examples include the layout of the clock tree 26 and the quality of the memory components 14. Another advantage is that, because the PLL 12 generates the clock signal for the module 10, the removal of any memory module from another memory slot of the memory subsystem will not alter the clock signal received by the memory components 14 of the memory module 10.
As evident from
In view of the above, the present invention fundamentally differs from previous uses of dedicated onboard PLL, in that the present invention is applicable to personal computers while previous uses of onboard PLL's have been exclusively limited to registered memory modules of servers and other high-density system memory configurations. The use of an unbuffered command and address bus in combination with the use of a dedicated PLL 12 per rank of memory components 14 in a memory subsystem allows the module designer to optimize the layout of the substrate 18 according to the exact specifications of the PLL 12 to warrant optimally synchronized clock input between all memory components 14 without incurring the access penalties associated with the use of registers. As such, the present invention allows the tightest control over the internal clocks of all memory components 14 and their optimal synchronization with the input/output clock supplied in the form of an I/O strobe. This is of particular importance in situations where multiple ranks of memory populate the memory subsystem of a motherboard, which results in a dilution of the clock signal and potentially in electromagnetic interference that introduces additional noise on the timing signals.
While the invention has been described in terms of a preferred embodiment, it is apparent that other forms could be adopted by one skilled in the art. For example, the physical configuration of an unbuffered memory module incorporating a PLL could differ from that shown. Therefore, the scope of the invention is to be limited only by the following claims.
This application claims the benefit of U.S. Provisional Application No. 60/522,169, filed Aug. 25, 2004.
Number | Date | Country | |
---|---|---|---|
60522169 | Aug 2004 | US |