This invention relates to integrated circuit memory devices. More particularly, this invention relates to a memory hub system wherein a memory controller communicates with several daisy-chained hubs across two different clock domains.
When a memory hub responds to a memory controller request with either read request data or a write request response, that information is typically transmitted to the controller over a high-speed bus. Different elements of the memory hub, however, may operate in a different clock domain (e.g., the memory hub logic may operate at a slower speed than the memory hub transmitters and receivers that interface with the high-speed bus). The data transfer between the memory hubs and the memory controller should be done such that the low-speed memory hub can transmit data onto the high-speed bus without stalling the memory hub or the transmission of data over the high-speed bus. Known systems and methods are often less than adequate at performing such data transfers.
In view of the forgoing, it would be desirable to be able to provide a system and method for data storage and transfer between two clock domains that will not stall or interrupt logic operations in one domain or data movement in another domain.
It is an object of this invention to provide data storage and transfer between two clock domains that will not stall logic in a low-speed clock domain while not interrupting data movement in a high-speed clock domain.
In accordance with the invention, a clock domain interface (CDI) provides data storage and transfer between two clock domains.
In a memory hub system, several daisy-chained memory hubs are connected to a single memory controller via a high-speed bus. Memory requests sent by the controller are passed from memory hub to memory hub. Similarly, responses to the memory requests generated by the appropriate memory hub are passed from memory hub to memory hub back to the memory controller. While the memory hub transmitters and receivers operate in a high-speed clock domain to receive and transmit data along the high-speed bus, the logic cores of the memory hubs typically operate in a low-speed clock domain. The CDI controls data storage and transfer between the two clock domains by multiplexing and de-multiplexing data such that substantially the same data rate is provided in both clock domains. That is, larger amounts of data may be provided per clock cycle in the low speed clock domain, while smaller amounts of data may be accessed at faster speeds in the high-speed domain. For example, in the low-speed clock domain, data may be transmitted in 256 bit words at a low-speed clock rate of 400 MHz, while that data may be transmitted in 64 bit words at a high-speed clock rate of 1.6 GHz. Both of these data transmission schemes have the same overall data rate.
The CDI advantageously allows data storage and transfer between two clock domains without stalling the logic in the low-speed clock domain and without interrupting the data movement in the high-speed clock domain.
The above and other objects and advantages of the invention will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
The invention provides a system and method for data storage and transfer between two clock domains of preferably a memory hub system. A memory controller sends memory requests to several daisy-chained hubs across a high-speed bus and receives corresponding responses from those memory hubs across the same or another high-speed bus. A clock domain interface (CDI) transfers data to high-speed transmitters from low-speed core memory logic without stalling or interrupting data movement across the high-speed bus.
When memory controller 110 sends a memory request to memory hubs 120, the request is received by each memory hub 120 using downstream receiver 122 and is transmitted to the next memory hub 120 using downstream transmitter 124. Memory request responses are similarly passed back to memory controller 110 by memory hubs 120 using upstream receiver 128, transmitter logic 134, and upstream transmitter 126.
Within each memory hub 120, the memory request is also received by core logic 130. Core logic 130, if it has the portion of memory specified by a memory request, forwards the memory request to its local memory 132. Local memory 132 and core logic 130 respond to the memory request with the data requested by a data read command or with a response to a data write command. The response is transferred by transmitter logic 134 to the upstream transmitter 126 for transmission over high-speed bus 150 back to memory controller 110. The data transfer from memory hubs 120 to upstream high-speed bus 150 is advantageously done such that (1) core logic 130 is not stalled, (2) handshaking across clock domains does not stall data movement, and (3) other data transmitted over high-speed bus 150 is not interrupted.
For illustrative purposes, assume that the high-speed clock domain operates at a speed of 1.6 GHz and the low-speed clock domain operates at 400 MHz. The high-speed clock domain is thus four times the speed of the low-speed clock domain. (Note that in the present embodiment high-speed and low-speed clock domains may operate at different speeds, as long as the ratio between the clock domains is maintained.)
Also assume for illustrative purposes that high-speed buses 140 and 150 each transmit a clock source signal received by memory hub receivers 122 and 128 that may be used to generate local low-speed and high-speed clocks within each memory hub 120. More particularly, in the downstream direction, each downstream receiver 122 may generate a local high-speed clock from the clock source signal to facilitate receipt of data from high-speed bus 140. In the upstream direction, each upstream receiver 128 may generate a local high-speed clock to facilitate receipt of data from high-speed bus 150. Each upstream receiver 128 may also generate a low-speed clock for core logic 130.
After generating a high-speed clock, each downstream receiver 122 passes the generated clock to its counterpart downstream transmitter 124 of the same memory hub 120. Thus, the downstream receiver 122 and transmitter 124 in each hub 120 use the same clock signal and are synchronous.
Downstream transmitter 124 uses the passed clock signal to move the data received over high-speed bus 140 farther downstream to the next memory hub 120. The clock source signal is also sent by downstream transmitter 124 to the next downstream receiver 122 over high-speed bus 140.
In the upstream direction, the upstream transmitter 126 in the memory hub 120 farthest downstream sends a clock source signal to the next upstream hub 120. The upstream receiver 128 of the second hub 120 generates a high-speed clock from the clock source to capture the incoming data. The generated high-speed clock is sent to counterpart upstream transmitter 126. Upstream transmitter 126 also preferably generates a low-speed clock which is sent to core logic 130. A clock source signal is also sent by upstream transmitter 126 to the next upstream receiver 128 over high-speed bus 150. In the upstream direction, clock signals for upstream receiver 128, transmitter 126, and core logic 130 are all derived from the same clock source signal and are thus all synchronous.
Note that in this embodiment, even though all receivers and transmitters in memory hubs 120 operate in the high-speed clock domain, downstream receivers 122 and transmitters 124 are not synchronous with upstream receivers 128 and transmitters 126. Their clock signals are derived from different clock source signals.
In an example of this embodiment, high-speed buses 140 and 150 carry data over sixteen data lines at a rate of 6.4 gigabits per second. When the data stream of high-speed bus 140 is received by downstream receiver 122, the data are preferably multiplexed to avoid interruptions in the downstream data flow. For example, the 6.4 gigabit per second data stream transmitted over 16 lines in high-speed bus 140 may be multiplexed over 64 data lines by downstream receiver 122. With 64 data lines operating at 1.6 GHz, downstream receiver 122 can output data at the same rate as high-speed bus 140.
Each memory request is sent to core logic 130 after it is received by downstream receiver 122. The memory request cannot be sent directly from downstream receiver 122 to core logic 130, because downstream receiver 122 and core logic 130 operate at different clock frequencies and are not synchronous. Thus, clock domain changer 305, which may be, for example, a buffer, a first-in-first-out (FIFO) queue, other queue, or other suitable temporary storage device, is used to receive data from downstream receiver 122 and to forward that data to core logic 130 independent of the two non-synchronous clocks. Furthermore, if, for example, downstream receiver 122 operates at a 1.6 GHz clock speed and memory core 130 operates at a 400 MHz clock speed, clock domain changer 305 may further multiplex the data signal to maintain the same data rate in the low-speed clock domain. For example, clock domain changer 305 may multiplex the data signal from 64 data lines to 256 data lines to maintain the same data rate at the low-speed clock domain. Clock domain changer 310 may be used similarly to send data from memory core 130 back onto the downstream path.
Memory core logic 130 passes all relevant memory requests to local memory 132. Memory core logic 130 and local memory 132 are preferably synchronous; however, local memory may be operated at a different clock speed. Accordingly, memory core logic may multiplex, de-multiplex, or use other suitable techniques to transmit and receive data to and from local memory 132.
When a memory read request is received by memory core logic 130, a read response is formulated along with the read data for transmission to memory controller 110. When a memory write request is received by memory core logic 130, a write response is formulated for transmission to memory controller 110 via upstream transmitter 126.
Upstream transmitter 126 takes data from several sources and merges it onto upstream high-speed bus 150. In particular, upstream transmitter 126 merges the data to be transmitted from memory core logic 130 with the incoming data on upstream high-speed bus 150.
In order to perform this task, bus switch 340, controlled by bus switch control logic 315, switches between bypass bus 313, temporary storage buffers 325 and 330, and clock domain interface (CDI) 335, which converts the low-speed clock domain output of core logic 130 to the high-speed clock domain.
Upstream receiver 128 receives data over high-speed bus 150. When memory core logic 130 has no data to be transmitted upstream to memory controller 110, the received data is sent directly to upstream transmitter 126 via bypass bus 313. As in the downstream direction, the data may be sent directly because upstream receiver 128 and transmitter 126 are synchronous.
However, whenever data from core logic 130 is transmitted by upstream transmitter 126, the upstream data provided by receiver 128 is placed in a temporary storage buffer for the duration of the core logic data transmission. In this embodiment, two temporary storage buffers are used: link temporary storage 330 and core temporary storage 325. Link temporary storage 330 stores and re-transmits data at the same clock rate as upstream receiver 128 and upstream receiver transmitter 126 (e.g., 1.6 GHz). Link temporary storage 330 can be used for temporary storage of data because it operates in the high-speed domain, even though it can only store a small amount of data. For temporary storage of larger amounts of data, core temporary storage 325 can be used. Core temporary storage 325 stores data at the slow-speed of core logic 130 (e.g., 400 MHz). At this slower speed, more data can be stored.
Clock domain interface (CDI) 335 is the control and handshaking interface between the low-speed and high-speed clock domains. Core logic 130 sends data to be transmitted to memory controller 110 to CDI 335 via data buses 304 and 306. When CDI 335 is empty, data can be sent directly from core logic 130 via data bus 304. When CDI is full, data from core logic 130 is stored in temporary storage until space is available and then sent to CDI 335 via data bus 311. Breakpoint logic 315 controls the operation of CDI 335 and coordinates the transmittal of CDI data through bus switch 340 into upstream transmitter 126.
When core logic 130 has data to be transmitted upstream to memory controller 110, it activates a core load signal on signal line 304. Data is loaded into any empty registers 405, one at a time, with any remaining data loaded into temporary storage 320. If temporary storage 320 is filled up, CDI 435 may signal core logic 130 to stop sending data until there is more space available in registers 405 and temporary storage 320.
In one example according to the invention, data from core logic 130 is sent in a 256 data line bus. Registers 405 each receive and store a single 256 bit word. Registers 405 can output each 256 bit word as four separate 64 bit words. As indicated by illustrative boundary line 450, data is received by registers 405 in the low-speed clock domain, but output from registers 405 in the high-speed clock domain.
When breakpoint logic 315 selects CDI 435 to output data to upstream transmitter 126, it sends control signals to output control 430. Output control 430 uses counters and control signals to select one 64 bit word from registers 405 for each high-speed clock cycle. Beginning at first register 405, four 64 bit words are removed, emptying the 256 bit register. As subsequent registers 405 are emptied, the empty registers 405 are refilled with any remaining data from temporary storage 320 and core logic 130.
Output logic 430 coordinates filling and emptying of registers 405 by controlling output multiplexer 420 to cycle through the outputs of registers 405. Output logic 430 also sends a control signal to input control 415 to coordinate the filling of empty registers 405. Input control 415 controls input multiplexer 410 to select temporary storage 320 or memory core 130 as the data source and to select the register 405 in which to load the data.
CDI 435 allows any number of 64 bit words to be removed from registers 405, which are filled in the low-speed clock domain. Upstream transmitter 126 may receive a certain number of data words from logic core 130, then switch to another data source (e.g., received upstream data), and return to data from logic core 130. All of this may be done in the high-speed clock domain and the two data sources may be seamlessly interleaved.
The low-speed clock domain is not affected by or dependent on the high-speed clock domain. As long as registers 405 and temporary storage 320 are not full, logic core 130 provides data to CDI 435 in the low-speed clock domain, and CDI 435 outputs that data in the high-speed clock domain. Thus, the task of supplying data from the low-speed domain and extracting it to the high-speed domain is advantageously accomplished with little high-speed logic. The data stream of core logic 130 will not be interrupted and the data supplied to the high-speed clock domain will not have to stall because of handshaking between the two clock domains. Further, less required logic in the high-speed clock domain advantageously permits larger temporary storage in the low-speed clock domain and has less stringent timing requirements.
Returning to
Thus it is seen that systems and methods for storage and transfer of data between two clock domains is provided that does not stall or interrupt logic or data movement in either respective clock domain. One skilled in the art will appreciate that the invention can be practiced by other than the described embodiments, which are presented for purposes of illustration and not of limitation, and the present invention is limited only by the claims which follow. For example, although the memory hub system is primarily depicted and described herein as having a high-speed clock domain that is four times the speed of the low-clock domain, it will be understood that the design of the memory hub system may be modified to have a different ratio between the two clock domains (e.g., the high-speed clock domain may be twice to ten times the speed of the low-speed clock domain).