The present invention relates generally to the data processing field, and more particularly, relates to a method and apparatus for implementing redundant memory access using multiple controllers for a memory system, and a design structure on which the subject circuit resides.
A related United States patent application assigned to the present assignee is being filed on the same day as the present patent application including:
U.S. patent application Ser. No. ______, by Gerald Keith Bartley, and entitled “IMPLEMENTING CACHE COHERENCY AND REDUCED LATENCY USING MULTIPLE CONTROLLERS FOR MEMORY SYSTEM”.
In today's server systems, the loss of data in a component or power failure can be devastating to a business' operations. The ability to fail-over components of the server system and applications is critical to the successful implementation of multi-processor systems.
Conventional processor-to-memory architectures utilize data coherency models that require each processor to have a single access point to either its own dedicated memory, or a bank of memory shared among many processors.
Typically cache coherence requirements prohibit simply connecting another processor to a bank of memory. For example, in a simple case such as a multiprocessor system, if one processor has requested a block of data for an operation, another processor cannot use the same data until the first one has completed its operation and returned the data to the memory bank, or invalidated the data in the memory. This requirement can be avoided by allowing each controller to independently maintain its own segregated memory bank, such as illustrated in the prior art memory system of
In the case where each processor is given a dedicated memory space, a failure of the processor can lead to the loss of data, both in the on-chip caches, and in the mainstore memory.
U.S. patent application Ser. No. 11/758,732 filed Jun. 6, 2007, and assigned to the present assignee, discloses a method and apparatus for implementing redundant memory access using multiple controllers on the same bank of memory. A first memory controller uses the memory as its primary address space, for storage and fetches. A second redundant controller is also connected to the same memory. System control logic is used to notify the redundant controller of the need to take over the memory interface. The redundant controller initializes if required and takes control of the memory. The memory only needs to be initialized if the system has to be brought down and restarted in the redundant mode.
While the above-identified patent application provides improvements over the prior art arrangements, there is no simultaneous access of the memory by more than one controller. When a primary controller fails, the redundant controller assumes full control and access to the memory, providing an alternate access path to the memory.
It is highly desirable to be able to allow multiple controllers to quickly and efficiently gain access to memory, and provide enhanced fail-over performance. A need exists for an effective mechanism that enables implementing redundant memory access using multiple controllers for a memory system.
Principal aspects of the present invention are to provide a method and apparatus for implementing redundant memory access using multiple controllers on first and second daisy chains of memory for a memory system, and a design structure on which the subject circuit resides. Other important aspects of the present invention are to provide such method and apparatus for implementing redundant memory access using multiple controllers on first and second daisy chains of memory substantially without negative effect and that overcome many of the disadvantages of prior art arrangements.
In brief, a method and apparatus for implementing redundant memory access using multiple controllers for a memory system, and a design structure on which the subject circuit resides are provided. A first memory and a memory are connected to multiple memory controllers. A first memory controller uses the first memory as its primary address space, for storage and fetches. A second memory controller is also connected to the first memory. The second memory controller uses the second memory as its primary address space, for storage and fetches. The first memory controller is also connected to the second memory. The first memory controller and the second memory controller, for example, are connected together by a processor communications bus. When one of the first memory controller or the second memory controller fails, then the other memory controller is notified. The redundant memory controller takes control of the memory for the failed controller, using the direct connection to that memory. The redundant memory controller maintains coherence of both the first memory and second memory.
The present invention together with the above and other objects and advantages may best be understood from the following detailed description of the preferred embodiments of the invention illustrated in the drawings, wherein:
In accordance with features of the invention, a method and apparatus enable implementing redundant memory access using multiple controllers for a memory system, while maintaining current conventional cache coherence schemes.
Having reference now to the drawings, in
Memory system 200 is a dynamic random access memory (DRAM) system 200. DRAM system 200 includes a first processor or memory controller (MC 1) 204 and a second processor or memory controller (MC 2) 206. The first memory controller MC 1, 204 and the second redundant memory controller MC2, 206, for example, includes an integrated microprocessor and memory controller, such as a processor system in a package (SIP).
Each of the two controllers MC1, 204 and MC2, 206 includes dedicated memory. The first processor or memory controller MC1, 204 includes a data path 1 and a primary memory control path 1 to a chain of memory chips or modules 208, such as dynamic random access memory (DRAM) chips or modules 208. The second processor or memory controller MC2, 206 includes a data path 2 and a primary memory control path 2 to a separate chain of memory chips or modules 210, such as dynamic random access memory (DRAM) chips or modules 210. The memory controllers MC1, 204 and MC2, 206 are connected together by a processor communications bus 212.
In accordance with features of the invention, in addition to the connection of each controller MC1, 204; MC2, 206 to its bank of memory 208, 210, an additional through bus connection is made to the other controller MC1, 204; MC2, 206. The data path 1 and a primary memory control path 1 to the chain of memory 208 extend to the other controller MC2, 206. The data path 2 and a primary memory control path 2 to the chain of memory 210 extend to the other controller MC1, 204. This bus is a full-width data interface, just like the one to the primary controller.
Referring also to
Memory system 300 is a dynamic random access memory (DRAM) system 300. DRAM system 300 includes a control logic circuit 302 is connected to each of a first processor or memory controller (MC 1) 304 and a second processor or memory controller (MC 2) 306. Optionally the memory controllers MC1, 304 and MC2, 306 are connected together by a processor communications bus.
Each of the memory controllers MC 1, MC 2, 304, 306 optionally can be physically included with a respective processor within a processor package or system in a package (SIP).
For example, the first memory controller MC 1, 304 includes dedicated memory chips or modules 308, and the second memory controller MC 2, 306 includes dedicated memory chips or modules 310. The control logic circuit 302 is provided to notify the other memory controller of a failed memory controller, and to send requests between and to notify the memory controllers MC 1, MC 2, 304, 306 with respect to changed data, in order to maintain cache coherency rules.
Each of the memory controllers MC 1, MC 2, 304, 306 is connected to a memory buffer 312 via northbound (NB) and southbound (SB) lanes. Memory buffer 312 is coupled to the plurality of DRAMs 308, 310, arranged, for example, as dual inline memory module (DIMM) circuit cards. Memory system300 is a fully-buffered DIMM (FBDIMM).
Exemplary operation of the memory system 200 and the memory system 300, is illustrated and described with respect to the exemplary steps shown in the flow chart of
During normal system operation, memory system 200 and memory system 300 have the ability to receive data directly from the memory of another controller. The request and send sequence of the method of the invention sends the data directly to the requesting memory controller and eliminates the need to re-route data back through the responding controller, improving the latency of the data transfer. By avoiding the transfer through the responding controller, bandwidth through the responding controller advantageously is saved for other transfers, further improving and optimizing performance. In a more complicated sequence, the responding controller advantageously determines which path is lower latency, either routing back through the primary controller, or moving the data upstream directly to the requesting controller. Each memory controller maintains coherence of its dedicated memory, according to current conventional methods.
In accordance with features of the invention, memory system 200 and memory system 300 have the ability for each of the memory controllers MC 1, MC2 to take control of the memory for a failed controller, using the direct connection to that memory, and maintains coherence of both the first memory and second memory. When one of the first memory controller or the second memory controller fails, then the other memory controller is notified, for example, by the failing memory controller or by control logic coupled to the memory controllers MC 1, MC2.
Referring now to
As indicated at a block 404, the second memory controller MC 2 routes the request to the second memory to send data to the first memory controller MC1. The MC 2 routes the request to the second memory, such as memory 210 in
As indicated at a block 406, the second memory sends the data directly to the first memory controller MC1. The first memory controller MC 1 notifies the second controller MC 2 of any change to the data for cache coherence requirements as indicated at a block 408.
As indicated at a block 410, one of the memory controller MC 1 or MC 2 fails or is not able to access memory, excessive time-outs or re-reads occur. The failed memory controller MC 1, or MC 2, or control logic 302 notifies the other memory controller MC 2 to take control of the first memory, or MC 1 to take control of the second memory as indicated at a block 412. Then the other controller MC 2 takes control of the first memory using the direct connection to the first memory, providing enhanced fail-over performance as compared to prior art redundancy arrangements as indicated at a block 414. Alternatively the other controller MC 1 takes control of the second memory using the direct connection to the first memory, providing enhanced fail-over performance at block 414, when the second memory controller MC 2 failed at block 410. Then at block 414, the memory controller MC 2, or memory controller MC 1, controls both the first memory and second memory, and maintains coherence of both the first memory and second memory, according to current conventional methods.
Design process 504 may include using a variety of inputs; for example, inputs from library elements 508 which may house a set of commonly used elements, circuits, and devices, including models, layouts, and symbolic representations, for a given manufacturing technology, such as different technology nodes, 32 nm, 45 nm, 90 nm, and the like, design specifications 510, characterization data 512, verification data 514, design rules 516, and test data files 518, which may include test patterns and other testing information. Design process 504 may further include, for example, standard circuit design processes such as timing analysis, verification, design rule checking, place and route operations, and the like. One of ordinary skill in the art of integrated circuit design can appreciate the extent of possible electronic design automation tools and applications used in design process 504 without deviating from the scope and spirit of the invention. The design structure of the invention is not limited to any specific design flow.
Design process 504 preferably translates an embodiment of the invention as shown in
While the present invention has been described with reference to the details of the embodiments of the invention shown in the drawing, these details are not intended to limit the scope of the invention as claimed in the appended claims.