The present invention is related to co pending application entitled, Multi Port Memory Controller Queuing, attorney docket number ROC920070593US1.
The present invention generally relates to a memory controller, and more particularly, to a method, apparatus, and program product for improved queuing in a memory controller wherein at least one of the memory ports is not utilized (i.e., there is no DIMM or other type of memory module associated/installed with the at least one memory port).
Since the dawn of the computer age, computer systems have evolved into extremely sophisticated devices that may be found in many different settings. Computer systems typically include a combination of hardware (e.g., semiconductors, circuit boards, etc.) and software (e.g., computer programs). One key component in any computer system is memory.
Modern computer systems typically include dynamic random-access memory (DRAM). DRAM is different than static RAM in that its contents must be continually refreshed to avoid losing data. A static RAM, in contrast, maintains its contents as long as power is present without the need to refresh the memory. This maintenance of memory in a static RAM comes at the expense of additional transistors for each memory cell that are not required in a DRAM cell. For this reason, DRAMs typically have densities significantly greater than static RAMs, thereby providing a much greater amount of memory at a lower cost than is possible using static RAM.
It is increasingly more common in modern computer systems to utilize a chipset with multiple memory controller (MC) ports, each memory port being associated (i.e., contained in, connected to, etc.) with the necessary queue structures for memory read and write commands. During high level architecture/design process, queuing analysis is typically performed to determine the queue structure sizes necessary for the expected memory traffic. In this analysis, it is also determined at which point a full indication must be given to stall the command traffic to avoid a queue structure overflow condition. This is accomplished by determining the maximum number of commands that the queue structure must accept even after the queue structure asserts that it is full. Herein queue structures (i.e., registers, queue systems, queue mechanisms, etc.) are referred to as queues.
As the number of commands that a queue must sink during a given clock cycle is increased, the number of commands the queue must sink after asserting that it is nearly full increases. For example, if a queue only sinks 1 command per cycle and the pipeline feeding the queue is 3 clock cycles, then the queue needs to be able to sink up to 3 possible commands in the pipeline after asserting that it is nearly full. If the queue sinks up to 4 commands per cycle and the pipeline feeding the queue is 3 clock cycles, then the queue needs to be able to sink up to 12 possible commands after asserting that it is nearly full. Without sufficient queue depth, the full assertion will stall command traffic much more frequently resulting in adverse system performance affects.
In a computer system having at least two memory ports, system performance is optimized when the pair of memory ports is populated in balanced configuration. This results in at least two queues being utilized and the memory accesses being distributed relatively evenly across the pair of queues. If one or more of the available memory ports are not populated, the populated port's queue(s) must handle the additional load. This may result in the populated port's queues having to sink additional commands per clock cycle. Sinking more commands per cycle results in having to assert the nearly full condition when the queue is less full. This is done to leave room for more commands that may be in flight to the memory controller (i.e., mainline flow, etc.).
To realize sufficient system performance in a non-balanced configuration, queue size may be increase to minimize the frequency of queue full conditions. These additional queue entries may not be required in a balanced configuration. The additional queue entries may result for example in increased chip area, increased complexity for selecting commands from the queue, increased capacitive loading, increased wiring congestion and wire lengths, etc. These factors can make it difficult to perform all necessary function in the desired period of time which may ultimately result in adding additional clock cycles to the memory latency, which will adversely affect system performance.
The present invention is generally directed to a method, system, and program product wherein at least two memory ports are contained within a memory controller, and the memory controller being capable of being arranged in a unbalanced memory configuration (i.e., one populated memory module adjacent to an absent memory module, etc.). In an embodiment of the invention a command is transferred between the two memory ports. In other embodiments a command is transferred from a first memory port to a second memory port. In certain embodiments this may effectively expand the functional queue sizes in unbalanced memory configurations.
In a particular embodiment, a first memory port may become unable to sink commands (i.e., if the queue in the first memory port becomes full) and a second memory port may have availability (i.e., excess capacity, capacity to accept a new command, etc.) to sink commands. In a particular embodiment the second memory port may accept excess commands (i.e., commands otherwise accepted by the first memory port if the first memory port was available, etc.). In another embodiment when the first memory port has availability after a period of non-availability, and there are excess commands in the second memory controller, the excess commands are transferred to the first memory controller. In another embodiment when the first memory port has availability after a period of non-availability, and there are no excess commands in the second memory controller, the first memory port may accept commands for example from the mainline command flow. In certain embodiments, the transferring of excess commands effectively enlarges the first memory port's queue depth, allowing for an improved system performance affect.
So that the manner in which the above recited features of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
The present invention relates to a memory controller for processing data in a computer system. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the embodiments and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present invention is not intended to be limited to the embodiments shown but is to be accorded the widest scope consistent with the principles and features described herein.
The memory controller 104 may be coupled to a local memory (e.g., one or more DRAMs, DIMM, or any such alternate memory module) 214. The memory controller 104 may include a plurality of memory ports (i.e., first memory port 131, second memory port 132, third memory port 133, and fourth memory port 134) for coupling to the local memory 214. For example, each memory port 131-134 may couple to a respective memory module (e.g., DRAM, DIMM, or any such memory module) 120-126 respectively, included in the local memory 214. In other words memory modules may be populated into computer system 100. Although the memory controller 104 includes four memory ports, a larger or smaller number of memory ports may be employed. The memory controller 104 is adapted to receive requests for memory access and service such requests. While servicing a request, the memory controller 104 may access one or more memory ports 131-134. In alternate embodiments, the memory controller 104 may include any suitable combination of logic, registers, memory or the like, and in at least one embodiment may comprise an application specific integrated circuit (ASIC).
Memory controller 104 comprises logic and control (i.e., 106, 107, and 108), a first memory port 131, and a second memory port 132, herein referred to as memory port 131 and memory port 132 respectively. A queue 110 is associated (i.e., contained in, connected to, linked to, etc) with memory port 131 and queue 111 is associated with memory port 132. Memory controller 104 receives commands from processors 204 and writes those commands to local memory 214. In a particular embodiment these commands may be altered (e.g., reformatted to a correct command format to allow the command to sink) in memory controller 104, resulting in related commands being written to memory module 120 rather than the actual commands from processors 204 written to memory module 120.
In a particular embodiment, there are numerous memory ports within memory controller 104, though only two are shown in
Upon memory controller 104 receiving commands from at least one processor, the commands are routed, processed, or otherwise controlled by logic and control 106. Logic and control 106 is an element that controls what memory port command(s) shall be routed. Logic and control 107 is an element that controls which command enters a queue. Logic and control 108 is an element that controls the routing of a command exiting a queue. Though only one of each logic and control 107 and 108 are shown, in other embodiments multiple logic and controls 107 and 108 may be utilized. In still other embodiments logic and control 106, 107, and 108 may be combined or otherwise organized.
Memory module 120 may be utilized and receiving commands from memory controller 104. Likewise, memory module 122 is unutilized and is not receiving commands from memory controller 104. This configuration is an example of an unbalanced memory configuration. In prior designs, because memory module 122 was unutilized, memory port 132 did not accept commands.
In a particular embodiment, after some time of operation, each queue entry 1101-110n is full, is giving a nearly full signal, is slowing in accepting new commands, or is not accepting new commands. In many instances, one or more commands are directed to queue 110, when queue 110 is full/nearly full. These one or more commands are herein referred to as excess commands, and this situation is referred to as an excess situation. In previous designs these excess commands were not routed through the memory port until the queue 110 had sinked a command, or had otherwise gained capacity to accept an excess command.
In accordance with the present invention, instead of waiting for queue 110 to sink a command (i.e., queue 110 is no longer full), the excess commands are written to queue 111 and subsequently transferred to queue 110. The excess commands are written to queue 111 until queue 111 is itself full or until queue 110 is no longer full. Upon queue 110 no longer being full, the one or more excess commands are transferred from the queue 111 to queue 110. In a particular embodiment, if both queue 110 and queue 111 are full, no other new commands can be sinked by the queues 110 and 111. In another embodiment, command prioritization may be utilized to affect how the commands are routed through the multiple memory ports.
Queue-to-queue interface 150 logically connects queue 110 and queue 111. Queue-to-queue interface 150 is subsystem (i.e., a bus, a wide bus, etc.) that transfers data that is stored in one queue to another queue. In a particular embodiment multiple Queue-to-queue interfaces 150 are utilized to connect queues 110 and 111. When queue 110 is no longer full, the excess command(s) (if present in queue 111) are transferred from queue 111 to an empty queue entry/entries 1101-1105. In the embodiment shown in
The accompanying figures and this description depicted and described embodiments of the present invention, and features and components thereof. Those skilled in the art will appreciate that any particular program nomenclature used in this description was merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature. Thus, for example, the routines executed to implement the embodiments of the invention, whether implemented as part of an operating system or a specific application, component, program, module, object, or sequence of instructions could have been referred to as a “program”, “application”, “server”, or other meaningful nomenclature. Therefore, it is desired that the embodiments described herein be considered in all respects as illustrative, not restrictive, and that reference be made to the appended claims for determining the scope of the invention.