1. Field of the Invention
This invention relates to computer system memory and, more particularly, to prefetching data in a serial memory subsystem topology.
2. Description of the Related Art
Many computer systems employ a main system memory that may be configured dependent upon the needs of an end user. In such systems, a motherboard or system board may include a number of memory expansion sockets. One or more small circuit boards, referred to as memory modules, may be inserted into the sockets as needed to increase the memory capacity of the computer system. Each of the memory modules typically includes multiple memory devices that provide a given amount of memory capacity. The memory devices are usually implemented using some type of dynamic random access memory (DRAM). Some examples of DRAM types include synchronous DRAM (SDRAM) as well as the various types of double data rate SDRAM (DDR SDRAM).
In conventional computer systems, the memory modules are connected to a memory/DRAM controller via a memory bus that includes address, control and a data signals. In some computer systems, the address, control and data signals may be multiplexed and thus share the same sets of wires. In other computer systems, the address, control and data signals may use separate wires. In either case, each of the address and control signals are routed to each expansion socket such that the memory modules, when inserted, are connected in parallel to the memory/DRAM controller. In some systems the memory/DRAM controller may reside on the same integrated circuit (IC) chip as the system processor, while in other systems the memory/DRAM controller may reside in one IC (e.g., a Northbridge) of a chipset.
Although the operating speed of computer system processors continues to increase, the relative performance of the main system memory has not increased at the same rate. This may be due, at least in part, to the incremental improvement in the bandwidth of the system memory architectures described above.
Various embodiments of a prefetch mechanism for use in a system including a host coupled serially to a plurality of memory modules are disclosed. In one embodiment, the system includes a host coupled to a serially connected chain of memory modules. The host includes a memory controller that may be configured to issue a memory read request for data stored within the memory modules. The memory controller may further request that data be prefetched from the memory modules by encoding prefetch information within the memory read request.
In one specific implementation, each of the memory modules may include a memory control hub that may control access to a plurality of memory chips on the memory module. In addition, each of the memory modules may include a DRAM controller configured to generate a memory read cycle to the memory chips in response to receiving a memory read command having a memory address that matches a memory address associated with the memory control hub.
In another specific implementation, the prefetch information may include prefetch hint information that indicates whether or not to prefetch data and prefetch stride information that indicates a number of addresses skipped between accesses to the memory modules.
In another specific implementation, each memory module includes a storage for storing prefetch data returned from the memory chips located on each of the memory modules.
In still another specific implementation, the memory controller may be configured to issue a memory write request to write data to the memory modules and to selectively request that one or more pages of memory within a given one of the memory modules remain open by encoding the prefetch information within the memory write request.
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the present invention as defined by the appended claims. Note, the headings are for organizational purposes only and are not meant to be used to limit or interpret the description or claims. Furthermore, note that the word “may” is used throughout this application in a permissive sense (i.e., having the potential to, being able to), not a mandatory sense (i.e., must). The term “include” and derivations thereof mean “including, but not limited to.” The term “connected” means “directly or indirectly connected,” and the term “coupled” means “directly or indirectly coupled.”
Turning now to
In the illustrated embodiment, memory module 150A includes a memory control hub 160A, which is coupled to a plurality of memory devices that are designated memory chip 171A through 171N, where N may be any number, as desired. In one embodiment, memory control hub 160A may be coupled to the memory chips via any type of memory interconnect. For example, in one embodiment, the memory interconnect may be a typical address, control and data bus configuration.
Similarly, memory module 150B includes a memory control hub 160B, which is coupled to a plurality of memory devices that are designated memory chip 181A through 181N, where N may be any number, as desired. In one embodiment, memory control hub 160B may be coupled to the memory chips via any type of memory interconnect as described above. It is noted that each of memory chips 171A through 171N and 181A through 181N may be any type of memory device such as a memory device in the DRAM family of memory devices, for example.
In the illustrated embodiment, memory links 10A-110C form a memory interconnect. In one embodiment, each of memory links 110A-110C forms a point-to-point memory interconnect that is implemented as two sets of unidirectional lines. One set of unidirectional lines is referred to as a downlink and is configured to convey transactions away from host 100 in a downstream direction. The other set of unidirectional lines is referred to as an uplink and is configured to convey transactions toward host 100 in an upstream direction. In addition, in one embodiment, each set of unidirectional lines may be implemented using a plurality of differential signal pairs. In one embodiment, each memory link 110 includes an 18-bit downlink and a 16-bit uplink, where each bit is a differential signal pair. As will be described in greater detail below in conjunction with the descriptions of
Generally speaking, all transactions from host 100 flow downstream through all memory modules 150 on the downlink and all response transactions flow upstream from the responding memory module 150 through each upstream memory module 150 on the uplink. More particularly, in one embodiment, host 100 may request to retrieve or store data within system memory 125. In response to host 100 making a request, memory controller 105 initiates a corresponding transaction such as a memory read transaction or a memory write transaction, for example. Memory controller 105 transmits the transaction to system memory 125 via memory link 110A. In the illustrated embodiment, the transaction is received by memory control hub 160A of memory module 150A.
In response to receiving the transaction, memory control hub 160A is configured to transmit the received transaction to memory module 150B via memory link 110B without decoding the transaction. This is referred to as forwarding the transaction downstream. Thus, each transaction received on a downlink by a given memory control hub 160 of a given memory module 150 is forwarded to the next memory module 150 in the chain that is coupled to the downlink without decoding the transaction. In one embodiment, decoding of the transaction may occur in parallel with the forwarding of the transaction. In other embodiments, the decoding of the transaction may occur after the transaction has been forwarded. A more detailed description of downstream forwarding function may be found below in the description of
Likewise, if memory controller 105 initiates a read request transaction, for example, the memory module 150 having the memory location corresponding to the address in the request will respond with the requested data. The response will be transmitted on the memory module's uplink toward host 100. If there are any intervening memory modules between the sending memory module and host 100, the intervening memory module will forward the response transaction on its uplink to either host 100 or the next memory module in the chain in an upstream direction. In addition, when the responding memory module is ready to send the response, it may inject the response into a sequence of transactions that are being forwarded upstream on the uplink. A more detailed description of upstream forwarding function may be found below in the description of
In one embodiment, memory controller 105 may be configured to make requests to system memory 125 without knowledge of which of memory modules 150A and 150B a particular address is associated. For example, each of memory modules 150 may be assigned a range of memory addresses during a system configuration sequence. Each memory control hub 160 may include logic (not shown in
In addition, in one embodiment, memory controller 105 may initiate a subsequent memory access request prior to receiving a response to a previous memory access request. In such an embodiment, memory controller 105 may keep track of outstanding requests and may thus process the responses in a different order than they were sent.
Further, in the illustrated embodiment, memory controller 105 includes a prefetch unit 107 configured to provide prefetch information for prefetching data from addresses that correspond to a current memory read request. As will be described in greater detail below, prefetch unit 107 may predict which address may be fetched next. The prefetch unit 107 may calculate the prefetch address or addresses and may provide prefetch hints that may be encoded and embedded within the memory request packets. Prefetch unit 107 may also provide the stride (i.e., the number of cache lines that will be skipped between memory accesses) for inclusion within the memory request packets. In addition, a prefetch buffer (not shown in
The Memory Interconnect
The memory interconnect includes one or more high-speed point-to-point memory links such as memory links 110A-110C each including an uplink such as uplink 111A and a downlink such as downlink 112A, for example. As noted above, in one embodiment downlinks may be 18-bit links while uplinks may be 16-bit links. As such, an 18-bit downlink may include 16 control, address and data (CAD) signals, a busy signal and a Control (CTL) signal. A given uplink may include 16 control, address and data (CAD) signals. It is contemplated however, that in an alternative embodiment, an uplink such as uplink 211A may also include a CTL signal.
In addition to the high-speed links, other signals may be provided to each memory module 150. For example, in one embodiment, a reset signal, a power OK signal and a reference clock may be provided to each memory module 150 from host 100. Further, other signals may be provided between each memory module. For example, as described above, a next memory module present signal may be provided between memory modules.
Generally speaking, the types of transactions conveyed on memory links 110 may be categorized into configuration and control transactions and memory transactions. In one embodiment, configuration and control transactions may be used to configure memory control hub 160. For example, configuration and control transactions may be used to access configuration registers, assign a memory address range to a memory module or to assign a hub address to a memory control hub. Memory transactions may be used to access the memory locations within the memory chips (e.g., 171A-171N . . . 181A-181N).
Accordingly, in one embodiment, there are two types of addressing supported: hub addressing and memory addressing. Using hub addressing, eight hub bits identify the specific memory control hub being accessed. In one embodiment, a hub address of FFh may be indicative of a broadcast to all memory control hubs. Using memory addressing, each hub decodes the upper portion of the address bits to determine which hub should accept the request and the lower portion to determine the memory location to be accessed. In one embodiment, there are 40 address bits, although it is contemplated that other numbers of address bits may be used as desired.
In one embodiment, each of the memory links is configured to convey the transactions using one or more packets. The packets include control and configuration packets and memory access packets, each of which may include a data payload depending on the type of command the packet carries. As such, the sets of wires that make up memory links 110 may be used to convey control, address and data.
The packets may be generally characterized by the following: Each packet includes a number of bit positions which convey a single bit of information. Each packet is divided into several bit times and during a given bit time, all of the bit positions of the packet are sampled. As such, the control information and data share the same wires of a given link (e.g., CAD wires). As will be described in greater detail below, in one embodiment, packets are multiples of bit pairs and the first bit-time of every packet is sampled at an even bit-time. Packets begin with a control header that may be either one or two bit-pairs in length. In one embodiment, the first five bits of the control header is the command code. Table 1 below illustrates the various types of packets and their associated command codes. It is noted however, that the actual codes shown in column one are for illustrative purposes and that other codes may be used for each given command.
Further, in one embodiment, packets (except NOP packets) may be transmitted with an error detecting code (EDC). It is noted that in one implementation, the EDC is a 32-bit cyclic redundancy code (CRC), although other implementations may employ other EDC's as desired. Additionally, addresses are sent most significant bit-time first to speed decode within memory control hub 160 while data is sent least significant byte first. It is noted however, that other embodiments are contemplated in which the addresses may be sent least significant bit-time first and data my be sent most significant byte first. Packets may carry a payload of byte enables and/or data. Packets with no payload are referred to as header-only packets. In one embodiment, the size of the data short reads may be up to one half of a programmed cache line size. In addition, the size of the data for long reads and block writes may be up to the programmed cache line size. Further, the size of the data for byte writes may be a maximum of 64 bytes regardless of the cache line size setting.
In addition to the control header and command code information included within a packet, the CTL signal may be used to convey information about each packet. As illustrated in Table 2 below, some exemplary CTL encodings are shown.
Different values of CTL for the header and payload portions of a packet may provide enough information to allow header-only packets to be inserted within the payload of another packet. This may be useful for reducing the latency of read commands by allowing them to issue while a write packet is still being sent on the link. Table 3 illustrates an exemplary packet including a payload in tabular format. The packet in table 3 also shows that a header-only packet is inserted in the payload during bit times 4-7.
Referring now to
In the illustrated embodiment, uplink control unit 241 may be configured to receive and forward packets received from another memory module downstream. The receiving and forwarding of the upstream packets creates an upstream transaction sequence. In addition, uplink control unit 241 may be configured to inject packets that originate within memory module 150 into the transaction stream.
In the illustrated embodiment, downlink control unit 242 may be configured to receive packets that originate at the host and if a memory module is connected downstream, to forward those packets to the downstream memory module. In addition, downlink control unit 242 may be configured to copy and decode the packets. In one embodiment, if the packets include an address that is within the range of addresses assigned to memory module 150 and the packet is a memory access request, downlink control unit 242 may pass the command associated with the packet to DRAM controller 250. However, if the packet is not a memory request, but is instead a configuration packet, downlink control unit 242 may pass the configuration command associated with the packet to the core logic of control unit 240 (not shown) for processing. It is noted that in one embodiment, if the packet does not include an address that is within the range of addresses assigned to memory module 150, memory control hub 160 may drop or discard the packet if memory module 150 is the last memory module in the chain.
In one embodiment, memory control hub 160 is configured to receive a module present signal (not shown), which when activated by a downstream memory module, indicates to an upstream memory module that there is a downstream memory module present. In such an embodiment, if memory control hub 160 receives a transaction and no downstream memory module is determined to be present, memory control hub 160 may drop the transaction.
As mentioned above, in one implementation, prefetch unit 107 may predict which addresses may be needed and may encode hint information and stride information into a memory request packet. In an alternative implementation, the prediction information may be generated by other hardware and or software and provided to prefetch unit 107. For example, software executing a memory-streaming algorithm may provide explicit stride information, which may be passed to prefetch unit 107. Prefetch unit 107 may then generate hint and stride information for inclusion within a given memory request.
The hint information may indicate to the DRAM Controller 250 the type of addresses to prefetch (if any). Table 4 below, illustrates an exemplary set of hint values. It is noted that in other embodiments, other values having other meanings are possible.
The stride information indicates how many cachelines will be skipped between accesses. Table 5 below, illustrates an exemplary set of stride values. It is noted that in other embodiments, other values having other meanings are possible.
In one implementation, DRAM controller 250 is configured to initiate memory cycles to memory chips 261A-261N in response to memory commands from memory control hub 160. Thus, in response to receiving a memory request from addresses assigned to that memory module, downlink control 242 may pass the memory request command, including the hint and stride information to DRAM controller 250. DRAM controller 250 generates the memory cycles corresponding to the request. For example, if a read request is received for the data at a given address, DRAM controller 250 generates the read cycles for the data at that address. In addition, DRAM controller 250 decodes the hint and stride information and prefetches the cachelines of data indicated by the hint and stride information.
As will be described in greater detail below in conjunction with the description of
In another embodiment, all the requested data (including any prefetched data) returned by DRAM controller 250 may be stored within a read buffer (not shown) within host 100. However, depending on the size of the read buffer, the prefetch data may get discarded to make room for read request data that is explicitly requested in a memory read packet before the expected data arrives. Thus, in another implementation, host 100 may include a separate prefetch buffer for storing prefetched data separate from read data returned as result of explicit read requests.
Turning to
The operation of memory module 150 of
As mentioned above, the data retrieved in response to an explicit memory read request may be sent back to host 100 and not stored within prefetch buffer 375. However, prefetch buffer 375 may store data that is prefetched by DRAM controller 250 in response to receiving hint and stride information with an explicit memory request. When a subsequent memory request is received by DRAM controller 250, prefetch buffer 375 may be checked for the requested data. If the data is stored within prefetch buffer 375, that data is returned to host 100; thereby possibly saving time and reducing the latency associated with accessing memory chips 261A-N.
In one implementation, when an explicit memory request packet such as a read request packet is received, the command and address information corresponding to the request packet is provided to DRAM controller 250. In addition, the hint and stride information is provided to DRAM controller 250. DRAM controller 250 is configured to generate memory read cycles corresponding to the explicit read command. In addition, DRAM controller 250 is configured to decode the prefetch hint and stride information and to generate memory read cycles corresponding to the addresses indicated by the hint and stride information. The read data returned from memory chips 261A-N as a result of the explicit read command may be packetized and injected into the upstream flow of packets on uplink 211A by uplink control 241. The read data returned from memory chips 261A-N as a result of the prefetch read commands is stored within prefetch buffer 375.
Similar to a memory read request, when a memory write request packet is received by a memory module, the command and address information corresponding to the write request packet is also provided to DRAM controller 250. If the write packet includes hint and stride information, instead of performing prefetch operations (as when a read request is received), in one implementation, DRAM controller 250 may use the hint and stride information to keep open for any subsequent accesses, the page within memory chips 261A-N corresponding to the hint and stride information.
In one embodiment, prefetch buffer 375 may be implemented using memory devices in the random access memory (RAM) family that have faster access times than for example, access times of memory chips 261A-N. Any suitable RAM device may be used such as static RAM (SRAM) or fast SRAM (FSRAM).
In addition, prefetch buffer 375 may be implemented using a variety of suitable structures. For example, depending on the size, prefetch buffer 375 may be implemented as a fully associative, set associative, or direct mapped structure that may include tags to support lookup functions. The tags may also be used to invalidate entries in prefetch buffer 375 in response to DRAM controller 250 receiving a command to write data to an address that is stored within prefect buffer 375, for example.
During bit time one, the length of the data that should be returned is conveyed in bit positions 0-5. In one embodiment, a value of 00h indicates no data, a value of 01h indicates two bit-pairs of data, a value of 02h indicates four bit-pairs of data, and so on. A zero length read results in an acknowledge packet (Ack) being returned to the requestor. In one embodiment, a read of a half cache line or less may result in a short RdResp and a read of more than a half cache line may result in either a single long RdResp or two short RdResp. The cache line size may be programmed by software into the configuration registers of host 100 and each memory control hub 160. A prefetch stride prediction value is encoded and conveyed in bit positions 6-7. Exemplary prefetch stride prediction values are discussed and shown in Table 4 above. Address bits 39-32 of the requested location in DRAM are conveyed in bit positions 8-15.
During bit time two, the address bits 31-16 of the requested location in DRAM are conveyed in bit positions 0-15 and during bit time 3, the address bits 3-15 of the requested location in DRAM are conveyed in bit positions 3-15. Also during bit time 3, the packet priority is conveyed in bit positions 0-1. In one embodiment, the priority may be indicative of the priority of the packet relative to other requests. For example, one priority may be to delay all requests with lower priority even if they are already in progress and to execute this request ahead of them. Bit position 2 is reserved. During bit times four and five, bits 0-15 and 16-31, respectively, of a CRC are conveyed in bit positions 0-15.
Referring to
During bit time one, the length of the data being conveyed in the data payload is conveyed in bit positions 0-5. In one embodiment, a value of 00h indicates no data, a value of 01h indicates two bit-pairs of data, a value of 02h indicates four bit-pairs of data, and so on. A prefetch stride prediction value is encoded and conveyed in bit positions 6-7. Address bits 39-32 of the location in DRAM being written are conveyed in bit positions 8-15.
During bit time two, the address bits 31-16 of the location in DRAM being written are conveyed in bit positions 0-15 and during bit time 3, the address bits 3-15 of the location in DRAM being written are conveyed in bit positions 3-15. Also during bit time 3, the packet priority is conveyed in bit positions 0-1. Bit position 2 is reserved.
During bit times four and five, bits 0-15 and 16-31 of a first bit pair of the data payload are conveyed in bit positions 0-15. If more data is being written, subsequent bit pairs may convey bits 0-15 and 16-31 of subsequent data payload. During bit times 4+2N and 5+2N, bits 0-15 and 16-31, respectively, of a CRC are conveyed in bit positions 0-15.
It is noted that although only two types of packets were shown, other types of packets, which may correspond to the command codes listed in table 3 are contemplated. It is further noted that although the various fields of the exemplary packets are shown having a particular number of bits, it is contemplated that in other embodiments, the various fields of the peach packet may include other numbers of bits as desired.
In the illustrated embodiment, each link of coherent packet interface 615 is implemented as sets of unidirectional lines (e.g. lines 615A are used to transmit packets from processing node 612A to processing node 612B and lines 615B are used to transmit packets from processing node 612B to processing node 612C). Other sets of lines 615C-D are used to transmit packets between other processing nodes as illustrated in
One example of a packet interface such as non-coherent packet interface 650 may be compatible with HyperTransport™ technology. Peripheral buses 625A and 625B are illustrative of a common peripheral bus such as a peripheral component interconnect (PCI) bus. It is understood, however, that other types of buses may be used.
It is further noted that other computer system configurations are possible and contemplated. For example, it is contemplated that the system memory configuration described above in
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
This application claims the benefit of U.S. Provisional Application No. 60/470,078 filed May 13, 2003.
Number | Date | Country | |
---|---|---|---|
60470078 | May 2003 | US |