This application relates to data retrieval within a data processing system.
In multiple processor computing systems, various components, such as processing modules and memory storage units, are interconnected by one or more busses. In such systems, a given processing module may be coupled to one or more memory storage units, and a given memory storage unit may be coupled to one or more processing modules. In many instances, a processing module will include a processor and a system controller, while a memory storage unit will include a memory controller and one or more memory units or modules.
A processing module may, over the course of time, need to read or write data for processing within the system. For example, when a processor within a processing module needs to read data, it may first check to see if such data is available from its local cache. If the data is not available in its cache, the processor may request that the processing module request such data to be retrieved from a memory storage unit that contains the requested data. In this case, the system controller sends, in a request transaction, a read request to the memory controller of the memory storage unit that contains the data. Upon receipt of the read request, the memory controller obtains the requested data from an appropriate memory unit, and provides this data, in a response transaction, back to the requesting system controller.
Once the requesting system controller receives the data, it typically must arbitrate to gain control of the system bus that couples the system controller with the processor. Arbitration can be time consuming. In many instances, arbitration and subsequent phases of the bus may require multiple bus cycles before the response data can be driven by the system controller onto the bus, during which time the system controller may need to buffer the data in a temporary storage space. In general, memory read-access latency, which relates to the amount of time required to access data from memory within a memory storage unit, can be a contributor to overall latency and system performance degradation.
In general, the invention is directed to a data processing system that reduces read latency of requested memory data, thereby resulting in improved system performance. The system incorporates at least one memory storage unit having a memory controller that, upon receiving a request for data from a system controller, is capable of sending two responses back to the system controller at different points in time. The first response is an “early response,” and the second, subsequent response is a data response that contains the requested data. The early response is an early indicator to the system controller that the requested data is present within the memory storage unit and will be arriving at an approximately fixed later time by a subsequent data response. The system controller processes this early response and uses the time the early response was received as a basis for determining timing as to when to initiate arbitration of the processor bus and also subsequent phases on the bus in anticipation of the requested data arriving at a later time. When the requested data finally arrives, the system controller and the bus are then already in a state in which the system controller can stream the received data directly onto the bus without having to wait for arbitration and bus transaction cycles to complete. As a result, a positive predictable indication of forthcoming response data (early response) may be implemented, in conjunction with a programmable timer in certain cases, to effectively hide processor bus cycles and realize latency reduction, thus improving system performance.
In one embodiment, a method includes sending a request for data from a controller, such as a system controller, to a memory storage unit (the controller being associated with a processor), receiving, by the controller, an early response from the memory storage unit indicating that the controller will later receive the requested data, and upon receipt of the early response indicator, starting a timer with the controller to wait a period of time. The method further includes, after expiration of the timer but prior to receipt of the requested data, sending an arbitration request from the controller to initiate a transaction on a bus to communicate the requested data from the controller to the processor when the requested data is later received by the controller.
In one embodiment, a data processing system includes a bus, a processor, and a controller, such as a system controller, that is associated with the processor. The controller is configured to send a request for data to a memory storage unit. The controller is configured to receive, from the memory storage unit, an early response indicating that the controller will later receive the requested data, and upon receipt of the early response indicator, start a timer to wait a period of time. The controller is further configured to, after expiration of the timer but prior to receipt of the requested data, send an arbitration request to initiate a transaction on the bus to communicate the requested data from the controller to the processor when the requested data is later received by the controller.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.
During execution, in system 100A, data flows between multiple processing modules 102 and multiple memory storage units 104 via one or more busses and/or interfaces, generally represented as system interconnect 106 in
In operation, a processing module 102 sends requests to memory storage units 104 to manipulate or use data. For example, a processing module 102 may issue read requests to retrieve data from memory storage units 104, and may also issue write requests to write data into a memory storage unit. Data movements and other communications between processing modules 102 and memory storage unit 104 may be referred to herein as “transactions.” Any number of processing modules 102 and memory storage units 104 may be included within the system 100A.
The data processing system 100A shown in
The system controller of the processing module 102 may use the early response as a basis for determining timing as to when to initiate arbitration of the processor bus and subsequent phases on the bus in anticipation of the requested data arriving at a later time. When the requested data finally arrives from the memory controller of the memory storage unit 104, the system controller and the bus are then already in a state in which the system controller can stream the received data directly onto the bus without having to wait for arbitration and bus transaction cycles to complete. As a result, a positive predictable indication of forthcoming response data (such as the early response) may be implemented, in conjunction with a programmable timer in certain cases, to effectively hide processor bus cycles and realize latency reduction, thus improving system performance of the system 100A.
In the example of
However, if the memory storage unit 104 determines that the snooped node 102B has gained control of the requested data (i.e., may have a more up-to-date copy of the data), it will send a snoop request, or command, to the snooped node 102B. In this case, the snooped node 102B will check its local storage area, such as its local cache, to determine if it may have a more current, or updated, version of the data than that contained by the memory storage unit 104. If it does, it may, in one embodiment, directly provide this data (snoop response) to the processing module 102A. In one embodiment, the snooped node 102B returns the snoop response to the processing module 102A. In one embodiment, the memory storage unit 104 will also return the read data back to the processing module 102A, in case the snooped node 102B may not have the current copy of the data.
In one embodiment, a memory controller of the memory storage unit 104, as described earlier, is capable of sending an early response back to a system controller of the requesting processing module 102, such as the module 102A shown in
The processor 204 is also coupled to a processor cache 206. The cache 206 provides one or more high-speed storage areas to store commands and data (e.g., an instruction cache and a data cache) for use by the processor 204. In certain instances, the processor 204 is capable of obtaining needed data directly from the cache 206. In these instances, the processor 204 need not issue requests to the system controller 200 to read data from an external memory storage unit 104.
As shown in
When the processor 204 needs data from an external memory storage unit 104, it sends a read request to the system controller 200 via the bus 202, according to one embodiment. The read request handler 224 handles this request from the processor. This request is a transaction, according to one embodiment. In this embodiment, every message, or command, that is sent by one entity to another comprises a transaction. For example, the system 100A may process the following types of transactions: read requests, read responses, write requests, write response, and others. Each transaction may, in one embodiment, comprise a multi-bit message that includes one or more of the following fields: a header (indicating whether the transaction includes control information or data information), an operational code (opcode), an identifier, an address, and data. In one embodiment, the opcode of the transaction specifies whether the transaction is, for example, a read request, a write request, a read response, or a write response. In one embodiment, in which early response transactions are used, the opcode may specify that the transaction is an early response (such as one delivered from a memory storage unit 104 or a snooped node 102B).
Each transaction may have a unique identifier that is specified in the identifier field. When the read request handler 224 receives a read request transaction from the processor 204, it may save the identifier of the transaction in the transaction ID storage area 222 for later use. When the system controller 200 later provides the requested data back to the processor 204 in a subsequent transaction, it can then retrieve the corresponding identifier from the storage area 222 and include it within the transaction, so that the processor 204 can match the response with its earlier request.
The read request handler 224 is also capable of storing within the storage area 222 a transaction ID of the new transaction that it sends to the memory storage unit 104, and further associating this transaction ID with the transaction ID of the request it received from the processor 204. By doing so, the early response handlers 208 and data response handlers 216 may access the storage area 222 when processing incoming transactions. Upon receipt of an incoming transaction, the handlers 208 or 216 may extract the transaction ID and cross reference it with the ID's stored in the storage area 222. In the case of incoming data, the data response handlers 216 may associate the ID of the incoming data transaction and identify the ID of the original read request from the processor 204, which had been previously extracted and stored in the storage area 222. The data response handlers 216 can then include the ID of the original read request within the data response transaction that is provided back to the processor 204.
Returning to discussion of the incoming read request, the read request handler 224 is further responsible for sending a read request to the appropriate memory storage unit 104 after it has received the request from the processor 204. The read request handler 224 is capable of identifying the appropriate memory storage unit 104 based upon the information in the address field that is provided within the read request transaction sent by the processor 204.
As will be described in more detail below, the memory storage unit 104 that has received the read request from the system controller 200 is capable of, according to one embodiment, sending an early response indicator back to the system controller 200. Such an early response indicates to the system controller 200 that the memory storage unit 104 is processing the read request and has determined that it will be providing the requested data at a relatively fixed later point in time.
Early responses received by the system controller 200 are processed by the main early response handler 210. As will be described in more detail below, the main early response handler 210 waits a period of time after receiving the early response indicator from the memory storage unit 104. After waiting this period of time, the main early response handler 210 initiates an arbitration request to the bus 202 in anticipation of later receiving the data pertaining to the request from the memory storage unit 104. In one embodiment, the arbitration request is initiated when there are no outstanding snoop commands, as described in more detail below. The main early response handler 210 may set a timer to wait for a period of time. In one embodiment, timers 214 are programmable timers whose predetermined values (to provide corresponding predetermined wait periods) are dependent on one or more configuration parameters or considerations of the system. For example, the value of one programmable timer for a predetermined wait period may be based, at least in part, upon predetermined knowledge of latency of data retrieval from the memory storage unit 104. The latency may relate to an amount of time that is needed to process the request for data within the memory storage unit 104 and retrieve the requested data from memory. In one embodiment, the timers 214 are hardware timers having values stored in memory-mapped registers that are accessible to the system controller 200 and programmed by the processor 204. In one embodiment, the processor 204 may evaluate the speed of various interfaces and the number of memory storage units 104 (and associated memory modules) when programming the values of timers. Examples of timer values will be provided in more detail below.
As described in reference to
The system controller 200 of a snooped node processing module 102B may receive a snoop command from a memory storage unit 104 that has received a read request from a separate, requesting processing module 102A. In this scenario, the memory storage unit 104 has determined that the processing module 102B may have a newer version of the requested data. Therefore, the system controller 200 shall, in one embodiment, process such incoming snoop commands with its snoop command handler 226. Upon receipt of a snoop command, the snoop command handler 226 will issue an early response directly to the system controller 200 of the requesting processing module 102A if the processing module 102B determines that it does have a local copy of the requested data. The snoop command handler 226 then retrieves the requested data from a local storage area of the snooped node 102B, such as from a local cache 206. Upon retrieval of the requested data, the snoop command handler 226 sends the data via a data response transaction to the system controller 200 of the requesting processing module 102A.
As shown in
As is shown in
In the embodiment shown in
The read request handler 303 handles incoming read requests from a system controller 200 of a requesting processing module 102. In certain cases, the read request handler 303 may process the requests immediately, as they arrive. However, because the memory controller 300 may be coupled to various different processing modules 102, it may receive too many read requests to process simultaneously. As a result, the read request handler 303 may need to store requests within the storage area 304 for processing. The storage area 304 shown in
In one embodiment, the read request handler 303 uses the address of the read request to determine which memory 302 contains the requested data. After identifying the appropriate memory 302 (which may comprise, in one embodiment, dynamic random access memory (DRAM)), the read request handler 303 sends a read command to the memory 302. In certain cases, when a data processing system 100B includes a snooped node, such as the module 102B in
When the read request handler 303 sends the read command to the memory 302, the early response handler 310 may send an early response back to the requesting processing module 102 as a positive indication that memory controller 300 will provide the data at a future point in time. In one embodiment, the early response handler 310 sends the early response back to the system controller 200 of requesting processing module 102 at substantially the same time that the read request handler 303 sends the read command to memory 302. In one embodiment, the early response handler 310 sends the early response back to the system controller 200 of requesting processing module 102 after the read request handler 303 sends the read command to memory 302. In this embodiment, the early response handler 310 may place the early response in the buffer 312 for later processing, as is described in more detail below. Various examples using such early responses in different scenarios are described in more detail below with reference to the corresponding flow diagrams. An early response provides the requesting processing module with an early indicator that data will be forthcoming at a later point in time. If the snoop command handler 314 has sent one or more snoop commands to snooped nodes 102, the early response handler 310 includes information within the early response specifying the number of snoop commands that were issued.
It should be noted that, in some cases, the early response handler 310 may not send an early response to the requesting processing module 102 under certain conditions, according to one embodiment. Typically, early responses are issued substantially at the same time or shortly after issuance of read command or snoop commands. However, because a given memory controller 300 may need to process requests from multiple different processing modules 102, the early response handler 310 may need to produce multiple data responses that will delay the pending early responses. These multiple early responses are temporarily queued within a storage area 312, which is shown in
In one embodiment, the early response handler 310 may utilize a programmable, early response timer to determine whether to process or discard early responses stored in the buffer 312. The memory controller 300 may program the timer based upon predetermined knowledge of memory access time, latencies, priority processing of transactions, or other criteria. The early response handler 310 starts the timer for a given early response once it places the response in the buffer 312. If the timer expires, according to one embodiment, the early response handler 310 will discard the early response and remove it from the buffer 312 (such that the early response is not sent to the processing module 102). This discarding of the early response occurs because it has remained in buffer 312 for a defined period, during which time the actual data response may have already been processed. If, however, the early response obtains priority out of buffer 312 before the early response timer expires, the early response is sent to the processing module 102. In one embodiment, the response manager 301 shown in
As noted, the data handler 306 of the memory controller 300 is responsible for sending data responses to the requesting processing module 102. When the data handler 306 receives data from memory 302, it then forwards the data in a data response to the requesting processing module 102.
After the processor 204 within a processing module 102 determines a need to read data from memory, it issues a memory read request transaction to the system controller 200 via the bus 202. The system controller 200 receives the read request from the bus 202. As shown in the various flow diagrams, messages, such as requests and responses, are sent from one entity to another. In general, these messages may be referred to as transactions. Each transaction may comprise a multi-bit packet of information, as described previously, with a pre-defined format, according to one embodiment. The sending entity populates the transaction packet with information, and the receiving entity processes the transaction by reading data from the packet.
The system controller 200 analyzes the received request (such as a transaction packet) to determine which memory storage unit 104 contains the requested data. It may do so by, in one embodiment, analyzing the data address that is specified in the read request. The system controller 200 then sends the memory read request to the memory controller 300 of the appropriate memory storage unit 104. Through this process, the processor 204 effectively sends a read request to the memory controller 300 via the bus 202 and the system controller 200.
Upon receipt of the read request, the memory controller 300 will then, in one embodiment, place the read request in a queue for processing, such as the queue 304 shown in
When processing a read request, the memory controller 300 may access a directory, such as the directory 308 shown in
Typically, there is a well known, or fixed, memory read access latency when retrieving data from the memory 302, due to access and interface timing. For example, when the memory 302 comprises DRAM, and when a 2.5 nanosecond clock is being utilized, it may take approximately thirty cycles to access data from the memory 302. This memory read access latency is represented by the bold vertical line (for the memory 302) shown in
In one embodiment, the memory controller may perform a directory lookup and determine that the most up-to-date version of the requested data is within memory 302. In this embodiment, the memory controller 300 sends a read command to the memory 302 after the read request transaction has gained priority by the memory controller 300. However, in addition to sending the read command to the memory 302, the memory controller 300 also sends the early response indicator (transaction) back to the system controller 200 so as to provide a positive indication that location for the data has been identified and that the data will be forthcoming at a later, or subsequent, point in time. The memory controller 300 sends the early response substantially concurrently with, sending the read command to the memory 302, according to one embodiment. The system controller 200 can utilize the early response as a reference point in time from which to initiate bus arbitration prior to receiving the actual data.
As noted earlier, there typically is a fixed latency for memory read access from the memory 302, due to access and interface timing. This fixed latency determines, in one embodiment, the relative delay between the early response and the data response being received by the system controller 200. This provides the system controller 200 with a positive, predictable mechanism to trigger the logic to arbitrate for the processor bus 202.
In one embodiment, the system controller 200 uses the receipt of the early response to initiate the arbitration of the bus 202. The optimum time for this early arbitration may be a determined number of bus cycles before the data arrives from the memory controller 300 and is to be transmitted onto the bus 202. But, the time between the receipt of the early response by the system controller 200 and receipt of the data response, determined by the relatively fixed latency of the memory access of the memory 302, is typically greater than this determined number of bus cycles for arbitration of the bus 202. If arbitration to the bus 202 is performed too early, the system controller 200 would have ownership of the bus 202 but may potentially need to invoke a data stall on the bus 202, as it would not yet have received the data response. To address this issue, a programmable timer may be implemented and utilized by the system controller 200, as described in some detail earlier, that will delay the initiation of arbitration until a determined number of bus cycles before the data response is expected. This timer is initiated when the system controller 200 receives the early response from the memory controller 300, and when the timer expires, the system controller 200 triggers arbitration of the bus 202. After the arbitration and subsequent phases on the bus 202, the system controller 200 can route the data to the bus 202 at the appropriate bus cycle without further delay. The overall result, in one embodiment, is that the data latency due to the memory access effectively hides the arbitration and required cycle delay on the bus 202.
In one embodiment, the timer used by the system controller 200 is a programmable timer, as was discussed previously. The system controller 200 may obtain the timer value from the storage area 214, shown in
Referring first to
Within the response, the memory controller 300 includes information indicating that it has sent a snoop command to a snooped system controller 200B. (If the memory controller 300 determines that multiple snooped nodes 102B may have copies of the requested data, it may send snoop commands to each of these snooped nodes 102B. In this case, the memory controller 300 includes information in the response to specify the number of different snooped commands that it has issued.) In one embodiment, the response may further indicate that no data will be arriving from the memory 302 or the memory controller 300, but that such requested data will be arriving from the snooped system controller 200B. In one embodiment, the requesting system controller 200A, upon receipt of the response, it will parse the response to identify the number of snooped commands that had been sent out by the memory controller 300, and will wait for a period of time until it has received a corresponding number of snoop responses from the associated snooped system controllers 200B. In one embodiment, the memory controller 300 sends only one snoop command to a snooped system controller 200B after it has determined that the snooped system controller 200B is associated with a snooped node 102B that has a modified version of the data.
The snooped system controller 200B returns a snoop early response back to the requesting system controller 200A after the snooped system controller 200B finds modified data on its processor bus, such as in a local storage area (e.g., cache). There is an inherent amount of latency in the bus protocol that delays the data being returned to the requesting system controller 200A. This fixed latency determines the relative delay between the early response and the data response being received by the requesting system controller 200A from the snooped system controller 200B. This provides the requesting system controller 200A a positive predictable mechanism to trigger the logic that will arbitrate for the bus and return the data to the processor via the bus 202. The data latency, however, on the bus for the snooped node 102B (with the snooped system controller 200B) is typically much shorter than the data latency from memory access on a memory storage unit 104. As a result, the requesting system controller 200 typically does not need to implement an additional timer after it has received the snoop early response. Instead, the requesting system controller 200 may initiate the bus arbitration request to the bus 202 after it has received the snoop early response from the snooped system controller 200. Once the bus has processed the arbitration request and subsequent phases for the data transaction, the requesting system controller 200 shall most likely have received the snoop data response from the snooped system controller 200. As such, the requesting system controller 200A can then send the data to the bus 202 without further delay, and without having to temporarily store the data in a buffer while waiting for the bus.
Referring to
Within the response message (transaction), the memory controller 300 includes information indicating that it has sent both a read command to the memory 302 and a snoop command to the snooped system controller 200B. When the requesting system controller 200A receives and parses the response, it determines that the memory controller 300 has sent a read command to the memory 302, and therefore starts the timer. In one embodiment, the requesting system controller 200A starts and uses the timer when the memory controller 300 has sent a read command to the memory 302, due the memory read access latency of the memory retrieval process.
As shown in the example of
When the requesting system controller 200A receives the snoop early response, it parses the response to determine that it will later be receiving data from the snooped system controller 200B. It then sends the bus arbitration request to the bus 202. At a later point, the requesting system controller 200A will receive a data response from the memory controller 300. Because, however, the snoop early response indicated that modified data will be arriving from the snooped node 102B, the requesting system controller 200A may ignore, or discard, the data response from the memory controller 300. Once it receives the snoop data response from the snooped system controller 200B, it may send the snoop data to the bus 202. In one embodiment, it may immediately send this data to the bus 202 without needing to buffer the data while waiting for the bus. In one embodiment, after the requesting system controller 200A has received the snoop data response from the snooped system controller 200B, it may then send a copy of the snoop data to update the memory controller 300.
Once the timer expires, the requesting system controller 200A sends the bus arbitration request to the bus 202, to initiate the bus arbitration and data transaction phases of the bus. When the requesting system controller 200A receives the data response from the memory controller 300, it sends the data to the bus 202 without delay, according to one embodiment.
Various embodiments of the invention have been described. These and other embodiments are within the scope of the following claims.