The present invention relates generally to processors, and more particularly to methods and apparatus for processing a command.
A processor may transmit commands (e.g., read and write requests) to and receive response data from a memory controller over a bus. As different requests sent by the processor to the memory controller may take different amounts of time to execute, response data may often be returned to the bus out-of-order with respect to the sequential order of requests. Thus, in some instances, a response to a request may be deferred as the memory controller attempts to retrieve the data associated with the request, which may introduce a certain amount of latency time. Due to such latency, phases of communication between the processor and the memory controller over the bus may be stalled, slowed or otherwise delayed. Consequently, methods and apparatus for reducing such latency time and thereby increasing processing efficiency would be desirable.
In a first aspect of the invention, a first method is provided for processing commands issued by a processor over a bus. The first method includes the operations of (1) transmitting the command to a remote node to obtain access to data required to complete the command; (2) receiving from the remote node a response packet including a header and a header CRC; (3) validating the response packet based on the header CRC; and (4) before receiving the data required to complete the command, arranging to return the data to the processor over the bus.
In a second aspect of the invention, a second method is provided for processing a command issued by a processor. The second method includes the operations of (1) receiving the command from a requesting node over a communication link; (2) incorporating a header and a header CRC in a response packet; and (3) transmitting the response packet including the header and header CRC before all of the data required to complete the command has been obtained.
In a third aspect of the invention, a first apparatus is provided which includes (1) at least one processor; (2) a memory controller coupled to and adapted to receive commands from one of the at least one processor via a bus, and coupled to one or more remote nodes via a communication link. The memory controller is adapted to: transmit a command issued by the at least one processor to a remote node over the communication link to obtain access to data required to complete the command; receive from the remote node a response packet including a header and a header CRC; validate the response packet based on the header CRC; and before receiving the data required to complete the command, arrange to return the data to the processor over the bus.
In a fourth aspect of the invention, a second apparatus is provided which includes (1) an interface adapted to receive commands from one or more requesting nodes over a communication link; (2) local memory including data required to complete the command; and (3) a memory controller coupled to the interface and to the local memory, the memory controller being adapted to access data in the local memory and to construct a response packet including a header and a header CRC. The memory controller is adapted to transmit the response packet including the header and header CRC before all of the data required to complete the command has been obtained from the local memory.
Other features and aspects of the present invention will become more fully apparent from the following detailed description, the appended claims and the accompanying drawings.
An embodiment of the present invention may provide methods and apparatus for processing a command. A computer system may include one or more processors that may initiate commands, including read and write requests. The data required to fulfill a request may be found in local memory (e.g., dynamic access memory (DRAM)) resources or alternatively, may be found in memory located on a remote computer system (‘remote node’) which may be coupled to the initiating processor (e.g., over a network). Compared with local access, remote data access typically takes more time, introducing a latency period between the issuance of a request from the requesting node and the return of the data from the remote node.
According to the methods and apparatus of embodiments of the present invention, the latency period that may occur between a data request being sent from a requesting node and the return of the requested data from the remote node may be reduced by incorporating a cyclic redundancy check (CRC) check that covers a response packet header (‘header CRC’). Incorporation of the header CRC may allow validation of a data response header in advance of a full data transfer and a final CRC check over an entire data response packet, enabling an early initiation of a deferred reply before all of the remote data is returned by the remote node. In addition, methods and apparatus of embodiments of the present invention may provide for insertion of a variable data gap in a response by a remote node to further advance the early indication. Methods and apparatus of embodiments of the present invention also may provide a recovery mechanism that may provide for recovery in the event that the data contains one or more errors (i.e., the data CRC does not check out).
Each of the plurality of processors 102, 104, 106, 108 may issue a command (or one or more portions of a command) onto the bus 110 for processing. To provide for the servicing of commands issued by the processors 102, 104, 106, 108 and access to memory resources, the apparatus 100 may include a memory controller (e.g., chipset) 112 which may be coupled to the bus 110. The apparatus 100 may further include local memory 114 coupled to the memory controller 112, which may include one or more memory units 116, 118 (e.g., DRAMs, cache, or the like).
A command (e.g., a read request) issued by a processor 102, 104, 106, 108 may include a header, command and address information. The address information included in the command may indicate a memory location where data requested to be read may reside. The memory controller 112 may be adapted to schedule and route commands received from the processors 102, 104, 106, 108 over the bus 110 to memory locations specified in the commands, which may be situated in either local memory 114 or in memory external to the apparatus 100.
The memory controller 112 may include several sub-components adapted to perform the tasks of processing commands and providing memory access. In one or more embodiments, the memory controller 112 may include a bus interface 120 which may be adapted to receive commands from the processors 102, 104, 106, 108 via the bus 110 and to regulate communication with the processors via a bus protocol, whereby commands received from the processors 102, 104, 106, 108 may be executed in various discrete transaction stages determined by the relevant bus protocol. Exemplary command transaction stages that may be employed in the context of an embodiment of the present invention are described further below.
A coherency unit 122 may be coupled to and receive command transactions from the bus interface 120. The coherency unit 122 may be adapted to (1) store pending commands (e.g., in a queue or similar storage area); (2) identify pending commands, which are accessing or need to access a memory address, that should complete before a new command that requires access to the same memory address may proceed; and/or (3) identify a new command received in the memory controller 112 as colliding with (e.g., requiring access to the same memory address as) a pending command previously received in the memory controller 112 that should complete before a second phase of processing is performed on the new command.
The coherency unit 122 may further be adapted to manage a lifetime of the transactions associated with the execution of a command. For example, if a read request issued by a processor 102, 104, 106, 108 is to be deferred for a period before the requested data is returned, the coherency unit 122 may perform tasks such as (i) providing an early indication that the request is being deferred while remote data is being accessed, (ii) checking whether data accessed contains errors, and (iii) indicating when data has been returned from the remote node.
The coherency unit 122 may further include a local memory interface 124, a scalability port interface 125 and an I/O interface 126 (e.g., within the memory controller 120). The local memory interface 124 may enable data communication between the coherency unit 122 and the local memory system 114, and in particular, enable the coherency unit to access data stored in the local memory system 114 within apparatus 100. The scalability port interface 125 may enable communication between the coherency unit 122 and one or more remote nodes coupled to the scalability port interface 125. The I/O interface 126 may enable communication between the coherency unit 122 and one or more peripheral devices (not shown).
In a first stage 302 of command processing, referred to as request phase, a processor (e.g., 102) may issue a command on the bus 110 such that the command may be observed by components coupled to the bus 110, such as remaining processors 104, 106, 108 and/or the memory controller 112. For example, in stage 302, the coherency unit 122 may initiate a collision check to determine whether there is a conflict with other pending requests with respect to the address associated with the command. The coherency unit 122 may also initiate a directory lookup to determine whether the data requested may be found in local memory resources 114 or on one or more remote nodes 200, and then may log the new command in a pending request queue. In stages 303, 304, referred to collectively as a snoop phase, results or processes initiated in stage 302 may be presented. For example, in stage 303, any collisions between other pending requests may be determined, and in stage 304, directory results may be ascertained.
In a third phase (e.g., response phase) of command processing, the coherency unit 122 may indicate whether a command is to be retried (e.g., reissued) or if data requested by the command will be provided. For example, in one or more embodiments, the response phase may include a first stage 306, in which a read request may be transmitted from the requesting node 100 to the remote node 200. In stage 307, which may be performed simultaneously with stage 306, the coherency unit 122 may deliver an early bus data return indication which may provide a notification to the processor bus interface 120 to reserve capacity on the bus 110 for the return of the requested data, which may shorten arbitration on the bus 110 when the data is returned, for example.
Upon receipt of the command request from the requesting node 100, the remote node 200 may initiate collision detection and directory lookup procedures in order to locate the memory location (e.g., in local memory 214) from which data is to be accessed to process the request. The request may flow through pending request queues in the memory controller 212 of apparatus 200 and may be driven onto the memory interface 216. At this point, the number of cycles before data is to be returned from the local memory 214 may be readily determinable (e.g., based on DRAM timings tRCD, tCL). Determination of the number of cycles may allow an early indicator to be provided to the scalability port 220 that may indicate that the data response is coming in N cycles.
The scalability port 220 may, upon receiving the early indication of the data response, begin to construct a data response header and, in parallel, a header CRC. According to an embodiment of the present invention, the remote node 200 may construct a header CRC and send the header CRC along with the header in the first cycle of a response before all of the data has been retrieved.
Referring again to
In addition, the scalability port 220 on the remote node 200 may monitor link utilization to determine how soon to send the early data response header and header CRC. If the SP link(s) 218, 219 have spare capacity, the scalability port 220 may encode a number of empty cycles or ‘gap cycles’ between the header and data response as part of the data response header, thus expanding the total response packet to prevent fragmentation of the packet.
At the requesting node 100, once a header CRC has been validated, the header of the response packet, followed by any gap cycles, may be forwarded to the bus interface 120 to begin a deferred reply on the bus 110. The bus interface 120 may use the number of gap cycles and current bus utilization information to determine when to schedule the deferred reply. In one or more embodiments, the bus interface 120 may load a timer to track the time between the receipt of the early header and the completion of the response packet.
Stage 314 marks the receipt of the requested data within the response packet at the requesting node 100. In stage 316, the data may be returned to the processor 102 over the bus 110, having previously arbitrated for use of the bus 110 for this purpose after validation of the header CRC. In stages 318 and 319, the coherency unit 122 may update the directory and perform a cache write to create a copy of the data received. In stage 320, the entry for the command in the pending queue of the coherency unit 122 is retired.
During processing of the command, there may be an error in the transmission of the data response, and the CRC performed on the received data may not check out. In this case, an embodiment of the present invention provides methods for error correction that allow the deferred reply to begin early despite the error.
In the process shown in
The foregoing description discloses only exemplary embodiments of the invention. Modifications of the above disclosed apparatus and methods which fall within the scope of the invention will be readily apparent to those of ordinary skill in the art.
Accordingly, while the present invention has been disclosed in connection with exemplary embodiments thereof, it should be understood that other embodiments may fall within the spirit and scope of the invention, as defined by the following claims.
The present application is related to U.S. patent application Ser. No. ______, filed ______ and titled “EARLY HEADER CRC IN DATA RESPONSE PACKETS WITH VARIABLE GAP COUNT” (Attorney Docket No. ROC920070353US1), and to U.S. patent application Ser. No. ______, filed ______ and titled “EARLY HEADER CRC IN DATA RESPONSE PACKETS WITH VARIABLE GAP COUNT” (Attorney Docket No. ROC920070354US1), both of which are hereby incorporated by reference herein in their entirety.