1. Field of the Invention
The present invention generally relates to processing commands in a command queue. More specifically, the invention relates to processing commands getting address translation cache hits under an outstanding address translation cache miss.
2. Description of the Related Art
Computing systems generally include one or more central processing units (CPUs) communicably coupled to memory and input/output (IO) devices. The memory may be random access memory (RAM) containing one or more programs and data necessary for the computations performed by the computer. For example, the memory may contain a program for encrypting data along with the data to be encrypted. The IO devices may include video cards, sound cards, graphics processing units, and the like configured to issue commands and receive responses from the CPU.
The CPU(s) may interpret and execute one or more commands received from the memory or IO devices. For example, the system may receive a request to add two numbers. The CPU may execute a sequence of commands of a program (in memory) containing the logic to add two numbers. The CPU may also receive user input from an input device entering the two numbers to be added. At the end of the computation, the CPU may display the result on an output device, such as a display screen.
Because sending the next command from a device after processing a previous command may take a long time, during which a CPU may have to remain idle, multiple commands from a device may be queued in a command queue at the CPU. Therefore, the CPU will have fast access to the next command after the processing of a previous command. The CPU may be required to execute the commands in a given order because of dependencies between the commands. Therefore, the commands may be placed in the queue and processed in a first in first out (FIFO) order to ensure that dependent commands are executed in the proper order. For example, if a read operation at a memory location follows a write operation to that memory location, the write operation must be performed first to ensure that the correct data is read during the read operation. Therefore the commands originating from the same I/O device may be processed by the CPU in the order in which they were received, while commands from different devices may be processed out of order.
The commands received by the CPU may be broadly classified as (a) commands requiring address translation and (b) commands without addresses. Commands without addresses may include interrupts and synchronization instructions such as the PowerPC eieio (Enforce In-order Execution of Input/Output) instructions. An interrupt command may be a command from a device to the CPU requesting the CPU to set aside what it is doing to do something else. An eieio operation may be issued to prevent subsequent commands from being processed until all commands preceding the eieio command have been processed. Because there are no addresses associated with these commands, they may not require address translation.
Commands requiring address translation include read commands and write commands. A read command may include an address of the location of the data to be read. Similarly, a write command may include an address for the location where data is to be written. Because the address provided in the command may be a virtual address, the address may require translation to an actual physical location in memory before performing the read or write.
Address translation may require looking up a segment table and a page table to match a virtual address with a physical address. For recently targeted addresses, the page table and segment table entries may be retained in a cache for fast and efficient access. However, even with fast and efficient access through caches, subsequent commands may be stalled in the pipeline during address translation. One solution to this problem is to process subsequent commands in the command queue during address translation. However, command order must still be retained for commands from the same IO device.
If, during translation, no table entry translating a virtual address to a physical address is found in the cache, the entry may have to be fetched from memory. Fetching entries when there are translation cache misses may result in a substantial latency. When a translation cache miss occurs for a command, address translation for subsequent commands may still continue. However, only one translation cache miss may be allowed by the system. Therefore, only those subsequent commands that have translation cache hits (hits under miss), or commands that do not require address translation may be processed while a translation cache miss is being handled.
One problem with this solution is that commands getting cache hits under an outstanding address translation cache miss may be dependent on the command getting the outstanding address translation cache miss. For example, the dependent commands may be issued by the same device, and in the same virtual channel, thereby requiring that the commands be executed in order. As a result of the dependency, the subsequent dependent commands may be processed again after the translation results for the command getting the miss are retrieved. Therefore, the addresses of the dependent subsequent commands may need to be retranslated after the outstanding miss has been handled.
One solution to this problem is to handle only one command at a time. However, as described above, this may cause a serious degradation in performance because commands may be stalled in the pipeline during address translation. Another solution may be to reissue the subsequent dependent commands for translation after address translation entries for the command getting a miss have been retrieved from memory. However, this solution is inefficient because of the redundant address translation. Yet another solution may be to include software preload of translation cache wherein the software ensures no misses. However, this solution creates undesired software overhead.
Therefore, what is needed is systems and methods for efficiently processing commands getting hits under a miss.
The present invention generally provides methods and systems for processing commands in a command queue.
One embodiment of the invention provides a method for processing commands in a command queue having stored therein a sequence of commands received from one or more input/output devices. The method generally comprises sending an address targeted by a first command in the command queue to address translation logic to be translated, and in response to determining no address translation entry exists in an address translation table of the translation logic containing virtual to real translation of the address targeted by the first command in the command queue, initiating retrieval of the address translation entry from memory. The method further comprises processing one or more commands received subsequent to the first command while retrieving the entry for the first command, wherein the processing includes sending an address targeted by a second command in the command queue to the address translation logic to be translated. The method further includes, in response to determining that the one or more commands received subsequent to the first command was sent by the same device that sent the first command, preserving the one or more commands and the address translation of the second command until the address translation for the first command is completed.
Another embodiment of the invention provides a system for processing commands in a command queue generally comprising one or more input/output devices, and a processor. The processor generally comprises (i) a command queue configured to store a sequence of commands received from the one or more input/output devices, (ii) an input controller configured to process commands from the command queue in a pipelined manner, (iii) address translation logic configured to translate addresses targeted by commands processed by the input controller using address translation tables with entries containing virtual to real address translations, and (iv) control logic configured to, in response to determining that a second command is sent by the same device that sent the first command for which an address translation entry is not found in cache, preserve the address translation for the second command until the address translation entry for a first command is retrieved.
Yet another embodiment of the invention provides a microprocessor for processing commands in a command queue. The microprocessor generally comprises (i) a command queue configured to store a sequence of commands from an input/output device, (ii) an input controller configured to process the commands in the command queue in a pipelined manner, (iii) address translation logic configured to translate virtual addresses to physical addresses utilizing cached address translation entries in an address translation table, and if for a command the address translation entry is not found in the cache, retrieve a corresponding address translation entry from memory, and (iv) an output controller configured to in response to determining that a second command is sent by the same device that sent the first command for which an address translation entry is not found in cache, preserve the address translation for the second command until the address translation entry for the first command is retrieved.
So that the manner in which the above recited features, advantages and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments thereof which are illustrated in the appended drawings.
It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
Embodiments of the present invention provide methods and systems for maintaining command order while processing commands in a command queue. Commands may be queued in an input command queue at the CPU. During address translation for a command, subsequent commands may be processed to increase efficiency. Processed commands may be placed in an output queue and sent to the CPU in order. If address translation entries for a command are not found, the translation entries may be retrieved from memory. Address translations for subsequent commands depending from the command getting the miss may be preserved until the address translation entry is retrieved from memory. Therefore, retranslation of addresses for subsequent commands is avoided.
In the following, reference is made to embodiments of the invention. However, it should be understood that the invention is not limited to specific described embodiments. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, in various embodiments the invention provides numerous advantages over the prior art. However, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
An Exemplary System
IO device 130 may also be configured to receive responses 132 from CPU 110. Responses 132, for example, may include the results of computation by CPU 110 that may be displayed to the user. Responses 132 may also include write operations performed on a memory device, such as the DRAM device described above. While one IO device 120 is illustrated in
Memory 140 is preferably a random access memory such as a dynamic random access memory (DRAM). Memory 140 may be sufficiently large to hold one or more programs and/or data structures being processed by the CPU. While the memory 140 is shown as a single entity, it should be understood that the memory 140 may in fact comprise a plurality of modules, and that the memory 140 may exist at multiple levels from high speed caches to lower speed but larger DRAM chips.
CPU 110 may include a command processor 111, translate logic 112, an embedded processor 113 and cache 114. Command processor 111 may receive one or more commands 131 from 10 device 120 and process the command. Each of commands 131 may be broadly classified as commands requiring address translation and commands without addresses. Therefore, processing the command may include determining whether the command requires address translation. If the command requires address translation, the command processor may dispatch the command to translate logic 112 for address translation. After those of commands 131 requiring translation have been translated, command processor may place ordered commands 133 on the on-chip bus 117 to be processed by the embedded processor 113 on the memory controller 118.
Translate logic 112 may receive one or more commands requiring address translation from command processor 111. Commands requiring address translation, for example, may include read and write commands. A read command may include an address for the location of the data that is to be read. Similarly, a write operation may include an address for the location where data is to be written.
The address included in commands requiring translation may be a virtual address. A virtual address may be referring to virtual memory allocated to a particular program. Virtual memory may be continuous memory space assigned to the program, which maps to different, non-contiguous, physical memory locations within memory 140. For example, virtual memory addresses may map to different non-continuous memory locations in physical memory and/or secondary storage. Therefore, when a virtual memory address is used, the virtual address must be translated to an actual physical address to perform operations on that location.
Address translation may involve looking up a segment table and a page table. The segment table and/or the page table may match virtual addresses with physical addresses. These pre-translated table entries may reside in main memory. Address translations for recently accessed data may be retained in a segment table 116 and page table 115 in cache 114 to reduce translation time for subsequent accesses to previously accessed addresses. If an address translation is found in cache 114, a translation cache hit occurs and the translation may be retrieved from the page and segment table entry in cache. If an address translation is not found in cache 114, a translation cache miss occurs and the translations may be brought into the cache from memory or other storage, when necessary.
Segment table 116 may indicate whether the virtual address is within a segment of memory allocated to a particular program. Segments may be variable sized blocks in virtual memory, each block being assigned to a particular program or process. Therefore, the segment table may be accessed first. If the virtual address addresses an area outside the bounds of a segment for a program, a segmentation fault may occur.
Each segment may be further divided into fixed size blocks called pages. The virtual address may address one or more of the pages contained within the segment. A page table 115 may map the virtual address to pages in memory. If a page is not found in memory, the page may be retrieved from secondary storage where the desired page may reside.
Command Processing
The translate interface input control (TIIC) 202 may monitor and manage the input command FIFO 201. The TIIC may maintain a read pointer 210 and a write pointer 211. The read pointer 210 may point to the next available command for processing in the input command FIFO. The write pointer 211 may indicate the next available location for writing a newly received command in the input command FIFO. As each command is retrieved from the input command FIFO for processing, the read pointer may be incremented. Similarly, as each command is received from the IO device, the write pointer may also be incremented. If the read or write pointers reach the end of the input command FIFO, the pointer may be reset to point to the beginning of the input command FIFO at the next increment.
TIIC 202 may be configured to ensure that the input command FIFO does not overflow by preventing the write pointer from increasing past the read pointer. For example, if the write pointer is increased and points to the same location as the read pointer, the buffer may be full of unprocessed commands. If any further commands are received, the TIIC may send an error message indicating that the command could not be latched in the CPU.
TIIC 202 may also determine whether a command received in the input command FIFO 201 is a command requiring address translation. If a command requiring translation is received the command may be directed to translate logic 112 for processing. If, however, the command does not require address translation, the command may be passed down the pipeline.
The operations in the TIIC begin in step 301 by receiving a command from the input command FIFO. For example, the TIIC may read the command pointed to by the read pointer. After the command is read, the read pointer may be incremented to point to the next command. In step 302, the TIIC may determine whether the retrieved command requires address translation. If it is determined that the command requires address translation, the command may be sent to translate logic 112 for address translation in step 303. In step 304, the input command FIFO address of the command sent to the translate logic may be sent down the pipeline. In step 302, if it is determined that the command does not require address translation, the command and the input command FIFO address of the command may be sent down the pipeline in step 305.
Referring back to
If no miss occurs during address translation, the translate logic may provide translation results to the Translate Interface Output Control (TIOC) 203, as illustrated in
If, however, the page and segment table entries are not found in the segment and page table caches, a notification of the translation miss for the command address may be sent to the TIOC in step 405. The translate logic may initiate miss handling procedures in step 406. For example, miss handling may include sending a request to memory or secondary storage device for the corresponding page or segment table entries.
It is important to note that, for some embodiments, the translate logic may handle only one translation cache miss when there is an outstanding miss being handled. If a second miss occurs, a miss notification may be sent to the TIOC. The handling of a second miss while an outstanding miss is being processed is discussed in greater detail below. Furthermore, as an outstanding miss is being handled, subsequent commands requiring address translation may continue to be processed. Because retrieving page and segment table entries from memory or secondary storage may take a relatively long time, stalling subsequent commands may substantially degrade performance. Therefore, subsequent commands with translation cache hits may be processed while a miss is being handled.
Processing Commands Under Misses
Referring back to
On the other hand, if a command received by the TIOC depends on a command that may not have been processed, the command complete signal for the command may not be asserted. For example, a first command in the input command FIFO may require address translation and may be transferred to the translate logic for address translation. While the first command is being translated, a subsequent second command depending on the first command that may not require address translation may be passed to the TIOC sooner than the first command. Similarly, while the first command is being translated, a third subsequent command that depends on the first command may get a translation cache hit and be passed to the TIOC.
Each command may include an IO identifier (IOID) and virtual channel number associated with the command. The IOID, for example, may identify the IO device from which the command was received. The TIOC may identify dependencies between commands by comparing the IOID and virtual channel of the commands getting address translation hits to the IOID and virtual channel of the command for which address an translation entry is being retrieved from memory.
As used herein, the term virtual channel generally refers to a data path that carries request and/or response information between components, for example, an IO device and a processor. Each virtual channel typically utilizes a different buffer within the device, with a virtual channel number indicating which buffer a packet transferred on that virtual channel will use. Virtual channels are referred to as virtual because, while multiple virtual channels may utilize a single common physical interface (e.g., a bus), they appear and act as separate channels.
If a dependent command getting a hit under a miss in the address translation cache is encountered, the command, along with the translated address may be stored in a hit collision FIFO 205 by the TIOC. The hit collision FIFO 205 may be a buffer large enough to hold a predetermined number of commands. The TIOC may not assert a command complete signal for commands stored in the hit collision FIFO 205 until the pending address translation miss has been handled.
After the translation results for the command getting the miss have been retrieved, the TIOC may assert the command complete signal for the command getting the miss. The command complete signal may also be asserted for commands in the hit collision FIFO. While issuing commands in the hit collision FIFO, the previously translated results stored in the command queue may be used. Therefore, the retranslation of addresses for commands in the hit collision FIFO is avoided.
Because the latency for retrieving address translation entries for a command getting a miss may be large, the hit collision FIFO 205 may fill up, leaving no room for additional dependent commands receiving hits in the address translation cache. If the hit collision FIFO becomes full, a hit collision FIFO full signal 212 may be sent to the TIIC, as illustrated in
On the other hand, if the IOID and virtual channel number of the command match the IOID and virtual channel number of a command getting a miss, the TIOC may store the command and translation results for the command in hit collision FIFO 505. In step 504 a determination is made as to whether the hit collision FIFO is now full. If the hit collision FIFO is now full, a hit collision FIFO full signal may be sent to the TIIC in step 506.
The TIOC may also monitor the number of misses occurring in the translate logic for identifying a miss under a miss. As described above, each time a miss occurs in the translate logic, a notification may be sent to the TIOC identifying the command getting the miss. If a second miss occurs while a first miss is being handled, the TIOC may stall the pipeline until the first miss has been handled. The TIOC may stall the pipeline until the earlier miss for the command has been completed before processing of the command causing the second miss can resume.
In response to receiving the stall notification from the TIOC, the TIIC may stall the pipeline by not issuing commands until further notice from the TIOC. The pipeline may be stalled until the first miss has been handled and the translation results are received by the TIOC. The TIIC may also reset the read pointer to point to the command causing the second miss in the input command FIFO. Therefore, the command causing the second miss and subsequent commands may be reissued after the first miss has been handled.
The pipeline may be drained before reissuing a command causing a second miss and subsequent commands.
Thereafter, in step 704, processing of the command causing the second miss and subsequent commands may be resumed. One simple way for resuming processing of the command causing the second miss and subsequent commands may be to reissue the commands. For example, the TIIC may receive the second command causing the miss and subsequent commands from the input command FIFO and process the commands as described above. Therefore, command ordering is maintained.
Conclusion
By allowing processing of subsequent commands during address translation for a given command, overall performance may be greatly improved. Furthermore, subsequent commands depending on the given command and their address translations may be preserved until the address translation for the given command is retrieved, thereby avoiding the need to retranslate addresses for the dependent commands.
While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.
This application is related to U.S. patent application Ser. No. _______, Attorney Docket No. ROC920050456US1, entitled METHOD FOR COMPLETING 10 COMMANDS AFTER AN IO TRANSLATION MISS, filed Feb. __, 2006, by John D. Irish et al. and U.S. patent application Ser. No. ______, Attorney Docket No. ROC920050463US1, entitled METHOD FOR COMMAND LIST ORDERING AFTER MULTIPLE CACHE MISSES, filed Feb. __, 2006, by John D. Irish et al. The related patent applications are herein incorporated by reference in entirety.