The present invention relates generally to microprocessors, and in particular to a computer utilizing a zero overhead loop strategy for an arbitrary number of nested loops.
Many different processor architectures are known in the art. Known processors typically read instructions and data, perform operations on the data according to the instructions, and forward results from the operations to other stages.
According to the example shown in
Jumps, conditional jumps, and loops are exceptional events in an instruction stream and cause instruction streams to stall. As a consequence, processing units run idle if no additional effort to fill the pipes is made. This phenomena is caused when, e.g., counters are compared with a value or conditions are evaluated in the execute stage 63. As a consequence, the decode logic 59 is idle and even the instruction fetch of the next subsequent instruction from the program memory 55 cannot be performed until the condition is evaluated or a result of the comparison of a program count control unit 60 is performed.
The program count is the address of the instruction which is read from the program memory 55. The program count is stored in a program count register 51 and is modified by a program count control logic 53 which can handle jumps, conditional jumps, and even loops.
Usually, two kinds of loops are used: loops that are bound to a condition, and loops that are bound to a counter. Loops that are bound to a condition work similar to conditional jumps and cause the program count to jump back to an instruction of the instruction sequence before the current instruction in case a condition evaluates to true. Loops that are bound to a counter repeat a loop as long as a counter is not equal to zero decrementing a counter at the end of each cycle.
One technique to avoid idle stages and stalling of the instruction stream in case of loops is a zero overhead loop approach. Several implementations of zero overhead loops are available that allow a logic circuit to determine whether the loop has to be repeated or not in either the decode stage or the fetch stage. The main idea of zero overhead loops is that the loop control is located in the fetch stage (or alternatively in the decode stage) and not in the execution stage.
Nested loops traditionally require additional complex logic to implement. Available approaches limit the number of nested loops or use a high number of logic elements such as comparators or use a high number of registers.
However, even for single-instruction multiple data (SIMD) architectures, loop control is of high importance as a multitude of physical units (PUs) work in parallel. The PUs in SIMD architectures normally are controlled by a central control unit. Idle execute stages in such architectures would mean all execute stages of all PUs are running idle thus leading to a higher loss of processing power.
However, even with various techniques applied, there is still considerable room for improvement. Therefore, what is needed is a high-performance implementation of zero overhead loops which provides the loop depth, i.e., the loop level, and provides an optimal and simple circuit to control nested loops.
A method and apparatus to control execution of nested loops is disclosed. The method and apparatus stores the loop level of the current loop in execution and uses this loop level to select the correct data set provided for each loop. This data set for each loop includes a start address, an end address, and a loop counter or a loop flag, respectively. The method and apparatus can use just one comparator and makes use of a loop level control logic and a loop control logic. Example embodiments for such a loop level control logic and a loop control logic are provided. The method and apparatus allows arbitrary nested loops to be controlled without increasing the complexity of the circuit and allows additional loop control. The only precondition is that the loop end addresses are different.
In an exemplary embodiment, the present invention is an electronic circuit to implement zero overhead loops for N nested loops in a processor. The circuit includes a program count register configured to store a program count where the program count is an address of an instruction to be fetched, a plurality of loop start registers configured to store loop start addresses of the N nested loops where the loop start addresses are addresses of a first of a plurality of instructions of the nested loops, and a plurality of loop end registers configured to store loop end addresses of the N nested loops where the loop end addresses are addresses of a last of the plurality of instructions of the nested loops. The circuit also includes a loop level control logic configured to control and set a loop level where the loop level control logic including a loop level register configured to store a loop level.
In another exemplary embodiment, the present invention is a method of controlling N nested loops including storing a program count where the program count are an address of an instruction to be fetched next, storing a set of N loop start addresses, where the loop start addresses are the addresses of a first of the instructions of the N nested loops, storing a set of N loop end addresses where the loop end addresses are the addresses of a last of the instructions of the N nested loops, and storing a loop level where the loop level are a number of a current loop with the current loop being a most inner loop containing an instruction in execution. The method also includes determining a current loop start address out of the set of N loop start addresses using the loop level, determining a current loop end address out of the set of N loop end addresses using the loop level, generating a next address by incrementing the program count, selecting a next value for the program count from a set of possible program count values, comparing the program count with the current loop end address, controlling and setting the loop level, and controlling and setting the program count multiplexer.
Typical computer programs make use of nested loops. Each loop in a set of nested loops has a loop level. Imagine a number of N nested loops, where each loop except for the most outer one is contained in another loop. The loop level (LL) of the most outer loop is 1 and the LL of the most inner loop is N. Therefore, loop N is contained in loop N−1 which is contained in loop N−2 and so on. Hence, all loops are contained in loop 1. Each loop has a start address and an end address which are the bounds of a loop. Hence, every instruction contained in loop N is within the bounds of all other loops as well.
As a result of an analysis of available programs, in most of all programs nested loops have different end addresses. Given, for example, three nested loops, in most applications the end address of loop 1 is higher than the end address of loop 2 which is higher than the end address of loop 3.
Disclosed herein, the property of nested loops for which the end address of every loop is higher than the end address of its nested inner loops is termed characteristic of the disclosure. The present invention exploits this characteristic and supports nested loops which are arranged in such a way.
Hence, programs that make use of nested loops which have the same end address have to be rearranged for execution by the disclosed method and apparatus. The only criteria those loops have to meet is the characteristic of the disclosure. As those loops which exactly have the same end addresses can be easily rearranged by the programmer to meet the required characteristic, the present invention can be applied to all available programs.
The characteristic leads to a significant reduction of the complexity of a zero overhead loop circuit. One advantage of the present invention is that it can be used for an arbitrary number of nested loops without increasing the complexity of the circuit. For example, registers may be used which store loop start addresses, loop end addresses, and loop count registers to control a loop. Any associated logic can be kept very small and does not depend on the number of nested loops to be supported. To achieve this design, the present invention stores and provides the loop level (LL) of the current loop that is currently being executed. The loop level can be used for control purposes as well. Controlling the loop level enables additional loop control. For example, to skip inner loops without requiring any changes to the program is explained in detail herein. In the disclosure which follows, the loop level of the current loop will be referred to simply as the loop level.
The exemplary embodiment of
A loop level (LL) register 301 stores the LL of the loop which will be repeated next. The loop which has the LL that is stored in the LL register 301 is called a current loop. As an example, imagine two nested loops: an outer loop (LL=1) and an inner loop (LL=2). When the loops are entered the first time, the LL register 301 is set to 2 as the inner loop (LL=2) is the loop that is repeated first. The LL register 301 holds the value 2 until all loop iterations of the inner loop have been performed. The LL register 301 is then set to the LL of the next outer loop, which is 1 in this example. When the end address of the outer loop (LL=1) is reached, the next loop iteration of the outer is performed, the LL register 301 is set back to the maximum LL which is 2 again and the process is repeated until all outer loop iterations are performed.
The LL register 301 is set and controlled by a loop level control logic 230. The LL is used to select the bounds of its loop by means of a start multiplexer 204 and an end multiplexer 214. The start multiplexer 204 uses the value of the LL register 301 to select the current loop start address from the loop start addresses stored in the set of start registers 202. The end multiplexer 214 uses the value of the LL register 301 to select the current loop end address from the loop end addresses stored in the set of end registers 212.
A comparator 217 signals a loop control logic 240 when the current loop end address and the PC are equal. The loop control logic 240 is responsible to decide and to signal whether the current loop has to be repeated or not. Reasons to repeat a loop can be that a certain loop condition is true or that a certain number of loop iterations have not yet been reached. If the current loop has to be repeated, the loop control logic 240 resets the PC register 51 to the start address of the current loop. The loop control logic 240 uses a PC multiplexer 209 to load the PC register 51 either with the next address calculated by an incrementer 207 or with the current loop start address received from the start multiplexer 204.
If the loop control logic 240 decides that a loop must not be repeated, the loop control logic 240 signals the loop level control logic 230 that the loop level has to be decremented. As previously mentioned, the loop level control logic 230 controls and sets the LL register 301. The loop level control logic 230 can be implemented in different ways. Embodiments of the present invention can use the LL register 301 to avoid the execution of loops. Other embodiments of the present invention can use the LL register 301 to explicitly control which loops have to be performed.
In alternative embodiments, the LL register 301 can even be read and written by the execute stage. However, the execute stage operates on instructions which have been fetched several cycles before. Therefore, from the execute stage point of view, the LL value which is stored in the LL register 301 contains the LL of the instruction which will be executed in one of the next cycles. Other embodiments of the present invention can use additional registers in the stages between the fetch stage and the execute stage to avoid such a misalignment.
In
As an example of loop execution,
The first loop state diagram shown in
After the second iteration of the inner loop the PC again is at the end of the first loop as shown in the third loop state diagram. The count LC3 is 1 and another decrement of the inner loop counter LC3 would result to zero. Therefore, the counter LC3 is reset with the loop count start value of LC3 (its loop count start value is 2) and the value in the LL register 301 is decremented to 2 which is illustrated by the arrow up. No further iteration of the inner loop is initiated.
When the PC reaches the end of the middle loop as shown in the fourth loop state diagram, the middle loop counter LC2 is decremented and the LL register 301 is set to the maximum LL, illustrated by the arrow down to the bottom. The maximum LL is 3 which is the number of nested loops that are processed. The loop control logic 240 initiates a second iteration of the middle loop.
Continuing, the ninth loop state diagram shown in
The next loop state diagram shown in
The eleventh loop state diagram shown in
The last loop state diagram shown in
As shown in
The comparator 217 signals the loop control logic 240 when the current loop end address and the PC are equal. This signal is used by the fifth multiplexer 327 to forward the correct loop count to the input multiplexers 313 of the loop count registers 311. The correct loop count is the value determined by the fourth multiplexer 325 in case the PC and the current loop end address are equal otherwise the loop count LCx.
The loop level control logic 230 controls and sets the LL register 301. The example implementation for the loop level control logic 230 shown in
In the foregoing specification, the present invention has been described with reference to specific embodiments thereof. It will, however, be evident to a skilled artisan that various modifications and changes can be made thereto without departing from the broader spirit and scope of the present invention as set forth in the appended claims. For example, various embodiments described utilize registers, multiplexers, and other electronic components. A skilled artisan will recognize that other components or combinations thereof may serve similar functions and thus may be substituted for the various embodiments described herein. These and various other embodiments are all within a scope of the present invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
This application claims priority from U.S. Provisional Patent Application Ser. No. 60/862,776 entitled “Digital Processor with Control Means for the Execution of Nested Loops” filed Oct. 25, 2006 which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
60862776 | Oct 2006 | US |