The field of this invention relates to an integrated circuit device and a method for calculating a predicate value.
In the field of computer architecture design, it is known for branch predication to be used for mitigating the processing costs that are typically associated with conditional branches of software code, in particular branches to short sections of code. Branch predication is achieved by allowing each instruction to conditionally either perform an operation or do nothing. With branch predication, each instruction is associated with a predicate and will only be executed if the predicate is ‘true’. In this manner all possible branch paths may be followed within the processing pipeline, with the ‘correct’ path being ‘kept’ (executed) and all other paths being discarded based on the predicate values. The main purpose of predication is to avoid jumps over small sections of program code, thereby increasing the effectiveness of pipelined execution and avoiding problems with caching. In addition, functions that are traditionally computed using simple arithmetic and bitwise operations may be quicker to compute using predicated instructions. Furthermore, predicated instructions with different predicates can be combined with unconditional code, thereby allowing better instruction scheduling and, thus, better performance. Further still, predication enables elimination of unnecessary branch instructions, thereby making the execution of unnecessary branches, such as those that make up loops, faster by lessening the load on branch prediction mechanisms.
Predicates are also used to control conditional execution of instructions, and as such their respective conditions are required to be calculated in order for the predicate values to be set. Complex conditions require calculating complex Boolean functions. Typically, such calculations are performed over several cycles. For example, performing such complex calculations may comprise moving predicate registers to general purpose registers and using the CPU's (Central Processing Unit's) execution units to perform the calculations before moving the results back to the predicate registers. Alternatively, the CPU may provide a predetermined set of Boolean operations that may be performed directly on the contents of the predicate registers.
Applications are becoming increasingly more demanding in their requirements for the efficiency of processing devices, such as CPUs and the like, to execute application program code. Accordingly, there is an increasing need for CPUs and the like to minimise the number of cycles required to execute instructions, including in the case of branch predication minimizing the number of cycles required to perform conditional calculations and/or the number of predicates used, which can accelerate control code performance.
The present invention provides an integrated circuit device and a method for calculating a predicate value as described in the accompanying claims.
Specific embodiments of the invention are set forth in the dependent claims.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
Further details, aspects and embodiments of the invention will be described, by way of example only, with reference to the drawings. In the drawings, like reference numbers are used to identify like or functionally similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale.
Examples of the present invention will now be described with reference to an example of an instruction processing architecture, such as a central processing unit (CPU) architecture. However, in other examples, the present invention may not be limited to the specific instruction processing architecture herein described with reference to the accompanying drawings, and may equally be applied to alternative architectures. For the illustrated example, an instruction processing architecture is provided that comprises separate data and address registers. However, in some examples, separate address registers need not be provided, as data registers may be used to provide address storage. Furthermore, for the illustrated examples, the instruction processing architecture is shown as comprising four data execution units. Some examples of the present invention may equally be implemented within an instruction processing architecture comprising any number of data execution units, and not necessarily four. Additionally, because the illustrated example embodiments of the present invention may, for the most part, be implemented using electronic components and circuits known to those skilled in the art, details will not be explained in any greater extent than that considered necessary as illustrated below, for the understanding and appreciation of the underlying concepts of the present invention and in order not to obfuscate or distract from the teachings of the present invention.
Referring first to
As previously mentioned, applications are becoming increasingly more demanding in their requirements for the efficiency of processing devices such as CPUs and the like to execute application program code. Accordingly, there is an increasing need for CPUs and the like to minimise the number of cycles required to execute instructions, including in the case of branch predication minimizing the number of cycles required to perform condition calculation and/or the number of predicates used, which can accelerate control code performance.
Thus, in accordance with an example embodiment of the present invention, the instruction processing module 100 may be arranged to perform branch predication, and comprises at least one predicate calculation module arranged to receive, as inputs, at least one result vector for a predicate function and at least one conditional parameter value therefore. The instruction processing module 100 may also be arranged to output a predicate value from the result vector based at least partly on the at least one received conditional parameter value. For the illustrated example, the at least one predicate calculation module forms an integral part of an execution module 120, and is illustrated generally at 150. As such, it is contemplated that one or more of the execution module 120 may comprise such a predicate calculation module. However, in other examples, one or more predicate calculation modules may additionally and/or alternatively be provided as discrete (stand alone) functional units within the instruction processing module 100.
Predicates are used to control conditional execution of instructions, and as such their respective conditions are required to be calculated in order for the predicate values to be set. Complex conditions require calculating complex Boolean functions. An example of such a Boolean function may comprise:
(p0 ∥ !((p1 ̂ p2)&& p3))→p4 [1]
where p0, p1, p2 and p3 represent predicate registers containing conditional parameter values within the function, and p4 represents a predicate register into which the result of the function is to be loaded.
Conventionally, such calculations are performed over several cycles. For example, performing such calculations may comprise moving the conditional parameter values from predicate registers to general purpose registers and using one or more execution modules to perform the calculations over several execution cycles, before moving the results back to the predicate registers. Alternatively, the execution modules may comprise a predetermined set of Boolean operations that may be performed directly on the contents of the predicate registers. In either case, the above function is required to be calculated as follows:
XOR p1, p2, p2
AND p2, p3, p2
NOT p2, p2
OR p2, p0, p4
Thus, for such conventional implementations, four execution cycles are required to calculate a result for the above function.
Conversely, for the illustrated example of the present invention, by encoding the possible results for such a function within the at least one result vector 220, and outputting the appropriate predicate value 240 from the at least one result vector 220 in accordance with the one or more conditional parameter values 230, a predicate value may be advantageously calculated in a single execution cycle.
For example, and as illustrated in
Thus, by providing such a result vector 220 to the predicate calculation module 150 along with a current permutation of the at least one conditional parameter value(s) 230, the at least one conditional parameter value(s) 230 may be used as selectors within the predicate calculation module 150, for selecting the appropriate predicate result value 240 from the at least one result vector 220 for that permutation of the at least one conditional parameter value(s) 230.
Typically, each permutation of the at least one conditional parameter value(s) 230 may be represented within the truth table 320 as a string of n bits, and thus interpreted as a binary number of n bits. In this manner, each permutation of the at least one conditional parameter value(s) 230 may be interpreted as comprising a unique binary number of n bits. By ordering the permutations of the at least one conditional parameter value(s) 230 in, say, ascending values within the truth table 320, a systematic and predictable ordering of all 2n possible permutations of conditional parameter values 230, and thereby of all 2n possible result values 240 within the result vector 220 may be achieved. As a result, the predicate calculation module 150 may be configured to systematically and predictably select the appropriate predicate result value 240 from the result vector 220 provided thereto, based on the received at least one conditional parameter value(s) 230.
In accordance with some example embodiments of the present invention, the at least one result vector 220, or at least an indication thereof, together with the one or more conditional parameter values 230, or at least an indication thereof, may be encoded within a single predicate calculation instruction, such as illustrated at 200 in
Conversely, in other examples, the single predicate calculation instruction 200 may additionally and/or alternatively comprise, say, one or more register identifiers for identifying one or more registers containing the at least one result vector 220 and/or one or more conditional parameter value(s) 230. Such registers may comprise general purpose data registers, such as illustrated at 140 in
Accordingly, such a predicate calculation instruction 200 may only comprise a result vector encoded therein (or a register identifier therefor), and the execution module 120 may be arranged, upon receipt of such a predicate calculation instruction 200, to extract the encoded at least one result vector 220 from the received predicate calculation instruction 200 (or retrieve the at least one result vector 220 from the register identified therein), and to retrieve the one or more conditional parameter value(s) 230 from the defined registers.
For the illustrated example, the predicate calculation instruction 200 further comprises an indication 245 of a storage location at which the predicate value is to be stored. For example, such a storage location may comprise a general purpose data register, such as illustrated at 140 in
Substantially any Boolean function is capable of being represented by way of a truth table. Accordingly, a predicate calculation module 150, as hereinbefore described with reference to the accompanying drawings, is capable of calculating a predicate result value 240 for substantially any Boolean function for which it is provided an appropriate result vector 220, with the only substantial limitation being the number n of the one or more conditional parameter value(s) 230 and the size (2n) of the at least one result vector 220 that the predicate calculation module 150 is capable of receiving. Furthermore, only a single predicate calculation instruction 200 and a single execution cycle are required for calculating the predicate result value 240, irrespective of the complexity of the underlying Boolean function. Accordingly, such a predicate calculation module 150 may enable substantially any predicate result value 240 to be calculated (for Boolean functions comprising up to n conditional parameters) using only a single predicate calculation instruction 200, and requiring only a single execution cycle, irrespective of the complexity of the underlying Boolean function.
In some examples, a predicate calculation module 150 may be implemented by way of multiplexing circuitry, as generally illustrated in
Referring now to
In the foregoing specification, the invention has been described with reference to specific examples of embodiments of the invention. It will, however, be evident that various modifications and changes may be made therein without departing from the broader spirit and scope of the invention as set forth in the appended claims.
The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.
Each signal described herein may be designed as positive or negative logic. In the case of a negative logic signal, the signal is active low where the logically true state corresponds to a logic level zero. In the case of a positive logic signal, the signal is active high where the logically true state corresponds to a logic level one. Note that any of the signals described herein can be designed as either negative or positive logic signals. Therefore, in alternate embodiments, those signals described as positive logic signals may be implemented as negative logic signals, and those signals described as negative logic signals may be implemented as positive logic signals.
Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. For example, and as previously mentioned, for the illustrated examples the predicate calculation module 150 has been illustrated and described as comprising an integral part of an execution module 120. However, it is contemplated that a predicate calculation module adapted in accordance with the present invention may equally be implemented as a generally discrete, stand alone function element, or integrated within an alternative functional module.
Any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermediary components. Likewise, any two components so associated can also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality.
Furthermore, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
However, other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an”, as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”. The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB11/50275 | 1/21/2011 | WO | 00 | 6/28/2013 |