The field of this invention relates to integrated circuit devices and methods for scheduling and executing a restricted load operation.
In the field of central processing unit (CPU) architectures and the like, and in particular for ‘in order’ pipelined CPU architectures, instruction scheduling is typically a compiler optimisation routing/process used to improve instruction level parallelism, which improves the performance of instruction processing architectures comprising instruction pipelines. Typically, instruction scheduling attempts to avoid pipeline stalls by re-arranging an order of instructions, and attempts to avoid illegal or semantically ambiguous operations (typically involving subtle instruction pipeline timing issues or non-interlocked resources), without changing the meaning of the application program code that is being compiled.
For conventional CPU architectures, compilers are typically restricted from cross block scheduling optimisations (i.e. scheduling optimisations between basic blocks of code within a program), in order to avoid violating un-optimised code exception behaviour. For example,
Furthermore, in conventional CPU architectures, compilers are also typically restricted from re-ordering read and write operations due to pointer ambiguity (e.g. in case of a write operation prematurely modifying a read area). For example,
Such restrictions in the ability to schedule the execution of instructions can have a significant detrimental effect on the efficiency with which the code may be executed by a CPU, and specifically can result in sub-optimal usage of the parallel processing capabilities of the CPU architecture.
The present invention provides integrated circuit devices, a method for executing a restricted load operation and a method for scheduling a restricted load operation as described in the accompanying claims.
Specific embodiments of the invention are set forth in the dependent claims.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.
Further details, aspects and embodiments of the invention will be described, by way of example only, with reference to the drawings. In the drawings, like reference numbers are used to identify like or functionally similar elements. Elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale.
Examples of the present invention will now be described with reference to an example of an instruction processing architecture, such as a central processing unit (CPU) architecture. However, it will be appreciated that the present invention is not limited to the specific instruction processing architecture herein described with reference to the accompanying drawings, and may equally be applied to alternative architectures. For the illustrated example, an instruction processing architecture is provided comprising separate data and address registers. However, it is contemplated in some examples that separate address registers need not be provided, with data registers being used to provide address storage. Furthermore, for the illustrated examples, the instruction processing architecture is shown as comprising four data execution units. Some examples of the present invention may equally be implemented within an instruction processing architecture comprising any number of data execution units. Additionally, because the illustrated example embodiments of the present invention may, for the most part, be implemented using electronic components and circuits known to those skilled in the art, details will not be explained in any greater extent than that considered necessary as illustrated below, for the understanding and appreciation of the underlying concepts of the present invention and in order not to obfuscate or distract from the teachings of the present invention.
Referring first to
As previously mentioned, scheduling restrictions can significantly limit the optimisation that may be achieved for the execution of instructions within an instruction processing module such as that illustrated in
In this manner, data held within the target register 340 may be validated by comparing it to the validation data to determine whether or not the previously loaded data is still valid (e.g. has not been overwritten). As a result, a load operation for which a scheduling restriction exists (hereinafter referred to as a ‘restricted load’ operation) may be scheduled ahead of the scheduling restriction, whereby target data is scheduled to be loaded into the target register 340 ahead of the scheduling restriction within the instruction sequence. The load validation instruction may then be scheduled after the scheduling restriction (but before the target data is used) to validate the data within the target register 340 in order to determine whether, following the scheduling restriction, the data is still valid. If the stored data within the target register 340 is still valid (for example if the stored data within the target data matches the validation data), then the instruction processing module 300 may proceed with executing the next sequential instruction, for example in which the stored data is used. Thus, a more optimised scheduling of such restricted load operations may be performed, thereby enabling a more efficient execution of a respective instruction sequence. Furthermore, as will be appreciated by a skilled artisan, the use of such a load validation instruction in this manner substantially alleviates the need for complex validation mechanisms to be provided, and the need for speculative load operation data etc. to be maintained, within the instruction processing module 300.
Conversely, for an example instruction sequence 405 scheduled in accordance with some example embodiments of the present invention, the restricted load operation may be initially implemented by way of an initial load instruction 410 that is scheduled ahead of the conditional branch 110 responsible for the scheduling restriction 160. In this manner, the operation of loading target data required for use after the scheduling restriction 160 is initiated in advance, in order to enable the data to be available for use without a need for introducing a stall 170 into the instruction pipeline. Additionally, a load validation instruction 420, as described above, is scheduled after the scheduling restriction 160 to validate the data stored within the target register 340. Assuming the target data loaded by the initial load instruction 410 has not be overwritten or the data in the target register 340 is otherwise not invalid, and thereby validated by the load validation instruction 420, the execution of the instruction sequence 405 proceeds on to the next sequential instruction 450, which for the illustrated example uses the target data within the target register. Significantly, and as illustrated in
A risk of loading data ahead of the scheduling restriction 160 in this manner is that, in the case of such a scheduling restriction 160 being in the form of a conditional branch, an MMU (Memory Management Unit) may decide not to provide the data in response to the initial load instruction 410. As such, the data in the target register will subsequently not be valid; hence the provision of the load validation instruction 420. In such a case, where the data in the target register 340 is invalid, for example as a result of an MMU (not shown) not providing the data in response to the initial load instruction 410, the load validation instruction 420 may be arranged to cause the validation data to be written to the target register 340, as illustrated at 440. In this manner, the data in the target register 340 may be updated to comprise the correct data. Since the load validation instruction 420 will be required to retrieve the validation data from the system memory 350, it will experience a ‘load to use’ penalty of, in this example, three execution cycles. As a result, any subsequent instructions within the instruction pipeline may have already accessed the invalid data before the data has been (in)validated. In the case where the stored data within the target register 340 is valid, execution of the subsequent sequential instructions within the instruction sequence 405 may be allowed to continue. However, in the case where the stored data within the target register 340 is invalid, the load validation instruction 420 may be further arranged to cause the instruction pipeline to be ‘flushed’, and for the execution flow to restart from, say, the next sequential instruction 450 within the instruction sequence 405 following the load validation instruction 420.
In this manner, corrupt execution of subsequent instructions based on the invalid data may be purged from the instruction pipeline. Although such a flushing of the instruction pipeline will result in a stall whilst subsequent instructions propagate through the instruction pipeline, as illustrated at 470, such a stall 470 is comparable to the stall 170 within the conventional instruction sequence 400. However, as illustrated in
For some example embodiments of the present invention, the initial load instruction 410 may be arranged to cause, for the illustrated example, the instruction processing module 300 to disregard memory management error indications. In some examples, the instruction processing module 300 may disregard memory management error by blocking data reaching the core/target register 340. For example, MMUs (memory management units) are responsible for memory protection and translation services for the CPU. Typically, memory errors are received predominantly for a memory access to areas that the running task either does not have translation for, or to areas that an Operating system (OS) has defined such a task as not being allowed access to. In the context of software speculation, such as hereinbefore described, a speculated memory load (e.g. the initial load initiated by initial load instruction 410) can be from a non-initialized pointer with an undefined value. As a result it is likely to generate a memory error.
Conversely, for an example instruction sequence 505 scheduled in accordance with some example embodiments of the present invention, the restricted load operation may be once again initially implemented by way of an initial load instruction 410 that is scheduled ahead of the store (write) operation 210 responsible for the scheduling restriction 260. In this manner, the operation of loading target data required for use after the scheduling restriction 260 is initiated in advance in order to enable the data to be available for use without a need for introducing a stall 270 into the instruction pipeline. Additionally, a load validation instruction 420 is scheduled after the scheduling restriction 260 to validate the data stored within the target register 340. As for the example illustrated in
For the examples illustrated in
However, if the data in the target register 340 is invalid, for example as a result of, say, an MMU (not shown) not providing the data in response to the initial load instruction 410, the load validation instruction 420 may be arranged to cause the validation data to be written to the target register 340, as illustrated at 640. In this manner, the data in the target register 340 may be updated to comprise the correct data. As previously mentioned, since the load validation instruction 420 will be required to retrieve the validation data from the system memory 350, it will experience a ‘load to use’ penalty of, in this example, three execution cycles 670. As a result, any subsequent instructions within the instruction pipeline may have already accessed the invalid data before the data has been (in)validated. Thus, in the case where the stored data within the target register 340 is invalid, the load validation instruction 420 may be further arranged to cause the instruction pipeline to be ‘flushed’.
As will be appreciated, the previously executed usage instruction 650, which may have used the invalid data, will be required to be re-executed following the instruction pipeline being flushed. Accordingly, in one example, the load validation instruction 420 may be arranged, following the instruction pipeline being flushed, to cause a re-execution of the speculatively scheduled usage instruction 650, as illustrated at 685. Such an operation may be performed prior to the execution flow re-starting from, say, the next sequential instruction 450 within the instruction sequence 405 following the load validation instruction 420. Thus, for the example illustrated in
Referring now to
Conversely, if, at 730, it is determined that the data within the target register does not match the read validation data, the method moves on to 740 where, the validation data is loaded into the target register, over-writing the previous (invalid) data stored therein. An instruction execution core pipeline is the flushed, at 745, in order to purge corrupt execution of subsequent instructions based on the invalid data from the instruction pipeline. The method may then move on to 735 with the continued execution of the next sequential instruction, before ending at 770. However, as previously mentioned, following a speculative usage of the data within the target register ahead of the scheduling restriction 780, such as data usage 717, a conditional jump instruction 732 may (optionally) be received following (or in parallel with) the load validation instruction. Accordingly, following the instruction execution core pipeline being flushed at 745, the method may return to the conditional jump instruction 732. In such a case, the load validation instruction 720 may cause the conditional bit to be set such that the conditional jump instruction is executed, resulting in a change of flow within the execution of the instruction sequence to a ‘fix-up’ code snippet 750, which may cause a re-execution of the speculatively scheduled usage 717. The method may then return to the execution of the next sequential instruction at 735, and end at 770.
Referring now to
In the foregoing specification, the invention has been described with reference to specific examples of embodiments of the invention. It will, however, be evident that various modifications and changes may be made therein without departing from the broader spirit and scope of the invention as set forth in the appended claims.
The connections as discussed herein may be any type of connection suitable to transfer signals from or to the respective nodes, units or devices, for example via intermediate devices. Accordingly, unless implied or stated otherwise, the connections may for example be direct connections or indirect connections. The connections may be illustrated or described in reference to being a single connection, a plurality of connections, unidirectional connections, or bidirectional connections. However, different embodiments may vary the implementation of the connections. For example, separate unidirectional connections may be used rather than bidirectional connections and vice versa. Also, plurality of connections may be replaced with a single connection that transfers multiple signals serially or in a time multiplexed manner. Likewise, single connections carrying multiple signals may be separated out into various different connections carrying subsets of these signals. Therefore, many options exist for transferring signals.
Although specific conductivity types or polarity of potentials have been described in the examples, it will be appreciated that conductivity types and polarities of potentials may be reversed.
Each signal described herein may be designed as positive or negative logic. In the case of a negative logic signal, the signal is active low where the logically true state corresponds to a logic level zero. In the case of a positive logic signal, the signal is active high where the logically true state corresponds to a logic level one. Note that any of the signals described herein can be designed as either negative or positive logic signals. Therefore, in alternate embodiments, those signals described as positive logic signals may be implemented as negative logic signals, and those signals described as negative logic signals may be implemented as positive logic signals.
Those skilled in the art will recognize that the boundaries between logic blocks are merely illustrative and that alternative embodiments may merge logic blocks or circuit elements or impose an alternate decomposition of functionality upon various logic blocks or circuit elements. Thus, it is to be understood that the architectures depicted herein are merely exemplary, and that in fact many other architectures can be implemented which achieve the same functionality. Specifically, the present invention is not limited to the particular instruction processing architecture illustrated in
Any arrangement of components to achieve the same functionality is effectively “associated” such that the desired functionality is achieved. Hence, any two components herein combined to achieve a particular functionality can be seen as “associated with” each other such that the desired functionality is achieved, irrespective of architectures or intermediary components. Likewise, any two components so associated can also be viewed as being “operably connected,” or “operably coupled,” to each other to achieve the desired functionality.
Furthermore, those skilled in the art will recognize that boundaries between the above described operations merely illustrative. The multiple operations may be combined into a single operation, a single operation may be distributed in additional operations and operations may be executed at least partially overlapping in time. Moreover, alternative embodiments may include multiple instances of a particular operation, and the order of operations may be altered in various other embodiments.
However, other modifications, variations and alternatives are also possible. The specifications and drawings are, accordingly, to be regarded in an illustrative rather than in a restrictive sense.
In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word ‘comprising’ does not exclude the presence of other elements or steps then those listed in a claim. Furthermore, the terms “a” or “an”, as used herein, are defined as one or more than one. Also, the use of introductory phrases such as “at least one” and “one or more” in the claims should not be construed to imply that the introduction of another claim element by the indefinite articles “a” or “an” limits any particular claim containing such introduced claim element to inventions containing only one such element, even when the same claim includes the introductory phrases “one or more” or “at least one” and indefinite articles such as “a” or “an”. The same holds true for the use of definite articles. Unless stated otherwise, terms such as “first” and “second” are used to arbitrarily distinguish between the elements such terms describe. Thus, these terms are not necessarily intended to indicate temporal or other prioritization of such elements. The mere fact that certain measures are recited in mutually different claims does not indicate that a combination of these measures cannot be used to advantage.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2011/050581 | 2/11/2011 | WO | 00 | 7/31/2013 |