The technology of the disclosure relates generally to performing loop buffering (i.e., loop detection and replay) for loops in computer software instructions processed in a processor.
Microprocessors, also known as “processors,” perform computational tasks for a wide variety of applications. A conventional microprocessor includes a central processing unit (CPU) that includes one or more processor cores, also known as “CPU cores,” that execute software instructions. The software instructions instruct a CPU to perform operations based on data. The CPU performs an operation according to the instructions to generate a result, which is a produced value. Processors employ instruction pipelining as a processing technique whereby the throughput of instructions being executed by a processor may be increased by splitting the handling of each instruction into a series of steps. These steps are executed in one or more instruction pipelines each composed of multiple stages in an instruction processing circuit. In this regard, an instruction processing circuit in a processor includes an instruction fetch circuit that is configured to fetch instructions to be executed from an instruction memory (e.g., system memory or an instruction cache memory). The fetched instructions are decoded in a decoding state and inserted into an instruction pipeline to be pre-processed before reaching an execution circuit to be executed.
Many modern high-performance processors deploy a loop buffer for further pipeline optimization and power savings. A loop is defined as any sequence of instructions in the pipeline whose processing is repeated sequentially in back-to-back operations. For example, loops can occur based on programming software loop constructs that are then compiled in instructions that, according to their processing, will cause a loop operation.
A conventional loop buffer in a processor may also be designed to ignore or not otherwise identify short loops (i.e., loops with a small number of instructions) and/or loops with multiple exit points. This is because the power savings benefit of identifying and replaying such loops may be outweighed by the power cost and complexity associated with identifying and replaying such loop. For example, the processor may wait until a pre-defined number of iterations of a loop are detected before the loop is considered detected for replay. Further, it may be difficult to track or otherwise predict the number of iterations that a loop will iterate for loops that contain multiple exit points. Loop buffering of small loops and/or loops with multiple exit points could actually reduce processor performance and increase power consumption.
Exemplary aspects disclosed herein include loop buffering employing loop characteristic prediction in a processor for optimizing loop buffer performance The processor includes an instruction processing circuit configured to fetch computer program instructions (“instructions”) into an instruction stream in an instruction pipeline(s) to be processed and executed. Loops can be contained in the instruction stream. A loop is a sequence of instructions in the instruction stream that repeat sequentially in a back-to-back arrangement. The instruction processing circuit includes a loop buffer circuit that is configured to detect loops. In response to a detected loop, the loop buffer circuit is configured to capture (i.e., loop buffer) instructions in the detected loop and insert (i.e., replay) the captured loop instructions in the instruction pipeline for iterations of the loop. In this manner, the instructions in the loop do not have to be re-fetched and re-processed, for example, for the subsequent iterations of the loop. Thus, loop buffering can conserve power by not having to re-fetch and re-process instructions in the loop for subsequent iterations of the loop. In exemplary aspects, the loop buffer circuit is configured to predict the number of iterations that a detected loop in the instruction stream will be executed before the loop is exited, as a loop iteration prediction. The loop iteration prediction is a type of loop characteristic prediction. This is to reduce or avoid under- or over-iterating the loop replay. The loop iteration prediction is used to control the number of iterative replays of the loop in the instruction pipeline. For example, a design that chooses a fixed iteration assumption for controlling replay may more often under- or over-iterate loop replay. As another example, a design that chooses to indefinitely replay a loop until a detected exit will over-iterate loop replay. Under-iterating a loop replay results in instructions in the loop being re-fetched and re-processed in the instruction pipeline that otherwise could have been replayed, thus consuming additional power unnecessarily. Over-iterating a loop replay results in additional replay of iterations of the loop in the instruction pipeline that reduces processor performance by such additional iterations being processed unnecessarily.
A replayed loop in the instruction pipeline of the processor may exit without a full iteration. In other words, the last iteration of a loop may be a partial iteration where the loop is exited before all instructions in the loop are fully replayed. In this regard, in other exemplary aspects, the loop buffer circuit can also be configured to predict the loop exit branch of the detected loop as a loop exit branch prediction. The loop exit branch prediction is a type of loop characteristic prediction. The prediction can be used to assist the loop buffer circuit in predicting the exact number of full iterations of the loop to replayed and what instructions to replay for the last partial iteration of the loop. Predicting the number of loop iterations and the loop exit branch allows a more accurate prediction of the number of full iterations of the loop to be replayed in the instruction pipeline to further reduce or avoid under- or over-iterating of the loop replay. Providing a more accurate prediction of the loop iterations to be replayed before the loop is exited can reduce the overhead penalty that would be associated with inaccurately predicting loop iteration for replay of shorter-length, detected loops. Providing a more accurate prediction of the loop iterations to be replayed before the loop is exited can also allow the loop buffer circuit to more accurately instruct the instruction fetch circuit when to resume the fetching and processing of new instructions following a detected loop. This can reduce or avoid instruction bubbles in the instruction pipeline. In this regard, the loop buffer circuit can be configured to instruct the instruction fetch circuit to resume fetching of new instructions following the loop exit based on the predicted loop exit branch of the loop.
The loop buffer circuit can be configured to instruct the instruction fetch circuit to halt fetching and processing of new instructions while a detected loop is being replayed to conserve power. However, the replayed loop may have multiple exit points that could be taken during the last partial iteration of the replayed loop. The next address from which to fetch instructions following a loop exit is not necessarily the next sequential instruction after the loop. In this regard, in other exemplary aspects, the loop buffer circuit can also be configured to predict the exit target address of the loop as a loop exit target prediction. The loop exit target prediction is a type of loop characteristic prediction. The loop buffer circuit can use the exit target address of the loop exit target prediction to instruct the instruction processing circuit as to the starting address to fetch new instructions following the loop exit when instruction fetching is resumed. The loop buffer circuit could be configured to instruct the immediate resumption of instruction fetching during loop replay without having to wait until the loop is exited in replay. Otherwise, if instruction fetching is resumed before the loop is exited, it may be more likely that the instruction pipeline will have to be flushed if instruction fetching is resumed before loop exit due to fetching of instructions that do not follow the correct next address following the loop exit. The loop buffer circuit can also be configured to instruct resumption of instruction fetching following a detected loop based on a defined period of time before the loop is exited based on the predicted number of loop iterations and the loop exit branch as a further optimization. Predicting the loop exit target of a replayed loop may make it more feasible for a loop buffer design to detect and replay shorter loops (as opposed to only replaying longer loops). This is because the instruction fetch circuit can more accurately restart the fetching of next instructions that follow the actual exit of the replayed loop based on the exit target prediction. In the absence of a loop exit target prediction, the cost associated with restarting the fetching of next instructions in the instruction pipeline after a short running loop that may not follow the actual loop exit may outweigh the benefits of replaying the loop from the loop buffer. Therefore, only longer running loops may be profitable from a benefit versus cost standpoint in the absence of loop exit target prediction. In the presence of loop exit target prediction, detection and replay of even short running may yield a benefit.
In another exemplary aspect, if the predicted number of loop iterations and the loop exit branch are hard to predict, such as their predictions having a low confidence indicator, for example, the loop buffer circuit can alternatively replay the detected loop indefinitely as discussed above. However, if the loop buffer circuit also has a prediction of the exit target address of the loop, the loop buffer circuit can be configured to perform a selective partial pipeline flush of the instruction pipeline in response to the loop exit as a further optimization. This is because only the instructions in the pipeline older than the next instruction at the exit target address of the loop exit target prediction in the instruction pipeline have to be flushed.
In this regard, in one exemplary aspect a processor is provided. The processor includes an instruction processing circuit, comprising a loop buffer circuit. The loop buffer circuit is configured to detect a loop among a plurality of instructions in an instruction stream in an instruction pipeline to be executed. In response to detection of the loop in the instruction stream, the loop buffer circuit is also configured to predict a number of full iterations of the detected loop to be executed in the instruction pipeline as a loop iteration prediction, predict a loop exit branch of an instruction of the detected loop that will result in the detected loop being exited in the instruction pipeline as a loop exit branch prediction, and fully replay the detected loop in the instruction pipeline for the number of full iterations indicated by the loop iteration prediction. In response to a last full iteration of the detected loop being fully replayed in the instruction pipeline, the loop buffer circuit is also configured to partially replay the plurality of instructions in the detected loop to the instruction at the loop exit branch indicated by the loop exit branch prediction.
In another exemplary aspect, a method of replaying a loop in an instruction pipeline in a processor is provided. The method includes detecting a loop among a plurality of instructions in an instruction stream in an instruction pipeline to be executed. In response to detection of the loop in the instruction stream, the method also includes predicting a number of full iterations of the detected loop to be executed in the instruction pipeline as a loop iteration prediction, predicting a loop exit branch of an instruction of the detected loop that will result in the detected loop being exited in the instruction pipeline as a loop exit branch prediction, fully replaying the detected loop in the instruction pipeline for the number of full iterations indicated by the loop iteration prediction, and partially replaying the plurality of instructions in the detected loop to the instruction at the loop exit branch indicated by the loop exit branch prediction, in response to a last full iteration of the detected loop being fully replayed in the instruction pipeline.
In this regard, in one exemplary aspect, a processor is provided. The processor includes an instruction processing circuit comprising an instruction fetch circuit configured to fetch a plurality of instructions into an instruction pipeline as an instruction stream to be executed, and an execution circuit configured to execute the plurality of instructions in the instruction stream. The processor also includes a loop buffer circuit. The loop buffer circuit is configured to detect a loop among the plurality of instructions in the instruction stream in the instruction pipeline to be executed in the execution circuit, and replay the detected loop in the instruction pipeline. In response to replay of the detected loop in the instruction pipeline, the loop buffer circuit is also configured to instruct the instruction fetch circuit to halt fetching next instructions into the instruction pipeline, and predict an exit target address of the next instruction to be executed following exit of the detected loop in the instruction pipeline as a loop exit target prediction. The loop buffer circuit is also configured to instruct the instruction fetch circuit to start fetching next instructions into the instruction pipeline starting at the exit target address of the loop exit target prediction.
In another exemplary aspect, a method of fetching next instructions following a detected loop replayed in an instruction pipeline in a processor is provided. The method includes fetching a plurality of instructions into an instruction pipeline as an instruction stream to be executed. The method also includes detecting a loop among the plurality of instructions in the instruction stream in the instruction pipeline to be executed. The method also includes replaying the detected loop in the instruction pipeline. In response to replaying the detected loop in the instruction pipeline, the method also includes instructing an instruction fetch circuit to halt fetching next instructions into the instruction pipeline, and predicting an exit target address of a next instruction to be executed following exit of the detected loop in the instruction pipeline as a loop exit target prediction. The method also includes instructing the instruction fetch circuit to start fetching next instructions into the instruction pipeline starting at the exit target address of the loop exit target prediction.
Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
Exemplary aspects disclosed herein include loop buffering employing loop characteristic prediction in a processor for optimizing loop buffer performance The processor includes an instruction processing circuit configured to fetch computer program instructions (“instructions”) into an instruction stream in an instruction pipeline(s) to be processed and executed. Loops can be contained in the instruction stream. A loop is a sequence of instructions in the instruction stream that repeat sequentially in a back-to-back arrangement. The instruction processing circuit includes a loop buffer circuit that is configured to detect loops. In response to a detected loop, the loop buffer circuit is configured to capture (i.e., loop buffer) instructions in the detected loop and insert (i.e., replay) the captured loop instructions in the instruction pipeline for iterations of the loop. In this manner, the instructions in the loop do not have to be re-fetched and re-processed, for example, for the subsequent iterations of the loop. Thus, loop buffering can conserve power by not having to re-fetch and re-process instructions in the loop for subsequent iterations of the loop. In exemplary aspects, the loop buffer circuit is configured to predict the number of iterations that a detected loop in the instruction stream will be executed before the loop is exited, as a loop iteration prediction. The loop iteration prediction is a type of loop characteristic prediction. This is to reduce or avoid under- or over-iterating the loop replay. The loop iteration prediction is used to control the number of iterative replays of the loop in the instruction pipeline. For example, a design that chooses a fixed iteration assumption for controlling replay may more often under- or over-iterate loop replay. As another example, a design that chooses to indefinitely replay a loop until a detected exit will over-iterate loop replay. Under-iterating a loop replay results in instructions in the loop being re-fetched and re-processed in the instruction pipeline that otherwise could have been replayed, thus consuming additional power unnecessarily. Over-iterating a loop replay results in additional replay of iterations of the loop in the instruction pipeline that reduces processor performance by such additional iterations being processed unnecessarily.
A replayed loop in the instruction pipeline of the processor may exit without a full iteration. In other words, the last iteration of a loop may be a partial iteration where the loop is exited before all instructions in the loop are fully replayed. In this regard, in other exemplary aspects, the loop buffer circuit can also be configured to predict the loop exit branch of the detected loop as a loop exit branch prediction. The loop exit branch prediction is a type of loop characteristic prediction. The loop exit branch prediction can be used to assist the loop buffer circuit in predicting the exact number of full iterations of the loop to replayed and what instructions to replay for the last partial iteration of the loop. Predicting the number of loop iterations and the loop exit branch allows a more accurate prediction of the number of full iterations of the loop to be replayed in the instruction pipeline to further reduce or avoid under- or over-iterating of the loop replay. Providing a more accurate prediction of the loop iterations to be replayed before the loop is exited can reduce the overhead penalty that would be associated with inaccurately predicting loop iteration for replay of detected shorter loops. Providing a more accurate prediction of the loop iterations to be replayed before the loop is exited can also allow the loop buffer circuit to more accurately instruct the instruction fetch circuit when to resume the fetching and processing of new instructions following a detected loop. This can reduce or avoid instruction bubbles in the instruction pipeline. In this regard, the loop buffer circuit can be configured to instruct the instruction fetch circuit to resume fetching of new instructions following the loop exit based on the predicted loop exit branch of the loop.
In this regard,
The instructions 208 in the instruction stream 214 may contain loops. A loop is a sequence of instructions 208 in the instruction stream 214 that repeat sequentially in a back-to-back arrangement. A loop can be present in the instruction stream 214 as a result of a programmed software construct that is compiled into a loop among the instructions 208. A loop can also be present in the instruction stream 214 even if not part of a higher-level, programmed, software construct. If the instructions 208 that are part of a loop could be detected when such instructions 208 are processed in an instruction pipeline I0-IN, these instructions 208 could be captured and replayed into the instruction stream 214 without having to re-fetch and/or re-decode such instructions 208, for example, for the subsequent iterations of the loop.
In this regard, the instruction processing circuit 204 in this example includes a loop buffer circuit 220 to perform loop buffering. As discussed in more detail below, the loop buffer circuit 220 is configured to detect a loop in instructions 208 fetched into an instruction pipeline I0-IN as an instruction stream 214 to be processed and executed. The loop buffer circuit 220 is configured to detect loops among the instructions 208 in the instruction stream 214. In response to a detected loop, the loop buffer circuit 220 is configured to capture (i.e., loop buffer) the instructions 208 in the detected loop to be replayed to avoid or reduce the need to re-fetch the instructions in the detected loop, since the processing of these instructions 208 is repeated in the instruction pipeline I0-IN. In this regard, the loop buffer circuit 220 is configured to insert (i.e., replay) the captured loop instructions 208 in an instruction pipeline I0-IN for iterations of the loop. In this manner, the instructions 208 in the loop do not have to be re-fetched and/or re-decoded, for example, for the subsequent iterations of the loop. Thus, loop buffering can conserve power by the instruction fetch circuit 206 not having to re-fetch the instructions 208 in a detected loop for subsequent iterations of the loop. Loop buffering can also conserve power by the instruction decode circuit 219 not having to re-decode the instructions 208 in a detected loop for subsequent iterations of the loop.
In exemplary aspects, as discussed in more detail below, the loop buffer circuit 220 is configured to predict the number of iterations that a detected loop in the instruction stream 214 will be executed before the loop is exited, as a loop iteration prediction. The loop iteration prediction is a type of loop characteristic prediction. This is to reduce or avoid under- or over-iterating the loop replay. The loop iteration prediction is used to control the number of iterative replays of the loop in the instruction pipeline I0-IN. For example, a design that chooses a fixed iteration assumption for controlling replay may more often under- or over-iterate loop replay. As another example, a design that chooses to indefinitely replay a loop until a detected exit will over-iterate loop replay. Under-iterating a loop replay results in instructions 208 in the loop having to be re-fetched and/or re-decoded in the instruction pipeline I0-IN that otherwise could have been replayed, thus consuming additional power unnecessarily. Over-iterating loop results in additional replay of iterations of the loop in the instruction pipeline I0-IN that reduces processor performance by such additional iterations being processed unnecessarily.
A replayed loop in the instruction pipeline I0-IN of the processor 200 may exit without a full iteration. In other words, the last iteration of a loop may be a partial iteration where the loop is exited before all instructions 208 in the loop are fully replayed. In this regard, in other exemplary aspects, as discussed in more detail below, the loop buffer circuit 220 can also be configured to predict the loop exit branch of the detected loop as a loop exit branch prediction. The loop exit branch prediction is a type of loop characteristic prediction. The loop exit branch prediction can be used to assist the loop buffer circuit 220 in predicting the exact number of full iterations of the loop to replay and what instructions 208 in the loop to replay for a last partial iteration of the loop. Thus, predicting the number of loop iterations and the loop exit branch in combination allows a more accurate prediction of the number of full iterations and instructions 208 in the loop for a last partial iteration of the loop to be replayed in the instruction pipeline I0-IN to further reduce or avoid under- or over-iterating of the loop replay. Providing a more accurate prediction of the full and partial loop iterations of a loop to be replayed in the instruction pipeline I0-IN before the loop is exited from the instruction pipeline I0-IN can reduce the overhead penalty that would be associated with inaccurately predicting loop iteration for replay of shorter length, detected loops as an example.
Before discussing more exemplary details of the loop buffer circuit 220 using a loop iteration prediction and loop exit branch prediction of a detected loop processed in the instruction processing circuit 204 in
With continuing reference to
As discussed above, the loop buffer circuit 220 is configured to predict the number of iterations that a detected loop in the instruction stream 214 will be executed before the loop is exited, as a loop iteration prediction as a type of loop characteristic. As also discussed above, the loop buffer circuit 220 can also be configured to predict the loop exit branch of the detected loop as a loop exit branch prediction as another type of loop characteristic prediction. The loop buffer circuit 220 can use the loop iteration prediction in combination with the loop exit branch prediction to more accurately and precisely control the replay of a detected loop in the instruction stream 214. The loop iteration prediction can be used by the loop buffer circuit 220 to control the number of full iterations of the loop replayed in the instruction stream 214. The loop exit branch prediction may be used by the loop buffer circuit 220 to control what instructions 208 in the loop to replay for a last partial iteration of the loop in the instruction stream 214. Thus, predicting the number of loop iterations and the loop exit branch in combination allows a more accurate prediction of the number of full iterations and instructions 208 in the loop for a last partial iteration of the loop to be replayed in the instruction pipeline I0-IN to further reduce or avoid under- or over-iterating of the loop replay. Providing a more accurate prediction of the full and partial loop iterations of a loop to be replayed in the instruction pipeline I0-IN before the loop is exited from instruction pipeline I0-IN can reduce the overhead penalty that would be associated with inaccurately predicting loop iteration for replay of shorter length, detected loops as an example.
In this regard, as shown in
After the loop has been replayed for the number of full iterations indicated by the loop iteration prediction, the loop replay circuit 238 is then configured to partially replay the instructions 208D in the detected loop to the instruction at the loop exit branch indicated by the loop exit branch prediction. The loop exit branch of a detected loop is the location of the branch instruction 208D in the loop that results in an exit of the loop in the instruction pipeline I0-IN when executed. In this example, since the exit branch of the loop may not be absolutely known before the loop is fully processed, the loop replay circuit 238 is configured to make a prediction of the loop exit branch as the loop exit branch prediction. For example, the detected loop may have multiple exits. The loop replay circuit 238 is configured to insert instructions 208D from the detected loop into the instruction pipeline I0-IN to be placed up until and including the instruction 208 at the predicted loop exit branch according to the loop exit branch prediction for the last partial iteration of the loop. Controlling the replay of the detected loop according to the combination of the loop iteration prediction and the loop exit branch prediction allows a more accurate prediction of the number of full iterations and instructions 208D in the loop for a last partial iteration of the loop to be replayed in the instruction pipeline I0-IN to further reduce or avoid under- or over-iterating of the loop replay. Providing a more accurate prediction of the full and partial loop iterations of a loop to be replayed in the instruction pipeline I0-IN before the loop is exited from the instruction pipeline I0-IN can reduce the overhead penalty that would be associated with inaccurately predicting loop iteration for replay of shorter length, detected loops as an example.
In this regard, as shown in
Thus, the loop buffer circuit 220 in the instruction processing circuit 204 in
The loop prediction circuit 400 is configured to provide the loop iteration prediction 402 and/or a loop exit branch prediction 404 to a loop instruction replay circuit 412. The loop instruction replay circuit 412 uses the loop iteration prediction 402 and/or a loop exit branch prediction 404 to control the replay of the detected loop. In this example, as discussed above, the loop instruction replay circuit 412 uses the loop iteration prediction 402 to determine the number of full iterations of the loop to be replayed in the instruction pipeline I0-IN. Also in this example, as discussed above, the loop instruction replay circuit 412 uses the loop exit branch prediction 404 to determine the instructions 208D to replay in the instruction pipeline I0-IN in a last partial replay of the loop. In this example, the loop instruction replay circuit 412 is configured to issue a fetch halt indicator 414 instructing the instruction fetch circuit 206 in
As discussed above, the loop replay circuit 238 in
In one example, the loop iteration context information 508 is based on a program counter (PC) of at least one instruction 208D of one or more previously detected loops. The loop iteration context information 508 is stored in a loop history register 509. The loop iteration context information 508 is also based on a PC of at least one instruction 208D in at least one previously detected and replayed loop. The loop iteration context information 508 may be appended or hashed with the PC of at least one instruction 208D in the current detected loop. In this manner, the loop iteration context information 508 is based on context information from the current detected loop and one or more previously detected and replayed loops. The loop prediction circuit 400 can be configured to edit the loop history register 509 based on the loop iteration context information 508 for detected loops when detected. When a loop is currently detected, the loop replay circuit 238 can also be configured to edit the loop history register 509 based on the loop iteration context information 508 for the current detected loop. The loop iteration context information 508 in the loop history register 509 can be used to index the loop iteration context prediction circuit 506 to access a prediction entry 510(0)-510(X) therein that has a loop iteration prediction stored therein. The loop prediction circuit 400 can set the loop iteration prediction 402 to the loop iteration prediction entry in the indexed and accessed prediction entry 510(0)-510(X) in the loop iteration context prediction circuit 506.
Similarly, as discussed above, the loop replay circuit 238 in
In one example, the loop exit branch context information 608 can be based on a loop path history of one or more previously detected loops. The loop exit branch context information 608 can also be based on loop exit branch position history of the position histories of exit branches in previously detected loops. The loop exit branch context information 608 can also be based on a loop exit PC of the exit PC in previously detected loops. The loop exit branch context information 608 is stored in a loop history register 609. The loop exit branch context information 608 may be appended or hashed with the loop path history for the current detected loop. In this manner, the loop exit branch context information 608 is based on context information from the current detected loop and one or more previously detected and replayed loops. The loop prediction circuit 400 can be configured to edit the loop history register 609 based on the loop exit branch context information 608 for detected loops when detected. When a loop is currently detected, the loop replay circuit 238 can also be configured to edit the loop history register 609 based on the loop exit branch context information 608 for the current detected loop. The loop exit branch context information 608 in the loop history register 609 can be used to index the loop exit branch context prediction circuit 606 to access a prediction entry 610(0)-610(X) therein that has a loop exit branch prediction stored therein. The loop prediction circuit 400 can set the loop exit branch prediction 404 to the loop exit branch prediction entry in the indexed and accessed prediction entry 610(0)-610(X) in the loop exit branch context prediction circuit 606.
As discussed above, the loop buffer circuit 220 in
In this regard, in other exemplary aspects, the loop buffer circuit 220 in
In response to the replaying of the detected loop in the instruction pipeline I0-IN (block 708 in
As discussed above, the loop buffer circuit 220, and its loop replay circuit 238 for example, can be configured to issue the fetch resumption indicator 244 to cause the instruction fetch circuit 206 to resume fetching of next instructions 208. The instruction fetch circuit 206 may be instructed to resume the fetching of next instructions 208 immediately after a loop is detected, a determined lead time before the loop exits, or after the replayed loop is exited, as examples. In the event that the instruction fetch circuit 206 is instructed to fetch next instructions 208 before the replayed loop is actually exited, the instruction fetch circuit 206 could also be instructed to hold any fetched next instructions 208F from being processed unnecessarily until the exit of the loop is actually detected in the instruction pipeline I0-IN. Once the exit of the replayed loop is detected, the next fetched instructions 208F in the instruction pipeline I0-IN could then be released to be processed. In this manner, fetched next instructions 208F are not unnecessarily processed and power is not consumed in doing so, when these fetched instructions 208D cannot be executed until after the replayed loop is exited. In one example, the next fetched instructions 208F in the instruction pipeline I0-IN could be held in the instruction fetch circuit 206 or at this stage in the instruction pipeline I0-IN. In one example, the next fetched instructions 208F in the instruction pipeline I0-IN could held in the instruction decode circuit 219 or at this stage in the instruction pipeline I0-IN.
As discussed above, the loop replay circuit 238 in
In this regard,
In one example, the loop exit target context information 808 may be appended or hashed with loop exit target context information 808 for the current detected loop, which may be based on the loop exit target prediction 802 as an example.
In this manner, the loop exit target context information 808 is based on loop exit target context information 808 from the current detected loop and one or more previously detected and replayed loops. The loop prediction circuit 400 can be configured to edit the loop history register 509 based on the loop exit target context information 808 for detected loops when detected. When a loop is currently detected, the loop replay circuit 238 can also be configured to edit the loop history register 509 based on the loop exit target context information 808 for the current detected loop. The loop exit target context information 808 in the loop history register 509 can be used to index the loop exit target context prediction circuit 806 to access a prediction entry 810(0)-810(X) therein that has a loop exit target prediction stored therein. The loop prediction circuit 400 can set the loop exit target prediction 802 to the loop exit target prediction entry in the indexed and accessed prediction entry 810(0)-810(X) in the loop exit target context prediction circuit 806.
In another exemplary aspect, if the predicted number of loop iterations and the loop exit branch of a detected loop are hard to predict, such as their predictions having a low confidence indicator, for example, the loop buffer circuit 220 in
In this regard, the loop buffer circuit 220 in
The loop buffer circuit 220 in
The processor-based system 900 may be a circuit or circuits included in an electronic board card, such as a printed circuit board (PCB), a server, a personal computer, a desktop computer, a laptop computer, a personal digital assistant (PDA), a computing pad, a mobile device, or any other device, and may represent, for example, a server, or a user's computer. In this example, the processor-based system 900 includes the processor 902. The processor 902 represents one or more processing circuits, such as a microprocessor, central processing unit, or the like. The processor 902 is configured to execute processing logic in instructions for performing the operations and steps discussed herein. Fetched or prefetched instructions from a memory, such as from a system memory 910 over a system bus 912, are stored in an instruction cache 908. The instruction processing circuit 904 is configured to process instructions fetched into the instruction cache 908 and process the instructions for execution. These instructions fetched from the instruction cache 908 to be processed can include loops that are detected by the loop buffer circuit 906 for replay based on prediction of one or more loop characteristics as loop characteristic predictions.
The processor 902 and the system memory 910 are coupled to the system bus 912 and can intercouple peripheral devices included in the processor-based system 900. As is well known, the processor 902 communicates with these other devices by exchanging address, control, and data information over the system bus 912. For example, the processor 902 can communicate bus transaction requests to a memory controller 914 in the system memory 910 as an example of a slave device. Although not illustrated in
Other devices can be connected to the system bus 912. As illustrated in
The processor-based system 900 in
While the non-transitory computer-readable medium 932 is shown in an exemplary embodiment to be a single medium, the term “computer-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that stores the one or more sets of instructions. The term “computer-readable medium” shall also be taken to include any medium that is capable of storing, encoding, or carrying a set of instructions for execution by the processing device and that causes the processing device to perform any one or more of the methodologies of the embodiments disclosed herein. The term “computer-readable medium” shall accordingly be taken to include, but not be limited to, solid-state memories, optical medium, and magnetic medium.
The embodiments disclosed herein include various steps. The steps of the embodiments disclosed herein may be formed by hardware components or may be embodied in machine-executable instructions, which may be used to cause a general-purpose or special-purpose processor programmed with the instructions to perform the steps. Alternatively, the steps may be performed by a combination of hardware and software.
The embodiments disclosed herein may be provided as a computer program product, or software, that may include a machine-readable medium (or computer-readable medium) having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the embodiments disclosed herein. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes: a machine-readable storage medium (e.g., ROM, random access memory (“RAM”), a magnetic disk storage medium, an optical storage medium, flash memory devices, etc.); and the like.
Unless specifically stated otherwise and as apparent from the previous discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “determining,” “displaying,” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data and memories represented as physical (electronic) quantities within the computer system's registers into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission, or display devices.
The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatuses to perform the required method steps. The required structure for a variety of these systems will appear from the description above. In addition, the embodiments described herein are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the embodiments as described herein.
Those of skill in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the embodiments disclosed herein may be implemented as electronic hardware, instructions stored in memory or in another computer-readable medium and executed by a processor or other processing device, or combinations of both. The components described herein may be employed in any circuit, hardware component, integrated circuit (IC), or IC chip, as examples. Memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends on the particular application, design choices, and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, a discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. Furthermore, a controller may be a processor. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
The embodiments disclosed herein may be embodied in hardware and in instructions that are stored in hardware, and may reside, for example, in RAM, flash memory, ROM, Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer-readable medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary embodiments herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary embodiments may be combined. Those of skill in the art will also understand that information and signals may be represented using any of a variety of technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips, that may be referenced throughout the above description, may be represented by voltages, currents, electromagnetic waves, magnetic fields, or particles, optical fields or particles, or any combination thereof.
Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps, or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is in no way intended that any particular order be inferred.
It will be apparent to those skilled in the art that various modifications and variations can be made without departing from the spirit or scope of the invention. Since modifications, combinations, sub-combinations and variations of the disclosed embodiments incorporating the spirit and substance of the invention may occur to persons skilled in the art, the invention should be construed to include everything within the scope of the appended claims and their equivalents.