The technology of the disclosure relates generally to pipeline optimizations for processor-based systems, and, in particular, to providing early pipeline optimization of conditional instructions.
“Conditional instructions,” as used herein, refer to computer-executable instructions that are executed only if a specified condition is met. A conditional instruction may be a conditional branch instruction (which allows program control within an executing computer program to be transferred in response to an asserted condition evaluating as true), or may be a conditional non-branch instruction (the execution of which may vary based on whether a specified condition associated with the instruction evaluates to true). In some computer architectures, such as the Arm® architecture, the outcome of a conditional instruction may be determined by examining a state of condition flags that are maintained by a processor, and that may be set based on the results of previously executed instructions. For example, in the Arm® architecture, four condition flags are represented by bits stored in the Application Processor Status Register (APSR), and are referred to as an N (negative) condition flag, a Z (zero) condition flag, a C (carry or unsigned overflow) condition flag, and a V (signed overflow) condition flag.
To improve processor performance, the outcome of a condition associated with a conditional instruction may be predicted by the processor, and subsequent instructions may be speculatively fetched based on the predicted outcome. For instance, the next instruction following a conditional branch instruction may be predicted and speculatively fetched based on the predicted outcome of a condition associated with the conditional branch instruction. Similarly, a conditional non-branch instruction may be speculatively executed (or speculatively not executed) based on a predicted outcome of the conditional non-branch instruction's specified condition.
However, the actual determination as to whether a predicted outcome is correct or not is unknown until the conditional instruction is actually executed by an execution stage, which may be one of the later stages of a conventional instruction pipeline. In particular, a misprediction of a conditional branch instruction that is dependent on the condition flags may require a flush of the instruction pipeline to remove instructions that were wrongly fetched based on the misprediction, followed by a fetch of instructions based on the actual outcome of the conditional branch instruction. However, such a pipeline flush results in a loss of the condition flags, which otherwise could be useful for optimizing the execution of instructions fetched following the pipeline flush (e.g., by performing an early determination of subsequently fetched conditional instructions). Consequently, any subsequently fetched conditional instructions remain subject to the latency incurred in correcting the mispredicted branch.
Aspects disclosed in the detailed description include providing early pipeline optimization of conditional instructions in processor-based systems. In this regard, in one aspect, a processor-based system provides an instruction pipeline that comprises, among other stages, one or more instruction fetch stages, an instruction decode stage, one or more execution stages, and a register writeback stage. Upon detecting a mispredicted branch within the instruction pipeline (i.e., following a misprediction of a condition associated with a speculatively executed conditional branch instruction that is dependent on one or more condition flags), a current state of one or more condition flags is recorded as a condition flags snapshot, which is provided to the one or more instruction fetch stages of the instruction pipeline. After a pipeline flush is initiated and a corrected fetch path is restarted, the instruction decode stage of the instruction pipeline uses the condition flags snapshot to apply an optimization to conditional instructions encountered within the corrected fetch path. For example, in some aspects, the condition flags snapshot may be used to determine, definitively and non-speculatively, whether a conditional branch instruction will be taken. If so, a non-speculative fetch address for the target instruction of the conditional branch instruction is provided to the one or more instruction fetch stages, and the conditional branch instruction is replaced with a NOP (no operation) instruction. Similarly, the condition flags snapshot may be used to non-speculatively determine whether and/or how a conditional non-branch instruction will be executed, and/or may be used to apply other optimizations to the conditional non-branch instruction. According to some aspects, the condition flags snapshot is invalidated upon encountering a condition-flag-writing instruction within the corrected fetch path. Processing then continues in conventional fashion until a next mispredicted branch is detected.
In another aspect, a processor-based system for providing early pipeline optimization of conditional instructions is provided. The processor-based system comprises an instruction pipeline comprising an instruction fetch stage, an instruction decode stage, an execution stage, and a register writeback stage. The execution stage of the instruction pipeline is configured to detect a mispredicted branch within an original fetch path. Responsive to the mispredicted branch, the execution stage initiates a pipeline flush to begin a corrected fetch path. The register writeback stage of the instruction pipeline is configured to, responsive to the mispredicted branch, provide a condition flags snapshot comprising a current state of one or more condition flags to the instruction fetch stage of the instruction pipeline. The instruction decode stage of the instruction pipeline is configured to detect a conditional instruction within the corrected fetch path, and apply an optimization to the conditional instruction based on the condition flags snapshot.
In another aspect, a processor-based system for providing early pipeline optimization of conditional instructions is provided. The processor-based system comprises a means for detecting a mispredicted branch within an original fetch path of an instruction pipeline of the processor-based system. The processor-based system further comprises a means for initiating a pipeline flush to begin a corrected fetch path, responsive to the mispredicted branch. The processor-based system also comprises a means for providing a condition flags snapshot comprising a current state of one or more condition flags to an instruction fetch stage of the instruction pipeline. The processor-based system additionally comprises a means for detecting a conditional instruction within the corrected fetch path. The processor-based system further comprises a means for applying an optimization to the conditional instruction based on the condition flags snapshot.
In another aspect, a method for providing early pipeline optimization of conditional instructions is provided. The method comprises detecting, by an execution stage of an instruction pipeline, a mispredicted branch within an original fetch path. The method further comprises, responsive to the mispredicted branch, initiating, by the execution stage, a pipeline flush to begin a corrected fetch path. The method also comprises providing, by a register writeback stage of the instruction pipeline, a condition flags snapshot comprising a current state of one or more condition flags to an instruction fetch stage of the instruction pipeline. The method additionally comprises detecting, by an instruction decode stage of the instruction pipeline, a conditional instruction within the corrected fetch path. The method further comprises applying, by the instruction decode stage, an optimization to the conditional instruction based on the condition flags snapshot.
In another aspect, a non-transitory computer-readable medium is provided. The non-transitory computer-readable medium stores thereon computer-readable instructions to cause a processor to detect a mispredicted branch within an original fetch path of an instruction pipeline of the processor. The computer-readable instructions further cause the processor to, responsive to the mispredicted branch, initiate a pipeline flush to begin a corrected fetch path. The computer-readable instructions also cause the processor to provide a condition flags snapshot comprising a current state of one or more condition flags to an instruction fetch stage of the instruction pipeline. The computer-readable instructions additionally cause the processor to detect a conditional instruction within the corrected fetch path. The computer-readable instructions further cause the processor to apply an optimization to the conditional instruction based on the condition flags snapshot.
With reference now to the drawing figures, several exemplary aspects of the present disclosure are described. The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
Aspects disclosed in the detailed description include early pipeline optimization of conditional instructions. Accordingly, in this regard,
In the example of
The term “back-end instruction pipeline 116” as used herein refers collectively to subsequent pipeline stages of the instruction pipeline 104 for issuing instructions for execution, for carrying out the actual execution of instructions, and/or for loading and/or storing data required by or produced by instruction execution. In the example of
The processor 102 additionally includes a register file 128, which provides physical storage for a plurality of registers 130(0)-130(X) and which may be accessed via one or more read ports 132(0)-132(P). In some aspects, the registers 130(0)-130(X) may comprise one or more general purpose registers (GPRs), a program counter, and/or a link register. In the example of
In exemplary operation, the one or more instruction fetch stages 117 of the front-end instruction pipeline 114 of the instruction pipeline 104 fetch program instructions (not shown) from the instruction cache 110. Program instructions may be further decoded by the instruction decode stage 118 of the front-end instruction pipeline 114, and passed to the one or more instruction queue stages 120 pending issuance to the back-end instruction pipeline 116. After the program instructions are issued to the back-end instruction pipeline 116, the execution stage(s) 124 of the back-end instruction pipeline 116 execute the issued program instructions and retire the executed program instructions, and the register writeback stage 126 stores results of the executed instructions.
In some aspects, the one or more instruction fetch stages 117 of the front-end instruction pipeline 114 of the instruction pipeline 104 may fetch instructions based on a branch prediction provided by the branch predictor 122 for a conditional branch instruction. However, any mispredicted branches generated by the branch predictor 122 may not be detected until the conditional branch instruction is executed by the one or more execution stages 124 of the back-end instruction pipeline 116 of the instruction pipeline 104. By that point, additional subsequent instructions may have been erroneously fetched, and may have progressed to various stages within the instruction pipeline 104. For this reason, when a mispredicted branch is detected, the one or more execution stages 124 initiate a pipeline flush to clear the instruction pipeline 104 of previously fetched instructions, and the one or more instruction fetch stages 117 re-fetch the correct instructions following the conditional branch instruction. Such a pipeline flush results in a loss of the condition flags 136(0)-136(C), which otherwise could be useful for optimizing the execution of instructions fetched following the pipeline flush (e.g., by performing an early determination of subsequently fetched conditional instructions). As a result, any subsequently fetched conditional instructions remain subject to the latency incurred in correcting the mispredicted branch.
In this regard, the instruction pipeline 104 of the processor 102 of
As the conditional branch instruction 202 moves through the instruction pipeline 104 of
After the instruction pipeline 104 is flushed following the detection of the mispredicted branch 204, a corrected fetch path 215, including the subsequent instructions to which the conditional branch instruction 202 actually branched, is begun. In the example of
The condition flags snapshot 212 may continue to be used for optimization of additional conditional instructions within the corrected fetch path 215 until such time as the condition flags 136(0)-136(C) are modified by an instruction within the corrected fetch path 215 (at which point the condition flags snapshot 212 may no longer accurately represent the contents of the condition flags 136(0)-136(C)). Accordingly, the instruction decode stage 118 monitors the corrected fetch path 215 to detect the fetching of a condition-flag-writing instruction 219. Upon detecting the condition-flag-writing instruction 219 within the corrected fetch path 215, the instruction decode stage 118 invalidates the condition flags snapshot 212, and processing of fetched instructions resumes in conventional fashion until another mispredicted branch 204 is detected.
In
In some aspects, the instruction decode stage 118 employs the condition flags snapshot 212 to perform an optimization on a conditional non-branch instruction to limit a number of the one or more read ports 132(0)-132(P) consumed by the conditional non-branch instruction. In this regard,
The instruction decode stage 118 according to some aspects may also employ the condition flags snapshot 212 to non-speculatively determine whether or not a conditional non-branch instruction will be executed at all. In this regard, a pre-optimization corrected fetch path 318, such as the corrected fetch path 215 of
To illustrate exemplary operations for providing early pipeline optimization of conditional instructions in processor-based systems,
The register writeback stage 126 of the instruction pipeline 104 then provides a condition flags snapshot 212 to an instruction fetch stage, such as the one or more instruction fetch stages 117, of the instruction pipeline 104 (block 406). The register writeback stage 126 thus may be referred to herein as “a means for providing a condition flags snapshot comprising a current state of one or more condition flags to an instruction fetch stage of the instruction pipeline.” The instruction decode stage 118 of the instruction pipeline 104 then determines whether a conditional instruction 216 is detected within the corrected fetch path 215 (block 408). In this regard, the instruction decode stage 118 may be referred to herein as “a means for detecting a conditional instruction within the corrected fetch path.” If no conditional instruction 216 is detected, processing of the corrected fetch path 215 continues (block 410). However, in some aspects, if the instruction decode stage 118 detects a conditional instruction 216 within the corrected fetch path 215 at decision block 408, the instruction decode stage 118 may next determine whether the condition flags snapshot 212 is valid (block 412). If the condition flags snapshot 212 is not valid, processing of the corrected fetch path 215 continues (block 410). If the condition flags snapshot 212 is valid, the instruction decode stage 118 applies an optimization to the conditional instruction 216 based on the condition flags snapshot 212 (block 414). Accordingly, the instruction decode stage 118 may be referred to herein as “a means for applying an optimization to the conditional instruction based on the condition flags snapshot.” Processing in some aspects then continues at block 416 of
Referring now to
In
To illustrate exemplary operations for applying optimizations to conditional non-branch instructions according to some aspects,
If the instruction decode stage 118 determines at decision block 600 that the conditional non-branch instruction 312, 320 will be executed, the instruction decode stage 118 next determines, based on the condition flags snapshot 212, whether one or more registers 130(0)-130(X) indicated by the conditional non-branch instruction 312, 320 will not be read by the conditional non-branch instruction 312, 320 (block 606). If so, the instruction decode stage 118 marks the conditional non-branch instruction 312, 320 to avoid consumption of one or more read ports 132(0)-132(P) corresponding to the one or more registers 130(0)-130(X) (block 608). Processing of the corrected fetch path 215 then continues (block 604).
Providing early pipeline optimization of conditional instructions in process-based systems according to aspects disclosed herein may be provided in or integrated into any processor-based device. Examples, without limitation, include a set top box, an entertainment unit, a navigation device, a communications device, a fixed location data unit, a mobile location data unit, a global positioning system (GPS) device, a mobile phone, a cellular phone, a smart phone, a session initiation protocol (SIP) phone, a tablet, a phablet, a server, a computer, a portable computer, a mobile computing device, a wearable computing device (e.g., a smart watch, a health or fitness tracker, eyewear, etc.), a desktop computer, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a digital video player, a video player, a digital video disc (DVD) player, a portable digital video player, an automobile, a vehicle component, avionics systems, a drone, and a multicopter.
In this regard,
Other master and slave devices can be connected to the system bus 708. As illustrated in
The CPU(s) 702 may also be configured to access the display controller(s) 720 over the system bus 708 to control information sent to one or more displays 726. The display controller(s) 720 sends information to the display(s) 726 to be displayed via one or more video processors 728, which process the information to be displayed into a format suitable for the display(s) 726. The display(s) 726 can include any type of display, including, but not limited to, a cathode ray tube (CRT), a liquid crystal display (LCD), a plasma display, etc.
Those of skill in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and algorithms described in connection with the aspects disclosed herein may be implemented as electronic hardware, instructions stored in memory or in another computer readable medium and executed by a processor or other processing device, or combinations of both. The master devices, and slave devices described herein may be employed in any circuit, hardware component, integrated circuit (IC), or IC chip, as examples. Memory disclosed herein may be any type and size of memory and may be configured to store any type of information desired. To clearly illustrate this interchangeability, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. How such functionality is implemented depends upon the particular application, design choices, and/or design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
The aspects disclosed herein may be embodied in hardware and in instructions that are stored in hardware, and may reside, for example, in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of computer readable medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a remote station. In the alternative, the processor and the storage medium may reside as discrete components in a remote station, base station, or server.
It is also noted that the operational steps described in any of the exemplary aspects herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary aspects may be combined. It is to be understood that the operational steps illustrated in the flowchart diagrams may be subject to numerous different modifications as will be readily apparent to one of skill in the art. Those of skill in the art will also understand that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.