1. Field of the Invention
The present invention relates to a data processing device adopting a branch prediction mechanism (branch history, etc.) in order to execute instruction stream, including branches at high speed, and in particular, relates to a method canceling the registration of an entry badly affecting performance.
2. Description of the Related Art
The performance of a data processing device adopting an advanced pipeline processing method has been improved by speculatively processing subsequent instructions without waiting for the termination of the current instruction. If it is not determined whether a branch instruction will branch control flow or to which address it will branch control flow, then the subsequent instruction cannot be fetched before the branch instruction has completed. In order to solve this problem, a branch prediction mechanism is introduced and by predicting the branch direction of the branch instruction or the branch destination instruction address, performance has been further improved. For example, in Japanese Patent Laid-open Publication No. 6-89173, improved performance has been obtained by providing a branch prediction mechanism (branch history) independent from cache memory.
However, as the scale of a branch history increases, performance often degrades depending on its content.
In particular, since a branch history is provided independent from cache memory, a TLB (Translation Lookaside Buffer) and the like, usually updated information is not reflected in the branch history or reflection cannot catch up with all updates even when the state of an instruction area is updated by updating an instruction string. As a result, branches are predicted for instructions other than branch instructions for the following reasons:
Another instruction is loaded into an address where there was a branch instruction
Another program is dispatched to a logical address by modifying the TLB Such an entry existing in a branch history is called a phantom entry.
A conventional branch history does not necessarily erase a phantom entry, and a phantom will also disappear when an old entry is erased by a replacement operation accompanying new entry registration.
However, as shown in
When in instruction execution control, a branch is predicted in this way although the instruction is not a branch instruction, a process for correcting the mistake is needed and costs increase. Therefore, if such a phantom entry is not erased as soon as it is detected, the performance of the branch history that was developed to improve performance actually degrades. In particular, if the entry capacity of the branch history is small, many phantom entries are left unprocessed as required capacity and amount of association increases, although time needed to erase a phantom entry by a replacement operation and the like is originally short, which is a problem.
It is an object of the present invention to provide a device efficiently erasing phantom entries in order to solve the problem described above and to improve the speed of a data processing device.
The first data processing device of the present invention has a branch prediction mechanism. The data processing device comprises judgment unit judging whether a target instruction is a branch instruction; and phantom erasure unit erasing a branch prediction entry corresponding to an instruction to be stored in the branch prediction mechanism if it is judged that the target instruction is not a branch instruction.
The second data processing device of the present invention has a branch prediction mechanism. The data processing device comprises queue unit extracting an instruction and storing it for execution; detection unit judging whether an address where a branch has been predicted is on the boundary of the instruction word stored in the queue unit when the branch has been predicted for the instruction stored in the queue unit; and misalignment erasure unit erasing branch prediction entries to be stored in a branch prediction mechanism on which the branch prediction is based, if it is judged that the address where a branch has been predicted is not on the boundary of the instruction word.
The third data processing device of the present invention has a branch prediction mechanism. The data processing device comprises phantom target instruction detection unit detecting a branch instruction that is not executed at high speed or a non-branch instruction that branches control flow; and phantom entry generation unit creating a branch prediction entry to be stored in a branch prediction mechanism, based on an entry corresponding to the instruction detected by the phantom target instruction detection unit and adding it to the branch history. The data processing device improves processing speed by performing instruction pre-fetching using the branch prediction entry.
According to the present invention, phantom entries, which are extra entries in a branch history to be stored in a branch prediction mechanism, can be completely erased, and even when time division control is applied to an application and a data processing device executes the application, incorrect branch prediction can be avoided. Therefore, time needed to correct incorrect branch prediction can be saved and accordingly, the performance of the data processing device can be improved.
Execution speed can also be improved by intentionally registering an instruction whose processing takes much time in a branch history as a phantom entry and by pre-fetching the instruction, and accordingly, the performance of the data processing device can also be improved.
Branch prediction is closely related to the execution control of branch instruction. A branch control unit knows whether as a result of a branch process, the branch prediction was accurate and has a data update control unit for updating a branch history. This configuration has been put into practical use (see Japanese Patent Laid-open Publication No. 2000-282710).
S A device that reports the accuracy of branch prediction to a branch prediction unit (branch history) by creating in the branch control unit an entry corresponding to an instruction whose branch has been predicted although the instruction is not a branch instruction is disclosed in Japanese Patent Laid-open Publication No. 2000-282710. Therefore, this device is used in the present invention.
Normal branch history update is disclosed, for example, in Japanese Patent Laid-open Publication No. 2000-172503. Therefore, this is also used in the present invention.
Some devices adopt a set of instructions, whose length each is constant and variable (have a plurality of instruction lengths). In the case of a micro-architecture adopting a branch history in such an instruction set, as shown in
In the normal branch prediction shown in
In this case, sometimes a phantom entry in the corresponding branch history cannot be erased unless information accurately reproducing the predicted address, such as offset information sent from an instruction boundary, is stored.
There are also instructions which branch or interrupt control flow like a branch instruction, such as an exception (software trap instruction). When the address is modified, the processor state of such instruction is simultaneously modified. Therefore, in this case, a branch instruction control unit alone sometimes cannot process such an instruction at high speed.
If such a special instruction can also be registered in a branch history, predicted branch destination can be fetched using the information obtained by retrieving data from the branch history. In this way, an instruction to be executed in an instruction cache area can be read in advance and cache miss penalty can be reduced.
As described above, by using a phantom entry erasure method according to the preferred embodiment of the present invention, instructions that the branch execution control unit does not execute can be consistently executed without interfering with other operations, including the prediction of another branch instruction.
The data processing device of this preferred embodiment is of super scalar type and can simultaneously process three instructions. It is assumed that an instruction fetching unit sets at maximum three instructions in IWR (Instruction Word Register) 0 through IWR2 for that purpose. It is also assumed that there are three instruction word lengths of two, four and six bytes. However, it is assumed that instruction six bytes long are set only in IWR0 (instruction word lengths other than 2, 4 and 6 bytes are divided into at least two groups and a part of it is set in subsequent cycles). Expression is sometimes input in units of half-words (therefore, there are three half-words of one, two and three bytes).
In this example, the branch instruction queue of a branch process is assumed to be RSBR. There is the address PC of each piece of branch instruction in each queue of the RSBR. There is BRHIS Hit tag information, which is branch prediction information, and Hit-Way tag information in a branch destination address TPC. This configuration is the same as that of Japanese Patent Laid-open Publication No. 2000-172503. This preferred embodiment further comprises Hit-Offset and is indicated by offset information sent from the instruction address PC in a position where a branch has been predicted. Therefore, if a branch is normally predicted by a -branch instruction, the Hit-Offset indicates 0.
However, in a specific type of RISC instruction set, all instruction words are constant, for example, four bytes, and it is guaranteed that all instructions fall on instruction word boundaries, which is different from the preferred embodiment of the present invention. In such an instruction set, a branch prediction position always falls on an instruction word boundary (Although the branch prediction position could be set to an address not on an instruction word boundary, there is no reason to do so). Therefore, a device for realizing such an instruction set does not require Hit-Offset. Therefore, the application to such an instruction set of the preferred embodiment should be modified by a person having ordinary skill in the art.
In
The instruction cache 12 extracts an instruction to be executed from the input address and inputs the instruction to the instruction input control unit 13. The instruction input control unit 13 transfers the input instruction to IWR, that is, an instruction reading unit 14 together with information about whether a branch has been predicted and instructs how to read the instruction. After the instruction reading unit 14 has read the instruction, it is transferred to a corresponding instruction processing unit. However, if it is a branch instruction, the instruction is input to an RSBR generation control unit 15 controlling the generation of branch instruction queues RSBR. A branch instruction queue RSBR is generated in a branch processing unit 16 and a branch instruction process is performed in order.
The result of the branch instruction process in the branch processing unit 16 is transferred to a branch completion control unit 17. The branch completion control unit 17 judges whether the branch prediction was accurate and transfers the branch information to a BRHIS update control unit 18. The BRHIS update control unit 18 updates the branch history of the branch prediction unit 11, based on the obtained branch information.
When an instruction is set in IWR, simultaneously the branch prediction result is analyzed and sent for each instruction. Then, Hit-Offset is transferred to RSBR together with the branch prediction information, including Hit-Way related to the branch prediction.
In
Even when a branch has not been predicted on an instruction boundary, the fact that branch prediction has not been conducted is judged by detecting the Hit of the corresponding instruction (SET_IWRx_HIT) and simultaneously by sending a signal SET_IERx_MISALIGN_HW_y.
Specifically, if in a circuit “for IWR0” shown at the top in
As described above, when signals shown in
In the case of a circuit “for IWR1”, the obtained information is as follows:
Furthermore, in the case of a circuit “for IWR2”, the following information is obtained.
Such information is transferred to RSBR together with another branch prediction information tag. A configuration used to transfer such information to RSBR together with another branch prediction information tag is already known.
The RSBR comprises a valid flag indicating the validity of an entry in a queue RSBR, a Phantom-Valid flag indicating whether the entry is a phantom entry, branch control information describing a conditional branch address, branch conditions and the like, the address IAR of branch prediction instruction, a branch destination instruction address TIAR, a section Hit for storing the SET_IWRy_HIT (in this case, y is an integer for identifying IWR), a section Way indicating the WAY of a branch history and a section Misalign-HW storing signals indicating the misalignment shown in
The flag Phantom-Valid of the RSBR is set using a technology disclosed in Japanese Patent Laid-open Publication No. 2000-181710 described earlier.
When a branch process or a phantom entry process is completed in the RSBR, the completion is reported to the branch history.
When a phantom entry process is completed, a branch completion control circuit sends the address BR_COMP_IAR<0:31> of the completed instruction, a WAY position BR_COMP_HIT_WAY<1:0> where BRHIS Hit is detected, BR_COMP_MISALIGN_HW_y indicating that instruction is misaligned and other control flags as requested to the BRHIS update control unit together with BR_COMP_AS_PHANTOM indicating that the relevant instruction is a phantom entry.
In
If a misaligned instruction happens to be a branch instruction, BR_COMP_AS_TAKEN (when control flow branches) or BR_COMP_AS_NOT TAKEN (when control flow does not branch) is sent and an aligned branch process is performed. In this case, update can be exercised over an address to which misalignment information is added. Except for adding misalignment information, the prior art is used.
When either normal erasure conditions or BR_COMP_AS_PHANTOM indicating that the instruction is a phantom entry is input, the circuit shown at the bottom of
In this way, a phantom entry is specified and an erase request signal is prepared for each phantom entry to be erased of phantom entries in the branch history. This erase request signal is handled like a conventional branch history entry erase request and the phantom entry is erased using entry erasure means of the conventional branch history.
So far a preferred embodiment that can completely erase phantom entries is described. Conversely, a preferred embodiment that realizes an instruction pre-fetch effect by intentionally generating a phantom entry is described below.
If an instruction is found to be a complex instruction that is micro code or emulated by firmware (branch instruction that is not executed at high speed) or non-branch instruction that is processed by the RSBR and branches control flow (such as an instruction that requires exception handling or an instruction to directly rewrite the program counter; in
In
On receipt of a notice BR_COMP_AS_PHANTOM with the tag, the BRHIS update control unit 18 does not erase the entry and updates aligned branch prediction information. Specifically, if there is the entry (BRHIS Hit), the BRHIS update control unit 18 updates the entry as requested. If there is no entry (Not hit), the unit 18 creates a new entry. The prior art is used for the other control, such as using BR_COMP_TIAR sent from the RSBR as a branch destination address to create/update an entry.
In
By doing so, when the next time there is an instruction fetch request corresponding to the instruction address, the entry is read and a branch prediction instruction is fetched. For example, even when an execution unit cannot promptly use the entry, instruction pre-fetching is available. In this way, since an operational equivalent to a pre-fetch request is made for a cache, performance can be improved.
As described above, according to this method, a phantom entry can be completely erased and the performance degradation of a branch history can be avoided. By positively using this function, control that brings about an instruction pre-fetching effect can be exercised over even a complex control transfer instruction and performance can be improved accordingly.
Number | Date | Country | Kind |
---|---|---|---|
2002-191433 | Jun 2002 | JP | national |
Number | Date | Country | |
---|---|---|---|
Parent | 10349930 | Jan 2003 | US |
Child | 11330191 | Jan 2006 | US |