The present disclosure relates to a computer system. More particularly, the present disclosure relates to a pipeline computer system having a branch prediction mechanism and an instruction processing method thereof.
Instruction pipeline is able to increase a number of instructions being executed in a single interval. In order to improve efficiency of processing instructions, a branch prediction instruction is utilized to predict an execution result of a branch instruction (e.g., a jump instruction, a return instruction, etc.), in order to move up the processing of a subsequent instruction. However, if the prediction result of the branch is branch-untaken, the current branch prediction mechanism is not able to remove bubbles (i.e., pipeline stalls) in the instruction processing progress.
In some aspects, a pipeline computer system includes a processor circuit and a memory circuit. The processor circuit is configured to obtain a first target address of a first branch instruction and a second address of a first prediction instruction according to a first address of the first branch instruction before the first branch instruction is executed, and sequentially prefetch a first instruction corresponding to the first target address and the first prediction instruction when a prediction result of the first branch instruction is branch-taken, in which an execution of the first instruction is followed by an execution of the first prediction instruction. The memory circuit is configured to store the first instruction and the first prediction instruction.
In some aspects, an instruction processing method includes the following operations: obtaining a first target address of a first branch instruction and a second address of a first prediction instruction according to a first address of the first branch instruction before the first branch instruction is executed; and sequentially prefetching a first instruction corresponding to the first target address and the first prediction instruction when a prediction result of the first branch instruction is branch-taken, on which an execution of the first instruction is followed by an execution of the first prediction instruction.
These and other objectives of the present disclosure will be described in preferred embodiments with various figures and drawings.
The terms used in this specification generally have their ordinary meanings in the art and in the specific context where each term is used. The use of examples in this specification, including examples of any terms discussed herein, is illustrative only, and in no way limits the scope and meaning of the disclosure or of any exemplified term. Likewise, the present disclosure is not limited to various embodiments given in this specification.
In this document, the term “coupled” may also be termed as “electrically coupled,” and the term “connected” may be termed as “electrically connected.” “Coupled” and “connected” may mean “directly coupled” and “directly connected” respectively, or “indirectly coupled” and “indirectly connected” respectively. “Coupled” and “connected” may also be used to indicate that two or more elements cooperate or interact with each other. In this document, the term “circuitry” may indicate a system formed with at least one circuit, and the term “circuit” may indicate an object, which is formed with one or more transistors and/or one or more active/passive elements based on a specific arrangement, for processing signals.
As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Although the terms “first,” “second,” etc., may be used herein to describe various elements, these elements should not be limited by these terms. These terms are used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the embodiments. For ease of understanding, like elements in various figures are designated with the same reference number.
In some embodiments, the processor circuit 110 may be a pipeline processor circuit, which may allow overlapping execution of multiple instructions. For example, the processor circuit 110 may include a program counter circuit (not shown), an instruction memory (not shown), at least one multiplexer circuit (not shown), at least one register (not shown), and at least one of data memory circuit (not shown) which form data paths for parallel processing multiple instructions. The arrangements about the data paths in the processor circuit 110 are given for illustrative purposes, and the present disclosures is not limited thereto.
In some embodiments, a core of the processor circuit 110 includes an instruction fetch circuit 112 and the processor circuit 110 may further include a memory circuit 114. The instruction fetch circuit 112 may be configured to determine whether a prediction result of a branch instruction is branch-taken or branch-untaken, and prefetch a corresponding instruction from the main memory 120 (or the memory circuit 114) according to the prediction result. In some embodiments, the instruction fetch circuit 112 includes a branch prediction mechanism (not shown), which is configured to determine the prediction result and store a lookup table (e.g., table 1 and table 2 discussed below). In some embodiments, the branch prediction mechanism may determine the prediction result of a current branch instruction according to a history about executions of previous instructions. In some embodiments, the branch prediction mechanism may perform a global-sharing (g-share) algorithm or a tagged geometric history length branch prediction (TAGE) algorithm, in order to determine the prediction result of the branch instruction. The types of the algorithms are given for illustrative purposes, and the present disclosure is not limited thereto. Various algorithms able to execute branch prediction are within the contemplated scope of the present disclosure. Operations about the branch prediction and the prefetching instructions will be described in the following paragraphs.
In some embodiments, the memory circuit 114 may be a register, which is configured to store instruction(s) and/or data prefetched by the instruction fetch circuit 112. In some embodiments, the memory circuit 114 may be a cache memory, which may include entire cache memory levels. For example, the memory circuit 114 may only include a L1 cache memory, or may include a L1 cache memory and a L2 cache memory, or may include a L1 cache memory, a L2 cache memory, and a L3 cache memory. The types of the memory circuit 114 are given for illustrative purposes, and the present disclosure is not limited thereto.
In operation S210, before a first branch instruction (e.g., branch instruction B) is executed, a first target address (e.g., an address ADDR3 in table 1) of the first branch instruction and a second address (e.g., an address ADDRC in table 1) of a first prediction instruction (e.g., branch instruction C) are obtained according to a first address (e.g., an address ADDRB in table 1) of the first branch instruction. In operation S220, a first instruction corresponding to the first target address and the first prediction instruction are sequentially prefetched when a prediction result of the first branch instruction is branch-taken, in which an execution of the first instruction is followed by an execution of the first prediction instruction.
The above description of the instruction processing method 200 includes exemplary operations, but the operations are not necessarily performed in the order described above. Operations of the instruction processing method 200 may be added, replaced, changed order, and/or eliminated as appropriate, or the operations are able to be executed simultaneously or partially simultaneously as appropriate, in accordance with the spirit and scope of various embodiments of the present disclosure.
In order to further illustrate the instruction processing method 200, reference is now made to
As shown in
As described above, the processor circuit 110 stores a lookup table. In some embodiments, the lookup table is configured to store a corresponding relation among the first address, the first target address, and the second address. For example, the lookup table may be expressed as the following table 1:
In table 1, the address (i.e., the first address) of the branch instruction indicates a memory address of the main memory 120 (or the memory circuit 114) where the branch instruction is stored. The target address (i.e., the first target address) of the branch instruction is to indicate a memory address where an instruction, which is to be executed when the prediction result of the branch instruction is branch-taken, is stored. The execution of the instruction corresponding to the target address is followed by the execution of the next prediction instruction. For example, the instruction 2 corresponds to the target address ADDR2, and the next prediction instruction is the instruction B that is executed after the execution of the instruction 2. As a result, when the processor circuit 110 executes the branch instruction A, the instruction fetch circuit 112 may search the lookup table according to the memory address ADDRA of the branch instruction A, in order to obtain the target address ADDR2 and the address ADDRB of the next prediction instruction (i.e., the branch instruction B). In other words, the address of the branch instruction is considered as a tag of the lookup table. If the tag of the lookup table is hit, it indicates that the processor circuit 110 is executing the branch instruction corresponding to the tag, and the processor circuit 110 may obtain the corresponding target address and the memory address (i.e., the second address) of the next prediction instruction. As shown in
In different embodiments, the address of the next prediction instruction in table 1 may be an offset value or an absolute address. If the address of the next prediction instruction is the offset value, the processor circuit 110 may sum up the corresponding target address and the corresponding offset value to determine the actual memory address of the next prediction instruction.
In some embodiments, as shown in
In greater detail, during an interval T, the processor circuit 110 starts processing the instruction 1. During an interval T+1, the processor circuit 110 starts processing the branch instruction A, and the instruction fetch circuit 112 starts determining the prediction result of the branch instruction A. Meanwhile, the instruction fetch circuit 112 reads the lookup table according to the address ADDRA, in order to obtain the target address ADDR2 and the address ADDRB of the next prediction instruction (i.e., operation S210 in
During an interval T+2, as the determination of whether the branch instruction A is branch-taken is not completed, the processor circuit 110 starts processing a next instruction of the branch instruction A (e.g., instruction A′ in
During the interval T+3, the instruction fetch circuit 112 determines that the prediction result of the branch instruction A is branch-taken (labeled as 3_IB/direct2). In response to this prediction result, the processor circuit 110 may prefetch the instruction 2 according to the target address ADDR2 (i.e., operation S220). Meanwhile, if the next prediction instruction (i.e., instruction B) corresponding to the address ADDRB is a branch instruction, the instruction fetch circuit 112 may start determining the prediction result of the branch instruction B, and read the lookup table according to the address ADDRB of the branch instruction B, in order to obtain the address ADDR3 and the address ADDRC of the next prediction instruction (i.e., branch instruction C) (i.e., operations S210 in
During an interval T+4, the processor circuit 110 starts processing the branch instruction B (i.e., operation S220 in
During an interval T+5, the instruction fetch circuit 112 determines that the prediction result of the branch instruction B is branch-taken (labeled as 3_IB/direct3). In response to the prediction result, the processor circuit 110 may start processing (i.e., prefetching) the instruction 3 according to the address ADDR3 (i.e., operation S220 in
In some related approaches, a branch prediction mechanism only prefetches the target address when the prediction result is branch-taken according to the address of the branch instruction. In the above approaches, even if the prediction result of the branch instruction is branch-taken, and one bubble is caused before the instruction corresponding to the target bit is executed. Compared with the above approaches, with the arrangement shown in table 1, most bubbles in the instruction processing progress can be removed. As a result, the instruction processing efficiency of the processor circuit 110 are improved.
Reference is made to
In this example, operations of processing the instruction 1, the branch instruction A, the instruction 2, the branch instruction B, and the instruction 3 are the same as those in
In the above related approaches, if the prediction result of the branch instruction is branch-untaken, at least one bubble is caused. In some other approaches, the branch prediction mechanism obtains a target address of a next branch instruction according to a target address a branch instruction (if the prediction result is branch-taken). In above approaches, if the prediction result is branch-untaken, multiple (e.g., four) bubbles are caused. Compared to those approaches, with the arrangements in table 1, when the prediction result of the branch instruction is branch-untaken, the processor circuit 110 is able to execute multiple instruction without causing bubble(s).
Reference is made to
In examples of
In other words, in this example, the lookup table (i.e., table 2) is further configured to store a corresponding relation among the address of the branch instruction, the target address of the branch instruction, the address of the next prediction address (if the prediction result is branch-taken), and the address of the next prediction address (if the prediction result is branch-untaken).
For example, before the processor circuit 110 starts processing the branch instruction A, the instruction fetch circuit 112 may start determining the prediction result of the branch instruction A according to the address ADDRA according to the branch instruction A, and obtain the corresponding target address ADDR2, the address ADDRB of the next prediction instruction B (if the prediction result of is branch-taken) and the address ADDRA′ of the next prediction instruction A′ (if the prediction result of is branch-untaken) from table 2. With this analogy, if the prediction result of the branch instruction A is branch-untaken, the processor circuit 110 (and the instruction fetch circuit 112) may obtain a target address ADDR2′ of the branch instruction A′, an address (not shown) of a next prediction instruction (if the prediction result is branch-taken), and an address (not shown) of a next prediction instruction (if the prediction result is branch-untaken) according to the address ADDRA′. As a result, if the prediction result is branch-untaken, the processor circuit 110 (and the instruction fetch circuit 112) may start processing (i.e., prefetch) a corresponding next prediction instruction, in order to remove more bubbles.
As described above, with the pipeline computer system and the instruction processing method in some embodiments, bubbles in the instruction processing progress can be removed, in order to improve overall efficiency of processing instructions.
Various functional components or blocks have been described herein. As will be appreciated by persons skilled in the art, in some embodiments, the functional blocks will preferably be implemented through circuits (either dedicated circuits, or general purpose circuits, which operate under the control of one or more processors and coded instructions), which will typically comprise transistors or other circuit elements that are configured in such a way as to control the operation of the circuitry in accordance with the functions and operations described herein. As will be further appreciated, the specific structure or interconnections of the circuit elements will typically be determined by a compiler, such as a register transfer language (RTL) compiler. RTL compilers operate upon scripts that closely resemble assembly language code, to compile the script into a form that is used for the layout or fabrication of the ultimate circuitry. Indeed, RTL is well known for its role and use in the facilitation of the design process of electronic and digital systems.
The aforementioned descriptions represent merely some embodiments of the present disclosure, without any intention to limit the scope of the present disclosure thereto. Various equivalent changes, alterations, or modifications based on the claims of present disclosure are all consequently viewed as being embraced by the scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
109140343 | Nov 2020 | TW | national |