This application claims priority of China Application Serial Number 201910530579.5, filed on Jun. 19, 2019, the entirety of which is herein incorporated by reference.
The present invention relates to an instruction execution mechanism and in particular to an instruction execution method and an instruction execution device.
In general, while the execution unit of the processor is processing the instruction, and if the latter instruction in the code requires data from the former instruction (that is, the source operand of the latter instruction is the same as the destination operand of the former instruction), this means that there is a data dependency between the former instruction and the latter instruction. In addition, when the former instruction and the latter instruction are dispatched to the same execution unit for execution, since the two instructions use the same hardware resource, this means that there is a structural dependency between the former instruction and the latter instruction.
When there is a structural dependency between instructions, although there is a sequence between the two instructions in the code, the execution unit may execute the latter instruction in the code first, and then execute the former instruction since the execution unit of the processor executes instructions out-of-order. The traditional method is to use the reorder buffer (ROB) to reorder the execution results of each instruction and then ensure that the instructions are retired in order.
However, since this method needs to reorder the execution results of each instruction, and then ensure that the instructions are sequentially retired, it takes a long time and may cause a delay for the execution unit to execute multiple instructions which require a long operation time (such as floating point operation instructions) when there is structural dependency between these instructions.
Therefore, how to effectively extend the processing performance of the processor based on the architecture of the current processor, and reduce the delay time required for the execution unit to execute the instructions when there is a structural dependency between the instructions has become a problem to be solved in the field.
An embodiment of the invention introduces an instruction execution method. The instruction execution method is suitable for being executed by a processor. The first processor comprises a register alias table (RAT) and a reservation station. The instruction execution method includes: a register alias table receives a first micro-instruction and a second micro-instruction and issues the first micro-instruction and the second micro-instruction to the reservation station; the reservation station assigns one of a plurality of execution units to execute the first micro-instruction according to a first specific message of the first micro-instruction; and the reservation station assigns another execution unit to execute the second micro-instruction, according to a second specific message of the second micro-instruction. When the reservation station determines that the execution units assigned for the first micro-instruction and the second micro-instruction are the same, the reservation station indicates that the second micro-instruction depends on the first micro-instruction.
An embodiment of the invention introduces an instruction execution device. The instruction execution device includes a reservation station and a register alias table. The register alias table is configured to receive a first micro-instruction and a second micro-instruction, and to issue the first micro-instruction and the second micro-instruction to the reservation station. The reservation station assigns an execution unit to execute the first micro-instruction according to the first specific message of the first micro-instruction, and it assigns another execution unit to execute the second micro-instruction according to the second specific message of the second micro-instruction. When the reservation station determines that the execution unit assigned for the first micro-instruction and the second micro-instruction are the same, the reservation station indicates that the second micro-instruction depends on the first micro-instruction.
The instruction execution method and the instruction execution device of the present invention can indicate a structural dependency with the proceeding micro-instruction for the succeeding micro-instruction by the reservation station, so that the succeeding micro-instruction can wait for the completion of the execution of the proceeding micro-instruction. After the execution of the proceeding micro-instruction is complete, the succeeding micro-instruction is dispatched by the reservation station for execution. In the occasion to execute multiple micro-instructions with structural dependency and instruction types needing a long operation time, delays caused by the succeeding micro-instruction that is executed before the proceeding micro-instruction but that can only be retired after waiting for the proceeding micro-instruction and other proceeding instructions have retired in order are avoided. The instruction execution method and the instruction execution device of the present invention greatly reduce the overall execution time.
The present invention can be fully understood by reading the subsequent detailed description and examples with references made to the accompanying drawings, wherein:
The following description is of the best-contemplated mode of carrying out the invention. This description is made for the purpose of illustrating the general principles of the invention and should not be taken in a limiting sense. The scope of the invention is best determined by reference to the appended claims.
The present invention will be described with respect to particular embodiments and with reference to certain drawings, but the invention is not limited thereto and is only limited by the claims. It will be further understood that the terms “comprises,” “comprising,” “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Use of ordinal terms such as “first”, “second”, “third”, etc., in the claims to modify a claim element does not by itself connote any priority, precedence, or order of one claim element over another or the temporal order in which acts of a method are performed, but are used merely as labels to distinguish one claim element having a certain name from another element having the same name (but for use of the ordinal term) to distinguish the claim elements.
In one embodiment, please refer to
In one embodiment, the instruction execution device 100 can be implemented by, for example, a microcontroller, a microprocessor, a digital signal processor, an application specific integrated circuit (ASIC), or a logic circuit.
In one embodiment, as shown in
In one embodiment, the register alias table 106 issued the renamed micro-instruction to the reorder buffer (ROB) 110 via the instruction path 107. The reorder buffer 110 stores each entry for each micro-instruction transmitted from the register alias table 106 in accordance with the original order of the micro-instructions in the program. These entries are also called reorder buffer entries, and ordered by a reordering buffer index (ROB index).
In one embodiment, the reservation station 108 indicates the dependency of the micro-instruction, and the dependency refers to which micro-instruction's destination operand the source operand of a micro-instruction depends on or relates to. For example, the source operand of a subsequent arithmetic logic unit (ALU) instruction may depend on or be related to the destination operand of the proceeding load instruction.
In one embodiment, the reservation station 108 dispatches instructions to the appropriate one of the plurality of execution units 112 for execution.
In one embodiment, the reservation station 108 includes at least one reservation station queue 122 (RS queue). The reservation station queue is also called a reservation-station matrix (RS matrix). When the instruction is ready to be executed (all source operands of the instruction are ready and all dependencies are resolved), the reservation station queue 122 schedules and dispatches the corresponding instruction to the corresponding execution unit 112. The execution unit 112 provides their execution results to the reorder buffer 110, which ensures that the instructions are retired in order.
In one embodiment, the reservation station queue 122 corresponds to an integer execution unit 114, a first floating point execution unit (FPU) 116, and a second floating point execution unit 118.
In one embodiment, all types of execution units 114-118 share a common reservation queue. In another embodiment, multiple execution units of the same type share a reservation station queue. For example, the first floating point execution unit 116 shares a reservation station queue 122 with the second floating point execution unit 118. The invention is not limited thereto. In addition, the execution unit 112 includes more other types of execution units. For example, a memory order buffer (MOB) for executing a load/store instruction. The memory order buffer is not shown in
It is should be noticed that the register alias table 106 is the last stage in which micro-instructions are executed sequentially (i.e., the instructions are executed according to the program order). Both the subsequent reservation station 108 and the execution unit 112 are operated out-of-order: the micro-instructions in the reservation station 108 whose operands are all ready are first dispatched to the execution unit 112 for execution (while there are multiple ready micro-instructions, then the oldest one in the reservation station 108 is selected). Therefore, at these stages, the original order of the micro-instructions in the program is broken, and the reorder buffer 110 ensures that the micro-instructions are sequentially retired in the program order after execution.
In one embodiment, when the first micro-instruction (e.g., the micro-instruction whose position is preceding in the code) and the second micro-instruction (e.g., the micro-instruction whose position is succeeding in the code) in the code use the same execution unit (for example, the first floating point execution unit 116), since the two instructions use the same hardware resource, there is a structural dependency between the first micro-instruction and the second micro-instruction. Since the execution units 112 of the processor are all out-of-order execution, the first floating point execution unit 116 may firstly execute the second micro-instruction in the code and then execute the first micro-instruction due to out-of-order execution. It may cause that the first floating point execution unit 116 generate a delay when executing a plurality of floating point operation instructions that require a long operation time.
For example, please refer to
As can be seen from the upper line in the
As can be seen from the lower line in the
Therefore, if the first micro-instruction μop1 and the second micro-instruction μop2 are assigned to be executed by the same execution unit (for example, the first floating point execution unit 116), the reservation station 108 can mark or indicate the first micro-instruction μop1 and the second micro-instruction μop2 having structural dependency and perform scheduling. The delay time required for the first floating point execution unit 116 to execute such instructions may be reduced.
Refer to
In step 310, the register alias table 106 receives a first micro-instruction μop1 and a second micro-instruction μop2, respectively, and the register alias table 106 respectively issues the first micro-instruction μop1 and the second micro-instruction μop2 to the reservation station 108 (i.e., the reservation station 400 of
In one embodiment, as shown in
In step 320, the reservation station 400 assigns one of the plurality of execution units 1 to 9 (for example, the floating point execution unit 8) to the first micro-instruction μop1 according to a specific message of the first micro-instruction op to execute the first micro-instruction. Moreover, the reservation station 400 assigns one of the plurality of execution units 1 to 9 (for example, floating point execution unit 8) to the second micro-instruction μop2 according to a specific message of the second micro-instruction μop2 to execute the second micro-instruction.
In one embodiment, the specific message can be an instruction type. For example, the first micro-instruction μop1, the second micro-instruction μop2, and the third micro-instruction μop3 are all of the instruction types of floating point operations. The fourth micro-instruction μop4 is of the instruction type of integer operation.
In the example of
In one embodiment, if a certain micro-instruction can be executed by multiple execution units (for example, the micro-instruction can be executed both by the execution units 8 and 9 if the execution units 8 and 9 are both the floating point execution units) according to the foregoing specific message, the dispatch port designator 410 can assign an execution unit to execute the micro-instruction in a round robin manner, for example, multiple micro-instructions of the same type are assigned to the execution units 8 and 9 in a polling manner.
In step 330, when the reservation station 400 determines that the first micro-instruction μop1 and the second micro-instruction μop2 are assigned to the same execution unit (for example, both first micro-instruction μop1 and second micro-instruction μop2 are assigned to the floating point execution units 8), the reservation station 400 indicates that the second micro-instruction μop2 depends on the first micro-instruction μop1.
In one embodiment, the reservation station 400 includes a plurality of dispatch ports 1 to 9 for dispatching micro-instructions to execution units 1 to 9, respectively. The reservation station 400 determines whether the execution unit (for example, the floating-point execution unit 8) assigned by the reservation station 400 for the first micro-instruction μop1 and the second micro-instruction μop2 are the same according to whether a dispatch port (for example, the dispatch port 8) corresponding to the execution unit assigned by the reservation station 400 for the second micro-instruction μop2 includes a message corresponding to the first micro-instruction μop1.
More specifically, in one embodiment, the dispatch ports 1 to 9 each includes a scoreboard. Only the scoreboard 412 and the scoreboard 413 are shown in
More specifically, if the first micro-instruction op is assigned to be dispatched by the dispatch port 8 to the execution unit 8, the reservation station 400 records the message corresponding to the first micro-instruction op on the scoreboard 412 corresponding to the dispatch port 8. If the third micro-instruction μop3 is also assigned to be dispatched by the dispatch port 8 to the execution unit 8, the reservation station 400 also records the message corresponding to the third micro-instruction μop3 on the scoreboard 412. When the reservation station 400 sends the related message of the third micro-instruction μop3 to the scoreboard 412, and if the message corresponding to the first micro-instruction μop1 has been recorded on the scoreboard 412, the third micro-instruction μop3 is marked as depending on the first micro-instruction μop1. This means that the third micro-instruction μop3 will use the same resource (i.e., the execution unit 8) as the first micro-instruction μop1 for execution. Therefore, the reservation station 400 reserves the third micro-instruction μop3 according to the dependency indicator to wait until execution of the first micro-instruction op is completed. How to mark the dependency and cancel the dependency indicator will be detailed later in
For example, the first micro-instruction μop1, the second micro-instruction μop2, and the third micro-instruction μop3 are all of the same instruction type (floating point operation micro-instruction). The reservation station 400 assigns the first and second floating point execution units 8 and 9 for the first micro-instruction μop1, the second micro-instruction μop2, and the third micro-instruction μop3 in a round robin manner. When the reservation station 400 assigns the first micro-instruction μop1 and the third micro-instruction μop3 to be executed by the first floating point execution unit 8, and assigns the second micro-instruction μop2 to be executed by the second floating point execution unit 9, the scoreboard 412 of the dispatch port 8 records a message corresponding to the first micro-instruction μop1 and the third micro-instruction μop3 in order to indicate that both the first micro-instruction μop1 and the third micro-instruction μop3 are executed by the floating point execution unit 8. Then, the reservation station 400 marks the succeeding third micro-instruction μop3 depends on the preceding first micro-instruction μop1. The scoreboard 413 of the dispatch port 9 records the message corresponding to the second micro-instruction μop2. The reservation station 400 does not need to mark the dependency for the second micro-instruction μop2.
In another embodiment, when the dispatch port designator 410 assigns the first micro-instruction op to be dispatched by the dispatch port 8 to the execution unit 8 for execution, the reservation station 400 records the message of the first micro-instruction μop1 on the scoreboard 412 of the dispatch port 8. When the reservation station 400 assigns the second micro-instruction μop2 to be dispatched by the dispatch port 8 to the execution unit 8 for execution, the reservation station 400 queries whether the scoreboard 412 contains messages corresponding to other micro-instructions. In this example, the reservation station 400 can find the message corresponding to the first micro-instruction op on the scoreboard 412. When determining that the scoreboard 412 contains a message corresponding to the first micro-instruction μop1, the reservation station 400 marks or indicates that the second micro-instruction μop2 structurally depends on the first micro-instruction μop1. In this example, when the reservation station 400 assigns the third micro-instruction μop3 to be dispatched by the dispatch port 9 to the execution unit 9 for execution, the reservation station 400 queries whether the scoreboard 413 of the dispatch port 9 contains messages corresponding to other micro-instructions. When the reservation station 400 determines that there are no messages corresponding to other micro-instructions on the scoreboard 413, it is not necessary to indicate any structural dependency for the third micro-instruction μop3.
In other words, when the reservation station 400 assigns multiple micro-instructions to be executed by the same execution unit, these micro-instructions are considered as having a dependency.
In one embodiment, since the instruction type of the fourth micro-instruction μop4 is non-floating point operation (for example, an integer operation), the dispatch port designator 410 assigns the fourth micro-instruction μop4 to be stored in the temporary register 411. Moreover, the dispatch port designator 410 assigns the fourth micro-instruction μop4 to be dispatched by the dispatch port 2 to the integer execution unit (e.g., the execution unit 2) for execution.
The steps and/or methods of performing the various elements in
All dependencies are represented by the index of the reservation-station matrix 420 of the associated instruction. The reservation station 400 in
Next, how the reservation station 400 indicates the aforementioned structural dependency is described in detail in following embodiment. For example, if the instruction type of the third micro-instruction μop3 is floating point micro-instruction (assigned to be executed by execution unit 8), the reservation station 400 queries the scoreboard 412 when processing the third micro-instruction μop3. If the reservation station 400 finds that the scoreboard 412 includes a first micro-instruction μop1 that is a floating point micro-instruction and was recorded earlier on the scoreboard 412, the first dependency indicator value Src4 dependency [63:0] representing the structural dependency of the third micro-instruction μop3 is updated according to the index value of the reservation-station matrix 420 of the first micro-instruction μop1. For example, if the first micro-instruction op is stored earlier in the entry R19 of the reservation-station matrix 420, the bit [19] of the 64 bits included in the first dependency indicator value Src4 dependency [63:0] of the third micro-instruction μop3 is set (for example, set to 1). In addition, the micro-instructions assigned to the execution unit 9 also establish structural dependencies in the same manner.
The following describes how to indicate the data dependency. The comparators 414, 415 and 416 in
The reservation station 400 sends the source operands Src1 to Src3 to the comparators 414 to 416, respectively. The reservation station 400 compares the source operands Src1 to Src3 with the destination operand DT of all previous micro-instructions already stored in the reservation-station matrix 420. If some of them are the same, the reservation station 400 sets the value of the corresponding position of the second dependency indicator value Src1 dependency [63:0] (for example, sets to 1). For example, if the source operand Src1 of the second micro-instruction μop2 is the same as the PRF index (or ROB index) of the destination operand of the micro-instruction stored in the entry R21, the reservation station 400 sets the bit [21] of the second dependency indicator value Src1 dependency [63:0] of the second micro-instruction μop2 to 1 to indicate that the source operand Src1 of the second micro-instruction μop2 depends on the micro-instruction stored in entry R21. Comparators 415 and 416 operate similarly to comparator 414 for respectively setting a third dependency indicator value Src2 dependency [63:0] to indicate the data dependency of source operand Src2 of second micro-instruction μop2 and the fourth dependency indicator value Src3 dependency [63:0] to indicate the data dependency of the source operand Src3 of the second micro-instruction μop2. The details are not described herein again.
The reservation station 400 performs an operation of “OR” on all dependency indicator values Src1 dependency [63:0] to Src4 dependency [63:0] (i.e., input to the OR logic gate 418) to obtain the dependency indicator value dependency [63:0] representing all the dependency relations of the third micro-instruction μop3. This dependency indicator value dependency [63:0] is recorded in the dependency domain DD of the entry stored in the third micro-instruction μop3 in the reservation-station matrix 420.
The following describes how to perform the dependency clearance using the value dependency [63:0] in the dependency domain DD.
The reservation-station matrix 420 has a plurality of dispatch ports, for example, dispatch port 8 (port 8) and dispatch port 9 (port 9), for dispatching micro-instructions to the corresponding execution units 1 to 9. For dispatch port 8 (port 8), the reservation station 400 selects the micro-instruction whose dependency has been resolved (That is, the selected micro-instruction is ready for execution), and who is the oldest in the program order (the age of the micro-instructions can be known by age domain AD). The reservation station 400 dispatches the selected micro-instructions to the execution unit 8 for execution (assuming that the dispatch port 8 corresponds to the execution unit 8).
Regard to clearance the dependency, if the second micro-instruction μop2 depends on the first micro-instruction μop1, the dependency of the second micro-instruction μop2 may be resolved after the execution of the first micro-instruction μop1. Moreover, the dependency of the second micro-instruction μop2 may also be resolved when the first micro-instruction μop1 writes back after execution (when the first micro-instruction μop1 writes back, 0 will be sent to the AND logic gate 419, indicating the end of execution). This depends on the execution time of the first micro-instruction μop1 (that is, related to the instruction type of the first micro-instruction μop1). In one embodiment, after the dependency of the second micro-instruction μop2 is resolved and it is executed, the result of the first micro-instruction μop1 just (the most time-intensive case) can be obtained or the first micro-instruction μop1 just vacates the resources of used execution unit before the execution unit performs the calculation. Specifically, after the first micro-instruction μop1 is completely executed, the position corresponding to the first micro-instruction op in the value dependency [63:0] in the dependency domain DD of the second micro-instruction μop2 is cleared. The position corresponding to the first micro-instruction op in the dependency [63:0] is determined by the index value of the first micro-instruction op in the reservation-station matrix 420. For example, the first micro-instruction op is stored earlier in the entry R19 of the reservation-station matrix 420. After the first micro-instruction μop1 is completely executed, the bit[19] of the dependency [63:0] of the second micro-instruction μop2 is cleared (for example, set to 0). When the reservation station 400 determines that all the positions of the dependency [63:0] of the second micro-instruction μop2 are cleared, which means that all the micro-instructions the second micro-instruction μop2 depends on have been executed, i.e., all the dependencies of the second micro-instruction μop2 are released, it is further determined whether the second micro-instruction μop2 is the oldest of all the micro-instructions assigned to be executed by the assigned execution unit (for example, execution unit 8) according to the value of the age domain AD and the value of the port domain PD corresponding to the second micro-instruction μop2. When the second micro-instruction μop2 is the oldest of all the micro-instructions assigned to be executed by the assigned execution unit 8, the second micro-instruction μop2 is dispatched by the reservation station 400 to the assigned execution unit 8 for execution.
In summary, the instruction execution method and the instruction execution device of the present invention that the reservation station indicates a structural dependency with the proceeding micro-instruction for the succeeding micro-instruction, so that the succeeding micro-instruction can wait for the execution of the proceeding micro-instruction. After the execution of the proceeding micro-instruction is complete, the succeeding micro-instruction is dispatched by the reservation station for execution. In the occasion to execute multiple micro-instructions with structural dependency and instruction types needing a long operation time, delays caused by the succeeding micro-instruction that is executed before the proceeding micro-instruction but that can only be retired after waiting for proceeding micro-instructions and other proceeding instructions to retire in order are avoided. The instruction execution method and the instruction execution device of the present invention greatly reduces overall execution time.
While the invention has been described by way of example and in terms of the preferred embodiments, it should be understood that the invention is not limited to the disclosed embodiments. On the contrary, it is intended to cover various modifications and similar arrangements (as would be apparent to those skilled in the art). Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Number | Date | Country | Kind |
---|---|---|---|
201910530579.5 | Jun 2019 | CN | national |