This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2009-154374, filed on Jun. 29, 2009, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are directed to a priority circuit, a processor, and a processing method.
Generally, a processor such as CPU (Central Processing Unit) or MPU (Micro Processing Unit) is an arithmetic processing unit that is included in an information processor, a portable telephone, or the like and executes various types of arithmetic processing. In recent years, with the miniaturization of a processor and the improvement of processing performance, the application field of the processor has been various.
It will be now explained about the configuration of a processor and a pipelining of the processor with reference to
For example, as illustrated in
For example, RS includes RSA (Reservation Station for Address) for load and store instructions, RSE (Reservation Station for Execution) for fixed point arithmetic instructions (integer arithmetic instructions), RSF (Reservation Station for Floating point) for floating point arithmetic instructions, RSBR (Reservation Station for BRanch) for branch instructions, and the like.
The processor registers all instructions decoded by the instruction decoder in CSE (Commit Stack Entry) that performs the management of all the instructions and registers all the instructions in the reservation stations that perform an out-of-order execution control in such a manner that an executable instruction is executed in first regardless of a program sequence. Next, the processor selects an instruction that can be executed in a priority (P) cycle (see
After that, the processor reads out a register in a buffer (B) cycle (see
Then, the processor receives reports such as execution completion of arithmetic processing in CSE, completion of data transfer process from a primary data cache, and completion of branch determination process from a branch control mechanism, and performs a commit process in order. Next, the processor writes information from the updating buffer to the register in a register writing (W) cycle (see
However, as a technique for improving a resource utilization ratio in a cache, a pipeline, an arithmetic unit, and the like that are required for instruction execution performed by the processor and drawing out the performance of the processor, there is a technology called a super scalar method as illustrated in
For example, as illustrated in
In
Next, it will be explained about RS that has a plurality of dissymmetric fixed point arithmetic units with reference to
For example, as illustrated in
Next, it will be explained about the dispatch to an arithmetic unit with reference to
For example, as illustrated in
While the old instruction of the instructions registered in each of queues is moved to the upper number (for example, 4 to 9) every cycle, i.e. a bubble up is performed, a priority circuit selects an oldest-ready instruction from each of RSEA and RSEB. Next, when “+INH_EXA(B)_P_TGR” that is a dispatch inhibition flag for each of the arithmetic units EXA and EXB is not on, the selected instruction is dispatched because “EXA(B)_VALID” of a priority selection signal of the arithmetic unit or that is a priority selection signal of the arithmetic unit becomes valid.
In the configuration illustrated in
Next, it will be explained about the functional block of the priority circuit of RSEA with reference to
For example, as illustrated in
A READY condition of each entry includes, for example, “Condition 1: the entry is valid and is not dispatched”, “Condition 2: a source 1 register does not dependency relation or can be bypassed”, and “Condition 3: a source 2 register does not dependency relation or can be bypassed”.
Next, it will be explained about the flow performed by the priority circuit of RSEA with reference to
For example, as illustrated in
Next, when all the three conditions are satisfied (Step S2: YES), the priority circuit selects the queue “9” “+P_EXA_SEL[9]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S3). In this case, the priority circuit in relation to the input of three conditions determines a queue from an old queue, i.e., the queue “9” among “0 to 9” when Condition 1 to Condition 3 are satisfied as illustrated in
Moreover, when the three conditions are not satisfied (Step S2: NO), the priority circuit does not select the queue “9” “+P_EXA_SEL[9]=0” (Step S4). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_RSEA—8” (Step S5). Next, when all the three conditions are satisfied (Step S5: YES), the priority circuit selects the queue “8” “+P_EXA_SEL[8]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S6).
Moreover, when the three conditions are not satisfied (Step S5: NO), the priority circuit does not select the queue “8” “+P_EXA_SEL[8]=0” (Step S7). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_RSEA—7” (Step S8).
Next, when all the three conditions are satisfied (Step S8: YES), the priority circuit selects the queue “7” “+P_EXA_SEL[7]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S9). On the other hand, when the three conditions are not satisfied (Step S8: NO), the priority circuit performs the same process on the queues “0 to 6”.
According to the flow, as illustrated in
Moreover, in the case of “−P_RSEA—9READY”, “−P_RSEA—8_READY”, and “+P_RSEA—7_READY”, the priority circuit selects “7”.
However, as described above, when each of the fixed point arithmetic units has RSE, a mounting area in a circuit increases. Thus, it is preferable that a processor for executing a floating point arithmetic program for HPC (High Performance Computing) etc. that has a higher load than that of a fixed point arithmetic program effectively. utilizes the RSE resource to be a smaller mounting area because it is a few that all the entries of RSE are filled up.
Therefore, in recent years, as illustrated in
However, in the technology for sharing RSE described above, there is a problem in that an operation according to an instruction that can be executed by only a predetermined arithmetic unit has errors. Specifically, in the technology for sharing RSE described above, an instruction that cannot be executed by an arithmetic unit is dispatched and thus an operation according to an instruction that can be executed by only a predetermined arithmetic unit has errors because it is not considered that different executable instructions are assigned to arithmetic units.
According to an aspect of an embodiment of the invention, a priority circuit is connected to a reservation station that holds and supplies a decoded instruction decoded by an instruction decoder to be operated and executable information outputted from the instruction decoder indicating whether a decoded instruction can be executed only by a specific arithmetic unit and a plurality of arithmetic units each have an instruction queue storing the decoded instruction supplied from the reservation station and each execute a different operation based on the decoded instruction stored in the instruction queue. The priority circuit includes dispatching unit that dispatches the decoded instruction to an arithmetic unit different from the specific arithmetic unit when the executable information indicates that the decoded instruction can be executed not only by the specific arithmetic unit and of which an instruction queue is vacant on the basis of the executable information input from the reservation station.
The object and advantages of the embodiment will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the embodiment, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. The present invention is not limited to the embodiments explained below.
First, it will be explained about registration and dispatch to RSE according to the first embodiment with reference to
For example, as illustrated in
On the other hand, the priority circuit 120 determines the flag “EXA_ONLY” information and does not dispatch an entry of which the flag “EXA_ONLY” is on to the EXB 140 according to priority logic of dispatch.
Moreover, when a dispatch inhibition flag “+INH_EXA_P_TGR” depicting that the dispatch to the EXA 130 cannot be performed is on because an instruction that uses the EXA 130 is being executed, the priority circuit 120 dispatches the oldest instruction to the EXB 140 on condition that it is not “EXA_ONLY”. Then, the EXA 130 and the EXB 140 execute arithmetic processing according to instructions that are dispatched in this way.
It will be explained about the dispatch to an arithmetic unit under each condition of the vacancy of RSE, the dispatch inhibition flag, the dependency relation between instructions, and the flag EXA_ONLY with reference to
First, it will be explained about the dispatch to an arithmetic unit when the queue of RSEB is empty with reference to
For example, as illustrated in
Next, it will be explained about the dispatch to the arithmetic unit when a dispatch inhibition flag is on with reference to
For example, as illustrated in
Next, it will be explained about the dispatch to the arithmetic unit when there is a dependency relation between Instruction 1 and Instruction 2 with reference to
For example, as illustrated in
Next, it will be explained about the dispatch to the arithmetic unit when there is a dependency relation between Instruction 1, Instruction 2, and Instruction 3 with reference to
For example, as illustrated in
Next, it will be explained about the dispatch to the arithmetic unit when Instruction 1 and Instruction 2 are EXA_ONLY with reference to
For example, as illustrated in
In brief, the priority circuit according to the first embodiment dispatches an instruction to an arithmetic unit by sharing RSE on the basis of the dependency relation of instructions to be dispatched to the arithmetic unit, the flag EXA_ONLY, or the dispatch inhibition flag. In other words, because the priority circuit dispatches an instruction to an arithmetic unit on the basis of the dependency relation of instructions or various types of flags, the efficiency of arithmetic processing can be promoted without errors in arithmetic processing.
Next, it will be explained about the functional block of the priority circuit of RSE with reference to
For example, as illustrated in
In brief, the priority circuit depicted in
Next, it will be explained about a flow of EXA_SEL performed by the priority circuit with reference to
For example, as illustrated in
Next, when all the three conditions are satisfied (Step S102: YES), the priority circuit selects the queue “9” “+P_EXA_SEL[9]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S103).
On the other hand, when the three conditions are not satisfied (Step S102: NO), the priority circuit does not select the queue “9” “+P_EXA_SEL[9]=0” (Step S104). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_EXA—8” (Step S105). Next, when all the three conditions are satisfied (Step S105: YES), the priority circuit selects the queue “8” “+P_EXA_SEL[8]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S106).
On the other hand, when the three conditions are not satisfied (Step S105: NO), the priority circuit does not select the queue “8” “+P_EXA_SEL[8]=0” (Step S107). Then, the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_EXA—7” (Step S108).
Next, when all the three conditions are satisfied (Step S108: YES), the priority circuit selects the queue “7” “+P_EXA_SEL[7]=1” as an instruction to be dispatched to the arithmetic unit EXA (Step S109). On the other hand, when the three conditions are not satisfied (Step S108: NO), the priority circuit performs the same process on the queues “0 to 6”.
Next, it will be explained about a flow of EXB_SEL[9] performed by the priority circuit with reference to
For example, as illustrated in
Next, when all the three conditions are satisfied (Step S202: YES), the priority circuit determines whether the flag EXA_ONLY is on (Step S203). After that, when the flag EXA_ONLY is not on, in other words, it is not “+P_RSE—9_EXA_ONLY=1” (Step S203: NO), the priority circuit determines whether a dispatch inhibition flag is on (Step S204).
Then, when the dispatch inhibition flag is on, in other words, it is “+INH_EXA_P_TGR=1” (Step S204: YES), the priority circuit selects the queue 9 “+P_EXB_SEL[9]=1” as an instruction to be dispatched to the arithmetic unit EXB (Step S205).
When three conditions are not satisfied (Step S202: NO), when it is “+P_RSE—9_EXA_ONLY=1” (Step S203: YES), or when it is not “+INH_EXA_P_TGR=1” (Step S204: NO), the priority circuit selects not to dispatch an instruction to the arithmetic unit EXB “+P_EXB_SEL[9]=0” (Step S206).
Next, it will be explained about a flow of EXB_SEL[8] performed by the priority circuit with reference to
For example, as illustrated in
Next, when all the three conditions are satisfied (Step S302: YES), the priority circuit determines whether the flag EXA_ONLY is on (Step S303). After that, when the flag EXA_ONLY is not on, in other words, it is not “+P_RSE—8_EXA_ONLY=1” (Step S303: NO), the priority circuit determines whether the dispatch inhibition flag is on (Step S304).
Then, when the dispatch inhibition flag is not on, in other words, it is not “+INH_EXA_P_TGR=1” (Step S304: NO), the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_EXA—9” (Step S305).
Next, when all the three conditions are satisfied, in other words, it is “+P_EXA—9_READY=1” (Step S305: YES), the priority circuit selects the queue 8 “+P_EXB_SEL[8]=1” as an instruction to be dispatched to the arithmetic unit EXB (Step S306).
On the other hand, when the dispatch inhibition flag is on, in other words, it is “+INH_EXA_P_TGR=1” (Step S304: YES), the priority circuit determines whether Condition 1 to Condition 3 are satisfied as the READY condition of “P_EXA—9” (Step S307). Then, when all the three conditions are satisfied, in other words, it is “+P_EXA—9_READY=1” (Step S307: YES), the priority circuit selects not to dispatch the queue 8 to the arithmetic unit EXB “+P_EXB_SEL[8] =0” (Step S309). On the other hand, at Step S307, when three conditions are not satisfied, in other words, it is not “+P_EXA—9_READY=1” (Step S307: NO), the priority circuit selects the queue 8 “+P_EXB_SEL[8]=1” as an instruction to be dispatched to the arithmetic unit EXB (Step S306).
Moreover, when three conditions are not satisfied at Step S302 (Step S302: NO), when it is “+P_RSE—8_EXA_ONLY=1” at Step S303 (Step S303: YES), or when it is not “+P_EXA—9_READY=1” at Step S305 (Step S305: NO), the priority circuit selects not to dispatch the queue 8 to the arithmetic unit EXB “+P_EXB_SEL[8]=0” (Step S308).
In brief, entries after “+P_EXB_SEL[8]” are selected in accordance with the READY conditions of all the preceding entries. It should be noted that the flows to EXA and EXB in
Moreover, as illustrated in
Next, it will be explained about an EXA selection logic unit of the priority circuit of RSE with reference to
For example, as illustrated in
Moreover, for example, the EXA selection logic unit selects “7” in the case of “−P_EXA—9_READY”, “−P_EXA—8_READY”, and “+P_EXA—7_READY”. In this way, the EXA selection logic unit selects “0” if it is “+P_EXA—0_READY” in the case of “−P_EXA—9_READY” to “−P_EXA—1_READY”.
Next, it will be explained about an EXB selection logic unit of the priority circuit of RSE with reference to
For example, as illustrated in
As described above, the instruction decoder inputs EXA_ONLY indicating that an instruction can be executed by only an arithmetic unit A into one reservation station and the priority circuit dispatches an instruction to an arithmetic unit B if the queue of the arithmetic unit B is vacant when it is determined that the EXA_ONLY is on in accordance with the input of the EXA_ONLY performed by the reservation station. Therefore, it is possible to improve the efficiency of arithmetic processing performed by an arithmetic unit without errors.
Moreover, when the oldest instruction is being executed by the arithmetic unit A in multiple cycles, the priority circuit dispatches an instruction to the arithmetic unit B if the dispatch inhibition flag is on and the queue of the arithmetic unit B is vacant. Therefore, it is possible to promote the efficiency of arithmetic processing performed by an arithmetic unit.
It has been explained about the embodiment of the priority circuit disclosed in the present application till now. However, the present invention may be performed by various different configurations in addition to the embodiment described above. Therefore, it will be explained about a different embodiment in terms of (1) an arithmetic processing unit, (2) a floating point arithmetic unit, and (3) the configuration of a priority circuit.
(1) Arithmetic Processing Unit
In the first embodiment, it has been mainly explained about the preferred embodiment of the priority circuit. However, the present invention can be realized by an arithmetic processing unit (processor) that includes the priority circuit. For example, the priority circuit may be applied to the processor illustrated in
(2) Floating Point Arithmetic Unit
Moreover, in the first embodiment, it has been explained about the case where an arithmetic unit is applied to a fixed point arithmetic unit. However, the arithmetic unit may be applied to a floating point arithmetic unit.
(3) Configuration of Priority Circuit
Moreover, processing procedures, control procedures, concrete titles, and information including various types of data and parameters (for example, the concrete title of “dispatch inhibition flag” and the like), which are described in the document and the drawings, can be arbitrarily changed except that they are specially mentioned.
Moreover, each component of each circuit illustrated in the drawings is a functional concept. Therefore, these components are not necessarily constituted physically as illustrated in the drawings. In other words, the specific configuration of dispersion/integration of each circuit is not limited to the illustrated configuration. Therefore, all or a part of each circuit can dispersed or integrated functionally or physically in an optional unit in accordance with various types of loads or operating conditions.
As described above, according to an aspect of the present invention, it is possible to improve the efficiency of arithmetic processing performed by an arithmetic unit without errors.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2009-154374 | Jun 2009 | JP | national |