1. Field of the Invention
The present invention generally relates to processors, and, more particularly, to a parallel processor that executes a plurality of basic instructions in parallel.
2. Description of the Related Art
Generally, in a conventional computer system, a plurality of basic instructions are executed in parallel by pipeline processing, thereby improving its performance. Conventionally, a plurality of basic instructions constitute a fixed-length instruction word, and a very-long instruction word (VLIW) technique is employed as a method for executing a plurality of basic instructions contained in one instruction word in parallel. Also, a super scalar technique may be employed. In accordance with the super scalar technique, basic instructions are executed in parallel depending on the number of basic instructions contained in each instruction word.
The instruction fetch unit 1 fetches an instruction word from the memory 7, and supplies the instruction word to the instruction issue unit 3. The instruction issue unit 3 issues the basic instructions contained in the supplied instruction word to the instruction execution units EU0 to EUn. If the instruction execution units EU0 to EUn are still executing previous basic instructions at this point, the instruction issue unit 3 waits for the end of the execution. When the execution ends, the instruction issue unit 3 supplies the basic instructions to the instruction execution units EU0 to EUn.
The instruction execution units EU0 to EUn execute the basic instructions, and notify the instruction issue unit 3 of the end of the execution. The register unit 5 supplies data to the instruction execution units EU0 to EUn, if necessary, and holds the execution results of the instruction execution units EU0 to EUn. The externally connected memory 7 stores an instruction word string to be executed in the parallel processor 10. The memory 7 also stores necessary data for the execution units EU0 to EUn to execute instructions, and data as the execution results.
In the conventional parallel processing method of executing a plurality of basic instructions by the VLIW technique, each instruction word has a fixed length. Therefore, if the number of basic instructions to be executed in parallel is smaller than a predetermined number, do-nothing instructions are added to comply with the predetermined length. Because of that, in a program having a small number of basic instructions in total, the proportion of do-nothing instructions is large, and the amount of instruction code increases accordingly, resulting in problems such as poor usage efficiency of memory, a decrease of the hit ratio of cache memory, and an increase of the load on the instruction fetch mechanism.
With the super scalar technique, there is also a problem that a large-scale circuit is needed to increase the number of instructions to be executed in parallel.
A general object of the present invention is to provide parallel processors in which the above disadvantages are eliminated.
A more specific object of the present invention is to provide a parallel processor that is capable of performing highly efficient parallel processing.
The above objects of the present invention are achieved by a parallel processor that performs parallel processing of one or more basic instructions contained in each of instruction words delimited by instruction delimiting information, the parallel processor comprising:
a plurality of instruction execution units that perform processes corresponding to the supplied basic instructions in parallel;
an instruction fetch unit that fetches the instruction words one by one in accordance with the instruction delimiting information; and
an instruction issue unit that selectively issues each of the basic instructions supplied from the instruction fetch unit to one of the instruction execution units to execute the basic instruction.
With the parallel processor having the above structure, the instruction fetch unit makes each instruction word length variable, so that the instruction words can be fetched one by one in accordance with the instruction delimiting information. Also, the instruction execution units can efficiently execute the instruction words, because each of the basic instructions is selectively issued to a corresponding one of the instruction execution units.
The above and other objects and features of the present invention will become more apparent from the following description taken in conjunction with the accompanying drawings.
The following is a description of embodiments of the present invention, with reference to the accompanying drawings.
It should be noted that, in the following description, the maximum basic instruction length of one instruction word is 2. However, the parallel processor in accordance with the first embodiment should operate in the same manner in a case where the maximum basic instruction length in one instruction word is 3 or greater.
The FPC 300 is connected to the memory 12 and the instruction execution units EU0 and EU1. The adder 324 is connected to the FPC 300. The instruction buffer 308 is connected to the memory 12, and the cutting unit 316 is connected to the instruction buffer 308. The adder 325 is connected to the cutting unit 316, and the EPC 339 is connected to the adder 325 and the register unit 98. The FPC 300 receives a fetch address contained in an instruction word from the memory 12, and the instruction buffer 308 receives fetch data contained in the instruction word from the memory 12. The FPC 300 further receives a branch destination address corresponding to a branch instruction from the instruction execution units EU0 and EU1.
On the other hand, the instruction issue unit 72 comprises an instruction register 347, selectors 355 and 356, a control unit 370, and an AND gate 378. Here, the instruction register 347 is connected to the cutting unit 316. The selectors 355 and 356 are both connected to the instruction register 347. The selector 355 is connected to the instruction execution unit EU0, while the selector 356 is connected to the instruction execution unit EU1. The control unit 370 is connected to the AND gate 378 and the selectors 355 and 356. The AND gate is connected to the instruction execution units EU0 and EU1. In this structure, the instruction execution units EU0 and EU1 transmit execution complete signals EUc0 and EUc1, respectively, to the AND gate 378.
The above instruction words are stored in the memory 12 in advance. The adder 324 in the instruction fetch unit 46 of the parallel processor 20 increments the address by a fixed length DISP, so that the instruction words can be fetched from the memory 12 in order. When the cutting unit 316 in the instruction fetch unit 46 fetches the instruction word of the upper row of
Based on the instruction word delimiting fields 0 and 1 contained in the instruction words supplied from the cutting unit 316, the instruction issue unit 72 recognizes each basic instruction EI, and issues each basic instruction EI selectively to one of the instruction execution units EU0 and EU1 via the selectors 355 and 356. Accordingly, if a basic instruction EI following an instruction word delimiting field 0 is issued to the instruction execution unit EU0, a basic instruction EI following an instruction word delimiting field 1 is issued to the instruction execution unit EU1. The selectors 355 and 356 are controlled by the control unit 370. When the execution of one instruction word is completed, the corresponding basic instruction EI is supplied to the instruction execution units EU0 and EU1 via the selectors 355 and 356.
Likewise, in a case where the instruction fetch unit 46 fetches and then supplies the instruction word having the basic instruction word length of 1 to the instruction buffer unit 308, the cutting unit 316 cuts the basic instruction EI that follows the instruction word delimiting field 1 from the rest of the instruction word. The instruction register 347 then issues the basic instruction EI to one of the instruction execution units EU0 and EU1.
The instruction word delimiting fields 0 and 1 are both represented by one bit, but any sort of data can be written in those fields as long as they can function to delimit the instruction words. In this example, the two instruction execution units EU0 and EU1 having the same structure are employed, but it is also possible to employ three or more instruction execution units.
As described so far, in the parallel processor 20 of this example, the instruction fetch unit 46 fetches instruction words one by one in accordance with the instruction word delimiting fields 0 and 1, so that the length of each of the instruction words can be made variable. The instruction issue unit 72 then issues a basic instruction EI to a corresponding one of the instruction execution units EU0 and EU1. Accordingly, there is no need to include do-nothing instructions NOP in any instruction word, and basic instructions EI can be efficiently included in each instruction word. By executing the basic instructions EI in the instruction words, the parallel processing performance of the parallel processor can be improved.
The judgment unit 103 compares a destination register number (write register number) defined in a basic instruction EI in execution with a source register number (read register number) defined in a basic instruction EI to be issued to one of the instruction execution units EU0 and EU1. If the destination register number coincides with the source register number, it is confirmed that there is data dependence between the two basic instructions EI. If the destination register number does not coincide with the source register number, it is confirmed that there is no data dependence between the two basic instructions EI, and the operation can proceed.
The judgment unit 103 also determines whether or not the basic instruction EI in execution contains a branch instruction, and whether or not the basic instruction EI has a possibility of starting an irregular process such as a division by 0. If the basic instruction EI in execution contains a branch instruction or has a possibility of an irregular process, there is control dependence between the basic instruction EI in execution and the basic instruction EI to be issued to the instruction execution unit EU0 or EU1. If the basic instruction EI in execution neither contains a branch instruction nor has a possibility of an irregular process, it is confirmed that there is no control dependency between the two basic instructions EI.
Based on the contents of each basic instruction EI, the judgment unit 103 also compares the resource (the instruction execution units EU0 and EU1, for instance) required by the basic instruction EI in execution with the resource required by the basic instruction EI to be issued. If the resource required by the basic instruction EI in execution is the same as the resource required by the basic instruction EI to be issued, there is resource sharing between the two basic instructions EI. If the resources are different, it is confirmed that there is no resource sharing between the two basic instructions EI.
If the basic instruction EI to be issued has neither data dependency nor control dependency, and causes no resource sharing with the basic instruction EI being executed by the instruction execution units EU0 and EU1, the instruction issue unit 73 issues the basic instruction EI to a corresponding one of the instruction execution units EU0 and EU1 before the end of the execution. Here, the instruction issuance by the instruction issue unit 73 and the instruction execution by the instruction execution units EU0 and EU1 are processed by time-sharing parallel processing.
On the other hand, if the basic instruction EI to be issued has data dependency and/or control dependency, and/or causes resource sharing with the basic instruction EI being executed by the instruction execution units EU0 and EU1, the basic instruction EI is issued to a corresponding one of the instruction execution units EU0 and EU1 after the end of the execution.
Although the two instruction execution units EU0 and EU1 having the same structure are employed in this example, it is also possible to employ three or more instruction execution units.
As described so far, the parallel processor 21 of this example can have the same effects as the parallel processor 20 of Example 1, and efficiently and accurately performs the parallel processing of the basic instructions EI. Thus, more reliable operations can be achieved.
The instruction execution unit LU0 is a load store instruction execution unit that executes a load instruction and a store instruction. After the execution of these instructions, the instruction execution unit LU0 notifies the instruction issue unit 74-79 of the end of the execution. The instruction execution units IU0 and IU1 are integer arithmetic instruction execution units that execute integer arithmetic instructions. When the execution of the integer arithmetic instructions is completed, the instruction execution units IU0 and IU1 notify the instruction issue unit 74-79 of the end of the execution.
The instruction execution units FU0 and FU1 are floating-point arithmetic instruction execution units that execute floating-point arithmetic instructions. When the execution of the floating-point arithmetic instructions is completed, the instruction execution units FU0 and FU1 notify the instruction issue unit 74-79 of the end of the execution. The instruction execution unit BU0 is a branch instruction execution unit that executes a branch instruction. When the execution of the branch instruction is completed, the instruction execution unit BU0 notifies the instruction issue unit 74-79 of the end of the execution.
In the following examples, the maximum basic instruction word length contained in one instruction word is 2, but the same effects can be expected in a case where the maximum basic instruction word length is 3 or greater.
More specifically, the parallel processor of the present invention is embodied on a printed board or an LSI circuit. The components are arranged on a two-dimensional surface and connected by wires. At this point, the wires might cross each other. However, a printed board and an LSI circuit have a plurality of wiring layers, so that any two wires that might cross each other can be arranged on two different wiring layers. Logically, it is possible to place wires in any desired arrangement. In view of the operation speed of the circuit, however, the above alternate wiring (arranging wires on different wiring layers) requires longer wires, which will decrease the operation speed. Therefore, it is preferable to have less alternate wiring. Shorter wires will facilitate the issuance of the basic instruction of the instruction issue unit 74, and increase the operation speed.
For simplification of the drawing, only two instruction passages from an instruction register 348 to the two instruction execution units LU0 and IU0 are shown in
The parallel processor 22 of this example operates in the following manner. First, the cutting unit 317 of the instruction fetch unit 48 fetches instruction words one by one. The formats 13 of the instruction words to be supplied to the instruction fetch unit 48 are shown in
An interface 15 for the instruction execution units LU0, IU0, IU1, FU0, FU1 and BU0, includes effective bits V, information II required for executing an integer arithmetic instruction, information FI required for executing a floating-point arithmetic instruction, information LI required for executing a load store instruction, and information BI required for executing a branch instruction. The interface 15 supplies the effective bit V and the information LI from the instruction issue unit 74 to the instruction execution unit LU0, the effective bit V and the information II to the instruction execution units IU0 and IU1, the effective bit V and the information FI to the instruction execution units FU0 and FU1, and the effective bit V and the information BI to the instruction execution unit BU0.
When the effective bit V is 0, no basic instruction is issued, and when the effective bit 1, a basic instruction is issued. Each effective bit V is coupled with the information II, FI, LI, or BI, and is then allocated to each corresponding instruction execution unit.
As shown in
As a result, the instruction execution unit FU0 executes the floating-point arithmetic instruction FI, and the instruction execution unit BU0 executes the branch instruction BI. In this case, no basic instructions are executed by the other instruction execution units LU0, IU0, IU1, and FU1.
The transmission line L1 transmits the first basic instruction contained in each instruction word, and the transmission line L2 transmits the second basic instruction contained in each instruction word. The BI detector BD1 is connected to the transmission line L1, and the BI detector BD2 is connected to the transmission line L2. The buffer 155 is connected to the BI detector BD1, and the AND gate 163 is connected to the BI detectors BD1 and BD2. The selector 209 is connected to the transmission lines L1 and L2, the buffer 155, and the AND gate 163. The OR gate 199 is connected to the buffer 155 and the AND gate 163.
The FI detector FD1 is connected to the transmission line L1, and the FI detector FD2 is connected to the transmission line L2. The buffer 156 is connected to the FI detector FD1, and the AND gate 164 is connected to the FI detectors FD1 and FD2. The two input terminals of the exclusive OR gate 187 are connected to the input node and the output node, respectively, of the buffer 156. The two input terminals of the exclusive OR gate 188 are connected to the output node of the AND gate 164 and the FI detector FD2, respectively. The AND gate 185 is connected to the two exclusive OR gates 187 and 188. The selector 210 is connected to the transmission lines L1 and L2, the buffer 156, and the AND gate 164. The OR gate 200 is connected to the buffer 156 and the AND gate 164.
The II detector ID1 is connected to the transmission line L1, and the II detector ID2 is connected to the transmission line L2. The buffer 157 is connected to the II detector ID1, and the AND gate 165 is connected to the II detectors ID1 and ID2. The two input terminals of the exclusive OR gate 189 are connected to the input node and the output node, respectively, to the buffer 157. The two input terminals of the exclusive OR gate 190 are connected to the output node of the AND gate 165 and the II detector ID2, respectively. The AND gate 186 is connected to the two exclusive OR gates 189 and 190. The selector 211 is connected to the transmission lines L1 and L2, the buffer 157, and the AND gate 165. The OR gate 201 is connected to the buffer 157 and the AND gate 165.
The LI detector LD1 is connected to the transmission line L1, and the LI detector LD2 is connected to the transmission line L2. The buffer 158 is connected to the LI detector LD1, and the AND gate 166 is connected to the LI detectors LD1 and LD2. The selector 212 is connected to the transmission lines L1 and L2, the buffer 158, and the AND gate 166. The OR gate 202 is connected to the buffer 158 and the AND gate 166.
The two BI detectors BD1 and BD2 constitute a BI detector block 147. The two FI detectors FD1 and F2 constitute an FI detector block 149. The two II detectors ID1 and ID2 constitute an II detector block 151. The two LI detectors LD1 and LD2 constitute an LI detector block 153.
In the following, an operation of the conversion unit 115 having the above structure will be described by way of an example case where the instruction word including the basic instructions BI and FI on the uppermost row of the instruction word formats 13 shown in
Next, the second basic instruction FI in the instruction word is transmitted through the transmission line L2. As in the case of the first basic instruction BI, The FI detector FD2 detects the second basic instruction FI and supplies a detection signal of logic 1 to the AND gate 164. The AND gate 164 in turn outputs a logic 1 signal. In accordance with the logic 1 signal supplied from the AND gate 164, the selector 210 selects the second basic instruction FI and outputs the second basic instruction FI as an instruction to be executed by the instruction execution unit FU0. At the same time as the output of the basic instruction FI, the OR gate 200 outputs the effective bit V of logic 1 in accordance with the detection signal supplied from the AND gate 164.
As the second basic instruction FI is detected, the BI detector BD2, the II detector ID2, and the LI detector LD2 output non-detection signals of logic 0. Accordingly, the selectors 209, 211, and 212 do not select the second basic instruction transmitted through the transmission line L2. Since neither first nor second basic instructions to be execution by the instruction executed units LU0, IU0, IU1, and FU1 are detected, the effective bit V of logic 0 is outputted from each of the OR gates 201 and 202, and the AND gates 185 and 186.
In the above described manner, the conversion unit 115 converts the instruction word formats 13 into the instruction word formats 17, as shown in
The conversion unit 115 further includes buffers 159 to 162, AND gates 167 to 184, exclusive OR gates 191 to 198, OR gates 203 to 208, and selectors 213 and 218. The four BI detectors BD1 to BD4 constitute a BI detector block 148. The four FI detectors FD1 to FD4 constitute an FI detector block 150. The four II detectors ID1 to ID4 constitute an ID detector block 152. The four LI detectors LD1 to LD4 constitute an LI detector block 154.
The conversion unit 115 having the above structure operates in the same manner as the conversion unit 115 shown in
Next, the second basic instruction FI is transmitted on the transmission line L2. The FI detector FD2 then detects the second basic instruction FI and supplies a detection signal of logic 1 to the AND gate 170. The AND gate 170 in turn outputs a logic 1 signal. In accordance with the logic 1 signal supplied from the AND gate 170, the selector 214 selects the second basic instruction FI and outputs the second basic instruction FI as an instruction to be executed by the instruction execution unit FU0. At the same time as the output of the second basic instruction FI, the OR gate 204 outputs the effective bit V of logic 1 in accordance with the detection signal supplied from the AND gate 170.
As the second basic instruction FI is detected, the BI detector BD2, the II detector ID2, and the LI detector LD2 each output a non-detection signal of logic 0. Accordingly, the selectors 213, 216, and 218 do not select the second basic instruction FI transmitted through the transmission line L2.
Next, the third basic instruction FI is transmitted through the transmission line L3. The FI detector FD3 then detects the third basic instruction FI and supplies a detection signal of logic level 1 to the AND gate 171. Since the AND gate 171 has already received a detection signal of logic 1 from the FI detector FD2 at this point, the output of the AND gate 171 is a logic 0 signal. Because of that, the exclusive OR gate 193 outputs a logic 1 signal, and the AND gate 174 also outputs a logic 1 signal. In accordance with the logic 1 signal supplied from the AND gate 174, the selector 215 selects the third basic instruction FI and outputs the third basic instruction FI as an instruction to be executed by the instruction execution unit FU1. At the same time as the output of the third basic instruction FI, the OR gate 205 outputs the effective bit V of logic 1 in accordance with the signal supplied from the AND gate 174.
As the third basic instruction FI is detected, the BI detector BD3, the II detector ID3, and the LI detector LD3 each output a non-detection signal of logic 0. Accordingly, the selectors 213, 216, and 218 do not select the third basic instruction FI transmitted through the transmission line 3.
Next, the fourth basic instruction II of the instruction word is transmitted through the transmission line L4. The II detector ID4 then detects the fourth basic instruction II and supplies a detection signal of logic 1 to the AND gate 178. The AND gate 178 in turn outputs a logic 1 signal. In accordance with the logic 1 signal supplied from the AND gate 178, the selector 216 selects the fourth basic instruction II and outputs the fourth basic instruction II as an instruction to be executed by the instruction execution unit IU0. At the same time as the output of the fourth basic instruction II, the OR gate 206 outputs the effective bit V of logic 1 in accordance with the signal supplied from the AND gate 178.
As described above, in the parallel processor of this example, basic instructions contained in each instruction word supplied to the instruction fetch unit 48 are rearranged in accordance with the arrangement of the instruction execution units, so that the instruction issue unit 74 can smoothly issue the basic instructions to the respective instruction execution units. Thus, the entire operation speed can be increased.
In this example, the instruction fetch unit 48 can also fetch an instruction word containing basic instructions that have already been arranged in accordance with the arrangement of the instruction execution units in advance. In such a case, the basic instructions are arranged in advance so that the circuit size required for rearranging the basic instructions in the instruction fetch unit 48 can be reduced.
More specifically, when there are two instructions for the same function, only one of the two instructions is employed. For instance, the instruction word on the uppermost row and the instruction word on the fourth row from the top of the formats 13 in
As described so far, the circuit size of the parallel processor 22 can be reduced by restricting in advance the arrangement of basic instruction contained in each instruction word to be supplied to the instruction fetch unit 48.
With the parallel processor of this example, basic instructions contained in each instruction word supplied from the instruction register 349 are rearranged by the conversion unit 116 in accordance with the arrangement of the instruction execution units. The rearranged basic instructions are then issued to the corresponding instruction execution units. Thus, the wires can be shortened as a whole, and the operation speed can be increased.
Also, the arrangement of basic instruction contained in each instruction word to be supplied to the instruction fetch unit 49 can be restricted in advance in the same manner as in Example 1. Thus, the circuit size of the parallel processor 23 can be reduced.
The first conversion unit 117 performs “preprocessing” of the rearrangement of basic instructions. The second conversion unit 118 performs “postprocessing” of the rearrangement of basic instructions.
In an actual circuit, the processes performed by the instruction fetch unit 50 and the instruction issue unit 76 are pipelined so as to improve the performance of the parallel processor. Because of that, the difference in processing time between instruction fetch unit 50 and the instruction issue unit 76 should be as small as possible to optimize the pipeline effects. Therefore, the arrangement process is divided into the “preprocessing” and “postprocessing”, so that the difference in processing time between the instruction fetch unit 50 and the instruction issue unit 76 is small.
More specifically, the first conversion unit 117 includes circuits that are the counterparts of the BI detector block 147 or 148, the FI detector block 149 or 150, the II detector block 151 or 152, and the LI detector block 153 or 154 shown in
With the parallel processor 24 having the above structure, the wires can be shortened as a whole, and the operation speed can be reduced.
Also, as in Examples 1 and 2, the circuit size of the parallel processor 24 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 50.
For simplification of the drawing, only the instruction passages from an instruction register 351 to the two instruction execution units LU0 and IU0 are shown, and the other instruction passages to the instruction execution units IU1, FU0, FU1, and BU0 are omitted in
The structure and operation of the conversion unit 119 are substantially the same as the structure and operation of the conversion unit 15 shown in
By the parallel processor of this example having the above structure, the same effects as obtained by the parallel processor of Example 2 of the first embodiment and the parallel processor of Example 1 of the second embodiment can be obtained. In the parallel processor of this example, the instruction issue unit 77, which includes the judgment unit 104, enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the parallel processor. Also, the instruction fetch unit 51, which includes the conversion unit 119, facilitates the basic instruction issuance to the instruction execution units by the instruction issue unit 77, thereby increasing the operation speed.
As in the foregoing examples, the circuit size of the parallel processor 25 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 51.
For simplification of the drawing, only the instruction passages from the instruction register 352 to the two instruction execution units LU0 and IU0 are shown, and the instruction passages to the other instruction execution units are omitted in
The structure and operation of the conversion unit 120 are the same as the structure and operation of the conversion unit 115 shown in
The parallel processor of this example having the above structure achieves the same effects as the parallel processor of Example 4. The instruction issue unit 78 including the judgment unit 105 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. Also, the instruction issue unit 78, which further includes the conversion unit 120, facilitates the issuance of basic instructions to the instruction execution units.
Additionally, the circuit size of the parallel processor 26 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 52, as in the foregoing examples.
For simplification of the drawing, only the instruction passages from the instruction register 353 to the two instruction execution units LU0 and IU0 are shown, and the instruction passages to the other instruction execution units IU1, FU0, FU1, and BU0 are omitted in
The structures and operations of a first conversion unit 121 and a second conversion unit 122 are the same as the structures and operations of the first conversion unit 117 and the second conversion unit 118. The structure and operation of the judgment unit 106 are the same as the structure and operation of the judgment unit 103 shown in
The parallel processor 27 of this example having the above structure can achieve both effects of the parallel processor of Example 2 of the first embodiment and the parallel processor of Example 3 of the second embodiment. More specifically, the instruction issue unit 79 including the judgment unit 106 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. Also, the instruction fetch unit 53 including the first conversion unit 121 and the instruction issue unit 79 including the second conversion unit 122 facilitate the issuance of basic instructions from the instruction issue unit 79 to the instruction execution units.
Additionally, the circuit size of the parallel processor 27 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 53, as in the foregoing examples.
As shown in
In the following, the parallel processors in accordance with the third embodiment of the present invention will be described by way of a case where the maximum basic instruction word length contained in one instruction word is 2. It should be understood that the same effects can be obtained in a case where the maximum instruction word length contained in one instruction word is 3 or more.
The parallel processor 28 having the above structure can achieve the same effects as the parallel processor 22 of Example 1 of the second embodiment. In other words, the issuance of basic instructions from the instruction issue unit 80 to the instruction execution units can be facilitated, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 28 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction execution units, as in the foregoing examples.
In the parallel processor 29 of this example, the instruction issue unit 81 issues each basic instruction to the corresponding one of the instruction execution units, only after the conversion unit 124 rearranges the basic instructions, which are contained in each instruction word supplied from the instruction fetch unit 55, in accordance with the arrangement of the instruction execution units. Thus, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 29 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 55, as in the foregoing examples.
The first conversion unit 125 performs “preprocessing” of rearrangement of basic instructions, and the second conversion unit 126 performs “postprocessing” of basic instructions.
In an actual circuit, the processes in the instruction fetch unit 56 and the instruction issue unit 82 are pipelined in order to improve the performance of the parallel processor. Because of that, the difference in processing time between instruction fetch unit 56 and the instruction issue unit 82 should be as small as possible to optimize the pipeline effects. Therefore, the arrangement process is divided into the “preprocessing” and “postprocessing”, so that the difference in processing time between the instruction fetch unit 56 and the instruction issue unit 82 is small.
By the parallel processor of this example having the above structure, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 30 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 56, as in the foregoing examples.
The structure and operation of the conversion unit 127 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor of this example having the above structure, the same effects as the parallel processor of Example 4 of the second embodiment can be obtained. More specifically, the instruction issue unit 83 including the judgment unit 107 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 57 including the conversion unit 127 facilitates the issuance of basic instructions to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 31 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 57, as in the foregoing examples.
The structure and operation of the conversion unit 128 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor of this example having the above structure, the same effects as the parallel processor 26 of Example 5 of the second embodiment. More specifically, the instruction issue unit 84 including the judgment unit 108 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction issue unit 84 further including the conversion unit 128 facilitates the issuance of basic instructions to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 32 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 58, as in the foregoing examples.
The structures and operations of a first conversion unit 129 and a second conversion unit 130 are the same as the structures and operations of the first conversion unit 117 and the second conversion unit 118 shown in
By the parallel processor of this example having the above structure, the same effects as obtained by the parallel processor 27 of Example 6 of the second embodiment can be obtained. More specifically, the instruction issue unit 85 including the judgment unit 109 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 59 including the first conversion unit 129 and the instruction issue unit 85 including the second conversion unit 130 facilitate the issuance of basic instructions from the instruction issue unit 85 to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 33 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 59, as in the foregoing examples.
As shown in
In the following, the parallel processor in accordance with the fourth embodiment of the present invention will be described by way of examples in which the maximum basic instruction word length contained in each one basic instruction is 4. In
By the parallel processor 34 having the above structure, the same effects as obtained by the parallel processor 22 of Example 1 of the second embodiment can also be obtained. More specifically, the issuance of basic instructions from the instruction issue unit 86 to the instruction execution units can be facilitated, and the operation speed can be increased accordingly.
Additionally, the circuit size of the parallel processor 34 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 60, as in the foregoing embodiments.
In the parallel processor 35 of this example, the instruction issue unit 87 rearranges basic instructions contained in each instruction word supplied to the instruction fetch unit 61, in accordance with the arrangement of the instruction execution unit, and then supplies the rearranged basic instructions to the instruction execution units. Thus, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 35 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 61, as in the foregoing examples.
The first conversion unit 133 performs “preprocessing” of the rearrangement of basic instructions, and the second conversion unit 134 performs “postprocessing” of the rearrangement of the basic instructions.
To improve the performance of the parallel processor in an actual circuit, the processes in the instruction fetch unit 62 and the instruction issue unit 88 are pipelined. Because of that, the difference in processing time between instruction fetch unit 62 and the instruction issue unit 88 should be as small as possible to optimize the pipeline effects. Therefore, the arrangement process is divided into the “preprocessing” and “postprocessing”, so that the difference in processing time between the instruction fetch unit 62 and the instruction issue unit 88 is small.
By the parallel processor 36 of this example having the above structure, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 36 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 62, as in the foregoing examples.
The structure and operation of the conversion unit 135 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor 37 of this example having the above structure, the same effects as obtained by the parallel processor 25 of Example 4 of the second embodiment can be obtained. More specifically, the instruction issue unit 89 including the judgment unit 110 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 63 including the conversion unit 135 facilitates the issuance of basic instructions from the instruction issue unit 89 to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 37 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 63, as in the foregoing examples.
The structure and operation of the conversion unit 136 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor of this example having the above structure, the same effects as obtained by the parallel processor 26 of Example 5 of the second embodiment can be obtained. More specifically, the instruction issue unit 90 including the judgment unit 111 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction issue unit 90 further including the conversion unit 136 facilitates the issuance of basic instruction to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 38 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 64, as in the foregoing examples.
The structures and operations of a first conversion unit 137 and a second conversion unit 138 are the same as the structures and operations of the first conversion unit 117 and the second conversion unit 118 shown in
By the parallel processor 39 of this example having the above structure, the same effects as obtained by the parallel processor of Example 6 of the second embodiment can be obtained. More specifically, the instruction issue unit 91 including the judgment unit 112 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 65 including the first conversion unit 137 and the instruction issue unit 91 further including the second conversion unit 138 facilitate the issuance of basic instructions from the instruction issue unit 91 to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 39 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 65, as in the foregoing examples.
As shown in
In the following, the parallel processor in accordance with the fifth embodiment of the present invention will be described by way of examples in which the maximum basic instruction word length contained in each instruction word is 4. In
It should be understood that the maximum basic instruction word length is not limited to 4 in this embodiment.
By the parallel processor 40 having the above structure, the same effects as obtained by the parallel processor 22 of Example 1 of the second embodiment can be obtained. More specifically, the issuance of basic instruction from the instruction issue unit 92 to the instruction execution units can be facilitated, and the operation speed can be increased accordingly.
Additionally, the circuit size of the parallel processor 40 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 66, as in the foregoing embodiments.
In the parallel processor 41 of this example, the instruction issue unit 93 rearranges basic instructions contained in each instruction word supplied from the instruction fetch unit 67, and then supplies the rearranged basic instructions to the instruction execution units. Thus, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 41 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 67, as in the foregoing examples.
The first conversion unit 141 performs “preprocessing” of the rearrangement of basic instructions, and the second conversion unit 142 performs “postprocessing” of the rearrangement of the basic instructions.
In order to improve the performance of the parallel processor in an actual circuit, the processes in the instruction fetch unit 68 and the instruction issue unit 94 are pipelined. Because of that, the difference in processing time between instruction fetch unit 68 and the instruction issue unit 94 should be as small as possible to optimize the pipeline effects. Therefore, the arrangement process is divided into the “preprocessing” and “postprocessing”, so that the difference in processing time between the instruction fetch unit 68 and the instruction issue unit 94 can be small.
By the parallel processor 42 of this example having the above structure, wires can be shortened as a whole, and the operation speed can be increased.
Additionally, the circuit size of the parallel processor 42 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 68, as in the foregoing examples.
The structure and operation of the conversion unit 143 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor 43 of this example having the above structure, the same effects as obtained by the parallel processor 25 of Example 4 of the second embodiment can be obtained. More specifically, the instruction issue unit 95 including the judgment unit 113 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 69 including the conversion unit 143 facilitates the issuance of basic instructions from the instruction issue unit 95 to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 43 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 69, as in the foregoing examples.
The structure and operation of the conversion unit 144 are the same as the structure and operation of the conversion unit 115 shown in
By the parallel processor 44 of this example having the above structure, the same effects as obtained by the parallel processor 26 of Example 5 of the second embodiment can be obtained. More specifically, the instruction issue unit 96 including the judgment unit 114 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction issue unit 96 further including the conversion unit 144 facilitates the issuance of basic instructions to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 44 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 70, as in the foregoing examples.
The structures and operations of the first conversion unit 145 and the second conversion unit 146 are the same as the structures and operations of the first conversion unit 117 and the second conversion unit 118 shown in
By the parallel processor 45 of this example having the above structure, the same effects as obtained by the parallel processor 27 of Example 6 of the second embodiment can be obtained. More specifically, the instruction issue unit 97 including the judgment unit 219 enables accurate and efficient parallel processing of basic instructions, thereby increasing the reliability of the operation. The instruction fetch unit 71 including the first conversion unit 145 and the instruction issue unit 97 including the second conversion unit 146 facilitate the issuance of basic instructions from the instruction issue unit 97 to the instruction execution units, thereby increasing the operation speed.
Additionally, the circuit size of the parallel processor 45 may be reduced by restricting in advance the arrangement of basic instructions contained in each instruction word to be supplied to the instruction fetch unit 71, as in the foregoing examples.
The present invention is not limited to the specifically disclosed embodiments, but variations and modifications may be made without departing from the scope of the present invention.
The present application is based on Japanese priority application No. 11-281957, filed on Oct. 1, 1999, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | Kind |
---|---|---|---|
11-281957 | Oct 1999 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5214763 | Blaner et al. | May 1993 | A |
5497496 | Ando | Mar 1996 | A |
5758114 | Johnson et al. | May 1998 | A |
5761470 | Yoshida | Jun 1998 | A |
5787302 | Hampapuram et al. | Jul 1998 | A |
5787303 | Ishikawa | Jul 1998 | A |
5881307 | Park et al. | Mar 1999 | A |
5930508 | Faraboschi et al. | Jul 1999 | A |
5941980 | Shang et al. | Aug 1999 | A |
6151668 | Pechanek et al. | Nov 2000 | A |
6467036 | Pechanek et al. | Oct 2002 | B1 |
6738892 | Coon et al. | May 2004 | B1 |
Number | Date | Country |
---|---|---|
04-040525 | Feb 1992 | JP |
05-197547 | Aug 1993 | JP |
08-234978 | Sep 1996 | JP |
8-234978 | Sep 1996 | JP |
10-074145 | Mar 1998 | JP |
10-232779 | Sep 1998 | JP |
11-282674 | Oct 1999 | JP |