This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No.2004-193578, filed on Jun. 30, 2004, the entire contents of which are incorporated herein by reference.
1) Field of the Invention
The present invention relates to a reconfiguration-type processor that performs a reconfiguration control over an arithmetic and logic unit (ALU) module.
2) Description of the Related Art
A conventional technology focusing attention on hardware for increasing computer's efficiency and speed is a reconfigurable technology. The reconfigurable technology allows part of hardware to be reconfigurable to flexibly support an application (software program).
Such a hardware-reconfiguring technology using filed programmable gate array (FPGA) is disclosed (see, for example, Japanese National Phase PCT Laid-Open Publication No. 7-503804). Also a technology in which the performance of an application is measured and a module is dynamically reconfigured according to the measurement results (see, for example, Japanese Patent Laid-Open Publication No. 2002-163150) is disclosed.
Furthermore, a method is disclosed in which arrangement information (configuration information) of a reconfigurable portion is previously generated, and with a plurality of read-only-memories (ROMs) having stored therein the configuration information being provided, the configuration information is read according to a process to be performed for reconfiguring a module (see, for example, Japanese Patent Laid-Open Publication No. 5-108347).
When such a reconfigurable technique is applied to a hardware architecture of a cluster structure including configuration information, an arithmetic and logic unit (ALU) (unit performing an arithmetic process such as four arithmetic operations and a logical operation) module of a reconfigurable type has to be equipped in a cluster. In that case, the configuration information is also disposed in the same cluster, and is sequentially read according to the process results of the ALU. The cluster is structured by an ALU block formed of a reconfigurable ALU module, a network, a memory, a counter, etc., and a sequencer (SQE) for controlling configuration definitions of these ALU module, network, memory, and counter.
However, to execute various applications, a highly-flexible ALU module of a reconfigurable type has to be equipped. With an ALU that is highly flexible in view of circuitry being equipped, the circuit area is increased and resource efficiency is decreased. Such an ALU module is a multifunctional ALU having many equipped functions, that is, for example, the one structured by arithmetic gates, such as those for AND, OR, addition and subtraction, an absolute-value operation, a normalizing process, multiplication, and zero decision, and a cumulative-sum operation circuit or the like for performing a cumulative-sum operation on the results of these arithmetic gates.
Also, to improve the process performance of the entire cluster, the internal structure of the sequencer is desired to be able to quickly reconfigure the ALU block in a simplified manner. That is, how the process of the sequencer responsible for controlling the configuration information required for reconfiguration is made efficient has an influence on the process performance of the cluster.
It is an object of the present invention to solve at least the above problems in the conventional technology.
A processor according to one aspect of the present invention executes a predetermined operation process by switching a connection structure between a plurality of arithmetic and logic unit modules. Each of the arithmetic and logic unit modules includes a plurality of arithmetic and logic units. The arithmetic and logic unit modules include a first arithmetic and logic unit module that includes a plurality of arithmetic and logic units that executes various operation processes; and a second arithmetic and logic unit module that includes a plurality of arithmetic and logic units of which executable operation processes are limited compared with the first arithmetic and logic unit module.
A processor according to another aspect of the present invention executes a predetermined arithmetic process by switching a connection structure between a plurality of arithmetic and logic unit modules under a control of a sequencer. Each of the arithmetic and logic unit modules having a plurality of arithmetic and logic units. The sequencer reconfigures the connection structure at an occasion of writing to a memory provided in the arithmetic and logic unit modules.
The other objects, features, and advantages of the present invention are specifically set forth in or will become apparent from the following detailed description of the invention when read in conjunction with the accompanying drawings.
Exemplary embodiments of a process according to the present invention are explained in detail with reference to the accompanying drawings. A cluster is configured by two units, an ALU block and a sequencer unit.
The ALU block 101 includes a plurality of ALU modules 103 structured by various arithmetic elements, a plurality of memories 104 that read data to be processed and store processed data, a plurality of counters 105 that generate an address of each of the memories 104, a single comparator 106 that compares two input signals (condition decision), a bus bridge 107 connected to a reduced instruction set computing (RISC) bus 121, and a network 108. The counter 105 may generate an address to any of the memories 104 according to the arithmetic results of the ALU modules 103. The comparator 106 outputs a decision result (result of comparison) to the sequencer unit 102. Each memory outputs Write Ack to the sequencer unit 102.
The network 108 is supplied with a plurality of signals (Inputs A to n), and the arithmetic results from the ALU modules 103 and others are output as a plurality of signals (Outputs A to n). This network 108 includes each of the ALU modules 103, the comparator 106, registers 109 respectively provided to input units of the signals to the memories 104, and selectors 110.
Then, based on the configuration information output from the sequencer unit 102 according to the arithmetic details and the like, a connection pattern among a combination (selection) of the ALU modules 103, the memories 104, and the comparator 106 is reconfigurable. A change in this connection pattern can be made by the selectors 110 provided to the network 108.
The ALU modules 103 provided in the ALU block 101 includes high-performance ALU modules and simplified ALU modules. For example, of 17 bits of bus used as input data to the ALU modules, 16 bits are data bits and the remaining one bit is a bit indicative of validity or invalidity (hereinafter referred to as a “Token bit”). Here, the network 108 with this bus of 17 bits switches the connections among the ALU modules 103, the comparator 106, and the memories 104.
An AND-OR arithmetic circuit 210 including the ALU_C 202 and the ALU_D 203 is a circuit for cumulative sum of the arithmetic results at the ALU_A 201 and others, and can be applied to an AND-OR operation often used in a media-related process, such as Fourier transformation.
Each arithmetic gate includes an AND gate 301 that performs an AND operation on two pieces of input data (Input_A, _B), an OR gate 302 that performs an OR operation, an ADD/SUB gate 303 that performs addition or subtraction under the control of the config decoder 308, an ABS gate 304 that performs an absolute-value operation, a primary encoder (Pri_Encoder) 305 that performs a normalizing process, a MUL gate 306 that performs multiplication, and a Zero gate 307 that performs zero decision. A selector (SEL) 309 selects any one of outputs from these arithmetic gates 301 through 306 under the control of the config decoder 308. When supplied with only either one of two pieces of data (Input_A, _B), the ALU_A 201 can pass this data.
The ALU_202 shown in
These ALU_A 201, ALU_C 202, and ALU_D 203 each can set whether to perform an operation on input data with or without code based on the configuration information. Other than that, with the configuration information, application of a saturation operation can be also set.
In the simplified ALU module, multifunctional functions included in the high-performance ALU module 200 are simplified to reduce the circuit size.
The simplified ALU module 400 is not provided with the AND-OR arithmetic circuit 210 included in the high-performance ALU module 200 (see
The subtracter 501 in the comparator 106 outputs Carry indicative of under-flow and Zero_flag indicating that the subtraction result is zero. Carry and Zero_flag output from the comparator 106 are equivalent to the decision result (result of comparison, see
The ALUs (201, 202, 203, and 401) provided inside the ALU modules 200 and 400 and the comparator 106 are each added with a token bit indicative of validity or invalidity of the relevant input. While performing an operation on the input data and outputs the operation result, the ALU also has to indicate validity or invalidity of the operation result. Therefore, the ALU generates and adds a token bit. A logic for generating a token bit is any one of the following schemes from (1) to (3).
(1) When both of two inputs have a valid token, a valid token is added to each of their operation results for output.
(2) When either one of two inputs has a valid token, a valid token is added to its operation result for output.
(3) Either of the two inputs in the above (1) or (2) is to be fixedly monitored. Such fixation can be set at the time of designing and kept as it is, or can be changed by configuration setting. Based on the data with the token bit added in the above manner, data writing to the memories 104 is controlled.
Here, as for the token bit, when the data to be process is stored in any one of the memories 104, the counter 105 that generates a read address for that memory 104 adds a token bit for the address information. In the memory 104, only the address with a valid token bit is to be read, and a valid token bit is then added to the read data. Also, in the case of the structure where the data to be processed is passed between the clusters 100, when data is externally supplied to one cluster 100 from another cluster 100, a token bit is added from the other cluster 100 for input.
Each of the ALU modules (the ALU_A 201, the ALU_B 401, the ALU_C 202, the ALU_D 203, and the subtracter 501) described above can change its internal structure and functions based on the configuration information from the sequencer unit 102. With this configuration information, it is possible in each module to perform designation of an operation with code, designation of a saturation operation (designation of a halt in arithmetic process), designation of an arithmetic process in the ALU_A 201, the ALU_B 401, the ALU_C 202, and the ALU_D 203, designation of a subtraction direction (A-B or B-A) for the subtracter 501. It is also possible in each of the selectors 206, 309, and 402 to perform designation of output selection.
In the internal structure of the ALU block 101 according to the first embodiment described by using
Particularly, with the single comparator 106, the decision result at the comparator 106 is reported to the sequencer unit 102, and the time of reporting can be taken as an occasion for switching the configuration. At the time of a loop process (for example, an IF statement in the C language) often used in various applications (computer programs), the sequencer unit 102 reconfigures the connection structure of the ALU modules 103, the memories 104, and the comparator 106 inside the ALU block 101 according to the decision result obtained by using the comparator 106. At this time, the ALU modules 103 can perform an arithmetic operation mostly with the use of the simplified ALU modules 400 and even without the use of ten modules as exemplified above all of which are high-performance ALU modules 200 having a cumulative-sum function. With this, even without using the high-performance ALU modules 200, the ALU connection structure can be changed according to an arithmetic operation required for the relevant application, thereby performing an efficient arithmetic process.
According to the first embodiment, the high-performance ALU modules, the simplified ALU modules, and the comparator are disposed inside the ALU block, and in combination of these, reconfiguration can be achieved. With this, a cluster structure capable of flexibly supporting various applications and improving resource efficiency can be obtained. Also, the ALU modules are configured not only solely by the high-performance ALU modules, but also partially by the simplified ALU modules. Thus, with an arithmetic process being made more efficient, improvement in area efficiency, power saving, and low cost can be achieved. Also, the arithmetic processing speed itself can be improved.
The timing (occasion) of reconfiguring the processor executed by the sequencer unit 102 described in the first embodiment (see
The sequencer unit 102 includes a configuration memory 601 storing a plurality of pieces of configuration (structure of the ALU block 101) information (Configuration #0 through n), a launch register 602 that controls a launch from an external CPU (not shown), a start-address generator 603 that designates a first piece of configuration information (any one of Configuration #1 through n) as the cluster, a configuration controller 604 that determines the next configuration information based on the state and designates the next address (Next Address) subsequent to the relevant configuration information stored in the configuration memory 601, and a bus bridge 605 provided with respect to the CPU.
The configuration memory 601 includes an A port with respect to the bus bridge 605 and a B port with respect to the start-address generator 603 and the configuration controller 604. The start-address generator 603 designates via the B port a start address to be read. From the B port to the ALU block 101 and the configuration controller 604, configuration information for hardware configuration (ALU-block hardware configuration 610, which will be described further below) is output. The configuration controller 604 manages the address read from the configuration memory 601 and, at the time of reconfiguration, designates the next address subsequent to that of the configuration information via the B port of the memory 601.
The start-address generator 603 is supplied with a start address and a launch trigger. The configuration controller 604 is supplied with Write Ack from the relevant memory 104 and the decision result (Compare Result (Carry and Zero_flag)) from the comparator 106. The configuration controller 604 outputs an interrupt (Interrupt) to the CPU.
There are two occasions for reconfiguring the function of the ALU block 101, that is, 1. when a sequential process is completed and the procedure goes to the next process, and 2. the next process is changed according to the decision result obtained through condition decision. In the latter case, reconfiguration is performed according to the decision result (true or false) of condition decision.
The case is described where the occasion is taken as 1. “when a sequential process is completed and the procedure goes to the next process”. The process in the ALU block 101 is supposed to be performed such that the data to be processed is read from the relevant memory 104 and the process result at the ALU block 101 is stored to the memory 104. Based on this supposition, a process is completed upon writing in the memory. At this occasion, the structure of the processor is changed.
The case is described where the occasion is taken as 2. “the next process is changed according to the decision result obtained through condition decision”. In this case, a change is made correspondingly to the decision result of condition decision. This decision is made by the comparator 106 described above. The comparator 106 includes the subtracter 501 that performs a subtracting process on the two input signals A and B (A-B or B-A) (see
Therefore, after the sequencer unit 102 defines an arbitrary configuration, the following two events are controlled as occasions for next configuration. One is 1. when the last processed data at any time of the configuration of the ALU block 101 is written in any memory 104. The other is 2. the occasion is made according to the decision result (Carry and Zero_flag) of condition decision at the comparator 106.
The process of the cluster 100 is performed by the launch register 602. By the external CPU, a start address 602b of the first configuration information (for example, Configuration #0) is designated. The launch register 602 sets a launch bit 602a. At this occasion, the first configuration information stored in the configuration memory 601 is read to the memory 104. The first configuration information is set in the ALU block 101. Furthermore, according to operation code in the configuration information, which will be described below, conditions for the next configuration (reconfiguration of the processor) are defined.
The cluster 100 can be launched through a scheme other than the above. For example, the structure can be such that the start address and the start event occasion are received from the outside of the cluster 100. This start event occasion can be used as the setting of the launch bit 602a of the launch register 602.
The item called operation code (Operation) 601a is composed of two bits for defining the state of transition from the current configuration to the next configuration.
The items called jump addresses (JumpADRS #0, 1) 601b and 601c are jump addresses according to the decision result of condition decision made by the comparator 106. Each of these is to designate an address to be read from the configuration memory 601 subsequently to the current configuration, and is used at the time of reconfiguration based on the decision result. Designation of the jump addresses 601b and 601c is such that either one of the jump addresses, 601b, for example, designates an address corresponding to a result of true from the comparator 106, while the other jump address 601c designates an address corresponding to a result of false from the comparator 106.
The item called Write Address Mask (WAM) 601d is used, when reconfiguration is performed based on a memory write (Write) event from the ALU block 101, for designating a memory 104 inside the ALU block 101 so that a memory write event therefrom is to be monitored.
The item called reconfiguration condition decision information (Next Info) 601e is used, when reconfiguration is performed based on the decision result of condition decision made by the comparator 106 provided to the ALU block 101, for designating an operation according to the decision result.
The item called ALU block hardware configuration 610 includes the item called ALU module 601f that defines the structure of the ALU module 103, the item called selector 601g that defines the connection structure of the selector 110, and the item called definition counter 601h that defines the structure of the counter 105.
Of the configuration information described above, each item other than the ALU block hardware configuration 610 (601a through 601e) is sent to the configuration controller 604 in the sequencer unit 102, and is used as information for determining the next configuration address.
The condition for transition from the current configuration to the next configuration is designated by the operation code 601a contained in the configuration information. The operation set in the operation code 601a is defined as the following (1) to (4).
(1) When the Operation Code=00
A No operation (NOP) process is performed. In this case, without changing the state at the ALU block 101 or waiting for the event occasion, the procedure goes to the address of the next configuration information (Configuration #0 through n) in the relevant configuration memory 601 in the next clock cycle, and then follows the setting details of the newly-read operation code 601a.
(2) When the Operation Code=01
In this case, a sequential process is performed. After the current configuration information is transferred to the ALU block 101 side, the procedure makes a transition to the address of the next configuration memory 601 in the next clock cycle at the occasion of having performed a process of writing in any memory 104 provided in the ALU block 101. Whether to take Write Ack from a plurality of memories 104 as occasions is designated by the configuration information.
(3) When the Operation Code=10
In this case, a complete instruction process is performed. The current configuration information is transferred to the ALU block 101 side and then an interrupt of the process end is reported to the CPU as the occasion of a write process in the relevant memory 104 of the ALU block 101. With this, the process at the cluster 100 side temporarily ends. The memory 104 whose Write Ack is taken as the occasion is designated by the configuration information. This case is used when part of the entire process required for executing the application is performed by using the cluster 100.
(4) When the Operation Code=11
In this case, a condition-branch instructing process is performed. The current configuration information is transferred to the ALU block 101 side, and then the procedure waits for an input of the decision result (Compare result) of condition decision made by the comparator 106 of the ALU block 101. By taking the input of this decision result as the occasion, configuration information corresponding to a different branch destination for each decision result is selected for reconfiguration.
The configuration controller 604 performs centralized control over reconfiguration in the ALU block 101.
The masking unit 701 is set with a mask value indicated by the item 601d of the write address mask (WAM) contained as the item of the configuration information. Of Write Ack input from the memories (taken as memories #0 to #n) provided to the ALU block 101, Write Ack from the memory 104 coinciding with the item 601d of the WAM is accepted for output to the adder (Add) 702.
The item value of the operation code (Operation) 601a contained in the configuration information is output to the adder (Add) 702 and the selector 703. The adder 702 refers to the details of the operation code 601a to increment (add 1 to) the current address for each clock cycle when the value allows addition, that is, “00, 01, 10”, and then outputs the result to the selector 703. When a start address is input from the start address generator 603, this adder 702 starts addition from the start address. Also, when the operation code 601a. indicates “10”, an interrupt (Interrupt) is output to the external CPU.
The selector 703 changes a switch not shown to be connected to the adder 702 when the input operation code 601a indicates “00, 01, 10”. With this, a route looping between the adder 702 and the selector 703 is set. With the address incremented by the adder 702 being taken as Next Address, a read address of the relevant configuration memory 601 is designated. This selector 703 changes the switch not shown to the decision register 704 side when the input operation code 601a indicates “11”. With this, a read address of the relevant configuration memory 601 is designated by taking the address indicated by the decision result of the decision register 704 as Next Address.
The decision register 704 is set with a plurality of entries (Entry 0 through 3) indicated by the Next Info 601e contained in the configuration information. Each of the entries 0 through 3 has a bit for comparison of two bits. Then, when the decision result of condition decision output from the comparator 106 (result of comparison (Carry of one bit and Zero_flag of one bit) is input, setting of the entries set in the decision register 704 for comparison is searched on a table in combination of two bits, and the procedure then jumps to a jump destination of the next address set for each entry. The jump destination of the next address is a jump address (JumpADRS #0 or JumpADRS #1, see
The next address (Next Address) is designated by the configuration controller 604 according to the operation code in the following four manners from (1) to (4).
(1) When the operation code=00
During a period in which the operation code=00 continues, a process of taking a value obtained by adding 1 to the current address (or the start address) as Next Address continues.
(2) When the Operation Code=01
In this case, because of sequential execution, a value obtained by adding 1 to the current address (or the start address) is taken as Next address at the time when a return of Write Ack from the memory 104 designated by the WAM 601d is confirmed.
(3) When the Operation Code=10
In this case, a normal completion interrupt (Interrupt) is reported to the CPU at the time when a return of Write Ack from the memory 104 designated by the WAM 601d is confirmed
(4) When the Operation Code=11
In this case, based on the decision result of condition decision from the comparator 106, the decision register 704 is referred to. Then, a jump address defined as the configuration according to the result of referring to the decision register 704 (either one of JumpADRS #0 and JumpADRS #1, see
Also, for the entries (Entry 0 through 3) indicated by Next Info 601e, it is assumed, for example, that an entry 801 is set to be true where A>B and false in other cases. In this case, Compare Result (Carry and Zero_flag) output from the comparator 106 becomes 0, 0 where A>B, which indicates true (Entry=00, see
Similarly, it is assumed that another entry 802 is set to be true only where A=B and false in other cases. In this case, Compare Result (Carry and Zero_flag) output from the comparator 106 becomes 0, 0 only where A=B, which indicates true (Entry=00, see
As such, the decision register 704 has a function of a look-up table (LUT). When the operation code indicates 11, the decision register 704 is referred to, thereby easily obtaining the next address (Next Address) according to the decision result of the comparator 106.
According to the second embodiment, a transition from the state of the current configuration to the next configuration can be appropriately performed. Particularly, since the switching occasion of the hardware of the ALU block to be reconfigured can be quickly and easily detected, the process performance can be improved. Also, since the hardware structure can be switched according to the decision result of condition decision using a comparator, condition decision does not have to be made by a plurality of ALU modules, thereby improving area efficiency on hardware and achieving space saving and power saving.
According to the present invention, it is possible to achieve a cluster structure flexibly supporting various applications and capable of improving resource efficiency. With this, an effect of providing hardware excellent in area efficiency, power saving, cost, and operation speed can be attained.
Although the invention has been described with respect to a specific embodiment for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art which fairly fall within the basic teaching herein set forth.
Number | Date | Country | Kind |
---|---|---|---|
2004-193578 | Jun 2004 | JP | national |