This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2013-180056 filed on Aug. 30, 2013, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a semiconductor integrated circuit.
A reconfigurable semiconductor integrated circuit such as a field-programmable gate array (FPGA) may implement a desired function by programming circuit configuration information to the semiconductor integrated circuit.
A reconfigurable semiconductor integrated circuit has a configurable logic block (CLB) including a plurality of basic logic elements, a switch block (SB), a connection block (CB), and wires that mutually connect these blocks. The semiconductor integrated circuit having these elements may connect the CLB in a programmable manner and thereby has longer wire lengths than an application-specific integrated circuit (ASIC), resulting in larger wire capacitance. Therefore, the wires in the semiconductor integrated circuit consume more power.
As a power saving technology applied to the ASIC field, a charge recycling technique is available.
When, in a logic circuit, the value of a signal changes from 1 to 0, all charge stored in a wire capacitor is released. If the value of the wire capacitance and a power supply voltage are respectively denoted C and VDD, electric power consumed is represented as CVDD2/2. With a charge recycling technique, when stored charge is released, part of them is stored in another capacitor. The charge in the other capacitor is reused during a next change of the signal from 0 to 1 to reduce power consumption.
A clock resonance technique is available as a type of charge recycling technique. In the clock resonance technique, an inductor is added to a clock wire net; an LC resonant circuit is created according to the inductance of the inductor and the capacitance of a clock wire to produce resonance. Thus, charge is reused between the inductor and the clock wire capacitor, reducing electric power consumed in the clock wire net.
If the above conventional charge recycling technique is applied to a semiconductor integrated circuit, the circuit area becomes large. With a technique, for example, when stored charge is released during a discharge, part of the released charge is stored in another capacitor, so the addition of the other capacitor increases the circuit area accordingly. The clock resonance technique is also problematic in that the addition of the inductor increases the circuit area accordingly.
The following are reference documents:
According to an aspect of the invention, a semiconductor integrated circuit includes: a first wire through which a signal is transmitted; a second wire that is not used for signal transmission; a switch that creates or breaks an electric connection between the first wire and the second wire; and a control circuit that controls the switch according to an potential of the signal, which is transmitted through the first wire, so that part of charge stored in the first wire capacitor moves to a second wire capacitor and is stored in the second wire capacitor and the charge stored in the second wire capacitor are drawn to the first wire capacitor to charge the first wire capacitor.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Embodiments of the present disclosure will be described below with reference to the drawings.
A semiconductor integrated circuit 1, which is reconfigurable as with an FPGA, has a plurality of wires, L1 and L2. In the example in
The semiconductor integrated circuit 1 further has a switch 2 that creates or breaks an electrical connection between the wire L1 and the non-used wire L2 and a charge recycling control circuit 3 that controls the switch 2.
The charge recycling control circuit 3 controls the switch 2 according to the potential of the signal VIN, which is transmitted through the wire L1, so that part of charge stored in a wire capacitor C1 of the wire L1 moves to a wire capacitor C2 of the non-used wire L2 and is stored therein. When charging the wire capacitor C1, the charge recycling control circuit 3 draws stored charge to the wire capacitor C1 to reuse them in charging the wire capacitor C1. The charge recycling control circuit 3 further has a buffer circuit function; the charge recycling control circuit 3 stores the signal VIN and outputs it. The wire capacitors C1 and C2 are each a parasitic capacitor, so they are indicated by the dashed lines in
When, for example, the signal VIN changes from 1 (the potential level is high (H)) to 0 (the potential level is low (L)), the charge recycling control circuit 3 turns on the switch 2 for a certain period. Thus, part of the charge stored in the wire capacitor C1 of the wire L1 moves to the wire capacitor C2 of the non-used the wire L2 and is stored therein as indicated by the arrow a1 in
In a case as well in which the signal VIN changes from 0 to 1, the charge recycling control circuit 3 turns on the switch 2 for a certain period. Thus, part of the charge stored in the wire capacitor C2 of the free the wire L2 is drawn to the wire capacitor C1 of the wire L1 and is stored therein as indicated by the arrow a2 in
An example of the operation of the semiconductor integrated circuit 1 in the first embodiment will be described below.
In
One terminal of the switch SW1 receives the power supply voltage VDD, and the other terminal is connected to one terminal of the switch SW2. The other terminal of the switch SW2 is grounded. One terminal of the wire capacitor C1 and the one terminal of the switch 2 are connected to a node placed between the switch SW1 and the switch SW2. The other terminal of the switch 2 is connected to one terminal of the wire capacitor C2. The other terminals of the wire capacitor C1 and capacitor C2 are grounded.
When the signal VIN changes from 1 to 0, the voltage VOUT of the wire L1 starts to drop, starting from time t1. At this time, the switch 2 is turned on under control of the charge recycling control circuit 3, shifting to the state illustrated in
When the signal VIN changes from 1 to 0 as described above, part of the charge in the wire capacitor C1 is stored in the wire capacitor C2, reducing the amount of charge that are released when the wire capacitor C1 is discharged and thereby reducing power consumption.
When the signal VIN changes from 0 to 1, the semiconductor integrated circuit 1 changes from the state in
Then, the semiconductor integrated circuit 1 enters the state in
The voltage VOUT of the wire L1 starts to rise, starting from time t3. At this time, the switch 2 is turned on under control of the charge recycling control circuit 3, shifting to the state illustrated in
When the signal VIN changes from 0 to 1 as described above, part of the charge in the wire capacitor C2 moves to the wire capacitor C1, reducing the amount of charge when the wire capacitor C1 is recharged and thereby reducing power consumption.
Equations (1) and (2) below hold due to relationships between voltages and charge in a state before the input changes and a state of charge movement between the wire capacitors C1 and C2 (a state in which charge is stored when the signal VIN changes from 1 to 0 or a state in which charge is reused when the signal VIN changes from 0 to 1).
C
L
V
DD
+C
R
V
INT
LOW
=C
L
V
INT
HIGH
+C
R
V
INT
HIGH (1)
C
R
V
INT
HIGH
=C
L
V
INT
LOW
+C
R
V
INT
LOW (2)
In equations (1) and (2), CL indicates the capacitance of the wire capacitor C1 and CR indicates the capacitance of the wire capacitor C2.
From equations (1) and (2), the voltage VINT
Energy E consumed when charge are not reused is represented as in equation (5) below. Energy E consumed when charge is reused and the signal VIN changes from 1 to 0 is represented as in equation (6) below. Energy E consumed when charge is reused and the signal VIN changes from 0 to 1 is represented as in equation (7) below.
Energy consumed when charge is reused
Energy consumed when charge is reused and the signal VIN changes from 1 to 0
Energy consumed when charge is reused and the signal VIN changes from 0 to 1
Since VDD, VINT
Since, as illustrated in
The semiconductor integrated circuit 10 includes CLBs 11-1, 11-2, 11-3, and 11-4, an SB 12, and CBs 13-1, 13-2, 13-3, and 13-4.
The CLBs 11-1, 11-2, 11-3, and 11-4 are each a logic circuit block having a plurality of basic logic elements. The SB 12 switches wires connected vertically and horizontally. The CBs 13-1 to 13-4 connect the CLBs 11-1 to 11-4 to the vertical or horizontal wires.
Example of Basic Logic Elements in the CLBs 11-1 to 11-4
The LUT 21 is implemented by, for example, a static random access memory (SRAM). The LUT 21 accepts a four-bit address from its four input terminals IN [0], IN [1], IN [2], and IN [3] and outputs one-bit data stored therein according to the accepted address.
The DFF 22 receives a clock, which is input from a clock terminal CLK, at its clock terminal CK, fetches an output from the LUT 21 at a time synchronized with the clock, and outputs the fetched output from a terminal Q.
The MUX 23 selects one of outputs from the LUT 21 and DFF 22 and outputs the selected output from an output terminal OUT.
Example of the SB 12
Each of the MUXes 30-1 to 30-4 selects any one of four input signals IN1, IN2, IN3, and IN4 and outputs the selected input signal as a signal VIN1, VIN2, VIN3 or VIN4. Control signals to the MUXes 30-1 to 30-4 are supplied from, for example, a configuration SRAM (not illustrated).
The charge recycling buffer circuits 31-1 to 31-4 have part of the functions of the charge recycling control circuit 3, illustrated in
The charge recycling circuits 32-1 to 32-4 have the function of the switch 2 in
In an example of the SB 12 in
Although, in the above example, adjacent wires are used to store charge, this is not a limitation. If, for example, the charge recycling circuit 32-1 is connected between the wires L10 and L12, the wire L12 is used to store charge of the wire L10. It is also possible to reuse charge among the wires of a plurality of dies by using a three-dimensional mounting technology.
An example of the charge recycling buffer circuits 31-1 to 31-4 of the SB 12 will be described below.
Example of the Charge Recycling Buffer Circuits 31-1 to 31-4
The charge recycling buffer circuit 31-1 has an inverter circuit 40, a delay circuit 41, and transistors M1, M2, M3, M4, M5, and M6.
The inverter circuit 40 receives the signal VIN1 and outputs a signal /VIN1, which is created by inverting the logic level of the signal VIN1. The delay circuit 41 receives the signal VIN1 and outputs a signal VIN
Of the transistors M1 to M6, the transistors M1, M4, and M6 are n-channel metal-oxide semiconductor field effect transistors (MOSFETs) and the transistors M2, M3, and M5 are p-channel MOSFETs.
The transistors M5 and M6 function as an inverter circuit. The transistors M1 to M4 function as a gating circuit that controls the inverter circuit.
The gates of the transistors M1 and M2 receive the signal VIN
In the transistors M5 and M6, which function as an inverter circuit, the power supply voltage VDD is applied to the source of the transistor M5 and its drain is connected to the drain of the transistor M6. The source of the transistor M6 is grounded. The potential of a node placed between the drain of the transistor M5 and the drain of the transistor M6 is output as a voltage VOUT1.
In a state in which charge is stored (referred to blow as the charge recovery state) and a state in which charge is reused (referred to below as the charge recycling state), the gating circuit formed with the transistors M1 to M4 controls the wire L10 in combination with the delay circuit 41 so that the wire L10 is disconnected from the power supply and GND by turning off the transistors M5 and M6 while charge move.
Next, an example of the delay circuit 41 of the charge recycling buffer circuit 31-1 will be described.
Example of the Delay Circuit 41
The delay circuit 41 has transistors MD1, MD2, MD3, MD4, MD5, MD6, MD7, and MD8. Of the transistors MD1 to MD8, the transistors MD1, MD2, MD3, and MD7 are each a p-channel MOSFET and the transistors MD4, MD5, MD6, and MD8 each are an n-channel MOSFET.
A bias voltage VPB is applied to the gate of the transistor MD1. A signal CR is input to the gate of the transistor MD2. The power supply voltage VDD is applied to the sources of the transistors MD1 and MD2. The drains of the transistors MD1 and MD2 are connected to the source of the transistor MD3. The drain of the transistor MD3 is connected to the drain of the transistor MD4 and to the gates of the transistors MD7 and MD8. The signal VIN1 is input to the gates of the transistors MD3 and MD4. The source of the transistor MD4 is connected to the drains of the transistors MD5 and MD6. A bias voltage VNB is applied to the gate of the transistor MD5. A signal /CR, which is created by inverting the logic level of the signal CR, is input to the gate of the transistor MD2. The sources of the transistors MD5 and MD6 are grounded.
The transistors MD7 and MD8 function as an inverter circuit; the power supply voltage VDD is applied to the source of the transistor MD7 and its drain is connected to the drain of the transistor MD8. The source of the transistor MD8 is grounded. The potential of a node placed between the drain of the transistor MD7 and the drain of the transistor MD8 is output as the signal VIN
This delay circuit 41 may adjust a delay by adjusting the value of a clamp current, so the delay circuit 41 may adjust a delay time so as to make a match with the time described above during which charge move. The delay time is preferably a time from when the falling edge of the signal VIN1 starts to change until the signal VOUT1 reaches the voltage VINT
In the delay circuit 41, whether to introduce a delay may be set by using the signal CR. When the value of the signal CR used to enable the reuse of charge is set to 1, the transistors MD2 and MD6 are turned off. At this time, the transistors MD1 and MD5 function as a clamp current source. The bias voltage VPB applied to the gate of the transistor MD1 limits the current flowing in the transistor MD1 and the bias voltage VNB applied to the gate of the transistor MD5 limits the current flowing in the transistor MD5. Thus, a delay is generated.
While the signal CR is 0, the transistors MD2 and MD6 are turned on, suppressing a delay from being generated by the clamp current source, which would otherwise be implemented by the transistors MD1 and MD5. As a result, an operation to reuse charge is not performed.
As described above, the delay circuit 41 enables or disables the delay function depending on the value of the setting signal (signal CR), so the delay circuit 41 may selectively set whether to enable or disable the reuse of charge.
If, for example, a signal with a high operation ratio is transmitted through a wire, the wire consumes much power, so the reuse of charge is enabled to reduce the power consumption of the wire. If a signal that does not meet a timing restriction if a long delay is caused is transmitted through a wire, the reuse of charge is disabled.
The signal CR is output from, for example, a configuration SRAM cell (not illustrated). The configuration SRAM cell is set according to, for example, a setting that enables or disables the reuse of charge (see
The bias voltages VPB and VNB, which are obtained, for example, outside the semiconductor integrated circuit 10 in consideration of the above delay time, may be respectively applied to the gates of the transistors MD1 and MD5 through special power supply pins. Alternatively, the bias voltages VPB and VNB may be generated from a power supply included in the semiconductor integrated circuit 10.
Example of the Charge Recycling Circuit 32-1
In the charge recovery state and charge recycling state, the charge recycling circuit 32-1 a function used a switch that mutually connect the wires L10 and L11.
The charge recycling circuit 32-1 has transistors M7, M8, M9, M10, M11, and M12. Of the transistors M7 to M12, the transistors M7, M9, and M11 are p-channel MOSFETs and the transistors M8, M10, and M12 are n-channel MOSFETs.
The gates of the transistors M7 and M10 receive the signal VIN
The transistors M11 and M12 each function as a pass transistor. A capacitor CR1 is connected to the drain of the transistor M11 and the source of the transistor M12. The capacitor CR1 is a wire capacitor of the wire L11. The signal VOUT1, which is transmitted through the wire L10, is input to the source of the transistor M11 and the drain of the transistor M12. These pass transistors M11 and M12 function as the switch 2 in
An example of the operation of the semiconductor integrated circuit 10 in the second embodiment will be described below.
Example of the Operation of the Semiconductor Integrated Circuit 10
The operations of the charge recycling buffer circuit 31-1 and charge recycling circuit 32-1 will be mainly described below, assuming that a signal is transmitted through the wire L10 in
In
In the charge recycling buffer circuit 31-1 in
In the charge recycling circuit 32-1, when the signal VIN1 and VIN
When the signal VIN1 goes low at time t11, the signal VIN
Thus, the wire L10 is disconnected from the power supply and GND.
In the charge recycling circuit 32-1, the signal VIN
When the signal VIN
When the signal VIN
Then, when the signal VIN1 goes high at time t14, since the signal VIN
Thus, the wire L10 is disconnected from the power supply and GND.
In the charge recycling circuit 32-1, the signal VIN
When the signal VIN
When the signal VIN
When the signal VIN1 changes from high to low (from 1 to 0) as illustrated in
When the signal VIN1 changes from low to high (from 0 to 1), part of the charge in the wire capacitor of the wire L11 is stored in the wire capacitor of the wire L10 during a period from time t14 to time t15. Therefore, a voltage change during charging of the wire L10 causes a change from the voltage VINT
If, for example, the wires L10 and L11 have the same length, CL and CR in equations (3) and (4) above become equal.
Therefore, equations (3) and (4) may be respectively rewritten as equations (8) and (9).
Power consumption may be represented as in equations (6) and (7), so it becomes 2CLVDD2/9 both when the signal VIN1 changes from 1 to 0 and when the signal VIN1 changes from 0 to 1. If charge is not reused (the voltage during discharging or charging changes from 0 V to the power supply voltage VDD), power consumption is represented as CLVDD2/2, so the effect of reducing electric power due to the reuse of charge is about 4/9 (56%).
In the semiconductor integrated circuit 10 in the second embodiment, the wire capacitor of a free wire (the wire L11 in the above example) is used to reuse charge, so power consumption may be suppressed with a circuit having a small area.
Although a case in which a wire used to store charge remains unchanged as illustrated in
Example of Another SB
An SB 12a includes a selecting unit 35-1 that selects a wire that uses a charge recycling circuit 32a and also has a selecting unit 35-2 that selects a wire that uses a charge recycling circuit 32b. The SB 12a may use the charge recycling circuits 32a and 32b in any combination of wires. The charge recycling circuits 32a and 32b are similar to the charge recycling circuit 32-1 in
The selecting unit 35-1 includes switches (SWs) 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, and 65.
The switches 50 to 61 each select a signal to be input to the charge recycling circuit 32a from the signals VIN1 to VIN4, VIN
The selecting unit 35-2 includes witches 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, and 85.
The switches 70 to 81 each select a signal to be input to the charge recycling circuit 32b from the signals VIN1 to VIN4, VIN
The switches 50 to 65 and 70 to 85 are controlled so as to be turned on and off according to a setting stored in, for example, a configuration SRAM (not illustrated).
In
In this placement, when the signal VIN1 changes from 1 to 0, part of the charge in the wire capacitor of the wire L10 is stored in the wire capacitor of the wire L11 through the switch 51, charge recycling circuit 32a, and switch 63. At this time, the signal VOUT1 drops to VINT
When the signal VIN1 changes from 0 to 1, part of the charge in the wire capacitor of the wire L11 is stored in the wire capacitor of the wire L10 through the switch 63, charge recycling circuit 32a, and switch 51. At this time, the signal VOUT1 is raised to VINT
When the signal VIN3 changes from 1 to 0, part of charge in the wire capacitor of the wire L12 is stored in the wire capacitor of the wire L13 through the switch 77, charge recycling circuit 32b, and switch 85. At this time, the signal VOUT3 drops to VINT
When the signal VIN3 changes from 0 to 1, part of the charge in the wire capacitor of the wire L13 is stored in the wire capacitor of the wire L10 through the switch 85, charge recycling circuit 32b, and switch 77. At this time, the signal VOUT3 is raised to VINT
In
In this placement, when the signal VIN2 changes from 1 to 0, part of the charge in the wire capacitor of the wire L11 is stored in the wire capacitors of the wires L10, L12, and L13 through the switch 54, charge recycling circuit 32a, and switches 62, 64, and 65.
At this time, the signal VOUT2 drops to VINT
When the signal VIN2 changes from 0 to 1, part of the charge in the wire capacitors of the wires L10, L12, and L13 is stored in the wire capacitor of the wire L11 through the switches 62, 64, and 65, charge recycling circuit 32a, and switch 54.
At this time, the signal VOUT2 drops to VINT
When a plurality of free wires are used to store charge present in a single wire, the capacitance of the wire capacitors used to store charge becomes larger than when a single free wire is used. That is, the capacitance CR in equations (3) and (4) above becomes larger than when a single free wire is used. If, for example, the capacitance CR is assumed to be infinite in consideration of a case in which the ratio of free wires to a wire through which a signal is transmitted is very large, then the voltage VINT
That is, it is found that the larger the capacitance of the wire capacitor of a free wire is, the closer to VDD/2 the voltages VINT
When the voltage VINT
When the voltage VINT
If the capacitance CR is infinite, in which case energy consumption is represented as in equations (6) and (7), energy consumption becomes CLVDD2/8 both when the signal VIN1 changes from 1 to 0 and when the signal VIN1 changes from 0 to 1. If charge is not reused (the voltage during discharging or charging changes from 0 V to the power supply voltage VDD), power consumption is represented as CLVDD2/2, so the effect of reducing electric power due to the reuse of charge is about 1/4 (75%).
In
Since the selecting units 35-1 and 35-2 are provided as illustrated in
Next, a method of designing the semiconductor integrated circuit 10, which enables the reuse of charge as described above, will be described.
Method of designing the semiconductor integrated circuit 10
First, register transfer level (RTL) design data D1 is converted to a net list through logic synthesis (step S10). The RTL design data D1 is written in Verilog-Hardware Description Language (HDL), Very High-speed Integrated Circuit HDL (VHDL), or another hardware description language. In processing in step S10, high-level synthesis may be performed for a description in SystemC, C, or C++.
Technology mapping processing and clustering processing are performed next (steps S11 and S12). In technology mapping, combinational circuits are mapped to the LUT in a basic logic element (see
Then, placement processing is performed (step S13). In placement processing, the position of the CLB on the FPGA is determined.
Upon completion of placement processing, routing processing with the reuse of charge taken into consideration is performed (step S14) and charge recycling mode selection processing is performed in which a wire for which to enable the reuse of charge is determined in consideration of a timing restriction (step S15). If the reuse of charge is enabled, a longer delay is generated than when the reuse of charge is disabled. If the reuse of charge is enabled for as many wires as possible within a range in which the timing restriction is met, therefore, much more power consumption may be reduced.
Then, bit stream generation processing is performed (step S16). In bit stream generation processing, bit stream (binary data) D2 used to configure the FPGA is generated.
The above design method is executed in, for example, a design apparatus as described below.
Example of a Design Apparatus
A design apparatus 100 is, for example, a computer as illustrated in
The RAM 102 is used as a main storage unit of the design apparatus 100. The RAM 102 temporarily stores at least part of an operating system (OS) and application programs, the OS and application programs being executed by the processor 101. The RAM 102 also store various types of data used in processing executed by the processor 101.
Peripherals connected to the bus 109 include a hard disk drive (HDD) 103, a graphic processing unit 104, an input interface 105, an optic drive unit 106, a unit connection interface 107, and a network interface 108.
The HDD 103 magnetically writes and reads data to and from a built-in disk. The HDD 103 is used as an auxiliary storage unit of the design apparatus 100. The HDD 103 stores the OS, application programs, and various types of data. A flash memory or another semiconductor storage unit may also be used as the auxiliary storage unit.
A monitor 104a is connected to the graphic processing unit 104. The graphic processing unit 104 displays an image on the screen of the monitor 104a in response to a command from the processor 101. Examples of the monitor 104a include a display unit that uses a cathode ray tube (CRT) and a liquid crystal display unit.
A keyboard 105a and a mouse 105b are connected to the input interface 105. The input interface 105 receives signals from the keyboard 105a and mouse 105b and transmits the received signals to the processor 101. The mouse 105b is only an example of a pointing device; another pointing device may also be used. Other pointing devices include a touch panel, a tablet, a touch pad, and a trackball.
The optic drive unit 106 uses, for example, laser light to read data recorded on an optic disk 106a. The optic disk 106a is a portable recording medium on which data has been recorded in such a way that the data may be read through light reflection. Examples of the optic disk 106a include a digital versatile disc (DVD), a DVD-RAM, a compact disc read-only memory (CD-ROM), and a CD recordable/rewritable (CD-R/RW).
The unit connection interface 107 is a communication interface through which peripheral units are connected to the design apparatus 100. A memory unit 107a and a memory reader/writer 107b, for example, may be connected to the unit connection interface 107. The memory unit 107a has a recording medium on which a function to communicate with the unit connection interface 107 is mounted. The memory reader/writer 107b is a unit that writes and reads data to and from a memory card 107c. The memory card 107c is a card-type recording medium.
The network interface 108 is connected to a network 108a. The network interface 108 transmits and receives data to and from another computer or a communication unit through the network 108a.
With the hardware structure described above, the design method illustrated in
The design apparatus 100 implements the above design method by executing a program recorded on, for example, a computer-readable recording medium. The program in which processing to be executed by the design apparatus 100 is coded may be recorded in advance in any of various recoding media. For example, the program to be executed by the design apparatus 100 may be stored in advance in the HDD 103. To have the design apparatus 100 execute the program in the HDD 103, the processor 101 loads at least part of the program from the HDD 103 into the RAM 102. It is also possible to record the program to be executed by the design apparatus 100 in advance in the optic disk 106a, the memory unit 107a, the memory card 107c, or another portable recording medium. The program stored in the portable recording medium is installed in the HDD 103 under control of, for example, the processor 101, making the program executable. It is also possible for the processor 101 to read the program directly from the portable recording medium and execute the program.
An example of routing processing in step S14 in
Example of Routing Processing
The functional block in
The routing processing unit 110 and storage unit 111 are implemented by, for example, the processor 101, RAM 102, and HDD 103 in the design apparatus 100 in
During routing processing, the routing processing unit 110 represents the internal wire structure of the FPGA as a directed graph.
In the wire structure in
In
In the directed graph in
In this directed graph, a delay in each of the wires w1 to w12 is represented as a weight of a side that mutually connects vertexes. When a path between a source and a sink that minimizes a cost obtained by a cost function described later is obtained, an optimum assignment to signal wires may be determined.
As a preparation, the routing processing unit 110 allows a plurality of signals to be assigned to a single wire and performs wire processing so as to minimize a delay.
First, the routing processing unit 110 determines whether there is a wire shared by a plurality of signals (step S20). If the routing processing unit 110 determines that there is a wire shared by a plurality of signals, the routing processing unit 110 performs processing in S21 and later. If the routing processing unit 110 determines that there is no wire shared by a plurality of signals, the routing processing unit 110 terminates the wire processing.
In processing in step S21, the routing processing unit 110 initializes a wire tree RTi, which holds a path through which a signal i is transmitted. Upon completion of the initialization of the wire tree RTi, the routing processing unit 110 adds the source si of the path through which the signal i is transmitted to the wire tree RTi (step S22).
The routing processing unit 110 then uses a breadth first search method to search for a sink by a procedure described below.
The routing processing unit 110 initializes a prioritized queue PQ and adds a leaf node (sink) of the wire tree RTi to the prioritized queue PQ, assuming that the cost is 0 (step S23). The routing processing unit 110 then retrieves a node m that minimizes the cost from the prioritized queue PQ (step S24) and determines whether the node m is a sink tij at which the signal i has not arrived (step S25).
If the routing processing unit 110 determines that the node m is not the sink tij at which the signal i has not arrived, the routing processing unit 110 calculates a cost about a node n adjacent to the node m and adds the calculated cost to the prioritized queue PQ (step S26). An example of processing in S26 will be described later. Upon completion of processing in step S26, the routing processing unit 110 repeatedly executes processing in step S24 and later.
If the routing processing unit 110 determines that the node m is the sink tij at which the signal i has not arrived, the routing processing unit 110 selects one node n included in the path in a reverse order, that is, from the sink tij to the source si, and updates the cost (step S27), after which the routing processing unit 110 adds the selected node n to the wire tree RTi (step S28).
The routing processing unit 110 determines whether processing has been completed for all nodes n in the path from the sink tij to the source si (step S29). If the routing processing unit 110 determines that processing has not been completed for all nodes n, the routing processing unit 110 repeatedly executes processing in step S27 and later.
If the routing processing unit 110 determines that processing has been completed for all nodes n in the path from the sink tij to the source si, this indicates that a path from a certain point si to a certain sink tij has been obtained. The routing processing unit 110 then determines whether all sinks tij of the signal i have been searched for (step S30). If there are a plurality of sinks tij and at least one of the plurality of sinks tij has not been searched for, the routing processing unit 110 repeatedly executes processing in step S23 and later.
If the routing processing unit 110 then determines that all sinks tij have been searched for, the routing processing unit 110 determines whether all signals have been assigned to wires (step S31). Since the wire tree RTi only temporarily stores the path for the signal i, the wire tree RTi is initialized each time a wire has been assigned to one signal. If the routing processing unit 110 determines that all signals have not been assigned to wires, therefore, the routing processing unit 110 repeatedly executes processing in step S21 and later.
If the routing processing unit 110 determines that all signals have been assigned to wires, the routing processing unit 110 makes a determination in step S20 again. If there is a wire shared by a plurality of signals, the routing processing unit 110 executes routing processing again for all signals. The coefficient of the cost function changes according to the result of the previous routing processing, so in next routing processing, a signal passes through a path different from the previous path. When assignment of all signals to wires is repeatedly executed, finally there is no more wire shared by a plurality of signals. This completes routing processing.
Next, an example of processing in step S26 will be described for a case in which a wire used to store charge is not changed (see
When a Wire Used to Store Charge is Fixed:
The routing processing unit 110 selects one node n adjacent to the node m that is a candidate to which the signal i is to be assigned (step S40), after which the routing processing unit 110 determines whether the wire used to store charge of the node n has been already used as a candidate to which another signal j is to be assigned (step S41).
If the wire used to store charge of the node n has not been used, the routing processing unit 110 uses cost a function A to calculate a cost represented by Costn+Pim and adds the calculated cost to the prioritized queue PQ (step S42). Costn is the cost of the node n and Pim is a cost taken from the source si to the node m.
In processing in step S42, the routing processing unit 110 uses the Pathfinder method (see Document 4) to calculate Costn as the cost function A according to equation (12) below.
Cost(n)=(1−Crit(i))·cong_cost(n)+Crit(i)·delay_cost (12)
In equation (12), Crit (i), which is represented as in equation (13) below, represents the slack ratio of the signal i.
In equation (13), delayi indicates a delay of the signal i and delaymax indicates the maximum delay among all signals. Therefore, Crit(i) is greater than 0 but is at most 1. The closer to the maximum delay the delay of the signal i is (the smaller a margin in timing is), the closer to 1 Crit(i) is; the further away from the maximum delay the delay of the signal i is (the larger a margin in timing is), the closer to 0 Crit(i) is.
In equation (12), cong_cost(n) indicates the degree of congestion at the node n; the more signals the node n is assigned as a candidate, the larger the value of cong_cost(n) is. delay_cost(n) indicates the delay of a wire n.
As described above, in processing in step S24 in
The larger a margin in timing is (the closer to 0 Crit(i) is), the more dominant cong_cost(n) is and the lower the degree of congestion at the node n is, the less the cost is. That is, as the degree of congestion at the node n becomes lower, the node n is more likely to be selected in processing in step S24.
If the wire used to store charge of the node n has been already used, the routing processing unit 110 uses a cost function B to calculate a cost represented by Costn+Pim and adds the calculated cost to the prioritized queue PQ (step S43).
In processing in step S43, the routing processing unit 110 calculates Costn as the cost function B according to equation (14) below.
Cost(n)=(1−Crit(i))·[cong_cost(n)+(1−αi)·res_cost(n)+αi·PF]+Crit(i)·delay_cost(n) (14)
In equation (14), αi is the operation ratio of the signal i, PF is an adjustment constant, and res_cost(n) is represented as in equation (15) below.
res_cost(n)=(1−Crit(j))·αj (15)
In equation (15), j indicates the index of a signal that has been assigned to a wire used to store charge of the node n, which is a candidate to which the signal i is to be assigned.
When the timing of the signal i is critical as in equation (14) (Crit(i) is close to 1), delay_cost(n) becomes dominant in terms of cost. Therefore, the longer the delay of the node n is, the more the cost is, so the node n is less likely to be selected as a candidate of a path for the signal i in processing in step S24. Due to the reason described above, the wire used to reuse charge causes a large delay. If the original delay (wire delay) of the node n is long, therefore, this processing makes the node n hard to select as a candidate of a path for the signal i. This suppresses the reuse of charge from being executed for the node n having a long delay, so it is possible to suppress a margin in timing from being more reduced.
By contrast, if the delay of the node n is short, the node n is likely to be selected as a candidate of the path for the signal i in processing in step S24. If charge is reused for the node n, therefore, power consumption may be reduced.
If there is a margin in the timing of the signal i (Crit(i) is close to 0), the larger the operation ratio αi of the signal i is, the more dominant the values of cong_cost(n) and αi·PF are. If cong_cost(n) is unchanged and the value of αi·PF is dominant, this indicates that PF has been appropriately adjusted so that the cost is reduced. In processing in step S24, therefore, the node n is likely to be selected as a candidate of the path for the signal i. That is, the node n is likely to be selected as a candidate of the path for the signal i and charge is reused for the node n, so power consumption may be reduced.
If there is a margin in the timing of the signal i and the operation ratio of the signal i is low, the values of cong_cost(n) and res_cost(n) are dominant. If cong_cost(n) is unchanged and there is a margin in the timing of the signal j (Crit(j) is close to 0), when the operation ratio of the signal j already assigned to a wire used to store charge is high, the cost becomes high.
In processing in step S24, therefore, the node n is less likely to be selected as a candidate of the path for the signal i. Then, a wire, at the node n, used to store charge is likely to be selected as part of a path through which the signal j is transmitted, enabling the reuse of charge to be easily performed for a wire through which the signal j is transmitted. When the reuse of charge is performed, as described above, for a wire through which the signal j with a high operation ratio is transmitted, the effect of reducing power consumption is increased.
Upon completion of processing in steps S42 and S43, the routing processing unit 110 determines whether processing has been completed for all nodes n adjacent to the node m (step S44).
If the routing processing unit 110 determines that processing has not been completed for all nodes n adjacent to the node m, the routing processing unit 110 repeatedly executes processing in step S40 and later. If the routing processing unit 110 determines that processing has been completed for all nodes n adjacent to the node m, the routing processing unit 110 terminates cost calculation concerning the node n adjacent to the node m and addition processing to add the calculated cost to the prioritized queue PQ.
When a Wire Used to Store Charge May be Arbitrarily Changed:
The routing processing unit 110 selects one node n adjacent to the node m that is a candidate to which the signal i is to be assigned (step S50), after which the routing processing unit 110 selects one non-processed wire from wires used to store charge of the node n (step S51).
The routing processing unit 110 then determines whether the wire used to store charge of the node n has been already used as a candidate to which another signal j is to be assigned or as a candidate of a wire used to store charge used by a wire through which the other signal j is transmitted (step S52).
If the wire used to store charge of the node n has not been used, the routing processing unit 110 uses the cost function A represented as in equations (12) and (13) above to calculate the cost represented by Costn+Pim (step S53).
If the wire used to store charge of the node n has been already used, the routing processing unit 110 uses the cost function B represented as in equations (14) and (15) above to calculate the cost represented by Costn+Pim (step S54).
In equation (15) in processing in
When the timing of the signal i is critical or when there is a margin for the timing of the signal i and the operation ratio αi of the signal i is large, the same effect as in the case, described above, in which a wire used to store charge is not changed may be obtained.
If there is a margin in the timing of the signal i and the operation ratio αi of the signal i is low, the values of cong_cost(n) and res_cost(n) are dominant. At this time, if the operation ratio αi of the signal j described above is large, the cost becomes high. Therefore, the node n is less likely to be selected as a candidate of a path for the signal i in processing in step S24. Then, a wire, at the node n, used to store charge is likely to be selected as part of a path through which the signal j is transmitted or as a wire used to store charge of the wire through which the signal j is transmitted. When the reuse of charge is easily applied to a wire through which the signal j with a high operation ratio is transmitted as described above, the effect of reducing power consumption is increased.
Upon completion of processing in steps S53 and S54, the routing processing unit 110 determines whether processing in steps S51 to S54 has been completed for all wires, at the node n, used to store charge (step S55).
If the routing processing unit 110 determines that processing in steps S51 to S54 has not been completed for all wires, at the node n, used to store charge, the routing processing unit 110 repeatedly executes processing in step S51 and later.
If the routing processing unit 110 determines that processing in steps S51 to S54 has been completed for all wires, at the node n, used to store charge, the routing processing unit 110 selects the wire that minimizes the cost from all wires and adds the determined cost to the prioritized queue PQ (step S56).
Then, the routing processing unit 110 determines whether processing in steps S50 to S56 has been completed for all nodes n adjacent to the node m (step S57).
If the routing processing unit 110 determines that processing in steps S50 to S56 has not been completed for all nodes n adjacent to the node m, the routing processing unit 110 repeatedly executes processing in step S50 and later.
If the routing processing unit 110 determines that processing in steps S50 to S56 has been completed for all nodes n adjacent to the node m, the routing processing unit 110 terminates cost calculation concerning the node n adjacent to the node m and addition processing to add the calculated cost to the prioritized queue PQ.
An example of charge recycling mode selection processing S15 in
Example of Charge Recycling Mode Selection Processing
In addition to the storage unit 111 in
The timing graph creating unit 130 creates a timing graph D9 according to the net list D5, placement information D6, and wire information D8, which are stored in the storage unit 111, and stores the crated timing graph D9 in the storage unit 111.
The objective function creating unit 131 creates an objective function D10 according to the signal operation ratio information D7 and timing graph D9, which are stored in the storage unit 111, and stores the crated objective function D10 in the storage unit 111.
The restrictive condition creating unit 132 creates a restrictive condition D12 for each node according to the timing graph D9 and timing restriction D11, which are stored in the storage unit 111, and stores the created restrictive condition D12 in the storage unit 111.
The charge reuse enabling/disabling unit 133 creates a charge reuse enabling/disabling setting D13 according to the objective function D10 and restrictive condition D12, which are stored in the storage unit 111, and stores the created charge reuse enabling/disabling setting D13 in the storage unit 111. The charge reuse enabling/disabling unit 133 in the second embodiment is a mixed integer linear programming (MILP) solver, which uses a MILP method to determine which wire is eligible for the reuse of charge.
The timing graph creating unit 130, objective function creating unit 131, restrictive condition creating unit 132, charge reuse enabling/disabling unit 133, and storage unit 111 may be implemented by, for example, the processor 101, RAM 102, HDD 103, and other components in the design apparatus 100 illustrated in
An example of determining a wire for which to enable the reuse of charge will be described below.
The AND circuit 140 in
The timing graph creating unit 130 creates a timing graph as described below from a logic circuit as illustrated in
In the timing graph G(V, E) in
Then, a delay Dv of a certain wire (vertex) v in a set V of vertexes is represented as in equation (16) below.
D
v
=DIntrinsicv+γv·δv (16)
In equation (16), DIntrinsicv represents a delay in a case in which charge is not reused, γv represents a binary (0 and 1) variable, which indicates whether a wire v uses the charge recycling mode (whether the reuse of charge is enabled or disabled), and δv represents a delay that is added when the reuse of charge is enabled.
A delay between an input to a certain vertex v in the worst case is represented as in equation (17) below.
Arr
v≧∀(u,v)εEArru+Dv (17)
To determine the delay in the worst case, delays of all inputs to the vertex v may be determined by using a max function. Since, in the second embodiment, a mixed integer programming method is used, however, the delay in the worst case is represented as in equation (17) above. If only one input is used, the inequality sign in equation (17) is removed.
If a subset of the set V that is an end of a combinational circuit (such as the output terminal of an entire circuit or input terminals of flip flops) is denoted CO and a user-defined timing restriction is denoted T, the restrictive condition creating unit 132 generates a restrictive condition represented as in equation (18) below.
∀vεCOArrv≦T (18)
If wires (including free wires) to which the reuse of charge is applicable, these wires being a subset of the set V, is denoted CR, the objective function creating unit 131 creates an objective function represented as in equation (19) below.
In equation (19), αi indicates the operation ratio of a vertex i, the first item represents the number of wires to which the reuse of wires is applicable, each wire being weighted by the operation ratio, and the next item represents the sum of delays.
When the charge reuse enabling/disabling unit 133 uses the conditions in equations (16) to (18) as well as equation (19) to determine γ that maximizes Φ by the mixed integer programming method, the charge reuse enabling/disabling unit 133 may enable the reuse of charge for the largest number of wires.
If, in the timing graph G(V, E) in
D
PI1
=D
PI2=0
D
A=1+γA·0.5
D
B=1+γB·0.5
D
C=1+δC·0.5
D
F=1+γF·0.5
D
I1
=D
I2=0
D
O=2
D
PO=0 (20)
Arr
A
=Arr
PI1
+D
A
Arr
B
=Arr
PI2
D
B
Arr
C
=Arr
B
+D
C
Arr
I1
=Arr
A
+D
I1
Arr
I2
=Arr
C
+D
I2
Arr
O
≧Arr
I1
+D
O
Arr
O
≧Arr
I2
+D
O
Arr
F
=Arr
O
+D
F
Arr
PO
=Arr
F
+D
PO (21)
If the timing restriction T is 5 ns, then equation (22) may be obtained from equation (18) as a restrictive condition.
Arr
P0≧5 (22)
The objective function is represented as in equation (23) below.
φ=γA+γB+γC+γF−ArrO (23)
The charge reuse enabling/disabling unit 133 determines a solution that enables the reuse of charge for the largest number of wires by using the mixed integer programming method according to the restrictive condition in equations (22) and the objective function in equation (23).
In the solution in the above example, γA is 1, γB is 0, γC is 0, and γF is 0. If the reuse of charge is enabled for the wire A and is disabled for the wires B, C and F, the result is that the reuse of charge may be enabled for the largest number of wires without violating the timing restriction.
Thus, electric power may be more reduced without violating the timing conditions.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-180056 | Aug 2013 | JP | national |