The present invention relates to the field of circuit design, synthesis and verification. In particular it relates to very large integrated circuit design synthesis and verification, and even more particularly it relates to a system, method and computer program product for synthesis and verification of clock gating on system on chip integrated circuits.
Power usage of system on chip (SOC) integrated circuit (IC) is a major concern during its design. The increasing integration of functions into the SOC with higher and higher speeds of operation has created a need find methods for reducing power consumption. One such method is to selectively switch-off unused portions or functional blocks of the SOC during operation. Such power reduction in SOCs during operation is achieved by performing clock gating, where clock signals (clocks) provided to the Flip-Flops (flops) within the functional block is switched-off thereby disabling the section of the circuit that is not being used.
Typical clock gating can be split into two classes, combinational clock gating and sequential clock gating. Combinational clock gating is the process of computing an explicit enable for a flop and use this enable to gate the clock to the flop. This requires only a combinational analysis of the design, leading to synthesis of a gating circuit for the clock to the flop. Synthesis tools easily perform any required combinational clock gating functionality to reduce the power of an SOC during synthesis.
On the other hand sequential clock gating is the process of computing an implicit enable for a flop. Since this requires a sequential analysis of the design, synthesis tools are not usually equipped to generate and implement such sequential clock gating circuits effectively.
Several specialized techniques to perform sequential clock gating have been published. The most common published techniques are to derive the observability don't care (OCD) condition, that is, the condition for which a flop is not observable; and stability (STC) condition, that is, the condition for which the input value of the flop does not change. It is necessary to identify OCD and STC conditions and use these conditions as an implicit enable to gate the flop.
In the above case, existing methods derive an enable, which is a delay of (EN1∥EN2∥EN3). To find this enable, existing methods traverse the fan-in of FF4 114 until reaching the three flops, FF1 111, FF2 112 and FF3 113 and extract the STC condition of these flops; that is the states of enables EN1 101, EN2 102 and EN3 103. It then performs an OR of these enables EN1 101, EN2 102 and EN3 103 and delays them by a clock cycle to compute the final STC condition, that is, the enable of FF4 114.
The resultant circuit 200 diagram is shown in
However the current methods to compute STC suffer from several limitations. They are not able to identify STC conditions for all cases, they also do not take into account the activity of the net, and finally none of the prior art methods can provide a solution to cover synthesis of clock gating in the case of an existing gated pipeline design. These limitations of the current STC computation are detailed below using the
The power dissipation of this pipeline design is a factor of the enable EN4 301 which is used to enable the clock 350 of the pipeline stages. If this enable is set to active, or a value of <1>, for a long period of time, that is, active for a large number of clock cycles at a time, the efficiency of clock gating using EN4 301 is minimum and such clock gating will not decrease the active power of FF4-1 310. Since the clock gating has to be sequentially delayed for the pipeline stages FF4-1 310 to FF4-4 313 the probability of gating based on the pipeline flops is limited and the current methods of deriving STC are not sufficient to compute the STC of the pipeline stage. Further in order to compute the STC of flop FF4-1 310, the fan-in of the first flop FF4-1 310 of the pipeline has to be traversed. The fan-in traversal within the circuit will encounter a primary reset input 304, a flop FF3 303 which is a flop without a reset and two flops FF1 111 and FF2 112 with reset. Though the condition of the flops FF1 111 and FF2 112 have been covered in prior art, the other two conditions, namely having a primary input (PI) and having a flop without enable in the fan-in traversal path, are not covered by the prior art STC computation methods. Due to these limitations the STC of flop FF4-1 cannot be computed using the prior art methods.
It is hence necessary and useful to find a solution that can provide full clock gating synthesis and verification coverage for a gated design including gated pipeline designs.
A computation, design synthesis method implemented on a computing system is provided. The method begins by identifying a first selected flip flop (flop) in the design for clock gating and then traversing a fan-in path of the flop to a termination in a component that is one of a primary input, a flop with enable and a flop without enable. Next, a stability condition (STC condition) of the first selected flip-flop (flop) in the design is computed for each of the terminations reached using the XOR based computation, and computed STC conditions are combined to generate a consolidated STC condition for the first flop. An implementation for the consolidated STC condition is generated such that the consolidated STC condition in semiconductor design generates the necessary clock gating signal for the identified first flop.
The STC condition for the fan-in path of the first selected flop ending in the component that is the primary input to the semiconductor design is generated by first delaying the primary input by a clock cycle to generate a delayed primary input and then doing a XOR function of the primary input with the delayed primary input. The STC condition for the fan-in path of the first selected flop ending in the components that are flops with enables is by first generating delayed enables, where each of the enables are delayed by a clock cycle and then generating an OR function of all the delayed enables. The STC condition for the fan-in path of the first selected flop ending in the component that is the flop without enable is by first generating a XOR function of the input of the flop with the output of the flop and then delaying the XOR output by a clock cycle. The STC condition for clock gating of the first selected flop may be generated by computing an OR function of the STC conditions of the individual terminating components of the fan-in paths of the first selected flop.
A circuit implementation is provided for generating a pipeline clock gating (pipeline gating) using a stability condition (STC condition) for a pipeline in a semiconductor design with an active enable. The implementation comprises computing a first STC condition of a first flip-flop (flop) of the pipeline in the semiconductor design with the enable in an enabled state; generating a second STC condition, called New_STC condition for the first flop of the pipeline in the semiconductor design by: generating an OR function of the first STC condition and an inversion of the New_STC condition; delaying the result produced by a clock cycle using a second flop; and computing the New-STC condition by generating an AND function of the delayed output of the second flop with the active enable.
The circuit design process requires ways to reduce the power consumption of large integrated circuits (ICs) and system on chip designs. This is typically done by introducing a process of clock gating thereby enabling or disabling flip-flops (flops) associated with specific functional blocks within the IC. However, such changes in the circuit require synthesis and verification to ensure correctness of design and operation as sequential clock gating changes the state function dynamically. It is therefore necessary to define synthesis methods adapted to such dynamic changes in the design. According to an embodiment a sequential clock gating method uses an exclusive OR (XOR) technique to overcome the deficiencies of the prior art methods.
According to an embodiment a method described herein overcomes these limitations of the prior art by using a XOR technique. Exemplary and non-limiting
The STC of a PI is derived by delaying the PI by one cycle and generating the XOR of the delayed PI with the original PI. Hence STC (PI)=[{delay (PI)} XOR PI]. In Verilog STC (PI)=[{delay (PI)} XOR PI] can be written as:
Reg delay;
always (@posedge clk) delay_PI<=PI;
wire stc_PI=delay_PI^PI;
The implementation of the STC of PI is shown in
The STC of a flop without enable is computed by generating the XOR of the input of the flop with the output of the flop. The Verilog implementation of this STC condition is:
In
Considering the pipeline design 300 of
STC_FF4-1=delay (Enable 1 OR enable 2
OR (FF3-d XOR FF3_q))
OR (reset XOR delay_reset)
That design when written in Verilog is:
STC_FF4-1=delay(EN1∥EN2∥(FF3_d^FF3-q))∥(reset^delay_reset);
In the case of the pipeline design 300 of
If the enable EN4 301 is an active enable for the gated flops FF4-1 310 to FF4-4 313 of the pipeline then the STC_FF4-1 generated cannot be directly applied as gated clock 450 to FF4-1 301 as it will change the functionality of the circuit. This condition is shown in exemplary and non-limiting
The effect of the enable or gating is introduced into the STC_FF4-1 as follows to generate New_STC_FF4-1:
Delay1<=INVERT (enable EN4 AND delay1)
OR STC_FF4-1;
New_STC_FF4-1=Delay1 AND enable EN4
This can be generated in Verilog as:
Reg delay1;
Always @ posedge clk;
Delay1<=!(enableEN4 and Delay1)∥STC_FF4-1;
Wire New_STC_FF4-1=Delay1 && EnableEN4;
(Note: here the initial state of the delay1=<1>)
Signal Wire New_STC_FF4-1 is the now the enable for FF4-1 called New_enable_FF4-1 550.
This modification is shown in
It should be understood that, as shown in
Power savings of any STC gate clocking where the inputs change often, that is if the activity level is high due to input changes or enable changes, will be limited. Hence before establishing a clock gating scheme the activity level of the STC condition has to be evaluated. In order to evaluate if the clock gating is good, the activity level of the STC condition needs to be evaluated. An activity level threshold, for example activity over 50%, negates the power saving effect of clock gating.
Within the IC choose a flop, say flop 1, with no enable or one with an enable but with an activity probability that is low, typically 25%. This is because of the fact that if the activity levels are high, the clock gating will not provide any major power saving advantage as the associated circuit of the selected flop is used often and the clock gating has to be enabled often with no power saving. In this case any additional clock gating circuits added will tend to increase the power and area usage of the IC. (S601)
Traverse the fan-in circuit of the selected flop 1 till another flop, flop 2, is reached or a PI to the circuit is reached. (S602)
If a PI is reached during traversal of fan-in of flop1, then the activity of the PI is checked to see if it is below a necessary threshold. That is check for PI activity threshold, typically of less than 50%. (S604)
If the activity level of the PI is higher than the 50% threshold level then no power saving is possible using STC based clock gating generated from the PI for flop 1 and hence the STC condition is not generated for the PI and the activity is stopped. (S605)
If the activity level of the PI is lower than the threshold level of 50% then generating an STC condition to gate the clock to the flop 1 is generated by generating the STC of PI. STC of PI=PI XOR (delay PI). (S606)
If during traversal of the fan-in of flop 1 the element reached is a flop, flop 2, then it is checked to see if it has an enable. (S607)
If flop 2 has an enable then an STC condition has to be derived and if it does not have an enable an activity check is done on the flop 2. (S608)
For the flop, flop 2, with an enable, the STC condition is generated for that flop 2 by delaying the enable. STC of flop with enable=delay (enable). (S609)
For the flop, flop 2, without enable the activity of the Flop 2 is checked to see if it is below a necessary threshold. Check for flop 2 activity threshold typically less thin 50%. (S610)
If activity level of flop 2 that has no enable is found to be greater than the threshold level of 50% then no power saving is achieved by using STC based clock gating generated from flop 2 for the original flop 1. The STC condition for flop 2 is not generated and the activity is stopped. (S611)
If the activity level of the flop 2, that has no enable, is found to be less than 50% then an STC condition for the flop is generated by delaying an XORed output of the input of flop 2 with the output of the flop 2. STC of flop_no enable=delay (flop_d XOR flop_q) (S612)
All the generated STC conditions from the PI and other flops, such as flop 2, are combined using OR gates to generate the final STC condition for the flop 1. (S613)
A differential power check is done for the original flop, flop 1, with the clock gating generated using the combined STC condition, to see if there is sufficient improvement in power saving to warrant retaining the generated clock gating. (S614)
The result of the differential power check is evaluated. (S615)
If the power saving is not sufficient then the clock gating using the STC condition generated is not implemented for the flop 1 and the operation is stopped. (S616)
If the power sufficient power saving is achieved, then the clock gating of the original flop, flop 1 is implemented as part of the circuit design thereby completing the generation of an STC condition to provide clock gating for the original flop, flop 1. (S617)
It should be noted that such effort for generating the STC condition and implementation of clock gating is continued for all the flops in the design to achieve power saving.
Once the STC condition has been generated and circuit modification has been introduced to generate the New_STC_FF4-1 550, or New-enable_FF4-1 to provide an equivalent clock gating to enable the flop FF4-1 310, further improvements can be made the clock gating to optimize the power dissipation of the circuit of the rest of the pipeline flops FF4-2 311 to FF4-4 313. As is evident from the circuit, each flop of the pipeline operates in a sequential fashion. Hence using the STC condition generated for the first flop FF4-1 310 of the pipeline design 300, for the rest of the flops FF4-2 311 to FF4-4 313 is not optimum.
The gated clock, enable 750 for FF4-2 311 can be written in Verilog as:
Reg Delay2;
Always @ posedge clk
If (EN4)
Delay2<=New_enable_FF4-1
New_enable_FF4-2=Delay2 && EN4;
Similarly the gated clock, enable 751 for FF4-3 can be written in Verilog as:
Reg Delay3;
Always @posedge clk
If (En4)
Delay3<=New_enable FF4-2;
New_enable_FF4-3=Delay3 && EN4;
And For FF4-4 313 the enable will be:
Reg Delay4;
Always @posedge clk
If (En4)
Delay4<=New_enable FF4-3;
New_enable_FF4-4=Delay4 && EN4;
The implementation of these functions is shown in exemplary and non-limiting
A new technique called the XOR technique has been described that enable the computation and synthesis of the STC condition for any flop in an IC design taking into account all the different logic conditions in the fan-in of that flop. This generated STC condition is used to modify the enable of the specific flop and other flops in a pipeline if the flop considered is the first of the pipeline stage without impacting the functionality of the design. The STC condition generated using the XOR technique enable synthesis and verification of clock gating of the flops of a design, for power reduction, while retaining the functionality of the design through all fan-in conditions. The computation and synthesis of the STC condition using the XOR technique lends itself to implementation using a computer system having at least a processor and at least sufficient storage capability for a suitable operating system, the design software and the synthesis software.
The invention may be implemented as part of an integrated circuit design, system on chip design, processor design, FPGA design and other semiconductor designs including a combination of the above. The invention may also be implemented as a synthesis and verification program to generate implement and verify the necessary design modifications of a basic design to achieve power reduction in operation by gate clocking. The invention may be implemented as a software program stored in a non-tangible memory module, the instructions of which to be executed on a processor, a combination of integrated software and hardware or as emulation on hardware, including but not limited to a computer aided design (CAD) system.
Number | Name | Date | Kind |
---|---|---|---|
6318911 | Kitahara | Nov 2001 | B1 |
6345379 | Khouja et al. | Feb 2002 | B1 |
6457167 | Kitahara | Sep 2002 | B1 |
7076748 | Kapoor et al. | Jul 2006 | B2 |
7484187 | Eisner et al. | Jan 2009 | B2 |
7546559 | Kapoor et al. | Jun 2009 | B2 |
7612597 | Matsumoto | Nov 2009 | B2 |
7631209 | Schultz | Dec 2009 | B2 |
7725848 | Nebel et al. | May 2010 | B2 |
7746116 | Narayanan et al. | Jun 2010 | B1 |
7761827 | Ramachandran et al. | Jul 2010 | B1 |
7772906 | Naffziger | Aug 2010 | B2 |
7774730 | Kanamaru | Aug 2010 | B2 |
7882461 | Jiang et al. | Feb 2011 | B2 |
7884649 | Savoj et al. | Feb 2011 | B1 |
7930673 | Jiang et al. | Apr 2011 | B2 |
7937634 | Almukhaizim et al. | May 2011 | B2 |
7941679 | Allen | May 2011 | B2 |
8065643 | Tamiya | Nov 2011 | B2 |
8069026 | Higuchi | Nov 2011 | B2 |
8086975 | Shiring et al. | Dec 2011 | B2 |
8132144 | Sundaresan et al. | Mar 2012 | B2 |
8166444 | Arbel et al. | Apr 2012 | B2 |
8207756 | Shionoiri et al. | Jun 2012 | B2 |
8219946 | Manovit et al. | Jul 2012 | B1 |
8225245 | Oommen et al. | Jul 2012 | B2 |
20050204316 | Nebel et al. | Sep 2005 | A1 |
20070220461 | Baumgartner et al. | Sep 2007 | A1 |
20070250798 | Chaudhry et al. | Oct 2007 | A1 |
20080209370 | Koelbl et al. | Aug 2008 | A1 |
20080288901 | Barowski et al. | Nov 2008 | A1 |
20080301604 | Itskovich et al. | Dec 2008 | A1 |
20090044154 | Berger et al. | Feb 2009 | A1 |
20090064070 | Kitahara et al. | Mar 2009 | A1 |
20090138837 | Baumgartner et al. | May 2009 | A1 |
20090195280 | Schlegel et al. | Aug 2009 | A1 |
20090224812 | Fujisawa | Sep 2009 | A1 |
20090327980 | Melamed-Kohen et al. | Dec 2009 | A1 |
20100042569 | Arbel et al. | Feb 2010 | A1 |
20100070941 | Sircar et al. | Mar 2010 | A1 |
20100083197 | Schultz | Apr 2010 | A1 |
20100198420 | Rettger et al. | Aug 2010 | A1 |
20100204844 | Rettger et al. | Aug 2010 | A1 |
20100231045 | Collins et al. | Sep 2010 | A1 |
20100231282 | Singasani | Sep 2010 | A1 |
20100253409 | Yeh | Oct 2010 | A1 |
20110084551 | Johnson et al. | Apr 2011 | A1 |
20110093431 | Arbel et al. | Apr 2011 | A1 |
20110156752 | Sofer et al. | Jun 2011 | A1 |
20110202788 | Hesse et al. | Aug 2011 | A1 |
20110248720 | Feldman et al. | Oct 2011 | A1 |
20120013627 | Shah et al. | Jan 2012 | A1 |
20120102344 | Kocev et al. | Apr 2012 | A1 |
20120173943 | Cesari | Jul 2012 | A1 |
20120210291 | Gu et al. | Aug 2012 | A1 |
20120242368 | Nozuyama | Sep 2012 | A1 |