This invention relates to designing clock logics in integrated circuits or chips, and particularly to optimizing clock logics during the design phase by minimizing clock uncertainty.
Integrated circuits (ICs) comprise a large number of circuit elements, such as transistors, interconnected by a large number of wires. Some elements (“drivers”) drive other elements (“driven elements”). Fanout of a given driver is the number of driven elements coupled to the output of the driver.
The “ramptime” of a driven element is the time required to drive a driven element to operation. Ramptime depends on the amount of capacitance and resistance “seen” by the driver, which in turn depends on the number of driven elements connected to the output of the driver and the length of the wires that interconnect the driver with its driven elements. If a driver's load exceeds a design threshold, the ramptime for the driven elements will also exceed a threshold.
It is common to selectively insert buffers, in the form of additional drivers, between the driver and the driven elements to reduce the number of driven elements for a given driver, thereby minimizing capacitance and resistance “seen” by that driver and minimizing timing violations. However, each added buffer increases power consumption of the integrated circuit. Consequently, it is desirable to minimize the number of buffers. Moreover, because each buffer introduces a delay in signal propagation, it is also desirable to minimize the number of levels of buffers and to minimize the overall interconnect length.
In the hierarchical design flow of digital systems, interconnect information is available only at lower levels of the design process. For example, coupling capacitance information is available only after detailed routing is completed, and not at the higher logic synthesis, placement and global routing stages. While lower levels of the design process provide more detailed interconnect information, the circuit design is usually so advanced at the lower levels that only minimal changes to the circuit structure can be performed to improve performance.
If a clock network is implemented after detailed routing, it is difficult to implement clock logic changes without changing the placement and the routing of data logics. It is also difficult to place the buffers and route the clock nets simultaneously in order to take into account the coupling and other detailed information of the chip fabrication and materials (“silicon information”).
To achieve the overall optimal results from the design specification to implementation, it is crucial to estimate the interconnect information at higher levels of the design process, such as during the placement stage and before routing, where there exists more freedom to restructure the design. Clock logics are very important and also sensitive to the timing closure of a design. A mis-estimation of clock delays may cause thousands or more violated timing paths, and attempts to correct a poorly routed clock net may inadvertently cause other timing violations. Therefore, good delay estimations for the clock logics are important at early stages of the design process. It is also important to implement the clock logics so that they are robust with respect to the interconnect implementations in fabrication of the chip.
A calculated clock delay will unavoidably have estimation errors. To compensate this estimation error, a “clock uncertainty” factor is employed in the estimation of clock delays. To make sure that the circuit under design will operate satisfactorily when implemented into a chip, the value of clock uncertainty is usually set conservatively. However, a conservative clock uncertainty value leads to other problems, such as adding unnecessary buffers to fix timing violations.
An embodiment of the present invention is directed to a technique for an early estimation of clock delay, and for reduction of estimation errors. The technique is useful in design optimization tools, and because delay changes dynamically during the optimization process, the developed technique is efficient in computation and memory usage.
In one embodiment of the invention, clock uncertainty between a receiving cell and a launching cell of a net is estimated by back-tracing a first path from the receiving cell toward the clock source. Each cell in the first path having a predetermined characteristic (e.g., in a critical path) is marked. A second path from the launching cell is back-traced toward the clock source to one of the marked cells having the predetermined characteristic (e.g., first marked cell). Clock uncertainty is calculated based on a delay associated with the first path between the marked cell and the receiving cell.
In preferred environments, there are a plurality of data launching cells capable of launching data to a data receiving cell. The second path is back-traced from each launching cell and clock uncertainty is calculated for each data path between the plurality of launching cells and the receiving cell. The maximum value of clock uncertainty is selected as a clock uncertainty for the receiving cell.
In some embodiments, a first clock delay between the clock source and the launching cell is calculated, and a second clock delay between the clock source and the receiving cell is identified. A data delay between the launching cell and the receiving cell is calculated, and a slack is calculated based on the first and second clock delays and the data delay. Clock uncertainty is calculated if the slack does not exceed a predetermined value.
In some embodiments, buffer placement to the clock net is optimized by forcing a buffer to the center of gravity of a plurality of inserted buffers driving respective clock nets without timing violations. The path between the root and the forced buffer defines a common path of maximum length to the leaves so that the non-common paths between the inserted buffer and the leaves is minimized, thereby minimizing clock uncertainty.
In other embodiments a computer having a computer useable medium has a computer readable program containing code that causes the computer to perform the process.
An embodiment of the present invention is directed to a process for optimizing a clock net in the form of a tree having a root defined by a driver pin and a plurality of leaves defined by driven pins. The process includes forcing a first buffer to a center of gravity of the plurality of leaves, inserting a set of second buffers so each leaf is driven by an inserted buffer without timing violations, and moving the first buffer to a center of gravity of the set of second buffers.
The value of uncertainty represents the maximal clock delay estimation errors. (As mentioned above, the clock delay estimation at the placement stage cannot be accurate because no routing information is available.) Larger timing violations may occur where the value of uncertainty is greater; large timing violations is minimized if the value of uncertainty is small.
The value of uncertainty can be quite large if the clock network delay is large. For example, if the clock network delay is 4 ns and, in the worst case, the estimation error is 15% of the clock network delay, the uncertainty value can be as high as 0.15*4=0.6 ns. Considering the clock cycle (T) is only 2.5 ns for a 400 MHz frequency, the uncertainty value is 24% of the clock cycle. Thus, the uncertainty value plays an important role in the timing closure of the design process.
An embodiment of the present invention provides an analysis approach for reducing the uncertainty value based on the clock network topology, rather than applying the worst case percentage. A robust clock network can be implemented to further reduce the uncertainty value.
This indicates that Dcommon (i.e., the common part of clock delays in both clock paths to pins CP1 and CP2) does not have any impact on the timing violation. So when uncertainty is being estimated, Dcommon can be ignored. Consequently, a larger Dcommon will provide a smaller uncertainty.
In
Clock logic 40 supplies clock signals from clock source 38 to pin CPr of receiving cell 30, and n clock logics 42, . . . , 44 supplies clock signals from clock source 38 to pins CPL1, . . . , CPLn of launching cells 32, . . . , 34. Clock logics 40, 42 and 44 may have common elements like buffer 24 in
It is time-consuming, and therefore impractical, to analyze and update each Dcommon-i on a path by path with an optimization tool. But it is also unnecessary to extract every path-based uncertainty because most paths are not timing-critical (in other words they are not likely to become timing violated paths).
To understand the calculation of uncertainty according to an embodiment of the present invention, the parameters slack, margin and coef are defined.
Slack is a measure of a potential timing violation for a given data path, and is defined as the clock cycle period, T, less the sum of the data path delay, Ddata, the difference in clock delay, Dclk1−Dclk2, setup and uncertainty:
slack=T−{Ddata+(Dclk1−Dclk2)+setup+uncertainty}. A timing violation might occur if the sum of the data path delay, Ddata, the difference in clock path delay, Dclk1−Dclk2, setup and uncertainty exceed the clock cycle period, T, that is, if slack<0. Thus in
Margin is a pre-determined value based on whether the time analysis is for setup time or hold time. For example, if the time analysis is for setup time, margin might be 2 ns, whereas if the time analysis is for hold time, margin might be 1 ns.
Coef is a user-specified parameter, which indicates the percentage-wise possible delay estimation errors at the placement stage. For example, if coef=0.15 (15%) and the clock delay is 3 ns, the worst case uncertainty=0.15×3=0.45 ns.
Duncertainty-i is the calculated clock uncertainty value from i-th launching cell to one path ending point in the receiving cell under analysis.
At step 104, the delay, Dclk2, from clock source 38 to the clock pin CPr of receiving cell 30 is identified. At step 106 the clock path is back traced through clock logic 40 to clock source 38 and each intermediate cell in the clock logic that is in a “critical path” to pin CPr is marked. An intermediate cell in the clock logic is in the critical path if the arrival time of a signal from clock source 38 to the intermediate cell, plus the time required to propagate a signal from the intermediate cell to pin CPr of the receiving cell is equal to clock delay Dclk2.
At step 108, the clock delay, Dclk1-i, from the clock source 38 to the clock pin of the selected launching cell 32 is calculated. Also, the data logic delay Ddata-i from the selected launching cell 32 to receiving cell 30 end point is calculated. As will become evident, the clock delay, Dclk1-i, and data logic delay, Ddata-i, are calculated for each launching cell i to the receiving cell.
The Slacki the data path from the respective i-th launching cell to the receiving cell is calculated as
Slacki=T−Dclk1-i−Ddata-i−setup+(1−coef)×Dclk2.
If, at step 110, slacki>margin, the launching cell (e.g., cell 32) can be ignored at step 112, that is, Duncertainty-i=0, and next launching cell (e.g., cell 34) will be selected at step 114.
If at step 110 slacki<margin, then at step 116 the clock circuit is back traced from the clock pin CPLi of i-th launching cell (such as cell 32) through the respective clock logic (such as logic 42) to clock source 38. Upon reaching the first marked cell, namely the cell that was marked at step 106 and first encountered in the back tracing of step 116, a clock delay, Dcommon-i, is calculated from clock source 38 to that marked cell. The selected marked cell is that cell that is electrically closest to the launching cell, and hence represents the marked cell of the longest common clock path to both the launching i and the receiving cell. At step 118, a clock uncertainty for launching cell i is calculated as Duncertainty-i=coef×(Dclk2−Dcommon-i).
At step 120, if all of the launching cells i in the set identified at step 102 have not been considered, then the process loops to step 114 to select the next launching cell for the receiving cell being considered. The process thus iterates to calculate Duncertainty-i for each launching cell capable of launching data to the receiving cell under consideration. When the last launching cell has been considered at step 120, the value of uncertainty is selected at step 122 as the maximum value of Duncertainty-i for all launching cells i to the receiving cell, thus representing the uncertainty for the path ending point under analysis:
uncertainty=MAX(Duncertainty-i|iε(1,2 . . . N),
where 1, 2, . . . , N are the launching cells.
The value of uncertainty is applied to Equation 1 for the timing analysis for the path end point.
To complete analysis of the entire integrated circuit design, at step 124 if the receiving cell under consideration is not the last receiving cell, the process advances to step 126 to select the next receiving cell and repeat the process. The process ends when, at step 124, the last receiving cell has been considered.
The value of uncertainty is used in Equation 1 for timing analysis for each path end point of the integrated circuit. The process is a dynamic process, used to update the clock uncertainty during the structuring and restructuring of the clock net. As shown in
After the cells of the clock network are placed, critical paths of the clock logic are identified and optimized at phase 156. The processes of steps 152 and 154 are again executed during the restructure of the clock logic at phase 156. Similarly, the processes of steps 152 and 154 are executed during the third phase 158 when the clock logic is optimized for timing violated paths. Hence, the process is performed during the cell placement and wire routing phase 150, during the phase 156 of optimizing critical paths, and during the phase 158 of minimizing timing violation paths. After each phase of the synthesis, clock uncertainty will be analyzed and updated based on the current clock network topology and the over-all delay (clock logic delay and data logic delay) information.
As indicated by Equation 2, different clock structures will have quite different clock uncertainties. Thus, the clock structure of
At step 200 the coordinates of each tree leaf are input to the process as (xi, yi), where iε(1, 2, . . . , M). At step 202, the center of gravity (x, y) of the leaves is calculated as
At step 204, a buffer is forced into a free space location close to (x, y), namely a location near the center of gravity of the leaves where there is sufficient free space for the buffer. “Forcing a buffer” means that no timing information or ramptime information will be considered. The forced buffer is arranged to drive all tree leaves.
At step 206, a set of buffers is inserted to drive all tree leaves. The set of buffers are inserted so that the new nets introduced by the inserted buffers do not have any ramptime violations. At step 208, a set of leaves within the bounding box of one of the inserted buffers is selected. The selected set of leaves are all those leaves that are driven by the selected one inserted buffer. A subset of the set is selected based on the drive capability of the inserted buffer, namely the maximum load that the inserted buffer can drive without causing ramptime violation. Preferably, priority is given to the inclusion within the subset of leaves between which there are timing paths. At step 210, the inserted buffer is then connected to drive the selected subset of leaves.
At step 212, if additional inserted buffers exist for which steps 208-210 have not been performed, the process loops back and iteratively performs steps 208 and 210 for each inserted buffer. When the last inserted buffer has been processed, as identified at step 212, then at step 214 the center of gravity of the inserted buffers is calculated.
For example, if there are K new inserted buffers such that each k-th buffer is inserted at respective coordinates (xk, yk). The center of gravity of the K buffers is calculated as
At step 216 the forced buffer inserted at step 204 is moved to this new center of gravity.
At step 218 another set of buffers is inserted to drive those buffers currently driven by the forced buffer such that all new nets driven by inserted buffers do not have an ramptime violation. At step 220 the net is tested to identify if the tree has any ramptime violations. If ramptime violations exist, the process loops back to step 214 to repeat steps 214-218 until no ramptime violations remain. The process then ends at step 220 with an implemented net having placed cells and routed wires.
The process of
The process is preferably carried out in a computer, with a memory medium, such as a recording disk of a disk drive, having a computer readable program therein containing computer readable program code that causes the computer to calculate the uncertainty parameter and carry out the processes of an embodiment of the invention. In preferred embodiments, the process is carried out in a computer in conjunction with an optimizing tool used during synthesis of the integrated circuit design.
Although the present invention has been described with reference to preferred embodiments, workers skilled in the art will recognize that changes may be made in form and detail without departing from the spirit and scope of the invention.
The present application is a division of and claims priority from U.S. patent application Ser. No. 10/616,623, filed Jul. 10, 2003, now U.S. Pat. No. 7,096,442, the content of which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
4924430 | Zasio et al. | May 1990 | A |
5258660 | Nelson et al. | Nov 1993 | A |
5298866 | Kaplinsky | Mar 1994 | A |
5414381 | Nelson et al. | May 1995 | A |
5452239 | Dai et al. | Sep 1995 | A |
5467040 | Nelson et al. | Nov 1995 | A |
5475830 | Chen et al. | Dec 1995 | A |
5608645 | Spyrou | Mar 1997 | A |
5649167 | Chen et al. | Jul 1997 | A |
5715172 | Tzeng | Feb 1998 | A |
5835751 | Chen et al. | Nov 1998 | A |
5852640 | Kliza et al. | Dec 1998 | A |
5936867 | Ashuri | Aug 1999 | A |
6090150 | Tawada | Jul 2000 | A |
6205571 | Camporese et al. | Mar 2001 | B1 |
6219384 | Kliza et al. | Apr 2001 | B1 |
6266803 | Scherer et al. | Jul 2001 | B1 |
6305001 | Graef | Oct 2001 | B1 |
6311313 | Camporese et al. | Oct 2001 | B1 |
6324671 | Ratzel et al. | Nov 2001 | B1 |
6341363 | Hasegawa | Jan 2002 | B1 |
6487697 | Lu et al. | Nov 2002 | B1 |
6518788 | Kasahara | Feb 2003 | B2 |
6536022 | Aingaran et al. | Mar 2003 | B1 |
6550045 | Lu et al. | Apr 2003 | B1 |
6564361 | Zolotykh et al. | May 2003 | B1 |
6578182 | Kurokawa et al. | Jun 2003 | B2 |
6606736 | Kobayashi et al. | Aug 2003 | B1 |
6618816 | Ido et al. | Sep 2003 | B1 |
6651224 | Sano et al. | Nov 2003 | B1 |
6665845 | Aingaran et al. | Dec 2003 | B1 |
6684373 | Bodine et al. | Jan 2004 | B1 |
6711724 | Yoshikawa | Mar 2004 | B2 |
6799308 | You et al. | Sep 2004 | B2 |
6810505 | Tetelbaum | Oct 2004 | B2 |
6836874 | Batchelor et al. | Dec 2004 | B2 |
6880141 | Tetelbaum | Apr 2005 | B1 |
6910194 | Mielke et al. | Jun 2005 | B2 |
6941532 | Haritsa et al. | Sep 2005 | B2 |
6954915 | Batchelor | Oct 2005 | B2 |
7051310 | Tsao et al. | May 2006 | B2 |
20020029361 | Kasahara | Mar 2002 | A1 |
20020073389 | Elboim et al. | Jun 2002 | A1 |
20020104035 | Burns et al. | Aug 2002 | A1 |
20020161947 | Ikeda et al. | Oct 2002 | A1 |
20030006819 | Nitta et al. | Jan 2003 | A1 |
20030014724 | Kojima et al. | Jan 2003 | A1 |
20030051222 | Williams et al. | Mar 2003 | A1 |
20030070151 | Kurokawa et al. | Apr 2003 | A1 |
20030074642 | Haritsa et al. | Apr 2003 | A1 |
20030074643 | Schmitt et al. | Apr 2003 | A1 |
20030115493 | Wong et al. | Jun 2003 | A1 |
20030121013 | Moon et al. | Jun 2003 | A1 |
20030135836 | Chang et al. | Jul 2003 | A1 |
20030149943 | Yoshikawa | Aug 2003 | A1 |
20030159118 | Lindkvist | Aug 2003 | A1 |
20030177464 | Takechi et al. | Sep 2003 | A1 |
20030212971 | Rodgers et al. | Nov 2003 | A1 |
20040003360 | Batchelor | Jan 2004 | A1 |
20040015801 | Mielke et al. | Jan 2004 | A1 |
20040025129 | Batchelor | Feb 2004 | A1 |
20040036518 | Nitta et al. | Feb 2004 | A1 |
20040107408 | Sano et al. | Jun 2004 | A1 |
20040123259 | You et al. | Jun 2004 | A1 |
20040196081 | Srinivasan et al. | Oct 2004 | A1 |
20040225984 | Tsao et al. | Nov 2004 | A1 |
20040268279 | Oleksinski et al. | Dec 2004 | A1 |
20050010884 | Lu et al. | Jan 2005 | A1 |
20050132313 | Lindkvist | Jun 2005 | A1 |
Number | Date | Country |
---|---|---|
0921888 | Aug 1997 | JP |
Number | Date | Country | |
---|---|---|---|
20060190886 A1 | Aug 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10616623 | Jul 2003 | US |
Child | 11402146 | US |