This invention relates generally to circuit design and more particularly to layout-driven, area-constrained design optimization.
With the advent of deep submicron (DSM) technologies, interconnect loads and delays and layout-driven synthesis have become significant. However, because of tight layout constraints, e.g., area availability and congestion, only layout-friendly logic transforms such as net buffering and gate resizing are effective.
According to the present invention, disadvantages and problems associated with circuit design may be reduced or eliminated.
In one embodiment, a method for layout-driven, area-constrained design optimization includes accessing a design and a layout of the design. The design includes one or more gates and one or more nets coupling the gates to each other. The layout includes blocks that partition a chip area of the design. Each block includes one or more of the gates. The layout also includes a global routing of the nets. The method also includes performing a first timing analysis of the design and the layout and updating the design and the layout as follows. Updating the design and the layout includes identifying all critical extended nets in the design. An extended net includes one net or two or more nets coupled to each other through buffers. Updating the design and the layout also includes ordering the critical extended nets. Updating the design and the layout also includes, for each of the critical extended nets, in the order, assigning polarities to sinks on the critical extended net, removing all buffers from the critical extended net, and updating the blocks in the layout spanned by the critical extended net according to the removal of buffers from the critical extended net. Updating the design and the layout also includes, if each block spanned by the critical extended net has no available area or includes only one net node for insertion of a buffer, applying to the critical extended net a modified van Ginneken algorithm for timing optimization or a modified global area algorithm for area minimization, and updating the blocks in the layout spanned by the critical extended net according to the application of the modified van Ginneken algorithm or the global area algorithm. Updating the design and the layout also includes, if the critical extended includes less than a first threshold number of net nodes and one or more blocks spanned by the critical extended net have at least some available area and include more than one net node, modifying the critical extended net so that each block spanned by the critical extended net has no available area or includes only one net node for insertion of a buffer, applying to the critical extended net the modified van Ginneken algorithm for timing optimization or the modified global area algorithm for area minimization, and updating the blocks in the layout spanned by the critical extended net according to the application of the modified van Ginneken algorithm or the modified global area algorithm. Updating the design and the layout also includes, if the critical extended net includes more than a second threshold number of net nodes and less than a third threshold number of net nodes, applying an exact Murgai buffering algorithm to the critical extended net and updating the blocks in the layout spanned by the critical extended net according to the application of the exact Murgai buffering algorithm. Updating the design and the layout also includes otherwise applying to the critical extended net a van Ginneken algorithm or a global area algorithm for timing optimization or a global area algorithm for area minimization that does not use illegal solutions at a root of the extended net, and updating the blocks in the layout spanned by the critical extended net according to the application of the van Ginneken algorithm or the global area algorithm. The method also includes performing a second timing analysis of the design and the layout. The second timing analysis takes into account the updates to the design and the layout. The method also includes, if one or more results of the second timing analysis indicate that the design meets all predetermined design goals or indicate no progress toward one or more of the design goals relative to one or more results of the first timing analysis, communicating the design and the layout for analysis. The method also includes, if one or more results of the second timing analysis indicate that the design does not meet one or more predetermined design goals and indicate at least a predetermined amount of progress toward one or more of the design goals relative to the one or more results of the first timing analysis, further updating the design and the layout.
Particular embodiments of the present invention may provide one or more technical advantages. As an example, particular embodiments address the problem of minimizing the delay of a mapped, roughly-placed and globally-routed design by buffer insertion and/or deletion without violating the local area constraints imposed by the layout. Particular embodiments make previous algorithms for timing optimization more practical by improving runtime without substantially sacrificing quality. Particular embodiments may provide speedups of 12.5 times the runtimes typically associated with previous algorithms for timing optimization.
Particular embodiments may provide all, some, or none of the technical advantages described above. Particular embodiments may provide one or more other technical advantages, one or more of which may be apparent, from the FIGURES, descriptions, and claims herein, to a person having ordinary skill in the art
To provide a more complete understanding of the present invention and the features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings, in which:
Given a mapped, block-placed, and globally-routed design, particular embodiments of the present invention address the problem of minimizing its delay by buffer insertion and/or deletion subject to the area constraint. The area constraint stipulates that the area available in each block should not be less than the net area increase due to the buffers inserted in and deleted from the block by buffer optimization. If the area constraint for some block is not met, the modified design may be unrealizable. Satisfying the area constraint becomes more important when the design process is closer to the final layout.
Previous algorithms addressing the area-constrained net buffering problem work relatively well on a single net, but a drawback of such algorithms is their worst-case exponential runtime, which makes them impractical for large nets. Particular embodiments of the present invention employ techniques making such algorithms more practical by improving runtime without sacrificing quality, e.g., final delay and area. Particular embodiments of the present invention make use of a one-node-per-roomy-block (ONPRB) condition, which, if satisfied by the net, improves the worst-case runtime complexity to quadratic, without causing a substantial loss in optimality. To further reduce runtime, particular embodiments of the present invention employ a technique that converts a net into one that satisfies this condition.
In a physical synthesis design flow, a design is typically partitioned into possibly nonuniform blocks. A block is represented by (x, y) coordinates. The total area of a block P is denoted by T(P). T(P) is determined by a floor planning and placement tool, taking into account, among other things, the total chip area and the number of blocks. The area occupied by the cells and nets in the block P is the used area of the block U(P). The remaining area of P is the available area A(P), which may be used for inserting new cells or replacing existing ones with larger ones. A(P)=T(P)−U(P). The area constraint for block P mandates that the net area increase in P due to buffers inserted into or deleted from P must not exceed A(P).
A net segment is a horizontal or vertical piece of wire. A simple net is a set of net segments connecting cell pins and circuit pads to each other. A net has a driver (or root) and sink pins/pads. In a net, all net segment end-points that are not pins or pads are called Steiner nodes. All pins, pads, and Steiner nodes on a net are called net nodes. Since a net is a rooted tree, it makes sense to talk about children, parents, and ancestors of a net node.
There are usually two main components of the circuit delay: pin-to-pin delay through the cells and interconnect delay. For an input pin i and an output pin o of a cell M, delay d(i, o) from i to o through M is given by d(i, o)=αio+βioco. Here, co is the load capacitance at o, αio is the intrinsic delay from i to o, and βio is the corresponding load coefficient. These parameters are specified for all input to output arcs of each cell in the cell library. An Elmore delay model may be used for the interconnect.
Previous techniques for timing optimization target timing optimization without any regards to the area constraint, so they may insert buffers in blocks or locations where no area is available. Some of these techniques suggest associating a cost such as the number of buffers (or their areas) with each buffering solution, but do not address the problem of inserting buffers subject to the area availability in each local part of the design. Other previous techniques address the problem of buffer insertion in the presence of obstacles, but these works do not address the buffering problem in the presence of local area constraints. In contrast, particular embodiments of the present invention automatically handle obstacles since the reduced areas available in the related blocks due to obstacles are used in such embodiments.
Given a net N with n nodes and a library with l buffers, there are (l+1)n buffering solutions for N. Previous techniques for timing optimization include an exact polynomial-time dynamic programming algorithm for optimizing timing on N by buffering. Given required times at each net sink of N and a buffer library, the algorithm generates the optimum choice and locations of buffers that maximize the required time at the root r. It traverses net nodes of N bottom-up: starting from sinks and proceeding towards the root. At an intermediate node (such as a Steiner node) v, there are (l+1) possibilities corresponding to inserting any of the l buffers and not inserting any buffer at v. The algorithm constructs a set of solutions S(v) at v to capture all these possibilities. A solution at node v is a pair (c, q), where c is the capacitance of the tree Tv rooted at v and q is the required time at v. S(v) is constructed from the solution sets of v's children and the buffering possibilities at v by appropriately incorporating capacitances of and delays through the wires and the buffer at v, as illustrated in
In particular embodiments of the present invention, a goal of area-constrained net buffering for timing optimization is to maximize the required time at the root of the net while satisfying the area constraint. Given an extended net N with its buffers already deleted, updated available areas A(P) for each block P of the design, and node v of N, σ ∈ S(v) is a legal solution if after inserting buffers in Tv as recommended by σ and updating the affected block areas, A(P)≧0 for all P. Otherwise, σ is illegal. Informally, a legal solution corresponds to a buffering choice that does not violate block area constraints.
Previous techniques for timing optimization include an exact algorithm for the area-constrained buffering problem for a single net. Reference to an exact Murgai buffering algorithm encompasses such an algorithm, where appropriate. Such an algorithm traverses the net nodes in a topological order, from sinks to the root. At a net node, the algorithm first generates all legal solutions (but no illegal solution) and then throws away inferior solutions. At the root, the algorithm picks the optimum legal solution. A net node v, which is in block P, is processed, as illustrated in
For σv=(cv, qv, Hv), the table Hv is generated as follows. If v is a sink node, Hv is empty. For an intermediate node v, there are two possibilities:
After generating Hv, it is checked if, for each Q ∈γ(v), Hv(Q)≦A(Q). If so, σv is legal. Otherwise, it is illegal and rejected. From the definition of the block area usage table, the following criterion of solution suboptimality or inferiority may be used:
The following definition relates to solution suboptimality:
Particular embodiments use a condition for the net, which, if satisfied, results in a polynomial-time exact algorithm for the timing-optimization problem. A reason for the exponential blow-up of algorithms used in previous timing-optimization techniques is the following: With each solution at a node v, the algorithm needs to remember the area used in each block in the span of the sub-tree Tv rooted at v. This is to handle the case when there is a net node w, which is an ancestor node of v and is in a block P ∈γ(v).
In the above description, w, v's ancestor node, was required to be in a block P, where P is in the span of the sub-tree Tv rooted at v. If a net N is such that each block Q spanned by N contains exactly one node of N, Q would be visited exactly once during the course of the algorithm: when the solutions for the net node v contained in Q are generated. No other net node of N has a contention for the area resources of Q. Hence, the block area usages need not be stored with each solution. The following modified van Ginneken algorithm may be used, which is similar to the original van Ginneken algorithm, with each solution having only two components c and q. The only modification needed is for the buffered solution generation. When trying to insert a buffer b at the node v in block Q, the algorithm performs an additional check: is it legal to insert buffer b, i.e., is the area A(Q) available in Q at least as much as the area of the buffer b? If so, buffered solutions may be generated at v for all the corresponding unbuffered solutions with buffer b. Thus, the legality check may be made for a solution at a node without using area usage information from any other node. In contrast, in the general case, the legality of a solution at a node v depends on the area used by other nodes that share v 's block.
For instance, in
From the above description, the following proposition holds:
The run-time complexity is the same as that of van Ginneken algorithm.
Note that the ONPB condition may be strengthened by allowing N to go over a block P with more than one net node, as long as P has no available area. The block area used here is the updated block area, i.e., after all the buffers originally present on N have been deleted in a pre-processing step.
The following proposition follows:
To further improve runtime, critical nets that do not satisfy the ONPRB condition may be modified so that they do. The modification may be done in one of the following two ways:
Particular embodiments of the present invention use the second scheme, above. The main problem is to select one node per block where potential buffer insertion may be done. The nodes on a net may be one of the following types: pin and pad (source and sinks), branching Steiner node, intermediate Steiner node (on long net segments), and Steiner node immediately after a branching Steiner node (on one of the branches, e.g., node w in
If a net does not satisfy the ONPRB condition, has a large node count, and spans several blocks, its solution generation may be restricted to improve runtime and memory usage. Solutions are generated using either only c and q, or c, q, and global area a components. The global area a represents the total area used by all the buffers in the sub-tree rooted at the current net node. Reference to a global area algorithm encompasses a buffering algorithm in which a solution contains c, q, and a components. After solutions have been generated at all net nodes, illegal solutions at the net root are discarded. The best legal solution is picked. Although this scheme introduces suboptimality, it can handle nets that could not be optimized earlier.
As an example and not by way of limitation, the following is a complete improved buffering algorithm for area-constrained timing optimization of a single net N:
For design goal of area minimization, a similar methodology is used. However, instead of van Ginneken algorithm and modified van Ginneken algorithm, the global area algorithm and modified global area algorithms are used respectively. The modified global area algorithm is the same as the global area algorithm, except for one modification. The modification is in the buffered solution generation. When trying to insert a buffer b at a node v in a block Q, the algorithm performs an additional check: is it legal to insert buffer b, i.e., is the area A(Q) available in Q at least as much as the area of the buffer b? If so, buffered solutions may be generated at v for all the corresponding unbuffered solutions with buffer b. As an example and not by way of limitation, the following is a complete improved buffering algorithm for area minimization of a single net N:
At step 114, if one or more blocks each include more than one gate, the method proceeds to step 118. At step 118, the min-cut tool further partitions the chip area of the design into smaller blocks using horizontal and vertical cuts. At step 120, the placement tool assigns each smaller block a fixed area limiting gates in the smaller block by number and type. At step 122, the placement tool places each gate in the design into a smaller block. At step 124, the global routing tool generates net segments between the smaller blocks to determine a topology and a global routing of each net in the design. At step 126, the timing analysis tool performs a timing analysis on the design. At step 128, the optimization tool applies one or more optimization algorithms to each net in the design. As an example and not by way of limitation, the optimization tool may apply a net buffering algorithm to each net in the layout of the design to minimize delay in the design without violating local area constraints imposed by the layout. At step 130, if each smaller block includes no more than one gate, the method proceeds to step 116. At step 130, if one or more smaller blocks each include more than one gate, the method returns to step 118.
Although particular steps of the method illustrated in
Particular embodiments have been used to describe the present invention, and a person having skill in the art may comprehend one or more changes, substitutions, variations, alterations, or modifications to the particular embodiments used to describe the present invention. The present invention encompasses all such changes, substitutions, variations, alterations, and modifications within the scope of the appended claims.
This application claims the benefit, under 35 U.S.C. § 119(e), of U.S. Provisional Application No. 60/636,319, filed Dec. 14, 2004.
Number | Name | Date | Kind |
---|---|---|---|
6286128 | Pileggi et al. | Sep 2001 | B1 |
6557145 | Boyle et al. | Apr 2003 | B2 |
6591407 | Kaufman et al. | Jul 2003 | B1 |
20010010090 | Boyle et al. | Jul 2001 | A1 |
20050138578 | Alpert et al. | Jun 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20060129960 A1 | Jun 2006 | US |
Number | Date | Country | |
---|---|---|---|
60636319 | Dec 2004 | US |