The present invention relates generally to the design and evaluation of integrated circuits (“ICs”).
Early planning of buffer and wiring resources is a critical aspect of every modern high-performance very large scale integration (“VLSI”) implementation methodology. Today, such planning is needed to evaluate the quality of register transfer (“RT”) level partitioning and soft (pre-synthesis) block placement/shaping, system-level timing constraints, and pin definition and buffered routing of global interconnects.
While the requirements for global wire planning as an adjunct to floorplan definition (i.e., the floorplan definition must take into account congestion, wire length, and timing, among other things) and the need for simultaneous pin assignment and global routing have not changed very much in the past ten to twenty years, it is well-understood that today's context for floorplan definition and global wire planning has evolved. Channel-less multilayer area routing has replaced channel/switchbox routing; interconnect delays are more balanced with appropriately sized gate delays, and no longer dominated by gate delays; layer RC constants vary by factors of up to 100, so that layer assignment must be planned; global interconnects are buffered; and floorplanning is at the RT-level (instead of physical floorplanning) with soft blocks having uncertain area/delay envelopes. At the same time, the underlying problem formulations and algorithmic technologies have separately advanced in at least three important ways: “buffer block” methodology, optimizations for individual global nets, and provably good global routing (i.e., global routing that reflects near-optimal solutions, or solutions with a proven approximation ratio, to problem formulations).
The “buffer block” methodology, along with the associated planning problem (i.e., solving for locations and capacities of buffer blocks), has been proposed and further elucidated. While the buffer block methodology has been used recently in hierarchical structured-custom (high-end microprocessor) methodologies, it may be less relevant to flat or application-specific integrated circuit, or “ASIC”—like regimes (where “ASIC” stands for “application-specific integrated circuit”) due to issues of separate power distribution, congestion, etc. To alleviate congestion problems associated with the use of buffer blocks, a “buffer site” methodology has been proposed which more uniformly distributes buffers across the chip wherever possible. In the buffer site methodology, block designers leave “holes” in their designs that can be used to insert buffers during the routing of global wires. The percentage of the block area left unused depends on the criticality of the block, ranging from 0% for high performance blocks, such as caches, up to a few percent for lower performance blocks.
The increased impact of interconnects on system performance in deep-submicron technologies has led to a large amount of literature on performance-driven optimizations for individual global nets. Such optimizations include buffer insertion and sizing, wire sizing, and topology synthesis.
Provably good global routing has been developed based on the primal dual framework, starting with “column-generating” analogies, then continuing with the exploitation of recent fast approximations for multi-commodity flows. More recently such provable approximations have been applied to the problem of global routing with a prescribed buffer block plan, taking into account signal parity, delay upper/lower bounds, and other practical considerations.
The present invention includes a system and method for evaluating a floorplan and for defining a global buffered routing for an integrated circuit. A method embodiment of the invention includes constructing a graphical representation of the integrated circuit floorplan, including wire capacity and buffer capacity; formulating an integer linear program from said graphical representation; finding a solution to said integer linear program.
The present invention includes a method and system for evaluating IC wire routing and buffer resources and for constructing IC global buffered routings. The present invention may be used for IC floorplan evaluation, and for IC construction of global routing and buffer insertion for ICs. A method embodiment of the invention includes constructing a graphical representation of the integrated circuit floorplan, including wire capacity and buffer capacity; formulating an integer linear program from said graphical representation; finding a solution to said integer linear program. The present invention allows floorplan evaluation, global routing, and buffer insertion for ICs that takes into account effectively and simultaneously buffer and wire congestion, buffer and wire sizing, multiple global nets, pin assignments, and timing constraints. Such floorplan evaluation, global routing, and buffer insertion are desirable in order to reduce the design time and improve the performance of complex, large-scale ICs.
A feasible buffered solution to the floorplan evaluation problem formulated from the tile graph 20 seeks for each net Ni an si-ti path Pi buffered using the available buffer sites 26 such that the source 28 and the buffers drive at most U units of wire, where U is a given upper-bound. In the tile graph 20 of
The set of all feasible routings (Pi, Bi) for net Ni is denoted by
and the relative wire congestion is
The buffered paths (Pi, Bi), i=1, . . . , k, are simultaneously routable if and only if both μ≦1 and v≦1. To leave resources available for subsequent optimization of critical nets and engineering change order (“ECO”) routing, simultaneous buffered routings and wire congestion bounded away from 1 is generally sought.
Using the total wire and buffer area as a measure of floorplan quality, the tile graph yields this floorplan evaluation problem:
Given:
tile graph G=(V, E, b, w), with buffer and wire capacities b: V→
set
wireload, buffer congestion, and wire congestion upper-bounds U>0, μ0≦1, and v0≦1.
Find: feasible buffer routing (Pi, Bi) for each net Ni with relative buffer congestion μ≦μ0 and relative wire congestion v≦v0, minimizing the total wire and buffer area, i.e.,
where α, β≧0 are given constants.
The gadget graph 32 H has U+1 vertex copies v0, v1, . . . vU for each tile vεV(G). Four exemplary vertex copies 34 are indicated in
Formally, the gadget graph 32 H has vertex set
V(H)={si, ti|1≦i≦k}∪{vj|vεV(G), 1≦j≦U}
and arc set
where
Esrc={(si, vU)|vεSi, 1≦i≦k}
Esink={(vj, ti)|vεTi, 0≦j≦U, 1≦i≦k}
Eu, v={(uj−1, vj), (vj−1, uj)|1≦j≦U}
Ev={(vj, vU)|1≦j≦U}.
Each directed path in the gadget graph 32 H corresponds to a buffered routing in the tile graph 20 G, obtained by ignoring copy indices for the vertex copies 34 and replacing each directed arc 36 (vj, vU) with a buffer inserted in the tile 22 v. The construction ensures that the wireload of each buffer is at most U since a directed path in gadget graph 32 H can visit at most U vertex copies 34 before following a directed buffer arc 36.
In
There is a one-to-one correspondence between the feasible buffered routings for net Ni in the tile graph 20 G and the si-ti paths in gadget graph 32 H (lemma 1).
In an embodiment of the invention, the correspondence established in lemma 1 is used to give an integer linear program (“ILP”) formulation for the floorplan evaluation problem. Let
subject to
Solving the ILP is NP-hard (where “NP” means “nondeterministic polynomial time”). A preferred embodiment of the invention solves exactly or approximately a fractional relaxation of the ILP (obtained by replacing the constraints xpε{0,1}, with xp>0) and then obtains near-optimal integer solutions by randomized rounding.
An embodiment of the invention uses an efficient approximation for solving the fractional relaxation of the ILP. An upper bound D is introduced on the total wire and buffer area and the following linear program (LP) is considered:
min λ
subject to
Let λ* be the optimum objective value for the LP. Solving the fractional relaxation of the ILP is equivalent to finding the minimum D for which λ*≦1. This can be done in a binary search that requires solving the LP for each probed value of D. A lower bound on the optimal value of D can be derived by ignoring all buffer and wire capacity constraints, i.e., by computing for each net Ni buffered paths pε
A trivial upper bound is the total routing area available, i.e.,
Unfeasibility in the fractional relaxation of the ILP is equivalent to λ* being greater than 1 when D=Dmax, and can therefore be detected using the algorithm described below.
subject to
The algorithm of
Minimum-weight paths are computed in line 11 of the algorithm of
The algorithm of
minimum-weight path computations, using
where n is the number of tiles 22 or vertices 23 and m is the number of edges 24 of tile graph 20 G, respectively, and
ε′:=ε(1+ε)(1+εγ).
In an embodiment of the invention, after the LP is solved using the algorithm of
A direct implementation of randomized rounding requires storing explicitly all paths with non-zero flow. However, this may be unfeasible in the case of limited memory capacity. An alternative to storing explicitly all paths with non-zero flow is to compute edge flows instead of path flows with the algorithm of
An embodiment of the invention makes use of another implementation requiring storing a single path per net, in which randomized rounding is interleaved with computation of the fractional flows xp. The path selected for each net is continuously updated as follows. In the first phase, the single path routed for each net becomes the net's choice with probability 1. In iteration r>1, the path routed for net i replaces the previous selection of net I with a probability of (r−1)/r. The path selected after t phases was selected by the net in phase r=1, . . . , t with an equal probability of 1/r, i.e., the probability that a path p is the final selection is equal to the fractional flow xp computed by the algorithm of
In an embodiment of the invention, the paths routed for each net in the last K=5 phases of the algorithm of
In an embodiment of the invention, dependence on λ* of the algorithm of
Using known ideas, it can be shown that the algorithm of
In an embodiment of the invention, line 2 of the algorithm of
To evaluate a floorplan at an early stage of the design process, it is useful not only to find the minimum routing area need for given bounds on μ0 and v0 on the relative buffer and wire congestion, but also to find how the total routing area increases if a smaller congestion is enforced. A floorplan is better if a smaller area increase is needed for the same decrease in congestion. Let the minimum routing area needed for a fractional solution with relative buffer and wire congestion not more than μ and v, respectively, be denoted by Λ(μ, v). In the following, a vector x denotes the fractional solution xp, pε
Another lemma shows that in certain cases a value Λ(μ, v) can be derived from an optimal solution of the LP, so the binary search described above can be avoided: letting x be an optimal solution for the LP for a given D, μ0, and v0, if there exists a solution x′ with
then Λ(μ(x), v(x))=A(x) (lemma 3).
In an embodiment of the invention, the full area versus the congestion tradeoff curve is computed as follows. The feasible region (which is also convex) for μ and v is computed by ignoring the constraint on the area. The LP is then solved for certain values of D, μ0 and v0. If the solution is on the boundary of the feasible region, D is decreased such that μ and v increase; otherwise, a new point for the area and congestion tradeoff curve has been found.
Embodiments of the invention using the algorithm of
There is a considerable degree of flexibility available for pin assignment at the early stage of floorplan design. In an embodiment of the invention, consideration of floorplan design requires only two small changes in the construction of the gadget graph 32 H. First, source vertices si must now be connected by directed arcs to the U-th copies of all vertices 23 representing enclosing tiles 22. Second, copies 0, . . . , U of all nodes representing enclosing tiles 22 must be connected by directed arcs into the sink vertices ti. Pin assignments are read from the paths selected by randomized rounding by assigning to each source an arbitrary pin in the tile 22 visited first, or each sink an arbitrary pin in the tile 22 visited last, by the selected path for the net. This embodiment of the invention does not distinguish between multiple pin assignments within a tile, since the within-tile pin assignment has no effect on tile-level congestion and routing area estimates. The size of the gadget graph 32 H in this embodiment is virtually of the same size as the gadget graph 32 H. For k nets, only O(k) edges are added to the gadget graph 32 H under the realistic assumption that each pin can be assigned to at most O(1) tiles. Therefore, the time required to find minimum-weight paths, and hence the overall runtime of the algorithm of
The present invention also permits consideration of given sink delay constraints. For simplicity, an embodiment of the invention that deals with sink delay constraints assuming only a single buffer type and a single wire size are available is first discussed, with no intention of limiting the invention in any way. Then, an embodiment of the invention that may simultaneously handle buffer and wire sizing is discussed.
Assume an upper-bound of di on the source 22-to-sink 24 delay of net Ni. The delay of a wire segment connecting the source 22 or buffer u to the sink 24 or buffer v is the sum between the gate delay
intrinsic_delayu+ru·(cwlu, v+Cin(v))
and the wire delay
rwlu, v·(cwlu, v/2+Cin(v))
where ru and Cin(v) are the output resistance and input capacitance, respectively, of the buffer/terminal u; rw and cw are the resistance and capacitance, respectively, of a tile-long wire; and lu, v is the wire length in tiles between u and v. Here, the term “gate” encompasses sources, sinks, and buffers, and the term “terminal” encompasses sources 22 and sinks 24.
To simplify the description of this embodiment of the invention even more, again without limiting the scope of the invention, it is assumed that the intrinsic delay and output resistance of sources are equal to the corresponding parameters of a buffer. (Non-uniform parameters are discussed below.) Under this assumption the total (i.e., gate+wire delay) delay of each routing segment depends only on the segment's length, l, and the input capacitance of the driven buffer/sink. Every routing segment ending in tile 22 v corresponds in the gadget graph 32 H to a path whose last directed arc is either the directed arc 36 (v1, v0), if the segment drives a buffer, or the directed arc (vl, ti), if the segment drives the i-th sink. Since these directed arcs fully identify both the segment length and the input capacitance of the driven buffer/sink, we can assign them pre-computed segment delays and obtain this lemma (lemma 4): the one-to-one correspondence between feasible buffered routings of net Ni in tile graph 20 G and the si-ti paths in gadget graph 32 H preserves the delay.
If given an upper-bound of di on the source 22-to-sink 24 delay of net Ni, computation of the solution by use of the algorithm of
An embodiment of the invention uses modifications of the gadget graph 32 H described above to handle sink delay constraints. This embodiment, in general, applies for any delay model, such as the Elmore delay model for which (1) the delay of a buffered path is the sum of the delays of the path segments separated by the buffers, and (2) the delay of each segment depends only on segment length and buffer parameters. (This embodiment does not take into account the slope at the input of the driving buffer, but this is not a significant problem in the context of early floorplan evaluation.) For efficiency, this embodiment requires that segment delays be rounded to relatively coarse units.
The resulting gadget graph 40 H in this embodiment of the invention is acyclic, so minimum-weight paths in the approximation algorithm of
An embodiment of the invention also takes into account buffer and wire sizing during timing-driven IC floorplan evaluation. Looking first at the case of using a given buffer library
In another embodiment of the invention, wire sizing may be taken into account. To reduce the complexity of the problem, fixed wired widths are required along any source-to-buffer wire segment, any buffer-to-buffer wire segment and any buffer-to-sink wire segment, a requirement that may increase propagation delays by at most 5% compared to the optimum delay achieved by wire tapering. Given a library of wires of different widths
An embodiment of the invention incorporates buffer and wire sizing through modifications of the gadget graph 32 H. A gadget graph 48 H for buffer sizing is shown in
An embodiment of the invention handles wire sizing (and a coarse form of layer assignment) by a different modification of the gadget graph 32 H. Assuming that per unit capacitances of the thinner wire widths are rounded to integer multiples of the “standard” per unit capacitance, the gadget graph 54, shown in
Inverting buffers are often preferred to non-inverting type buffers since they occupy a smaller areas for the same driving strength. The use of inverting buffers introduces additional polarity constraints, which requires a larger number of buffers to be inserted, but overall, inverting buffers may lead to a better overall resource utilization. Algorithms for bounded capacitive load inverting and non-inverting buffer insertion are known; the prior art focuses on single net buffering with arbitrary positions for the buffers. But in the floorplan evaluation problem, the goal is to minimize the overall number of buffers required by the nets, and buffers can be inserted only in the available sites.
In an embodiment of the invention, consideration of polarity constraints is achieved by modifying a gadget graph 32 H as shown in
In the modified gadget graph 62, each vertex copy 64 of the basic gadget graph 60 H is replaced by an “even” vertex copy 67 and an “odd” vertex copy 69, i.e., vi is propagated into vieven and viodd. Exemplary even vertex copies 67 and exemplary odd vertex copies 69 are indicated in
The gadget graph 62 H also allows two inverting buffers to be inserted in the same tile for the purpose of meeting polarity constraints. This is achieved by providing bidirectional arcs 74 connecting the U-th even and odd copies of a tile v, i.e., (uUeven, uUodd) and (vUodd, uUeven). Finally, source vertices si (not shown) are connected by directed arcs (not shown) to the even U-th vertex copy 66 representing an enclosing tile 22, and only vertex copies 67 or 69 of the desired polarity have directed arcs (not shown) going into sink vertices tI (not shown).
An embodiment of the invention provides for multipin nets: nets including more than one sink ti. For multipin nets, a buffered tree, rather than a buffered path, is sought in which the wireload of each buffer is at most U. The algorithm of
Under the assumption that the driving strength of the source terminals is identical to the driving strength of a buffer, and the input capacitance of the sink terminals is identical to the input capacitance of a buffer, a feasible solution to a floorplan evaluation problem will satisfy load capacitance constraints regardless of which source terminal is driving the net. Thus, an embodiment of the invention may be used with respect to instances that contain multi-driven nets such as buses. However, application of an embodiment of the invention to multi-driven nets seems feasible only for the case in which buffers are non-inverting (i.e., there are no polarity constraints). Further, an embodiment of the invention capable of handling multipin nets cannot handle multi-driven nets with simultaneous upper-bounds on delays for paths involving more than one source.
An embodiment of the invention decreases the tile size to increase accuracy. However, this results in significant increases in running time. Furthermore, when the tile size decreases beyond a certain point, the channel widths and the number of buffer sites per tile may become so small that the accuracy of the randomized rounding is greatly reduced. Ideally, the channel widths and buffer sites per tile should be approximately the same for all tiles. If a tile is too crowded, potential congestion violations can be missed, and if a tile is too sparse, then the solution of the linear program relaxation cannot be rounded accurately. This embodiment of the invention uses uneven tile sizes to achieve evenly populated tiles, implemented by using appropriate target values for channel width and buffer sites per tile, and, starting with a coarse grid, recursively partitioning the overpopulated tiles into four equal sub-tiles until the target tile occupancy is reached.
An embodiment of the invention handles constraints, not only on the number of buffer sites in each tile, but additional constraints on the total number of buffers in a set of tiles, i.e., in a window. For instance, these additional constraints may explicitly bound the total number of buffers in a given block.
The algorithm of
While various embodiments of the present invention have been shown and described, it should be understood that other modifications, substitutions, and alternatives are apparent to one of ordinary skill in the art. Such modifications, substitutions, and alternatives can be made without departing from the spirit and scope of the invention, which should be determined from the appended claims.
Various features of the present invention are set forth in the appended claims.
This application is related to now abandoned provisional application Ser. No. 60/413,096, filed on Sep. 24, 2002, and claims priority from that provisional application under 35 U.S.C. § 119. Provisional application Ser. No. 60/413,096 is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6480991 | Cho et al. | Nov 2002 | B1 |
6883154 | Teig et al. | Apr 2005 | B1 |
20040044979 | Aji et al. | Mar 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20040117753 A1 | Jun 2004 | US |
Number | Date | Country | |
---|---|---|---|
60413096 | Sep 2002 | US |