This patent application makes reference to, claims priority to and claims benefit from U.S. Provisional Patent Application Ser. No. 61/025,548 filed on Feb. 1, 2008. The above stated provisional application is hereby incorporated herein by reference in its entirety.
The present invention relates to a system and method for use in electronic design software to perform minimum area retiming to relocate registers or flip-flops in a circuit design through sequential transformation wherein circuit functionality is preserved and more particularly to such a retiming system and method that is optimal, efficient and minimizes memory usage.
Retiming is one of the most powerful sequential transformations that relocates the registers or flip-flops (FFs) in a circuit while preserving its functionality. Since relocating the FFs could balance the longest combinational paths and reduce the circuit states, the clock period and the FF area or number of FFs in a circuit can be reduced through retiming optimizations. As the minimum clock period retiming minimizes the clock period, and thus might significantly increase the FF area, the minimum area retiming minimizes the FF area under a given clock period, and thus could be used to minimize the FF area even under the minimum clock period. Therefore, the min-area retiming problem is more important for sequential circuit design, but of higher complexity. See, e.g. N. Shenoy and R. Rudell, “Efficient implementation of retiming” ICCAD, pages 226-233 (1994).
All known and provable approaches to min-area retiming follow the basic ideas of C. E. Leiserson and I. B. Saxe “Retiming synchronous circuitry,” Algorithmica 6(1):5-35 (1991). Given a circuit represented as a graph of n vertices and m edges, the minimum number of FFs between any two vertices and the maximum delay over the paths of the minimum number of FFs are first computed. Then, besides one constraint for each edge requiring that the FF number to be nonnegative, for each pair of the vertices whose computed path delay is larger than the given clock period, i.e. the timing critical path, a constraint is generated requiring that there be at least one FF between them. Minimizing the FF′ area under those constraints formulates a dual of the min-cost network flow problem. Since each constraint forms an arc in the flow network, the number of arcs in the network is usually θ(n2). Even though polynomially solvable, min-cost network flow computation, such as described in R. K. Ahuja, T. L. Magnanti, and J. B. Orlin “Network Flows: Theory, Algorithms, and Application,” Prentice Hall (1993), over a dense circuit graph is usually expensive on large problems.
N. Shenoy and R. Rudell, in “Efficient implementation of retiming,” ICCAD, pages 226-233 (1994), were among the first to consider a practical implementation of the min-area retiming algorithm. They found that the storage requirement to compute the timing critical paths and the number of constraints are the bottleneck and proposed techniques to reduce memory usage and to prune some redundant constraints. Minaret, proposed by N. Maheshwari and S S. Sapatnekar in “Efficient retiming of large circuits,” IEEE TVLSI, 6(1):74-83, (March 1998), further pruned redundant constraints to reduce the size of the flow network by exploring the equivalence of retiming and clock skew optimization as proposed in ASTRA. See S. S. Sapatnekar and R. B Deokar “Utilizing the retiming-skew equivalence in a practical algorithm for retiming large circuits,” IEEE TCAD, 15(10):1237-1248 (October 1996). However, even with these pruning techniques, as experimental results indicate, the flow networks could still be very dense compared to the original circuit graphs. Experiments have shown that for a circuit with more than 180K gates Minaret had to formulate and solve a minimum cost network flow problem with more than 122M arcs, which used up more than 2 GB of virtual memory.
H. Zhou, in “Deriving a new efficient algorithm for min-period retiming,” Asia and South Pacific Design Automation Conference, Shanghai, China (January 2005) proposed an efficient incremental algorithm for minimum period retiming that iteratively moves FFs to decrease the clock period while the optimal solution is found in a short time. To overcome the expenses of existing approaches to minimum area retiming, D. P. Singh, V. Manohararajah, and S D Brown, in “Incremental retiming for FPGA physical synthesis,” DAC, pages 433-438, Anaheim, Calif. (June 2005) also proposed that FFs be incrementally moved in the circuit. However, since only those moves that are better in cost and feasible in timing are allowed, these approaches are heuristic and may end up with a suboptimal solution. An efficient incremental algorithm for minimum area retiming with provably optimal solution has been evasive.
In accordance with the present invention the disadvantages of prior retiming methods has been overcome. The method of retiming registers or flip flops in a circuit design in accordance with the present invention performs minimum area retiming to relocate the registers/flip flops in a circuit design in an efficient and optimal manner.
More particularly, the method of the present invention dynamically generates constraints; maintains the constraints as a regular forest; and incrementally relocates registers in the circuit to reduce the register area in the circuit and/or the number of registers in the circuit.
In accordance with one embodiment of the present invention the method includes determining whether there is a cluster of gates in the circuit wherein each gate of the cluster has a number of fanouts that is greater than the number of fanins of the gate and relocating the flip flops by iteratively moving a flip flop over the cluster.
In accordance with another embodiment of the present invention a constraint is generated only if it is directed to a path in the circuit that can lead to a reduction in the register area and/or number of registers in the circuit.
These and other advantages and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and the drawings.
a-2h are illustrations of the iterative retiming of a circuit represented as a graph of vertices and edges in accordance with the routine of
A synchronous sequential circuit is modeled by a directed graph G=(V,E) whose vertices V represent combinational gates and whose edges E represent signals between vertices. Nonnegative gate delays are given as vertex weights d: V→R* and the nonnegative numbers of FFs on the signals are given as edge weights w′: E→N. Given such a graph, the min-area retiming problem asks for an FF relocation w′: E→N such that the total FF area in the circuit is minimized while it works under a given clock period φ.
Conventionally, to guarantee that w′ is a relocation of w, a retiming is given by a vertex labeling r: V→Z representing the number of FFs moved backward over each gate from fanouts to fanins. Given r, the FF number on the edge (u, v) after retiming is wr(u,v)=w(u,v)+r(v)−r(u). A retiming r is valid if the FF number of every edge is nonnegative,
P0(r):(∀(u,v)∈E:wr(u,v)≧0)
For a circuit to work under a given clock period φ, the maximum combinational path delay in the circuit can be at most φ. To compute the maximum path delay, a vertex labeling of t: V→R is used to represent the arrival time at the output of each gate. A valid r retiming r is feasible for φ if the following condition holds for some arrival times t,
P1(r,φ):(∀(u,v)∈E:(wr(u,v)>0)V(t(v)≧d(v)+t(u)))^(∀v∈V:d(v)≦t(v)≦φ).
The total FF number is Σe∈ E(G)wr(e). For any vertex v∈ V, let FI(v) and F0(v) be the sets of the fanins and the fanouts of v respectively. To minimize, the total FF number is equivalent to maximizing the quantity Σv∈v(|FO(v)|−FI(v)|)r(v). More generally, b: V→R is defined to be the labeling that represents the reduction in FF area if one FF is moved from the fanouts of the given vertex to its fanins. Then the FF area reduction for the retiming r is Σv∈v(G) b(v)r(v). With these notations, the min-area retiming problem can be formally stated as follows.
For ease of presentation, b is extended to any graph X=(VX1Ex) with VX⊂ V and any I ⊂ V by defining b(X) that b(X)Σv
More complicated retiming problems can be solved in the same formulation of the problem stated above. One example is to consider the sharing of the FFs at the fanouts of a gate. As proposed by C. E. Leiserson and I. B Saxe in “Retiming synchronous circuitry,” Algorithmica, 6(1):5-35 (1991), this scenario is handled by including additional constraints in P0(r) and setting the labeling b accordingly Let wmax(u)=max(u,v)∈Ew(u, v) and assume that all the fanout edges of u have the same breadth B(u), which is the costs of adding a FF along each edge. For each vertex u where the FFs at the fanouts of u should be shared, a dummy vertex um is introduced For each fanout v of u, the breadth of the edge (u, v) is changed to
and one constraint is added to P0(r) by introducing the edge (v,um) to G with w(v,um)=wmax(u)−w(u, v) and the breadth
In Leiserson and Saxe's approach to the minimum-area retiming, two n×n matrices W and D are first computed to capture the critical timing constraints, and based on them, a dual of the min-cost network flow problem is formulated and solved. For any vertex pair (u, v), W(u, v) is the minimum number of FFs along any path from u to v, and D(u,v) is the maximum delay of the paths from u to v with W(u, v) FFs. If D(u,v)>φ, then there is a timing critical path from u and v and a critical timing constraint requiring at least one FF on the path should be generated. The dual of the minimum cost network flow problem is formulated to maximize the FF area reduction subject to the nonnegative FF number requirement and all the critical timing constraints. As W and D would usually be much denser than the circuit graph, the flow network would be dense when the given clock period is tight. Despite the many efforts as described in N. Shenoy and R. Rudell, “Efficient implementation of retiming,” ICCAD, pages 226-233 (1994) and N. Maheshwari and S S. Sapatnekar, “Efficient retiming of large circuits,” IEEE TVLSI, 6(1):74-83 (March 1998) to reduce the storage requirement for computing the critical timing constraints and to prune the redundant constraints, the large number of constraints is still the bottleneck for solving the min-area retiming problems.
To totally avoid the bottleneck, the method of the present invention does not compute the matrices W or D at all. The feasibility of clock period φ is checked by dynamically updating the gate arrival times and comparing them with φ, as in C. E. Leiserson and I. B Saxe “Retiming synchronous circuitry” Algorithmica, 6(1):5-35 (1991) and H. Zhou “Deriving a new efficient algorithm for min-period retiming,” Asia and South Pacific Design Automation Conference, Shanghai, China (January 2005). The objective in the retiming problem stated above indicates that, in order to improve a given solution, some vertices with b>0 must have their r increased. However, a vertex may not be independent if wr (u, v)=0 and increasing r(u) requires that r(v) be increased at the same time. It is not hard to maintain such a relation. However, a more involved case happens when the increase of rover a path extends it to be longer than φ. Incremental arrival time updating is used to identify such a situation, and the relation between the source u and sink v of the violating path is maintained as a constraint. It is revealing to note that D(u, v)>φ and r(v)+W(u, v)−r (i)=1 for such u and v. In other words, the method of the present invention dynamically identifies timing arcs in Leiserson and Saxe's flow network but only identifies the currently tight ones that “lie on the road to improvement.” The relations thus identified on normal circuit edges and on tight timing arcs are called active constraints. They force vertices with b(I)>0 to be bundled with vertices with b<0. When there is still a bundle I with b(I)>0, the objective can be improved by increasing r on I; otherwise, the current retiming is already optimal.
More specifically, as shown in
It is noted that keeping every identified active constraint in A is not efficient since it might make |A| very large. On the other hand, if not careful, removing some active constraints from A may not lead to algorithm convergence, since it is possible to have active constraints cycling in and out of A. The method of the present invention successfully overcomes the difficulty by maintaining A as a regular forest, which is a forest as discussed below, wherein |A| is at most n−1 while the termination of the algorithm is guaranteed. Because the method incrementally handles dynamically generated constraints in a regular forest, which can not be done by any existing algorithm, the method of the present invention is much more efficient when it is expensive to generate all of the constraints.
With regard to regular forests, a forest F with vertices V consists of rooted trees. For any vertex v ∈ V, let Tv be the subtree rooted at v for any non-root vertex v ∈ V let pv be its parent. A labeling B: V→R is maintained such that B(v)=B(Tv) For any non-root vertex v ∈ V, a direction is assigned to the edge {pv, v} such that an active constraint can be derived from the edge. A labeling U(v) is used to maintain the direction: if U(v)=true, then (v, pu) is the active constraint; and if U(v)=false, then {pv, v} is the active constraint. Let A(F) be the set of the active constraints derived from the edges of F. A tree T is defined to be regular if for any vertex v of T that is not the root of T, the following conditions hold, which are illustrated in
1. if b(T)>0, then (U(v)(B)(v)>0))(U(v)(B(v)≦0));
2. if b(T)=0, then (U(v)(B)(v)>0))(U(v)(B(v)<0));
3. if b(T)<0, then (U(v)(B)(v)≧0))(U(v)(B(v)<0));
A tree is defined to be almost regular if the inequalities B(v)<0 and B(v)>0 in the above conditions are substituted with B(v)≦0 and B(v)≧0 respectively. Further, the forest F is defined to be regular if any tree T in the forest is regular.
A tree T is positive if b(T)>0. A tree T is respectively zero and negative if b(T)=0 and b(T)<0 respectively. Let P(F) be the set of all of the positive trees, Z(F) be the set of all of the zero trees and N(F) be the set of all of the negative trees in F. Let the vertices in P(F) be VP(F). If P(F)≠φ, then I=VP(F) is positive and closed under A(F).
Lemma states 1 and 2 are as follows.
The forest F is stored in an adjacency list data structure using O(n) storage. It is assumed that there are two operations that can be completed with O(n) time and space. The first one is CreateTree (F,v), which either removes the edge {pu, v} from the forest if v is not a root, or keeps F unchanged if v is a root. The second one is MergeTree (F, u, v), which assumes that v is the root of a tree not containing u and makes u the parent of v. The subroutine ChangeRoot (F,v) as show in
A forest with no edge may be designated as F0. That is, every vertex is a tree in F0. The method of
Note that b(P(F))≧0 always holds and P(F)=φ is equivalent to b(P(F))=0. Intuitively, either a positive tree is combined with a negative tree to reduce b(P(F)), or a positive tree is combined with a zero tree in order to expand P(F) such that b(P(F)) can be reduced later. Such progress is captured by a potential tuple,
Ψ(F)(b(P(F)),n−|VP(F)|),
with the lexicographic ordering, i.e., for Ψ (F)=(x,y) and Ψ (F′)=(x′, y′), Ψ(F′)≦Ψ (F′) if x<x′ or (x=x′)(y<y′). Assuming that the additional active constraint is (u, V) satisfying u ∈ VP(F) and v∉ VP(F), the UpdateForest subroutine as shown in
The ZeroCut subroutine as shown in
The method of
The preconditions for lines 9 and 10 are established by the following lemma.
When optimality is not determined at line 6, either the FF area of r will be strictly decreased by b(I)>0 for some I⊂V and Ψ(F) remains the same on line 12, or Ψ(F) will be strictly decreased and the FF area of r remains the same on line 8 and 10 according to Lemma 5. Since the problem is bounded, the number of the subsets of V is finite, and the number of the regular forests with vertices V is finite, we can terminate the method where r in an optimal solution of the minimum area re-timing problem. Together with Lemma 2 and 6, we have the following theorem,
The method of
It is noted that it is not necessary to generate I on line 4 of
It is also not efficient to check every fanout edge of I, to compute the labelings t and q in rI, and to check every vertex every time when the algorithm reaches line 7 and 9. The constraints should be checked incrementally, i.e. the constraints that are known to hold should be excluded from being checked, and the labelings t and q should be updated incrementally. Two vertex queues I and K are maintained for such purpose. For any vertex u∉J, if u ∈ I, then for any edge (u, v), either v ∉ I, or wr(u,v)>0. For any vertex u ∉K and any vertex v in the combinational fanin cone of u (including i) in rI, t(v) and q(v) are up-to-date, and t(v)≧φ. On line 7, a vertex u is repeatedly removed from J until an edge (u,v) leaving, I satisfying wr(u,v)=0 is found or J is empty. On line 9 of
If the sharing of the FFs at the fanouts of gates is considered, redundant constraints can be introduced to PO. Let u be any vertex with the dummy vertex um and let v be a fanout of u. In P0(r), we should have w(u,v)+r(v)−r(u)≧0 and wmax(u)−w(u, v)+7 (um)−r(v)≧0. Thus, wmax(u)+r (um)−r (u)≧0. This redundant constraint is inserted to PO and is checked first on line 7 after u is removed from the vertex queue J. The effect is that when both (u, v) and (v, um) are active constraints, (u, um) are directly identified as an active constraint and thus u and um are included in one regular tree without requiring a detour to v. As b(u)>0 and b(u)+b(um)=0 for most u, b(P(F)) is reduced more frequently without the necessity to expand P(F) first and the method runs faster.
The retiming method of the present invention has been found to be substantially faster than prior methods and it uses considerably less memory. Moreover, because the method is incremental, it can be stopped at any time that the circuit designer is satisfied with the register/FF area or number of registers/FFs.
Many further modifications and variations of the present invention are possible in light of the above teachings. Thus, it is to be understood that, within the scope of the appended claims, the invention may be practiced otherwise than are described here in above.
This work was supported by the National Science Foundation (NSF) Grant No. CCR-0238484.
Number | Name | Date | Kind |
---|---|---|---|
5392430 | Chen et al. | Feb 1995 | A |
5442563 | Lee | Aug 1995 | A |
5502647 | Chakradhar et al. | Mar 1996 | A |
7120883 | van Antwerpen et al. | Oct 2006 | B1 |
7200822 | McElvain | Apr 2007 | B1 |
7203919 | Suaris et al. | Apr 2007 | B2 |
7296246 | Kuehlmann et al. | Nov 2007 | B1 |
7350166 | Baumgartner et al. | Mar 2008 | B2 |
7366652 | Wang et al. | Apr 2008 | B2 |
7516139 | Ziemann et al. | Apr 2009 | B2 |
7624364 | Albrecht et al. | Nov 2009 | B2 |
7743354 | Albrecht et al. | Jun 2010 | B2 |
7917874 | Baumgartner et al. | Mar 2011 | B2 |
7945880 | Albrecht et al. | May 2011 | B1 |
8423939 | Hurst | Apr 2013 | B1 |
20040221249 | Lahner et al. | Nov 2004 | A1 |
20080235637 | Baumgartner et al. | Sep 2008 | A1 |
20100102825 | Bushnell et al. | Apr 2010 | A1 |
20100115477 | Albrecht et al. | May 2010 | A1 |
Entry |
---|
Sapatnekar et al.; “Utilizing the retiming-skew equivalence in a practical algorithm for retiming large circuits”; Publication Year: 1996; Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on; vol. 15 , Issue: 10 pp. 1237-1248. |
Wang et al.; “An efficient incremental algorithm for min-area retiming”; Publication Year: 2008; Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE; pp. 528-533. |
Lalgudi et al.; “Retiming edge-triggered circuits under general delay models”; Publication Year: 1997; Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on; vol. 16 , Issue: 12; pp. 1393-1408. |
Baumgartner et al.; “Min-area retiming on flexible circuit structures”; Publication Year: 2001; Computer Aided Design, 2001. ICCAD 2001. IEEE/ACM International Conference on; pp. 176-182. |
Sundararajanet al.; “Marsh:min-area retiming with setup and hold constraints”; Publication Year: 1999; Computer-Aided Design, 1999. Digest of Technical Papers. 1999 IEEE/ACM International Conference on; pp. 2-6. |
Hurst, et al.; “Scalable min-register retiming under timing and initializability constraints ”; Publication Year: 2008; Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE; pp. 534-539. |
Maheshwari et al.; “Minimum area retiming with equivalent initial states”; Publication Year: 1997; Computer-Aided Design, 1997. Digest of Technical Papers., 1997 IEEE/ACM International Conference on; pp. 216-219. |
Jia Wang et al.; “An efficient incremental algorithm for min-area retiming”; Publication Year: 2008 Design Automation Conference, 2008. DAC 2008. 45th ACM/IEEE pp. 528-533. |
Lin et al.; “Optimal wire retiming without binary search”; Publication Year: 2006; Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on; vol. 25 , Issue: 9; pp. 1577-1588. |
International Search Report and the Written Opinion of the International Searching Authority in International Application No. PCT/US2009/032810. |
Number | Date | Country | |
---|---|---|---|
20090199146 A1 | Aug 2009 | US |
Number | Date | Country | |
---|---|---|---|
61025548 | Feb 2008 | US |