In Electronic Computer Aided Design (ECAD) software systems, an integrated circuit design specification and implementation data must be stored as a set of database records, and these records have some finite maximum size based on the virtual memory capacity of the computer on which the software is running. In addition, the execution time of the ECAD software normally increases with the size of the design. The data to represent a very large integrated circuit design may be too large to fit in a computer's memory, or the execution time required to design or simulate the entire design may be prohibitive. This is particularly true where the number of components (i.e. gates) and attendant connections within an integrated circuit are in the 10s or 100s of millions or more.
Hierarchical decomposition or “partitioning” is a technique which may be used to reduce the complexity of a large integrated circuit design specification so that the memory and/or execution time required to complete the design remains manageable. Instead of representing the design as a single flat database, the design is partitioned into pieces, often called “blocks”, which can be designed and verified independently. With a given single level of hierarchy, the design specification consists of a set of blocks and the top-level interconnections between those blocks. With multiple levels of hierarchy the blocks may themselves consist of smaller sub-blocks and their interconnections.
Hierarchical decomposition may also be used simply as an organizational tool by a design team as a method for partitioning a design project among several designers. However, this logical hierarchy created by the design team in the design specification does not need to be the same as the physical hierarchy used to partition the design for implementation. Often the logical hierarchy is much deeper than the physical hierarchy. A process of block flattening may be used to transform the logical hierarchy into an appropriate physical hierarchy.
A conventional hierarchical design project typically proceeds in two major steps: a top-down block planning step followed by a bottom-up verification step. If the blocks themselves are implemented during the top-down phase (i.e. each block is implemented before its children) the flow is referred to as a top-down flow. Conversely, if the blocks are implemented during the bottom-up phase (i.e. each block is implemented after all of its children have been completed) the flow is referred to as a bottom-up flow. The top-down and bottom-up flows each have their advantages and disadvantages. Without loss of generality, a top-down flow is used as an example in the remainder of this document. A bottom-up flow could be implemented using identical techniques.
At this point in a top-down flow, after the top-level block has been planned, the process is prepared to implement the block. All leaf-cells (standard cells and macros) owned by the block are assigned a placement, and all nets owned by the block are routed (step 140). If any of the nets were routed over the sub-blocks (so-called “feedthrough nets”) these wires are pushed down into the sub-blocks that they overlap, and new pins are created on the sub-block where the wires cross the sub-block boundaries (step 145). Then, recursively implement the sub-blocks according to the same process (step 150). This involves recursively performing steps 110 to 170 while treating each sub-block as the top-level block.
For the above process to complete successfully the shapes, pin locations, and timing budgets assigned to each block (steps 115 through 135) must represent achievable constraints. Otherwise the system may not be able to complete the implementation of some blocks according to their specifications. In such a case the specifications may need to be refined and the top-down process may need to be repeated before a correct implementation can be realized. Such an iterative refinement is time-consuming and should be avoided. Thus, methods for achieving high-quality results in these steps are of critical importance.
When the recursive top-down planning and implementation step is complete the bottom-up verification process can commence. Proceeding from the lowest-level blocks toward the top-level, each block is independently analyzed for logical correctness, as well as its timing and electrical performance, and compared against its specification (step 155). After all sub-blocks of a block have been independently verified the block itself can be analyzed (step 170), under the assumption that the sub-blocks are correct.
To work on an individual module, a designer or software tool requires a representation of the environment in which that module must operate. This includes the physical shape of the space in which the module is placed, the location of its input and output pins, power and other important signal routing information, the operating conditions (temperature and voltage), the expected details of the process used to fabricate the module, and the timing characteristics of the interface between the module and its environment. The focus of this method is to provide a representation of the timing characteristics. The problem is complicated by the fact that this representation must be generated before other modules or the top-level netlist has been completed. The result is that the timing characteristics used for design must be an approximation of the timing characteristics of the final product.
This set of timing characteristics is called the “timing budget” of a module. Good timing budgets must have the following characteristics-Completeness, Balance, and Achievability.
Completeness describes the characteristics of a budget at the block boundary. A complete timing budget describes the entire relevant context of a module. It should include signal arrival time constraints for all input pins (including bidirectional pins) and signal required time constraints for all output pins (also including bidirectional pins). It should include descriptions of all clocks that are applied as inputs of the design, generated within the design, or used as a reference for the timing constraints applied at the module outputs. It should also include any other special constraints that must be satisfied inside the module, such as global limits on signal transition times (i.e. slew limits). When the timing budget for a module is incomplete, the module cannot be fully designed without its context and the final design is likely to contain errors associated with violated constraints that were omitted from the budget. This is the minimal requirement for a timing budget.
Next, for successful integration of the top-level design, a set of timing budgets must be balanced. Balance describes the relationship between a budget, the top-level timing, and other budgets in the design. Balanced timing budgets guarantee that if all modules' timing constraints are satisfied, the top-level timing constraints will also be satisfied. When timing budgets are unbalanced, designers are forced to rework the final design to resolve problems that appear during integration of the top level. This rework often occurs very late in the design process and may require drastic and painful changes. Failure to generate balanced timing budgets may be seen as a lack of design discipline that has delayed timing closure in design methodologies.
The requirements of completeness and balance make achievability the most challenging aspect of the time budgeting problem. Achievability relates to the relationship between a timing budget and the block to which it applies. The difficulty is to create budgets that are achievable while maintaining balance and completeness. To achieve rapid design closure, it is crucial for the designer or design tool to have the ability to meet the timing constraints that are specified for each module. When the timing budget for a module is unachievable, designers are forced into a difficult cycle of iterative implementation and renegotiation of budgets. Each iteration attempts to resolve the conflicts between the timing requirements of the top-level netlist and other modules in a design and the difficulties found in implementing the module being designed. The inability to measure achievability is the biggest problem faced by design teams today and is the largest contributor to the failure to achieve timing closure that is experienced in contemporary design methodologies.
There is a need to provide a timing budgeting solution that better conforms to the completeness, balance, and achievability necessary but that does so in a resource-utilization friendly manner.
What is disclosed is a method for budgeting timing used in producing an integrated circuit design. The circuit design has register cells and combinational logic cells, and has a representation that is hierarchically decomposed into a top-level and a plurality of blocks. At least some of the plurality of blocks are capable of being further hierarchically decomposed. The register cells and combinational logic cells have at least one cell pin. The blocks have boundaries, and these block boundaries represented by at least one block pin. The method includes:
These and other objects, features and advantages of the present invention are better understood by reading the following detailed description, taken in conjunction with the accompanying drawings, in which:
One way of implementing the top-down hierarchical design process is the hierarchical design flow shown and described in FIG. 2. The design flow shown in
During the top-down budgeting step one objective is to analyze the combinational logic paths (combinational logic gates between registers (latches and/or flip-flops)) that cross one or more hierarchical boundaries, and determine what fraction of the clock cycle should be budgeted for each segment of the path.
During the top-down block implementation step, a block is placed and routed before its sub-blocks have been implemented. In most cases the placement and routing is fairly decoupled across hierarchical boundaries. However, many modern manufacturing processes require the routing wires to obey a set of rules called “antenna rules” that require detailed knowledge of the routing wires present on both sides of a hierarchical boundary.
During the bottom-up verification process there is also a need to analyze the combinational logic paths that cross the hierarchical boundaries. When analyzing a block that contains sub-blocks, it would be desirable to take advantage of the fact that the sub-blocks have been pre-verified, avoiding the need to re-analyze the sub-blocks while analyzing their parents.
To address this, some embodiments of the invention disclose, the use of a reduced model, referred to as a block “abstraction”, that captures the structure and behavior of the block in sufficient detail that the interface with its parent block and its sibling blocks may be correctly analyzed. The goal of the abstraction is to reduce the amount of memory required to represent a block to its ancestors in the hierarchy, and reduce the amount of execution time required to analyze each instance of the block in the context of its parents and sibling blocks.
As mentioned above, in this regard, the hierarchical design flow of
One key difference between a top-down block implementation flow and a bottom-up block implementation flow is that, in the former, a block is implemented before its children, while in the latter a block is implemented after its children. The hierarchical implementation flow in
This design process is further detailed in a co-pending patent application entitled “Representing the Design of a Sub-module in a Hierarchical Integrated Circuit Design and Analysis System,” filed on Jun. 10, 2002 (Attorney's reference number 054355-0293259). One critical step in the overall design process is time budgeting (as outlined in step 235).
Time budgeting, in one embodiment of the invention, First, according to step 310, optimize paths between register cells of the top-level and register cells of the blocks and/or abstractions of the blocks. Next, according to step 312, optionally partition the blocks into clusters. Also, optionally, whether or not step 312 is performed, according to step 314, perform a placement of the clusters (if any) or the cells in the design. Next, in step 316, optionally perform a routing between the placed cells. This routing is often referred to as global routing. Then, according to block 320, optionally buffer long nets between blocks. Next, a timing analysis of the top-level and then the blocks (and/or abstractions) is performed (block 330) resulting in arrival times. One key aspect of the invention is that the timing analysis is based upon gains of cells. Finally, time budgets can be derived (block 340) by allocating delays (using gains) to achieve zero slack and examining the arrival times at pins on the block boundaries.
The time budgeting method above can be implemented by the integration of several components into a common platform. These include:
For inputs, the time budgeting process of
In accordance with one aspect of the invention, the next step is to process the library to create “supercells” (block 425). A “supercell” refers to a family of gates with common pins and function. This family would ideally include a wide range of device sizes with different input capacitances and output drive strengths. The delay of a supercell is characterized as a function of its gain and if available, the input transition time of the supercell. For ease of analysis, delays are characterized as a function of a scaled gain that allows considering a gain of 1.0 as a “good” gain. This unit gain is loosely related to the gain of an inverter driving a “typical” fan-out of approximately 4. Delay varies roughly linearly with gain and increases as gain is increased and falls as gain is reduced.
If any one or more of clustering, placement or routing are desired/required (checked at step 430), then these optional routines are performed (step 435). Clustering reduces the number of objects being placed, which can improve the performance of the global placement step. Placement gives initial locations for all cells. If a cell was placed as part of a clusters, its location is taken from the cluster location. This global placement should be done “virtually flat”, ignoring hierarchy boundaries. Global routing increases the accuracy of the wire models.
Next, gains are adjusted until top-level timing constraints are satisfied (step 440). Gains can be adjusted using an enhanced zero-slack trimming algorithm which is discussed below with respect to FIG. 5. Also, in accordance with some embodiments of the invention, these gains can be checked to measure achievability (step 445). In implementing an embodiment of the invention, as one condition, no gain can be less than 0.2, although in many situations, tighter bounds may be more appropriate. In general, achievability is measured as a function of the gain profile of the cells in a design and observe design experiments that indicate that as the percentage of cells with gains less than 1.0 increases above 2.5%, it becomes increasingly difficult to achieve design convergence and note that when all cells have gains greater than 1.0, design success is virtually guaranteed.
Zero-slack Based Analysis
The most preferred timing analysis results for generating a budget are those in which all slacks in a design are zero. Slack measures how closely a timing constraint is satisfied. Positive slack indicates that a constraint is satisfied with a safety margin equal to the slack value. Circuits with positive slack are usually considered to be overdesigned, since the slack indicates that the circuit could either be operated at a higher speed or redesigned to operate at the same speed using less area or power. Negative slack indicates that a constraint is unsatisfied and cannot be satisfied unless delays in the circuit are modified by the amount of the slack. Ideally, zero slack indicates that constraints are exactly satisfied with no margin for error and no unnecessarily wasted resources. However, it is rare to find a circuit for which all timing constraints have slack of exactly zero. Even when the most critical paths in a design have zero slack, most of the remaining paths have slacks that are positive by a large margin.
The invention in various embodiments utilizes a novel zero-slack algorithm to apportion slack along a path. Zero-slack algorithms typically work by increasing or reducing delay budgets for individual gates and wires until the slack of the circuit based on the budgeted delays is zero. While many different conventional zero-slack algorithms are in vogue, they are often ad hoc heuristic algorithms that rely on trial-and-error by the algorithm designer to obtain the best method to modify gate delays. Details of a novel zero-slack algorithm are presented with reference to FIG. 5 and described below.
The trimming algorithm works on networks of supercells. Each supercell represents the delay of a family of gates (such as 2-input AND gates). The delay of a supercell is a function of its gain, which determines the ratio of output capacitance to gate input capacitance. Increasing the gain of a supercell increases its delay, and reducing gain reduces delay.
The trimming algorithm adjusts the drive strengths of gates until all slacks in the circuits are zero or all gates to be changed are at their maximum or minimum allowable gain. It begins with all supercells initialized to a nominal gain value (block E10). Then, static timing analysis is used to compute the worst timing slack on each gate (block 520). The gains of each gate are then adjusted by an amount that depends on this slack value if this slack value is non-zero (blocks 525 and 530). Gates with negative slack have their gains reduced (to make the gates faster) and gates with positive slack have their gains increased (since these gates can be made slower). The size of each gain adjustment is chosen to make the adjustment process converge smoothly. The amount of each change is related to the magnitude of the slack; larger gain reductions are made for gates with large negative slack than for gates with small negative slack. Similarly, larger gain increases are made for gates with large positive slack than for gates with small positive slack. The amount of each change is also related to the length of the critical path through each gate. Smaller changes are made to gates that lie on long paths (paths with a large number of gates) than gates that lie on shorter paths. After these changes are made, a new static timing analysis is performed, and new set of gain adjustments is made. The process stops when no more changes can be made, either because all slacks are zero (and no changes are necessary) or all the gates to be changed are already set to have their largest or smallest possible gain.
Although the trimming algorithm specifically adjusts delays of cells, it can also include the effects of wire delays. Such an inclusion is possible when performing initial placement and routing in the disclosed budgeting procedure. A placement allows modeling wire delays as a function of the distance between cells. If a routing is also done, the timing analysis can model wire delays even more accurately by following the globally routed path of each wire. It is even possible to interleave global placement and routing between steps of the trimming algorithm so that the placement and sizing converges simultaneously.
The timing of this circuit is dependent on what happens inside the block as well as what happens outside the block. However, some paths inside and outside block 650 are unrelated to the timing at the boundary of the top-level and block 650. During budgeting, these paths can be completely ignored. For example, the path through combinational logic block 625 is contained entirely within the block. With the exception of the clock input, the timing of this path is unaffected by all signals at the block boundary. And as long as the clock period is held constant, the timing of this path is unaffected by the specific time at which clock signals enter the block.
Those gates/elements which do not contribute to the timing at the boundary can be discarded according to the abstraction process discussed above. This abstraction retains the parasitic information needed while discarding what is irrelevant. For the example in
The initial placement provides useful estimates of the delays of wires in a design. When there are too many cells in the netlists to place individually, one can use a clustering algorithm to merge them together into groups which are then placed based on the connectivity between groups. The location of individual cells is then derived from the locations of the placed clusters.
With an initial placement, one can use Manhattan-based wire models to compute estimated resistance, capacitance, and delays of individual wires in the design. Although the cell delay models presented herein do not depend on capacitance, the wire delay estimates are extremely useful for improving the zero-slack trimming algorithm that seeks to optimize a design by adjusting the gain of each cell.
The trimming algorithm iteratively interleaves a global timing analysis with a gain-adjustment step that can potentially adjust the gain for every cell in the design. In each gain-adjustment step, the gain at each gate is smoothly increased or decreased by an amount proportional to the local slack at the gate and inversely proportional to the maximum topological path length through the gate. This ensures that the gains along critical paths move smoothly and simultaneously toward their final trimmed value and that the trimmed gains for each cell along a critical path are equal.
Trimming is not the only tool available for timing optimization. As a workaround for situations in which trimming produces dangerously low gains, optional steps can be performed to fix structural problems in the netlist. Such steps include:
Shell abstractions can be used to cut out parts of the modules that are not visible at the interface. These parts should be excluded from the top-level timing analysis and any trimming or structural optimization that is done at the top level. Similarly, the top-level netlist can be pruned to remove parts of the netlist that are not visible to specific blocks for which budgets are needed.
Creating Timing Constraints for Modules
Budgets may be generated by converting properties of the top-level model into constraints for lower level modules (step 450). When a zero-slack timing solution has been obtained, the arrival and required times at each node in the circuit are equal. At module boundaries, these times may be interpreted as budgeted values. For module inputs, they represent arrival times, the latest or earliest times that signal transitions are presented to the block boundary from the environment. For module outputs, these times represent required times, the latest or earliest times that signal transitions at module outputs can occur without causing a timing failure in another part of the circuit.
Input arrival and output required times may be relative to one or more reference clocks, and it is also possible for clocks to enter a module to control internal storage elements. As a result, it is also desirable to include definitions of these clocks in the timing budget. These clock definitions are represented by special constraints that describe clocks in the timing model for the budgeted module. The difference between the ideal time of each clock and the actual time at which the clock arrives at the input is represented with another special constraint that specifies the external latency of a clock.
A few other miscellaneous timing constraints in the top-level timing model are copied down into the timing budget for a module. User-specified constants indicate that certain nets are always at constant logic values. This indicates that the timing of signal transitions on these nets can be ignored and also allows other constants to be derived by combining user-specified constants with the logical function of the gates in the design. All user-specified or derived constants that affect a module are included in its budget. Also, any constraints that represent limits and margins are also copied down into the timing budget for the module.
There is one remaining type of constraint which must be represented in timing budgets; these constraints are called path exceptions. Path exceptions describe exceptions to the normal rules of timing analysis and are applied to user-specified paths in a circuit. They identify false paths, multicycle paths, and paths that are constrained to have either a minimum or maximum specified delay. All path exceptions that affect a module should be included in its budget. However, many path exceptions may refer to pins outside the module itself. This requires us to rewrite these exceptions into a form suitable for timing analysis of the module.
To explain this rewriting, consider one way in which path exceptions may be supported in a static timing analyzer. To properly apply a path exception, it is necessary to partition the signal arrival times at a pin into two groups, a group which is affected by the exception and a group which is not. To enable this partitioning, associate a special symbol, called a “mark”, with each pin which is named in a path exception. Each pin may have a unique mark, or to reduce resource usage, marks may be assigned to groups of pins. These marks are then associated with the signal arrival time at each marked pin and are propagated to all arrival times that are dependent on the arrival times at marked pins. Any pin in a design may have a number of different arrival times associated with it, each arrival time being identified with a different combination of marks. This can occur because a number of different paths may exist to any pin in a design, and some of those paths may be affected by path exceptions while others are not. Because each marked arrival time may be affected by a different path exception, each marked arrival time associated with a pin may have a different required time and corresponding slack value.
For example, a-multicycle path constraint specified from a pin named “A” and through a pin named “B” would create a mark for all arrival times that result from paths through pin A and a second mark for all arrival times that result from paths through pin B. At any endpoint, only arrival times with both marks A and B would be affected by the exception.
When an exception refers to pins outside a module, it is modified by replacing each reference to an external pin with the name of the associated mark. The mark is essentially an alias for the external pin; it is included explicitly because one cannot directly refer to the external pin.
To support path exceptions that cross module inputs, make the arrival time constraints for module inputs specifically associate marks with the arrival times that they present to the module inputs. Each module input may thus have a number of arrival times, which allows us to associate different arrival times with different paths through the block inputs.
To support path exceptions that cross module outputs, make the required time constraints at module outputs specifically associate marks with the required times being enforced at the module outputs. Each marked required time will only apply to an arrival time with the exact same marking. This allows associating different required times with different paths through block outputs. If a path exception is wholly contained inside a module, there is no need for it to be modified, since all of the associated pins are available inside the module.
One of ordinary skill in the art may program computer system 710 to perform the task of budgeting through zero-slack trimming algorithms and static timing analysis as set forth in various embodiments of the invention. Such program code may be executed using a processor 712 such as CPU (Central Processing Unit) and a memory 711, such as RAM (Random Access Memory), which is used to store/load instructions, addresses and result data as needed. The application(s) used to perform the functions of time budgeting and timing analysis may derive from an executable compiled from source code written in a language such as C++. The executable may be loaded into memory 711 and its instructions executed by processor 712. The instructions of that executable file, which correspond with instructions necessary to perform time budgeting and timing analysis, may be stored to a disk 718, such as a floppy drive, hard drive or optical drive 717, or memory 711. The various inputs such as the netlist(s), constraints, delays, capacitances, wire models, cell library and other such information may be written to/accessed from disk 718, optical drive 717 or even via network 700 in the form of databases and/or flat files.
Computer system 710 has a system bus 713 which facilitates information transfer to/from the processor 712 and memory 711 and a bridge 714 which couples to an I/O bus 715. I/O bus 715 connects various I/O devices such as a network interface card (NIC) 716, disk 718 and optical drive 717 to the system memory 711 and processor 712. Many such combinations of I/O devices, buses and bridges can be utilized with the invention and the combination shown is merely illustrative of one such possible combination.
The present invention has been described above in connection with a preferred embodiment thereof; however, this has been done for purposes of illustration only, and the invention is not so limited. Indeed, variations of the invention will be readily apparent to those skilled in the art and also fall within the scope of the invention.
This application claims priority from a provisional patent application entitled “Method for Generating Design Constraints for Modules in a Hierarchical Integrated Circuit Design System”, filed on Jun. 8, 2001, and bearing Ser. No. 60/296,792.
Number | Name | Date | Kind |
---|---|---|---|
5067091 | Nakazawa | Nov 1991 | A |
5521837 | Frankle et al. | May 1996 | A |
5778216 | Venkatesh | Jul 1998 | A |
5903471 | Pullela et al. | May 1999 | A |
6086621 | Ginetti et al. | Jul 2000 | A |
6145117 | Eng | Nov 2000 | A |
6167557 | Kudva et al. | Dec 2000 | A |
6367060 | Cheng et al. | Apr 2002 | B1 |
6415426 | Chang et al. | Jul 2002 | B1 |
6453446 | van Ginneken | Sep 2002 | B1 |
20030014720 | Ito et al. | Jan 2003 | A1 |
Number | Date | Country |
---|---|---|
WO 9934310 | Jul 1999 | WO |
Number | Date | Country | |
---|---|---|---|
20030009734 A1 | Jan 2003 | US |
Number | Date | Country | |
---|---|---|---|
60296792 | Jun 2001 | US |