1. Technical Field
The present invention generally relates to integrated circuit design tools and in particular to integrated circuit design tools that optimize area performance and signal integrity in integrated circuits.
2. Description of the Related Art
Existing methods have sought to improve the placement of negative-slack gates of a circuit in a physical synthesis flow. While several solutions to this problem have existed in the past, there are several drawbacks to these existing solutions. One major drawback of existing solutions is that existing solutions consider only the placement of a single, movable gate within an integrated circuit design. In addition, existing physical synthesis optimization methods consider gates (i.e., clocked repeaters and unclocked repeaters, such as buffers and inverters) unmovable that are adjacent to the single, movable gate, which can possibly over constrain gate placement optimization efforts.
Disclosed are a method, system, and computer program product for optimizing the placement of movable gates of a circuit in a physical synthesis flow. A Rip Up and Move Boxes Linear Evaluation (RUMBLE) utility optimizes a timing state of an original subcircuit by determining a new optimized placement(s) of movable gate(s) while accounting for future interconnect optimizations. The RUMBLE utility: (a) identifies and selects movable gate(s) for timing-driven placement optimization; (b) isolates an original subcircuit associated with the movable gate(s); (c) builds an unbuffered RUMBLE model of the original subcircuit; (d) yields a new optimized placement(s) of movable gate(s) using a RUMBLE mathematical program to optimize timing state of the original subcircuit while accounting for the future interconnect optimization (i.e., unclocked repeater insertions, gate re-sizing); (e) creates a RUMBLE tree cache for each non-repeater gate output pin of the original subcircuit; (i) disconnects all tree cache end points from the original subcircuit; (g) creates a new subcircuit by connecting new unoptimized nets to corresponding tree cache end points; (h) evaluates whether a timing degradation exists in the new subcircuit; (i) restores the original subcircuit if a timing degradation exists in the new subcircuit; and (j) retains the new subcircuit if there is no timing degradation in the new subcircuit. According to one embodiment, the RUMBLE utility removes at least one buffer tree before yielding a new optimized placement(s) of the movable gate(s).
The above, as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The illustrative embodiments provide a method, system, and computer program product for optimizing the placement of logic gates of a subcircuit in a physical synthesis flow, in accordance with one embodiment of the invention. Physical synthesis is the process of creating a specification for a physical integrated circuit (IC) given a logic circuit specification. As utilized herein, a logic gate is a computer circuit with several inputs but only one output that can be activated by particular combinations of inputs. Moreover, combinations of logic gates are used to store information in sequential logic systems, forming a latch. In order to improve the overall circuit timing of a subcircuit, one or more movable logic gates are placed on a timing-driven basis by directly maximizing a source-to-sink timing arc.
In the following detailed description of exemplary embodiments of the invention, specific exemplary embodiments in which the invention may be practiced are described in sufficient detail to enable those skilled in the art to practice the invention, and it is to be understood that other embodiments may be utilized and that logical, architectural, programmatic, mechanical, electrical and other changes may be made without departing from the spirit or scope of the present invention. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the present invention is defined only by the appended claims.
It is understood that the use of specific component, device and/or parameter names are for example only and not meant to imply any limitations on the invention. The invention may thus be implemented with different nomenclature/terminology utilized to describe the components/devices/parameters herein, without limitation. Each term utilized herein is to be given its broadest interpretation given the context in which that term is utilized.
With reference now to
Notably, in addition to the above described hardware components of DPS 100, various features of the invention are completed via software (or firmware) code or logic stored within system memory 115 or other storage (e.g., storage 117) and executed by CPU 105. In one embodiment, data/instructions/code from storage 117 populates the system memory 115, which is also coupled to system bus 110. System memory 115 is defined as a lowest level of volatile memory (not shown), including, but not limited to, cache memory, registers, and buffers. Thus, illustrated within system memory 115 are a number of software/firmware components, including operating system (OS) 130 (e.g., Microsoft Windows®, a trademark of Microsoft Corp; or GNU®/Linux®, registered trademarks of the Free Software Foundation and The Linux Mark Institute; or Advanced Interactive eXecutive—AIX—, registered trademark of International Business Machines—IBM), applications (APP) 135, and Rip Up and Move Boxes with Linear Evaluation (RUMBLE) utility 145. In actual implementation, components or code of OS 130 may be combined with those of RUMBLE utility 145, collectively providing the various functional features of the invention when the corresponding code is executed by the CPU 105. For simplicity, RUMBLE utility 145 is illustrated and described as a stand alone or separate software/firmware component, which is stored in system memory 115 to provide/support the specific novel functions described herein.
CPU 105 executes RUMBLE utility 145 as well as OS 130, which supports the user interface (UI) features of RUMBLE utility 145. In the illustrative embodiment, RUMBLE utility 145 optimizes a timing state of an original subcircuit by determining a new optimized placement(s) of movable gate(s) while accounting for future interconnect optimizations (i.e., unclocked repeater insertions, gate re-sizing, and the like). Among the software code/instructions provided by RUMBLE utility 145, and which are specific to the invention, are: (a) code for identifying and selecting movable gate(s) for timing-driven placement optimization; (b) code for isolating an original subcircuit corresponding to the movable gate(s); (c) code for building an unbuffered RUMBLE model of the original subcircuit; (d) code for determining new optimized placement(s) of movable gate(s) using RUMBLE mathematical program to optimize timing state of the original subcircuit while accounting for future interconnect optimization; (e) code for creating a tree cache for each non-repeater gate output pin of the original subcircuit; (f) code for disconnecting all tree cache end points from the original subcircuit; (g) code for creating a new subcircuit by connecting new unoptimized nets to corresponding tree cache end points. For simplicity of the description, the collective body of code that enables these various features is referred to herein as RUMBLE utility 145. According to the illustrative embodiment, when CPU 105 executes RUMBLE utility 145, DPS 100 initiates a series of functional processes that enable the above functional features as well as additional features/functionality, which are described below within the description of
Those of ordinary skill in the art will appreciate that the hardware and basic configuration depicted in
Within the descriptions of the figures, similar elements are provided similar names and reference numerals as those of the previous figure(s). Where a later figure utilizes the element in a different context or with different functionality, the element is provided a different leading numeral representative of the figure number (e.g., 1xx for FIGS. 1 and 2xx for
With collective reference now to
The process of
As used herein:
a “timing point” is a vertex in a timing graph; conventionally, all gate pins (input or output) in a circuit have an associated timing point;
a “slack” at a timing point is defined as the difference between the required arrival time (RAT) at the timing point and the actual arrival time (AAT) at the timing point. A negative slack value indicates that the signal that is sent to the input of the timing point is actually arriving beyond its required arrival time. A positive slack value indicates that the signal is arriving before its required arrival time.
a “critical gate” is a gate that is characterized as having a negative slack value;
a “critical path” is a sequence of connected gates, which are all characterized as having a negative slack value; and
a “slack differential” is defined as the difference between the smallest slack value of an output timing point and the largest slack value of an input timing point; or vice versa. A large slack differential, especially when either the input timing point or the output timing point has a negative slack value, indicates that the latch timing can likely be improved by moving the movable gate/latch.
Once a movable gate(s) is/are selected for placement optimization, the RUMBLE utility 145 isolates an original subcircuit 200 (
Referring now to block 325, the original subcircuit's placement data and timing state are passed to a solver, which creates an unbuffered RUMBLE model (not shown) of an original subcircuit 200. The RUMBLE model of the original subcircuit 200 is represented by a hypergraph (not shown) which contains a vertex for each gate (fixed or movable) in the original subcircuit. Note that unclocked repeaters are not included in the hypergraph because the original subcircuit is modeled within the RUMBLE model as if the buffers 209 and the inverters 211 have been removed. The RUMBLE model also contains a 2-pin edge connecting the source of each net to the sink(s) of that net, again modeling the results based upon a hypothetical removal of any intermediate unclocked repeaters. In addition, the RUMBLE model also contains information as to the identification of movable gate(s) 207 and those gates which are not movable (i.e., source gate 201, sink gates 203 and 205), which are collectively referred to in the RUMBLE model as clock boundaries. Finally, the RUMBLE model will contain for each fixed gate (201, 203, and 205), a RAT if the fixed gate(s) 201, 203, 205 is/are an output of the subcircuit and an AAT if the fixed gate(s) 201, 203, 205 is/are an input of the original subcircuit 200.
A RUMBLE mathematical program is derived from the RUMBLE model of the subcircuit, as shown in block 330. A solver optimizes the RUMBLE mathematical program with the creation of the RUMBLE model. Specifically, the RUMBLE mathematical program is a set of expressions describing an optimization problem. Given an assignment of variables, the RUMBLE mathematical program yields new optimized placements of the movable gate(s) 207, and any other simultaneously optimized values, such as gate sizes or wire sizes. Notably, the RUMBLE mathematical program also accounts for future interconnect optimizations. In the current exemplary embodiment, the RUMBLE mathematical program accounts for downstream applications of buffer insertion by assuming a wire delay that is linearly proportional to the wire's length inside the RUMBLE mathematical program. Such an assumption can only be valid if buffer re-insertion is allowed.
Before the movable gate(s) is/are moved to the new optimized placement(s), the RUMBLE utility records the original placement(s) of the movable gate(s), as depicted in block 335. Then, the movable gate(s) is/are moved to the new optimized placement(s) of an in-memory representation of a physical intermediate circuit, as depicted in block 340 and illustrated in the first intermediate subcircuit 210 (
However, before any interconnect optimizations can be performed on any particular net, a RUMBLE Tree Cache for each net corresponding to a non-repeater gate output pin of the original subcircuit is created, as depicted in block 345. A RUMBLE Tree Cache is a facility for storing several possible physical implementations of a particular logical net, each of which has different timing properties. Inside the RUMBLE utility, each non-repeater gate output pin, or opin 213 (
With further reference to the creation of the RUMBLE Tree Cache depicted in block 345, each of the unclocked repeater trees are stored by caching the placements of all the unclocked repeaters (i.e., buffers 209 (
With reference now to
After the movable gate 207 (
It is important to note that although the RUMBLE mathematical program theoretically solves for optimal movable gate placement locations under the RUMBLE model, the timing state of the new subcircuit 230 may continue to be degraded after interconnect optimization. In this regard, the RUMBLE mathematical program described in this embodiment is an abstraction of the new subcircuit timing that models the interconnect optimizations (e.g. virtual buffering) by setting a wire delay constant that reflects an estimate of what the timing state will be after interconnect optimizations are actually performed. The RUMBLE model could result in an overly optimistic subcircuit model that results in timing degradation of the new subcircuit 230. For example, the new optimized nets may be optimally placed in congested regions or at blockage sites of the new subcircuit 230 where there is no space for unclocked repeater insertions. Thus, the creation of the RUMBLE tree cache in block 345 allows the circuit designer to store the timing state of the original subcircuit 200 before any physical changes are made to the actual circuit model. The circuit designer may perform future interconnect optimizations with the safety of being able to restore the original subcircuit 200 if the future interconnect optimizations result in a timing degradation of the new subcircuit 230.
After the new subcircuit 230 has undergone interconnect optimization, the slack at each timing point of the new subcircuit 230 is measured and the timing state of the new subcircuit 230 is recorded, as depicted respectively in blocks 365 and 370. For exemplary purposes only,
However, if timing degradation is present in the new subcircuit 230, the RUMBLE Tree Cache structure is recalled to restore (i) the original subcircuit 200 and (ii) the original subcircuit's timing state. According to the described embodiment, this restoration begins by disconnecting all new tree caches at the tree cache end points of the new subcircuit 230, as depicted in block 385. The tree cache end points of the original subcircuit 200 are then reconnected to their former output source pins and input sink pins, as depicted in block 390. The movable gate(s) is/are re-placed to their original placement(s), as depicted in block 395. The process terminates at block 396.
In the flow chart above (
As will be further appreciated, the processes in embodiments of the present invention may be implemented using any combination of software, firmware, or hardware. As a preparatory step to practicing the invention in software, the programming code (whether software or firmware) will typically be stored in one or more machine readable storage mediums such as fixed (hard) drives, diskettes, optical disks, magnetic tape, semiconductor memories such as ROMs, PROMs, and the like, thereby making an article of manufacture in accordance with the invention. The article of manufacture containing the programming code is used by either executing the code directly from the storage device, by copying the code from the storage device into another storage device such as a hard disk, RAM, and the like, or by transmitting the code for remote execution using transmission type media such as digital and analog communication links. The methods of the invention may be practiced by combining one or more machine-readable storage devices containing the code according to the present invention with appropriate processing hardware to execute the code contained therein. An apparatus for practicing the invention could be one or more processing devices and storage systems containing or having network access to program(s) coded in accordance with the invention.
Thus, it is important that while an illustrative embodiment of the present invention is described in the context of a fully functional computer (server) system with installed (or executed) software, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of media used to actually carry out the distribution. By way of example, a non exclusive list of types of media includes recordable type (tangible) media such as floppy disks, thumb drives, hard disk drives, CD ROMs, DVD ROMs, and transmission type media such as digital and analog communication links.
While the invention has been described with reference to exemplary embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular system, device or component thereof to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims. Moreover, the use of the terms first, second, etc. do not denote any order or importance, but rather the terms first, second, etc. are used to distinguish one element from another.