1. Field of the Invention
The present invention generally relates to the design of integrated circuits, and more particularly to a method for physical synthesis of an integrated circuit design using optimization, simulation and analysis tools.
2. Description of the Related Art
Integrated circuits are used for a wide variety of electronic applications, from simple devices such as wristwatches, to the most complex computer systems. A microelectronic integrated circuit (IC) chip can generally be thought of as a collection of logic cells with electrical interconnections between the cells, formed on a semiconductor substrate (e.g., silicon). An IC may include a very large number of cells and require complicated connections between the cells. A cell is a group of one or more circuit elements such as transistors, capacitors, resistors, inductors, and other basic circuit elements grouped to perform a logic function. Cell types include, for example, core cells, scan cells and input/output (I/O) cells. Each of the cells of an IC may have one or more pins, each of which in turn may be connected to one or more other pins of the IC by wires. The wires connecting the pins of the IC are also formed on the surface of the chip. For more complex designs, there are typically at least four distinct layers of conducting media available for routing, such as a polysilicon layer and three metal layers (metal-1, metal-2, and metal-3). The polysilicon layer, metal-1, metal-2, and metal-3 are all used for vertical and/or horizontal routing.
An IC chip is fabricated by first conceiving the logical circuit description, and then converting that logical description into a physical description, or geometric layout. This process is usually carried out using a “netlist,” which is a record of all of the nets, or interconnections, between the cell pins, including information about the various components such as transistors, resistors and capacitors. A layout typically consists of a set of planar geometric shapes in several layers. The layout is then checked to ensure that it meets all of the design requirements, particularly timing requirements. The result is a set of design files known as an intermediate form that describes the layout. The design files are then run through a dataprep process that is used to produce patterns called masks by an optical or electron beam pattern generator. During fabrication, these masks are used to etch or deposit features in a silicon wafer in a sequence of photolithographic steps using a complex lens system that shrinks the mask image. The process of converting the specifications of an electrical circuit into such a layout is called the physical design.
Cell placement in semiconductor fabrication involves a determination of where particular cells should optimally (or near-optimally) be located on the surface of a integrated circuit device. Due to the large number of components and the details required by the fabrication process for very large scale integrated (VLSI) devices, physical design is not practical without the aid of computers. As a result, most phases of physical design extensively use computer-aided design (CAD) tools, and many phases have already been partially or fully automated. Automation of the physical design process has increased the level of integration, reduced turn around time and enhanced chip performance. Several different programming languages have been created for electronic design automation (EDA), including Verilog, VHDL and TDML. A typical EDA system receives one or more high level behavioral descriptions of an IC device, and translates this high level design language description into netlists of various levels of abstraction.
Physical synthesis is prominent in the automated design of integrated circuits such as high performance processors and application specific integrated circuits (ASICs). Physical synthesis is the process of concurrently optimizing placement, timing, power consumption, crosstalk effects and the like in an integrated circuit design. This comprehensive approach helps to eliminate iterations between circuit analysis and place-and-route. A generalized physical synthesis flow is shown in
A conventional flow for the optimization step 3 is further illustrated in
Physical synthesis has the ability to repower gates, insert repeaters, clone gates or other combinational logic, etc., so the area of logic in the design remains fluid. However, physical synthesis can take days to complete, and the computational requirements are increasing as designs are ever larger and more gates need to be placed. There are also more chances for bad placements due to limited area resources. As process technology scales to the deep-submicron regime (65 nm and smaller), it becomes particularly difficult to achieve timing targets with efficient use of the chip area for model design closure. Area efficiency is important at different hierarchical levels, e.g., the top level for a large ASIC or at the macro design level, but timing requirements must still be satisfied. Area has traditionally been treated as a constraint (like 80% chip density) and not a design target. Such constraints are generally taken into consideration at the late design stages, for example by performing additional area recovery at the end of a regular physical synthesis flow. This approach has two serious flaws. First, since area is not a target, the final chip design never has the lowest possible achievable area. The increased area may lead to congestion problems, or if the chip area is too large the die size will have to be adjusted, introducing additional expense. Inefficiencies in area also lead to excess power usage. Second, if area is only considered near the end of the design flow, the process can be stalled in the early optimization stage if there are insufficient space tolerances for buffers or repowering. Even if some space is available it is unlikely that the optimization result will be the actual optimal solution. Thus either timing may not be closed, or the flow will require a much longer runtime on iterations and further optimization refinement.
In light of the foregoing, it would be desirable to devise an improved method for physical synthesis flow which could be more area efficient without sacrificing timing and other requirements. It would be further advantageous if there method were achievable with reasonable runtime overhead.
It is therefore one object of the present invention to provide an improved method for physical synthesis of an integrated circuit design.
It is another object of the present invention to provide such a method which is particularly area-efficient while satisfying timing requirements.
It is yet another object of the present invention to provide such a method which does not excessively increase the computational requirements in achieving the area efficiency.
The foregoing objects are achieved in a computer-implemented method for optimizing a physical design of an integrated circuit having a plurality of nets, by receiving a layout for a physical placement of the nets, receiving an initial slew constraint, first inserting one or more buffers in the layout such that slew for each of the nets is less than the initial slew constraint, calculating a new slew constraint which is less than the initial slew constraint, and then inserting one or more additional buffers in the layout such that slew for at least one of the nets is less than the new slew constraint. Insertion of additional buffers is iteratively repeated using incrementally decreasing slew constraints until either the current slew constraint is less than or equal to a predetermined minimum slew constraint or none of the nets have negative slack. In the illustrative implementation the next slew constraint is 20%-50% less than the current slew constraint. Any nets having positive slack from the previous iteration are skipped, and that slack information is cached for future timing analysis.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
The present invention may be better understood, and its numerous objects, features, and advantages made apparent to those skilled in the art by referencing the accompanying drawings.
The use of the same reference symbols in different drawings indicates similar or identical items.
With reference now to the figures, and in particular with reference to
MC/HB 16 also has an interface to peripheral component interconnect (PCI) Express links 20a, 20b, 20c. Each PCI Express (PCIe)link 20a, 20b is connected to a respective PCIe adaptor 22a, 22b, and each PCIe adaptor 22a, 22b is connected to a respective input/output (I/O) device 24a, 24b. MC/HB 16 may additionally have an interface to an I/O bus 26 which is connected to a switch (I/O fabric) 28. Switch 28 provides a fan-out for the I/O bus to a plurality of PCI links 20d, 20e, 20f. These PCI links are connected to more PCIe adaptors 22c, 22d, 22e which in turn support more I/O devices 24c, 24d, 24e. The I/O devices may include, without limitation, a keyboard, a graphical pointing device (mouse), a microphone, a display device, speakers, a permanent storage device (hard disk drive) or an array of such storage devices, an optical disk drive, and a network card. Each PCIe adaptor provides an interface between the PCI link and the respective I/O device. MC/HB 16 provides a low latency path through which processors 12a, 12b may access PCI devices mapped anywhere within bus memory or I/O address spaces. MC/HB 16 further provides a high bandwidth path to allow the PCI devices to access memory 18. Switch 28 may provide peer-to-peer communications between different endpoints and this data traffic does not need to be forwarded to MC/HB 16 if it does not involve cache-coherent memory transfers. Switch 28 is shown as a separate logical component but it could be integrated into MC/HB 16.
In this embodiment, PCI link 20c connects MC/HB 16 to a service processor interface 30 to allow communications between I/O device 24a and a service processor 32. Service processor 32 is connected to processors 12a, 12b via a JTAG interface 34, and uses an attention line 36 which interrupts the operation of processors 12a, 12b. Service processor 32 may have its own local memory 38, and is connected to read-only memory (ROM) 40 which stores various program instructions for system startup. Service processor 32 may also have access to a hardware operator panel 42 to provide system status and diagnostic information.
In alternative embodiments computer system 10 may include modifications of these hardware components or their interconnections, or additional components, so the depicted example should not be construed as implying any architectural limitations with respect to the present invention.
When computer system 10 is initially powered up, service processor 32 uses JTAG interface 34 to interrogate the system (host) processors 12a, 12b and MC/HB 16. After completing the interrogation, service processor 32 acquires an inventory and topology for computer system 10. Service processor 32 then executes various tests such as built-in-self-tests (BISTs), basic assurance tests (BATs), and memory tests on the components of computer system 10. Any error information for failures detected during the testing is reported by service processor 32 to operator panel 42. If a valid configuration of system resources is still possible after taking out any components found to be faulty during the testing then computer system 10 is allowed to proceed. Executable code is loaded into memory 18 and service processor 32 releases host processors 12a, 12b for execution of the program code, e.g., an operating system (OS) which is used to launch applications and in particular the circuit design application of the present invention, results of which may be stored in a hard disk drive of the system (an I/O device 24). While host processors 12a, 12b are executing program code, service processor 32 may enter a mode of monitoring and reporting any operating parameters or errors, such as the cooling fan speed and operation, thermal sensors, power supply regulators, and recoverable and non-recoverable errors reported by any of processors 12a, 12b, memory 18, and MC/HB 16. Service processor 32 may take further action based on the type of errors or defined thresholds.
While the illustrative implementation provides program instructions embodying the present invention on disk drive 36, those skilled in the art will appreciate that the invention can be embodied in a program product utilizing other computer-readable media. The program instructions may be written in the C++ programming language for an AIX environment. Computer system 10 carries out program instructions for a physical synthesis process that uses a novel optimization technique to increase area efficiency of an integrated circuit design structure. Accordingly, a program embodying the invention may include conventional aspects of various physical synthesis tools, and these details will become apparent to those skilled in the art upon reference to this disclosure.
In accordance with one implementation of the present invention, computer system 10 performs a circuit optimization which includes fast timerless buffering and repowering, but iteratively repeats this procedure with a changing slew target. Slew (or slew rate) refers to the rise time or fall time of a switching digital signal. Different definitions can be used to quantify slew, the most common being the 10/90 slew which is the time it takes for a waveform to cross from the 10% signal level to the 90% signal level. Other definitions such as 20/80 slew or 30/70 slew are often used when the waveform has a slowly rising or falling tail. Since higher interconnect resistivity causes signal integrity to degrade more quickly with each advancing technology, buffers need to be inserted on long interconnects to meet slew constraints.
After each iteration of fast timerless buffering and repowering, a determination is made as to whether any nets of the circuit have negative slack (52). Slack is the time difference between actual arrival time of a signal and the required arrival time (RAT) according to the circuit design parameters, and is calculated from the input timing characteristics of the net components (e.g., buffers and wire length). Negative slack means the net does not meet the timing requirements, while positive slack indicates timing requirements are met. If computer system 10 determines that all of the nets have positive slack, the timerless buffering and repowering stage is complete, and the process continues with critical path optimization (54) and histogram optimization (56).
If nets with negative slack remain after the current iteration of timerless buffering and repowering, the slew constraint will be incrementally decreased for the next iteration unless it has already been reduced to a minimum (predetermined) slew target for this optimization stage. The value for the minimum slew target should be fairly aggressive, and depends on the particular semiconductor technology, but is preferably the slew for a long optimal buffered line, e.g., 30 picoseconds for 45 nanometer technology for a regular Vt threshold buffer on M2 layers. The program instructions running on computer system 10 accordingly compare the current slew constraint to the minimum slew target (57). This comparison could be performed before checking the slack of the nets. If the slew is already at or below the minimum, the timerless buffering and repowering stage is again complete, and the process continues with critical path optimization (54) and histogram optimization (56). If the current slew is not yet at the minimum value, computer system 10 calculates a new slew target (58). However, any nets having positive slack from the previous iteration are skipped to avoid over-optimizing the design structure, and that slack information is cached for future timing analysis. In the illustrative implementation the slew target is incrementally reduced by roughly 20%-50%. The new target may be calculated as a fraction of the current target or by reference to a table of incremental targets. The process then repeats iteratively at box 50 to carry out fast timerless buffering and repowering with the next slew target. Iteratively repeating the fast timerless buffering and repowering while gradually decreasing the slew constraint in this manner results in a design which retains high quality of results with significantly smaller area and wire length.
The entire process of
The computation costs associated with the present invention are relatively small. The invention will incur a runtime penalty for the additional iterations with changing slew target, but experiments indicate that significant efficiencies can be obtained with only two to three iterations, in which case the runtime overhead around 5% to 10%. The technique is still much faster than using a fully timer-based optimization methodology.
The present invention may be further understood by reference to an exemplary integrated circuit design 60 as shown in
After this initial round of slew-driven buffering and repowering, the slew target is lowered from 100 picoseconds to 50 picoseconds, and a second round is then performed which generates the second optimized design structure 60-3 for the integrated circuit as seen in
After this second round of slew-driven buffering and repowering the slew target is again lowered, this time from 50 picoseconds to 30 picoseconds, and a third round is then performed which generates the third optimized design structure 60-4 for the integrated circuit as seen in
While the example of
Although the invention has been described with reference to specific embodiments, this description is not meant to be construed in a limiting sense. Various modifications of the disclosed embodiments, as well as alternative embodiments of the invention, will become apparent to persons skilled in the art upon reference to the description of the invention. It is therefore contemplated that such modifications can be made without departing from the spirit or scope of the present invention as defined in the appended claims.