The invention relates to the design and manufacture of integrated circuits, and more particularly, to systems and methods for performing parallel processing of electronic design automation (EDA) tools.
The electronic design process for an integrated circuit (IC) involves describing the behavioral, architectural, functional, and structural attributes of an IC or electronic system. Design teams often begin with very abstract behavioral models of the intended product and end with a physical description of the numerous structures, devices, and interconnections on an IC chip. Semiconductor foundries use the physical description to create the masks and test programs needed to manufacture the ICs. EDA tools are extensively used by designers throughout the process of designing and verifying electronic designs.
A Physical Verification (PV) tool is a common example of a EDA tool that is used by electronics designers. PV is one of the final steps that is performed before releasing an IC design to manufacturing. Physical verification ensures that the design abides by all of the detailed rules and parameters that the foundry specifies for its manufacturing process. Violating a single foundry rule can result in a silicon product that does not work for its intended purpose. Therefore, it is critical that thorough PV processing is performed before finalizing an IC design. Physical Verification tools may be used frequently and at many stages of the IC design process. As noted above, PV tools may be used during design and at tape-out to ensure compliance with physical and electrical constraints imposed by the manufacturing process. In addition, PV tools may also be used after tape-out to verify and ensure manufacturability of the design and its constituent elements.
PV tools read and manipulate a design database which stores information about device geometries and connectivity. Because compliance with design rules generally constitutes the gating factor between one stage of the design and the next, PV tools are typically executed multiple times during the evolution of the design and contribute significantly to the project's critical path. Therefore, reducing PV tool execution time makes a major contribution to the reduction of overall design cycle times.
As the quantity of data in modern IC designs become larger and larger over time, the execution time required to process EDA tools upon these IC designs also becomes greater. For example, the goal of reducing PV tool execution time is in sharp tension with many modern IC designs being produced by electronics companies that are constantly increasing in complexity and number of transistors. The more transistors and other structures on an IC design, the greater amounts of time that is normally needed to perform PV processing. This problem is exasperated for all EDA tools by constantly improving IC manufacturing technologies that can create IC chips at ever-smaller feature sizes, which allows increasingly greater quantities of transistors to be placed within the same chip area, as well resulting in more complex physical and lithographic effects during manufacture.
To improve the processing of EDA tools, the present invention provide an improved method and system for processing the tasks performed by an EDA tool in parallel. In some embodiment of the invention, the IC layout is divided into a plurality of layout windows and one or more of the layout windows are processed in parallel. Methods are described for some embodiments for sampling one or more windows to provide dynamic performance estimation.
Further details of aspects, objects, and advantages of the invention are described below in the detailed description, drawings, and claims. Both the foregoing general description and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the invention.
The accompanying drawings are included to provide a further understanding of the invention and, together with the Detailed Description, serve to explain the principles of the invention.
Disclosed is an improved method and system for implementing parallelism for execution of electronic design automation tools. An example of an EDA tool is a physical verification (PV) tool. Embodiments of the present invention may be illustrated below relative to a description of parallelism for PV tools. It is noted, however, that the present invention is not limited to PV tools, and may also be applied to other types of EDA tools.
A layout window 104 may be implemented as a rectangular area of a design layout. The window 104 may itself be a hierarchical layout with multiple layers. Shapes that touch the window boundary are cut into pieces along the window boundary. The pieces inside the boundary remain within the window layout. In alternative embodiments, the window may comprise one or more non-rectangular shapes. The window itself may be non-rectangular.
A design hierarchy has cell masters and cell instances (linear transformations of the master). When a cell master is outside a window, but the window includes instances of this cell master, a new master inside the window is generated that completes the hierarchy of the window's layout. In some embodiments, two approaches are used to deal with cells and instances that intersect the window boundary. In the first approach, all shapes of the intersecting cell/instance are “promoted” to the top-level of the hierarchy, i.e., the instance disappears and shapes inside the window are “flattened”. In the second approach, a new cell (a “variant”, i.e., a modified copy of the original instance) is created and stored in the design hierarchy instead of the original cell/instance. In yet another approach, the layout is partially flattened, in which only a portion of the hierarchy is promoted to a higher level of the hierarchy or only a portion of the hierarchy is flattened.
This approach can be used to implement “output” partitioning, in which the intended output of some sort of processing (e.g., for an IC design layout to be verified) is partitioned into multiple portions or sections that can be individually operated upon by different processing entities. This is in contrast to “input” partitioning, in which partitioning is performed based solely upon the input data.
As shown in
The size, composition, and location of the windows can be selected to meet desired performance expectations. If the layout windows are configured to meet performance expectations, then this may be accomplished by having the user determine a desired timeframe for completing execution of the EDA workload and configuring the layout windows to meet the desired timeframe.
For example, consider a PV tool operation to verify a IC design layout. The IC layout may include many millions of transistors. On a conventional non-parallel PV tool, this verification workload may take at least an overnight run to complete, and may even take over a day to finish processing. The user may determine that the desired timeframe for completing the verification task is actually several hours, instead of overnight. This desired performance expectation may be taken into account when calculating the windowing and parallelism parameters for the workload, e.g., by dividing the layout into enough windows of the correct configuration such that parallel processing of the windows will result in the intended performance timeframe. In an alternate embodiment, the expected processing timeframe is not provided by the user; instead, the EDA system calculates optimal windowing and parallelism parameters based upon system scheduling requirements, system parameters, heuristics, and/or other non-user supplied factors.
Historical data and past processing of similar/same IC designs may be taken into account and analyzed to configure the layout windows. In many cases, the IC design presently being processed includes only incremental changes over a prior version of the IC design. Therefore, run-time data from processing the earlier version of the IC design can be used to create configurations for the layout windows that will accurately match the desired performance expectations.
In some embodiments, the windows configured for a given layout may have different sizes. In alternate embodiments, some or all of the windows may be configured to have the same size.
At 304, interactions between different windows are addressed. Certain operations are local in nature to a portion of a layout, while other operations will necessarily involve data from other portions of a layout. This action will identify and address the situation if processing the layout windows will necessarily involve data from other layout windows.
To perform this action, various classifications can be made for operations or rules that are intended to be performed upon a layout.
A first type of operation (Type I) is a local computation that can be performed without requiring any interaction with other windows. An example of this type of operation is a Boolean operation performed upon shapes in the layout window. To illustrate, consider layout window 410 in
A second type of operation (Type II) involves situations where data from a neighboring windows must be accessed to perform the operation. This typically involves a limited interaction distance between one window and another.
To illustrate, consider the layout windows 420 and 422 in
As another example, consider an optical proximity correction (OPC) operation that is to be performed upon a shape in a window. Adding a scattering bar to a layout is a common OPC operation performed by EDA tools. The illustrative example of
A third type of operation (Type III) involves operations that relate to a global data exchange on output. For example, when calculating the total area of shapes on a given layer, one can calculate the total area of shapes on this layer in all windows, in parallel. Then, in a second step, the final global area is calculated by adding local areas in one global communication operation. Note that the global communication operations required for windowed PV are very similar to global data exchanges necessary when performing linear algebra algorithms on distributed memory machines.
The fourth type of operation (Type IV) is one that can be represented by a sequence of operations of Type I to III.
One way to address interactions between windows is to configure a “halo” around each window that interacts with a neighboring window. This means that operations performed for a given window will not just consider shapes within the boundaries of the window, but also any additional layout objects that exist within the expanded halo distance even if the layout objects appear outside of the window.
In some embodiments, the halo distance is established to address interaction distances for the specific operations or DRC rules that are to be performed for a given window. For example, consider an OPC operation involving placement of scattering bars. Assume that the maximum distance that needs to be considered to place a scattering bar is 20 nanometers from an edge of an object. If so, then the minimum interaction distance from one window to another to address scattering bars is at least 21 nanometers. The largest interaction distance for all operations to be performed for the window is identified, and that largest interaction distance becomes the minimum value of the halo spacing for the window. If the largest interaction distance for all operations for a given window is based upon placing scattering bars, then the halo spacing distance will be set at 21 nanometers for that window.
In some embodiments, each window may potentially be associated with a different halo spacing distance, based upon the type of operations to be performed for a given window. In alternate embodiments, a common halo spacing distance is shared by some or all of the windows.
Returning back to
The layout windows can be executed in parallel using, for example, either the distributed-memory parallel approach or the shared-memory parallel approach. The distributed-memory parallel approach involves software that can make efficient use of multiple processing devices, such as CPUs, where each CPU may access its own memory. With respect to implementation, message passing primitives (such as UNIX sockets, MPI, PVM, etc.) are typically employed when coordinating execution of program components running on different CPUs. The shared-memory parallel approach involves software that makes use of multiple processing devices, e.g., CPUs, that can address common physical memory. With respect to implementation, shared memory can be allocated, read and written from all program components being executed on different CPUs. Coordination is accomplished via atomic memory accesses, also called semaphores.
In some embodiments, the parallel processing is performed using distributed-memory parallelization. However, if the product's memory consumption is efficient; a distributed-memory parallel program can be ported to a shared-memory machine by emulating a distributed computer network on a shared-memory computer. Due to increased spatial locality, in some cases, a distributed parallel program ported back to a shared memory parallel machine runs faster than a similar program developed from the beginning using the shared-memory parallel programming paradigm.
In addition, the type and/or quantity of certain structures within the window may affect the performance of processing for that window. The identification of certain types or quantities of structures within a window that will affect performance is very dependent upon the specific EDA tool operation that is being performed. For example, certain types of processing, such as certain kinds of DRC rules checking, are dependent upon the density of structures within a given layout area. Therefore, all else being equal, windows having greater instance densities will be slower to process for these types of DRC verification than for other windows having smaller instance densities. Other examples include certain DRC rules that relate specifically to pattern density. Therefore, for these pattern density-related rules, windows having greater pattern densities will be slower to process for these types of DRC verification than for other windows having smaller pattern densities.
The next action is to check or predict the expected performance of the processing system based upon the set of layout windows that have been identified (404). As described below, “sampling” can be used to provide performed estimation. If the expected performance meets the desired performance level (406), then the processing system continues with parallel execution of the identified layout windows (410).
If the expected performance does not meet desired performance levels, then one or more of the layout windows are reconfigured (408) and the process returns back to 404. Examples of parameters for the layout windows that may be reconfigured include location, size, shape, and/or number of windows.
Sampling
Layout “sampling” can be used to provide dynamic performance prediction of the parallelized processing system.
The collected run-time data can also be used to optimize the process of forming windows and executing the workload (706). For example, the run-time data can be used to adjust the final size of the layout windows. If the actual computational performance of the EDA tool against the window is too slow to achieve the desired performance timeframe, then the size of the window can be adjusted to be smaller. If the actual computational performance of the EDA tool against the window is faster than expected, then the size of the window can be adjusted to be larger or placed in a different location.
Sampling the layout and generating a trace for it takes time and introduces overhead in the overall verification run that should be taken into account when determining the configuration to be used for the parallel processing. The amount of overhead devoted to determining the windows parameters and checking sampled performance should be small enough such that when it is added to the actual processing of the workload, the overhead processing time fits within desired performance expectations.
A layout sampling example is shown in
Layout sampling can also be performed as a factor of identifying one or more parameters which are predicted to affect the run-time performance of a given window. To explain this, consider that the overall execution time of the parallelized system is related to the slowest workload and/or processing entity that is handling work in the system.
Therefore, one way to configure the windows in some embodiment to ensure that system can process the workload within the expected performance requirements is to make sure that the window that is expected to be the slowest to process will meet the expected performance requirements.
Consider a PV tool for which processing time is highly dependent upon instance density. For this type of PV tool, it is preferable that the window having the greatest instance density is selected as the sample window. For purposes of this example, instance density would refer to the density of instances throughout the different levels of the IC design that exist within the geometric boundaries of the selected window.
Multiple windows can be sampled according to some embodiments of the invention. Given a particular sampling factor, multiple smaller windows can be chosen rather than a single larger window.
This type of information relating to factors that affect processing time can be used to configure the size, shape, and location of layout windows. For example, consider again a PV tool for which processing time is highly dependent upon instance density. The different layout windows for the IC design can be configured to have different sizes and/or shapes based upon instance densities in the IC design. This is illustrated in the example of
A determination can be made whether sampling accuracy is high enough. The following algorithm to calculate a correlation function that can be used to decided whether the sampling is sufficiently accurate:
Percentage Correlation(Trace T1, Trace T2, Percentage x)
{
// T1, T2 are vectors of execution times, the n'th element
// contains the execution time of the n'th operation
// in the trace.
This function can be used to compute correlations between a full and a sampled trace, between two sampled traces, and for the computation of the auto-correlation of a trace. For a given trace T and number x of most expensive operations, the auto-correlation Correlation(T,T,x) computes the performance improvement that can be gained when these x operations are eliminated. Since the correlation function is a monotonically increasing function, its integral (from x=0 to 100%) can also be used to automatically predict performance accuracy.
Windowing and Other Types of Parallelism
Windows-based parallelism can also be used in conjunction with other types of parallelism in the EDA processing system. For example, in the context of a PV tool, a PV tool can make use of parallelism at different levels of the tool's execution. A rule deck operates on multiple layers, and can often be processed independently. Therefore, some rules can be executed in parallel. This is referred to as rule-based parallelism.
Based on its topology, a design database exhibits multiple forms of parallelism that can be used for domain decomposition and PV parallelization: A 2D layout can be decomposed into 2D segments. If interaction distances are small, such decomposition allows for efficient parallelization of DRC operations. As noted above, windows are created by cutting the layout into 2D windows. In addition, parallelism can be implemented by processing different cells of the design hierarchy in parallel. This is referred to herein as cell-based parallelism.
Window-based parallelism can be implemented as a simultaneous extension and constraint to cell-based parallelism. Windows can be represented by new cells introduced at the top-level of the design hierarchy. Parallelization is then only limited to this hierarchy level.
The design database also includes data-structures representing connectivity and passive devices. These data-structures are used during NVN or parasitic extraction (RCx). Since the overall chip circuitry can be decomposed into sub-circuits and nets, this is an additional source of parallelism, the so-called net-based parallelism. Windows can also be constructed on top of, or in conjunction with net-based parallelism.
Devices are represented by multiple shapes. Several devices form a gate that typically forms a leaf node in the design hierarchy. Statistically, in a design database, a minority of transistor and gate types (for example, inverters and NAND gates) are dominant. Therefore, patterns that are replicated many times can be identified, in particular on lower layers. A pattern is an assembly of one or more polygons. If a statistically dominant transistor can be represented by such a pattern on a given layer, the same pattern can be found many times (linearly transformed) at many places in the layer. This repetition can be used to extract parallelism for rules with an interaction distance smaller than pattern dimensions. Windows can also be constructed on top of, or in conjunction with pattern-based parallelism.
Recognition of geometric layout patterns can be used to improve performance of most geometric PV and also RET operations. There are also other applications, such as layout compaction, cell projection for direct ebeam writing tools, etc. Ideally, in the long term, we desire a design environment that generates a limited “vocabulary” of patterns such that their detection will become obsolete during verification and RET (patterns and their names can be identified via a new hierarchy representation).
The windowing approach of the present invention can also be used to perform OPC operations. Portions of a layout can be configured into layout windows, and separate processing entities used to handle OPC processing for some or all of the windows in parallel.
Yield analysis is another type of analysis that can be performed in conjunction with windowing. In particular, the layout is partitioned into windows as described above. Each window is then analyzed to determine yield projections based upon the configuration of shapes within that window. The overall yield determination or the IC design can be determined based upon aggregating analysis results for all of the windows.
System Architecture Overview
According to one embodiment of the invention, computer system 1400 performs specific operations by processor 1407 executing one or more sequences of one or more instructions contained in system memory 1408. Such instructions may be read into system memory 1408 from another computer readable/usable medium, such as static storage device 1409 or disk drive 1410. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software. In one embodiment, the term “logic” shall mean any combination of software or hardware that is used to implement all or part of the invention.
The term “computer readable medium” or “computer usable medium” as used herein refers to any medium that participates in providing instructions to processor 1407 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 1410. Volatile media includes dynamic memory, such as system memory 1408. Transmission media includes coaxial cables, copper wire, and fiber optics, including wires that comprise bus 1406. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications.
Common forms of computer readable media includes, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, carrier wave, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 1400. According to other embodiments of the invention, two or more computer systems 1400 coupled by communication link 1415 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions required to practice the invention in coordination with one another.
Computer system 1400 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 1415 and communication interface 1414. Received program code may be executed by processor 1407 as it is received, and/or stored in disk drive 1410, or other non-volatile storage for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of many of the described process actions may be changed without affecting the scope or operation of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
5299139 | Baisuck et al. | Mar 1994 | A |
5339415 | Strout et al. | Aug 1994 | A |
5440720 | Baisuck et al. | Aug 1995 | A |
5537329 | Feldmann et al. | Jul 1996 | A |
5559718 | Baisuck et al. | Sep 1996 | A |
5581475 | Majors | Dec 1996 | A |
5613102 | Chiang et al. | Mar 1997 | A |
5812415 | Baisuck | Sep 1998 | A |
5828880 | Hannko | Oct 1998 | A |
5870313 | Boyle et al. | Feb 1999 | A |
6003066 | Ryan et al. | Dec 1999 | A |
6009250 | Ho et al. | Dec 1999 | A |
6035107 | Kuehlmann et al. | Mar 2000 | A |
6047116 | Murakami et al. | Apr 2000 | A |
6066179 | Allan | May 2000 | A |
6185583 | Blando | Feb 2001 | B1 |
6237128 | Folberth et al. | May 2001 | B1 |
6289369 | Sundaresan | Sep 2001 | B1 |
6324673 | Luo et al. | Nov 2001 | B1 |
6389451 | Hart | May 2002 | B1 |
6401240 | Summers | Jun 2002 | B1 |
6505327 | Lin | Jan 2003 | B2 |
6519749 | Chao et al. | Feb 2003 | B1 |
6536028 | Katsioulas et al. | Mar 2003 | B1 |
6560766 | Pierrat et al. | May 2003 | B2 |
6574788 | Levine et al. | Jun 2003 | B1 |
6629293 | Chang et al. | Sep 2003 | B2 |
6701504 | Chang et al. | Mar 2004 | B2 |
6721928 | Pierrat et al. | Apr 2004 | B2 |
6738954 | Allen et al. | May 2004 | B1 |
6829757 | Teig et al. | Dec 2004 | B1 |
6996790 | Chang | Feb 2006 | B2 |
7047506 | Neves et al. | May 2006 | B2 |
7051307 | Ditlow et al. | May 2006 | B2 |
7089511 | Allen et al. | Aug 2006 | B2 |
7107559 | Lakshmanan et al. | Sep 2006 | B2 |
7155698 | Gennari | Dec 2006 | B1 |
7177859 | Pather et al. | Feb 2007 | B2 |
7266795 | Baumgartner et al. | Sep 2007 | B2 |
7318214 | Prasad et al. | Jan 2008 | B1 |
7340742 | Tabuchi | Mar 2008 | B2 |
7401208 | Kalla et al. | Jul 2008 | B2 |
7421505 | Berg | Sep 2008 | B2 |
7500240 | Shoemaker et al. | Mar 2009 | B2 |
7526740 | Bohl et al. | Apr 2009 | B2 |
20010003843 | Scepanovic et al. | Jun 2001 | A1 |
20020049956 | Bozkus et al. | Apr 2002 | A1 |
20020162085 | Zolotykh et al. | Oct 2002 | A1 |
20030012147 | Buckman et al. | Jan 2003 | A1 |
20030023939 | Pierrat et al. | Jan 2003 | A1 |
20030033509 | Leibholz et al. | Feb 2003 | A1 |
20030037117 | Tabuchi | Feb 2003 | A1 |
20040015256 | Conrad et al. | Jan 2004 | A1 |
20040019679 | Sandhya et al. | Jan 2004 | A1 |
20040019892 | Sandhya et al. | Jan 2004 | A1 |
20040044979 | Aji et al. | Mar 2004 | A1 |
20040098511 | Lin et al. | May 2004 | A1 |
20040187112 | Potter | Sep 2004 | A1 |
20040199887 | Jain et al. | Oct 2004 | A1 |
20040215932 | Burky et al. | Oct 2004 | A1 |
20040215939 | Armstrong | Oct 2004 | A1 |
20040216101 | Burky et al. | Oct 2004 | A1 |
20040268354 | Kanai et al. | Dec 2004 | A1 |
20050038852 | Howard | Feb 2005 | A1 |
20050091634 | Gallatin et al. | Apr 2005 | A1 |
20050097561 | Schumacher et al. | May 2005 | A1 |
20050102681 | Richardson | May 2005 | A1 |
20050132320 | Allen et al. | Jun 2005 | A1 |
20050138474 | Jain et al. | Jun 2005 | A1 |
20050166173 | Cote et al. | Jul 2005 | A1 |
20050216870 | DeCamp et al. | Sep 2005 | A1 |
20050216875 | Zhang et al. | Sep 2005 | A1 |
20050262510 | Parameswaran et al. | Nov 2005 | A1 |
20060062430 | Vallone et al. | Mar 2006 | A1 |
20060123420 | Nishikawa | Jun 2006 | A1 |
20060200825 | Potter | Sep 2006 | A1 |
20060230370 | Baumgartner et al. | Oct 2006 | A1 |
20060265675 | Wang | Nov 2006 | A1 |
20070079268 | Jones et al. | Apr 2007 | A1 |
20070192545 | Gara et al. | Aug 2007 | A1 |
20070220232 | Rhoades et al. | Sep 2007 | A1 |
20070233805 | Grodd et al. | Oct 2007 | A1 |
20070271562 | Schumacher et al. | Nov 2007 | A1 |
20090125867 | Cote et al. | May 2009 | A1 |