METHOD AND APPARATUS FOR PARALLEL DATA PREPARATION AND PROCESSING OF INTEGRATED CIRCUIT GRAPHICAL DESIGN DATA

Information

  • Patent Application
  • 20080077891
  • Publication Number
    20080077891
  • Date Filed
    September 27, 2006
    18 years ago
  • Date Published
    March 27, 2008
    16 years ago
Abstract
A method for implementing an ORC process to facilitate physical verification of an integrated circuit (IC) graphical design. The method includes partitioning the IC graphical design data into files by a host machine such that the files correspond to regions of interest or partitions with defined margins, dispersing the partitioned data files to available cpus within the network, processing of each job by the cpu receiving the file, wherein artifacts arising from bisection of partitioning margins during the partitioning, including cut-induced false errors, are detected and removed, and the shape-altering effects of such artifact errors are minimized and transmitting the results of processing at each cpu to the host machine for aggregate processing.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:



FIG. 1 is a flow block, which broadly defines a conventional master process for partitioning IC graphical designs for distributed processing;



FIG. 2 is a flow diagram of one embodiment of the inventive parallel data preparation and distributed processing of the invention;



FIG. 3 is a system-level design of an IC graphical design verification system which allows processing in accordance with the inventive concepts herein; and



FIG. 4 shows an IC arranged to implement the inventive parallel data processing of this invention.





DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT OF THE INVENTION

The inventive methods, software and apparatus set forth herein automatically and intelligently partitions IC graphical design data into portions that are readily processed by available network resources in a distributed processing network, without the deleterious effects that arbitrary data partitioning could have on aggregate processing time for a large scale real check or design verification. At the processing level (at the cpu processing a particular design piece or job), the cpu executes the slated operations on the received data file (with the partitioned data) while implementing an inventive splinter or margin line removal process. The sliver removal process, which in its simplest implementation removes or eliminates windowed splinters or other artifacts which could affect ORC-like processing operations to generate bottlenecks or communication overhead.


Before an individual cpu in a distributed processing environment can carry out ORC processes on the design, the graphical design must first be characterized or separated into at least three data levels. There is a target level, an OPC output level and a simulated wafer level. These are three different representations of the same image, but the shapes on each level are not necessarily coincident. Because of the way the OPC output and the simulated wafer level are generated, splinters or other artifacts may result from partitioning in one or more of the three levels (but not necessarily on all of them). The mismatch can create millions of false errors. Regardless of the number of errors, however, splinters or artifacts generated in the design partitioning, or shapes' operation errors must be noticed, and corrected if what is printed is ever to match the target shape. Put another way, the invention not only addresses sliver removal, but the inventive sliver-removal processes supports content shape retention, simplifying checking, which shows in aggregate check time improvements.


That is, slivers or partitioning-related errors may cause, or cause to be generated, more than one OPC shape associated with the design shape. For that matter, splinters associated with a design shape that is no longer present may nevertheless remain as part of the design data. The splinters present are likely to generate associated print simulation errors. Print simulation errors include false shorts, opens, or related errors showing into the region of interest, etc. This naturally increases communication overhead in distributed processing systems and applications. By overcoming partition-generated errors at the cpu level, smaller jobs may be more readily distributed, processed concurrently, and returned in a way that takes full advantage of available distributed processing network resources to shorten and simply OPC-like processing with improved scalability. This obviates the need for larger server-class machines to carry out the verification processes.


To implement the invention, existing or known code and processes for carrying out conventional DRC-like, and resolution enhancement-like operations in a distributed processing environment need only be slightly modified. That is, the modification must accommodate and recognize the inventive “windowing” approach to data partitioning including the splinter removal. It is the code used to implement the ORC and OPC operations that essentially allows for smaller jobs. But only with error free partitioning can improved times and improved scalability of the processes to available resources be realized. Any code that carries out the inventive processes is likely to suffer an increased communication overhead including implementing tedious and operation intensive processes to respond to errors reported at positions within the wafer volume that abut or span the partition boundaries. To that end, FIG. 2 is a schematic flow diagram for the inventive process that could by “called” or implemented within a master process (such as the FIG. 1 AGP process), or called as a stand-alone process. For example, the FIG. 2 process could be called by the FIG. 1 process from the step of block 110 therein (FIG. 1), or the step of block 120 therein. The inventive process as depicted in the FIG. 2 example, however, may readily operate or be implemented as a remote process, independent of a master process such as that depicted in FIG. 1.


The exemplary process begins with start a step such as represented by block 200, and a working directory for the files generated by the process is established by the step represented by block 210. The input file is retrieved by the step represented by block 220, and specific DRC-like or resolution enhancement type operations), are specifically performed, as indicated by block 230. A post-processing step, represented by block 240, provides the post-process results, which are returned (as the case may be) in a step represented by block 250. The inventive post-processing code (as modified by the inventive concepts taught hereby) concurrently filters or removes cut-induced (margin-boundary) errors at each machine or cpu designated for the task, and includes summarizing the results across all independent tasks. If the transfer of results is successful, the step represented by block 260 either completes the task (block 270, or performs a clean up step, represented by block 280.


So by implementing this approach, the original full-chip layout data are first partitioned into individual files, each containing sections of data. The partitioning algorithm attempts to partition the data along macro boundaries, only resorting to geometric partitioning if the pieces are larger than a pre-cut set point. Each piece is arranged to include a frame region (ROI; see FIG. 4) of sufficient size to render operations in one piece independent from those in another piece, with one job per file. A load leveler process distributes the jobs across multiple systems or cpus, matching job requirements (memory, number of processors, etc.) to the systems with requested characteristics.


By keeping the size of the partitions consistent with the desired run time goal for a relatively small number of processors, conventional vendor applications and platforms operated as modified by the inventive processes show good or improved scalability. When an individual job is assigned to a particular machine, system or cpu comprising the distributed processing network, the process first creates a temporary working directory (for example, on local DASD). It then retrieves its assigned data section via FTP communication with the system, machine or cpu where the data resides. When processing (local at the cpu of system) is complete, the resulting files (data log, etc.) are transmitted back to the original system via FTP and placed in a named subdirectory. The progress of jobs is preferably monitored so that when all jobs are completed, the final aggregation of data and results occurs. Data files are merged and a final summary is produced. Resubmission of unsuccessful jobs is automatic.



FIG. 3 is a system level schematic representation of one implementation of an integrated system 300 of the invention, including a user workstation 301 and a data host machine, i.e., Regatta. The inventive processing is implemented in the FIG. 3 constructions partitions a region of interest to facilitate improved scalability by sliver removal and content shape retention. Workstation 301 operates in unison with a host machine 302 to partition the graphical design data, preferably based on heuristically determined regions of interest, and includes available memory for storing the design, and the partitioned data. Section 304 of FIG. 3 idealizes how the partitioned data are distributed to the available cpus 303 (compute resources) in the host workstation.


Ideally, each partition contains a frame region of sufficient size to render operations in the piece or file partition independently of the other pieces or file partitions. This would imply one task or thread for each partitioned file. The individual tasks are dispatched to a task scheduling system, such as Loadleveler or LSF (not shown in detail in FIG. 3), to distribute the tasks across multiple systems or machines by matching task requirements (e.g., memory, number of processors or cpus, etc.) to the cpus allocated in the network (sometimes referred to as the Loadleveler pool or compute resources). By keeping the size of the partitions consistent with desired runtime goals for a relatively small number of processors, conventional processing tools or applications are operated in such a way that they effectively scale. However, where conditions are not ideal, and partitioning is not always “clean,” the processes disclosed hereby distribute the tasks to particular machines suited for each task. The cpu or machine processes out errors generated by partitioning, thereby making effective and efficient use of the network resources and realize shorter run times.


This is particularly important when partitioning to efficiently and concurrently performing intensive data processing applications such as design rule checking (DRC), optical rule checking (ORC), and optical proximity correction (OPC), where the proximity of one shape to another is an important design factor, and accommodating potential errors (partition-related artifact errors) arising during concurrent processing. The process is as long as it takes to process the worst-case machine time.



FIG. 4 shows a portion of an IC, which is a physical region of interest corresponding to a partitioning, by which the sliver removal and shape retention may be readily understood. That is, the inventive processes preserve the proximity of the inventive layout of a device such as IC or wafer portion 405 by analyzing the design shapes comprising the ROI with respect to four (4) regions. A first region 410 is referred to as the region of interest (ROI). A second or far region 420 is the outermost region of the partition, which includes the region of interest 410 and region 430 immediately surrounding the region of interest 430. In the far region, optical effects are still linked to the region of interest, and may be affected by optical effects linked to the region of interest. A fourth region is defined as the thin ring 440, which thin ring is relatively small on the inside of the far margin 420, and is used for partition artifact or splinter removal by the inventive process. Put another way, thin ring 440 may be thought of as occupying the outer portion or ring of the far margin.


Inventive shapes-handling code is included in the ORC process to define the regions, and very thin outer ring of the far margin. The code required to process each CRC-like job by each cpu in the distributed environment identifies and removes any “windowed” splinters or artifacts in the partitioned data generated by margin bisection, and attempts to accommodate any other associated cut-induced false errors (see “cut line” of FIG. 3). If not removed, artifact errors arising from bisected ROI or partition margins confuse the applied processes (e.g., ORC) implemented at each cpu. By detecting margin-induced errors, the inventions avoid wasting processing time by attempting to process or make sense of detected phantom shapes caused by splinters. The inventive processes further remove, or obviate the deleterious effects of caused by splinters that are present and suggest the presence of shapes that are not really present in the data (failure to identify shape pairs). Any artifacts arising from partitioning and parallel processing must be eliminated, and are eliminated by the inventive processes and systems to obviate generating associated print simulation errors of false shorts, opens, or related errors showing into the region of interest.


The shape-handling code may be called by a main ORC process at the Cpu. The added code preferably includes functionality whereby all other cut-induced false errors are easily removed from the area outside the region of interest. The skilled artisan will note, however, that in some cases, there may be same-net connectivity interdependencies or layers that contain complex large-vertices that require more complex handling. The complex handling typically includes processing in all four of the above-defined regions. While the above-described inventive processing works effectively and efficiently where the integrated circuit is merely geometrically partitioned into m×n pieces or parts, the resulting output is typically larger due to hierarchical flattening of the data. It is readily understood by the skilled artisan, the best case run times to be realized by implementing the inventive process equals the time for partitioning the data, plus the time of the longest running individual piece or task. That is why scaling and effective load distribution is so important. When the size of the partitioned data is well suited to the system resources assigned (partition, task, region, etc.), processing may be conducted with minimal communication overhead. The inventions carry out the partitioning, distribution and processing of tasks to fully utilize the target processors (great scalability) to operate on the partitioned data with minimal partition-related error to reduce overall run-time.


Such approach to the problems solved hereby essentially pipelines OPC and ORC job steps by running them in sequence in individual pieces rather than waiting for the OPC to complete for all the pieces or partitions, and reassembling the chip to submit for ORC. Where the technology is smaller, or more advanced, ORC errors are frequent, so the inventive techniques permit quicker determination. That is, it is relatively quickly determined where data are not suitable for mask build than those processes and platforms implementing convention OPC and ORC on sequence on fill chip data sets.


For that matter, scalability of the allocated machines or cpus in an network programmed to operate in accord with the inventions herein may reach upwards of 95% or 98%, with large numbers of cpus. The skilled artisan will note that cases may arise in which there are same-net connectivity interdependencies or layers that contain complex large vector count polygons requiring more complex region handling that required by the example set forth, where all four of the so-defined four (4) regions must be processed to realize the desired outcome. In this variation on the above-described process, or system operation, all of the splinters for each layer are identified, and those splinters that touch the thin ring are selected in order that the physical verification process avoids missing errors in the region of interest. The splinters are expanded by an amount sufficient to cover variations between the three layers. Thereafter, the selected, expanded splinters are subtracted from all three levels.


In the foregoing specification, the invention has been described with reference to specific embodiments. It will, however, be evident that various modifications and changes can be made thereto without departing from the broader scope and spirit of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims
  • 1. A method for distributed processing of IC graphical design data to verify or check the IC physical that partitions the design data into pieces that support scaling in view of available network resources for processing the pieces, comprising the steps of: partitioning the IC graphical design data into files by a host machine, wherein the files correspond to regions of interest within the graphical design data;dispersing the partitioned data files to available cpus within the network;processing of each job by the cpu receiving the filer, wherein artifacts arising from bisection of partitioning margins during the partitioning, which could generate cut-induced false errors if not removed from the data file, are detected and removed, and the shape-altering effects of such artifact errors are minimized; andtransmitting the results of processing at each cpu to the host machine for aggregate processing, wherein the smaller jobs support improved scalability and shorter aggregate real check times.
  • 2. The process as set forth in claim 1, wherein the step of processing includes pipelining OPC and ORC task steps in sequence to quickly determine suitability for a mask build.
  • 3. The process as set forth in claim 1, wherein the step of ORC processing includes using shape-handling code to define a very thin outer ring of a shape's far margin, so that the outer ring may be utilized in the process to identify and remove artifacts, and/or artifact-induced error.
  • 4. The process as set forth in claim 3, wherein the step of processing requires processing all regions to identify artifacts in layers comprising the IC, selecting artifacts that contact the thin outer ring, expanding the artifacts in amounts sufficient to cover variations between layers, and subtracting the artifacts from all three levels.
  • 5. The process as set forth in claim 4, wherein artifacts may include slivers, splinters, cut-margins, mis-shapes and phantom shapes arising out of partitioning.
  • 6. The process as set forth in claim 1, wherein the step of processing includes implementing ORC task steps in sequence to quickly determine suitability for a mask build.
  • 7. The process as set forth in claim 1, wherein the step of processing includes a parallel implementation of resolution enhancement techniques (RET), pipelined with the DRC-like operations.
  • 8. The process as set forth in claim 1, wherein the step of processing includes parallel implementation for optical rules checking and optical proximity correction (OPC), optical rules checking (OPC) and resolution enhancement techniques (RET).
  • 9. A computer-readable medium comprising a set of computer-readable instructions that upon execution by a processor, implement a distributed processing method for checking or verifying an integrated circuit (IC) graphical design that improves scalability of verification tasks to effectively utilize available network compute resources or cpus, the method comprising the steps of: partitioning the IC graphical design data by a host processor to generate files corresponding to each partition or region of interest (ROI) to be processed by separate cpus,dispersing the data files to the available cpus for processing; andprocessing each file by the cpu processor to remove artifacts generated by bisection of partition margins, and related cut-induced false errors to improve job scalability in view of available cpus, and improve aggregate real check processing run times.
  • 10. A distributed processing network for physical verification of an integrated circuit (IC) graphical design, which distributed processing network automatically and intelligently partitions the IC design into data files for distributed processing by processors comprising the network, dispersing the data files to the allocated processors, individually processing each separate file to minimize aggregate verification processing times, and maximize scalability of the aggregate task across the available in-network compute resources, comprising: a plurality of processors;a user workstation in communication with the plurality of processors; anda data host machine in communication with the user workstation and processors, wherein user instructions submitted to the host machine partition the IC graphical design into data partitions with fixed margins, and wherein data partition size is calculated to allow for effective processing of each data partitions by particular processor suited and available for the task;dispersing the data partitions as individual processing tasks for processing by the particular processors allocated for such tasks; andprocessing each task at each processor to remove artifacts arising from bisection of partitioning margins during the partitioning, including cut-induced false errors, resulting in improved network processor use and run times for aggregate processing.