METHOD AND SYSTEM FOR ANALYZING PERFORMANCE METRICS OF ARRAY TYPE CIRCUITS UNDER PROCESS VARIABILITY

Information

  • Patent Application
  • 20100250187
  • Publication Number
    20100250187
  • Date Filed
    March 24, 2010
    14 years ago
  • Date Published
    September 30, 2010
    14 years ago
Abstract
A method is disclosed for analyzing a performance metric of an array type electronic circuit under process variability effects. The electronic circuit has an array with a plurality of array elements and an access path being a model of the array type electronic circuit. The model includes building blocks having all hardware to access one array element in the array. Each building block has at least one basic element. In one aspect, the method includes deriving statistics of the access path due to variations in the building blocks under process variability of the basic elements, and deriving statistics of the full array type electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture.
Description
BACKGROUND OF THE INVENTION

1. Field of the Invention


The invention relates to a method for analyzing a performance metric of an electronic circuit, in particular an array type of electronic circuit, such as for example a memory or an image sensor, under process variability effects. In particular embodiments, the present invention relates to analyzing performance metrics of memories. The present invention also relates to a corresponding system and to software for carrying out the method.


2. Description of the Related Technology


With decreasing physical dimensions, such as 65 nm and below, dopant atoms have become countable, and no process can for example implant a few atoms on the same position in a device, for example in the channel of a transistor, when looking from device to device. This leads to purely random variations between transistors. This may be problematic for electronic circuits in general, and is particularly problematic for array type of circuits such as for example memories, because of their high density requirements, meaning high count of variability-sensitive small devices.


Unfortunately, for array types of electronic circuits, such as memories, virtually no commercially available solutions to predict the effect of process variations on yield exist today, and the designer has to resort to additional Silicon runs at the expense of development time and cost or has to set overly-pessimistic margins at the expense of product quality. Several issues make array types of circuits, e.g. memories, especially challenging.


Engineers reduce the nominal simulation of a full array, e.g. a full memory, to the critical path, assuming every other path behaves the same. However, this approach particularly fails under local process variability where device-to-device uncorrelated variations make access operations to every cell in the array, e.g. to every bitcell in a memory, to behave differently. Since an array such as a memory is as good as its worst path, the array statistics comprises the distribution of the worst of all paths. As a result, simulating the critical-path-netlist under variability does not model the full array statistics correctly.


Works considering the array cell, e.g. bitcell, alone without its periphery manage to reduce the sample sizes and transistor counts effectively, but also entail incomplete analysis. Several stability criterions depend not only on the cell but also on all other parts of the array. For instance in case of memories, the read operation of the cell is affected not only by the cell's capability to discharge the bitline but also by variations on the sense-amplifier offset, the timing-circuit that controls its activation, and the row-decoder that enables the wordline activation. Accounting for the bitcell only would lead to optimistic estimations of the read-voltage variability.


On top of that, designers must pay attention to architectural correlations of different parts of the array, for example in case of memories of bitcells, sense-amplifiers and other memory parts. A worst-case cell instance is not necessarily in the same path with the worst-case sense-amplifier or the worst-case row driver logic, so that a blind worst-case combination would lead to over-pessimistic results. Combining for the worst-case situation of each of these effects would lead to pessimistic estimations of the read-margin.


Hence today's variation aware design flows of array types of circuits require mainly two input items: 1) a critical path transistor level netlist of the array, including analysis of performance metrics under nominal process conditions and 2) information about the variability of the basic devices, such as for example transistors, photodiodes, interconnects, used in the underlying technology. Usually this information is available separately for local and global process variations, and can be in form of statistical distributions of certain basic device parameters. Often, the netlist described in 1) is a product of an array compiler, and it typically contains only one or at most few of the array cells to save simulation time. Typically, one can find a testbench along with the netlist. It stimulates the critical path netlist by applying appropriate signal combinations to address, write and read data to/from the present cell(s). It also extracts from the circuit response the performance metrics the user is interested in, such as for example the access time (the time between applying a clock signal and a change of the data at the output), the amount of current consumed, or the read-voltage.


When looking closer to the nature of a critical path netlist model (FIG. 1—illustrated for the particular case of an SRAM array 10), a difficulty of the apparently straight forward strategy as described above becomes visible; due to the large number of transistors in the array, a simulation of the full circuit model is computationally too difficult. Under the—rational—premise that, in a certain parameter, a memory is performing only as good as does the worst instance of any of its building blocks, it is clear that purging some of these instances for sake of higher simulation speed, destroys the statistics of that parameter. For instance, while it is in general safe to assume that the nominal delay of a memory corresponds to that of its critical path, because none of the pruned building blocks would exhibit different delay, this is not true for the statistical delay of the memory, where all possible instances need to be verified in order to guarantee a worst-case timing behavior of the memory itself.


Correcting statistics of critical path netlist parts has been proposed in the past. Aitken and Idgunji applied in ‘“Worst-Case Design and Margin for Embedded SRAM”, Aitken R. and Idgunji S., Automation & Test in Europe Conference and Exhibition, April 2007, DATE ‘07 ’, a branch of the extreme value theory originally developed by Gumbel to derive estimates for the variability related yield of SRAMs. The method is fast, as the authors work with analytic expressions of the transformation instead of multiple iterations of statistical sampling and simulation. However, this is in turn also its limitation, since assumption on the Gaussian nature of the metrics must be made. In US 2008/0005707 a product-convolution technique is presented to percolate timing and power distributions from IP-block to SoC-level. This technique was developed for digital blocks and also covers the phenomenon of the shift of statistics when multiple parallel instances are present. However, it requires a separation of different object types through a synchronous boundary, which is generally not provided by a memory critical path netlist. Refer to DATE 2010 paper which comes very close (Loop unfolding . . . ).


On the industrial side, designers prefer to work with process corners. An assumed worst case condition drives the attempt to simulate for worst-case memory behavior. Apart from the above-described problem, namely that this can only capture the statistics of the critical path netlist and not the statistics of the full memory, there are more questions. The most important one is: “How are these corners defined?” Many circuit and digital designers are afraid of statistical methods and prefer corners, but at the same time they do not know that the corners themselves are also derived statistically. Secondly, which combination of corners for the different transistor types (nmos, pmol, low-/high-Vth, specialized cell transistor types) actually triggers the worst-possible performance of the memory? This is a non-trivial question in general, especially when considering local random variations which affect every transistor individually, and can often only be answered by combinatorial experiments. Will a good corner in one parameter, such as timing, be also good in another parameter, such as power? In general, the answer is no. In the end, the designer (or her management) is interested in system yield loss due to parametric spread and most likely causes for functional yield loss. Even by answering the above-mentioned questions, the corner approach cannot lead to these figures.


SUMMARY OF CERTAIN INVENTIVE ASPECTS

A first aspect of the present invention relates to a method for analyzing performance metrics of array types of electronic systems comprising a large number of building blocks under process variations, like for example semiconductor memories. Performance metrics may include, without being limited thereto, power consumption, voltage and/or current levels, timing parameters, yield. In general, the method according to embodiments of the present invention applies to any type of system where previously in its simulation model some repeated instantiations of subsystems were removed for faster simulation speed. This removing has no influence on nominal case analysis, but it destroys the statistics. By a specific way of simulation the statistics can be re-constructed to an excellent level of accuracy.


A method according to embodiments of the present invention is applicable to array types of electronic circuits, the array type of electronic circuit comprising an array with a plurality of array elements. An access path is a model of the array type of electronic circuit, comprising building blocks containing all hardware to access one array element in the array. Each building block comprises at least one basic element.


One inventive aspect relates to a method for analyzing a performance metric of an array type of electronic circuit under process variability effects according to an embodiment of the present invention is a method which comprises a first process of deriving statistics of the access path due to variations in the building blocks under process variability of the basic elements, and a second process of deriving statistics of the full array type of electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture.


Hence a method according to embodiments of the present invention comprises identifying building blocks of the design thereby analyzing variability of design due to variability in the building blocks separately, for example by using specific statistical simulation techniques. The method further comprises re-combining sub-variability information to provide the whole variability under awareness of any architecture based on regular building blocks such as (but not limited to) memory cells, row decoders, bitline sense amplifiers, etc. The method is useful for system designers deploying a specific system and more specific, array, e.g. memory, designers and their management to estimate parametric yield loss due to specifications on these metrics guaranteed to customers. Examples could be but are not limited to maximal cycle time, maximal access time, maximal power consumption (static/dynamic) or maximal tolerated noise. As a second use, the method allows to track down the reasons for functional yield loss, and the relative likelihood of such reasons. This is useful for the array, e.g. memory, designer to avoid the most likely reasons for failures already during design time.


In a method according to embodiments of the present invention, combining the results of the statistics of the access path under awareness of the array architecture may include taking into account a specification of the instance count and the connectivity of the building blocks. This may include taking into account the multiplicity of the building blocks, i.e. the number of instantiations of the building blocks within the electronic circuit.


Deriving statistics of the access path may comprise injecting into the basic elements of a building block variability that can occur under process variations, and simulating the thus modified access path. Variability may be injected into the basic elements of one building block at a time, the other building blocks of the access path remaining invariant with respect to their nominal case. Deriving statistics of the access path due to variations in the building blocks may comprise a statistical sampling technique, such as for example but not limited thereto enhanced Monte Carlo picking.


A method according to embodiments of the present invention may furthermore comprise recording resulting sensitivity populations of the access path.


In a method according to embodiments of the present invention, the process of deriving statistics of the full array type of electronic circuit may comprise any statistical sampling loop, such as for example, but not limited thereto, a Monte Carlo loop.


Deriving statistics of the access path to variations in the building blocks may comprise combining building block sensitivities.


In a method according to embodiments of the present invention, deriving statistics of the full array type of electronic circuit may comprise generating a template of the array type of electronic circuit including all paths through the circuit, for example by listing the full coordinate space, creating a random observation of the electronic circuit following this template, and repeating at least once the process of creating a random observation of the electronic circuit with different random sequences to generate an electronic circuit population. In such embodiments, creating a random observation of the electronic circuit may comprise for each building block of the electronic circuit selecting one random sample from the obtained sensitivity data, combining the thus-obtained samples, and deriving a corresponding path performance metric for every path in the electronic circuit. A method according to embodiments of the present invention may furthermore comprise evaluating a path performance metric for every path in the electronic circuit, and selecting the combination of building blocks corresponding to the worst-case value of this path performance metric.


In a method according to embodiments of the present invention, deriving statistics of the full array type of electronic circuit may furthermore comprise scaling the path performance metrics into an observation of the electronic circuit performance, using any of MAX operator, for example for delays, a MIN operator, for example for read margins, an AVG operator, for example for dynamic energy, a SUM operator, for example for leakage values, an AND operator, for example for yield, or an OR operator, for example for yield-loss.


In a method according to embodiments of the present invention, generating a template of the array type of electronic circuit may comprise including redundant paths in the template. Such methods may furthermore comprise, after evaluating the path performance metric for every path in the electronic circuit, replacing the path corresponding to the worst-case value of this path performance metric by a redundant path if the path performance metric of this redundant path is better than this worst-case path performance metric.


A second inventive aspect relates to a computer program product which, when executed on a computer, performs any of the method embodiments of the first aspect of the present invention.


One inventive aspect relates to a machine readable data storage, also called carrier medium, storing the computer program product according to the second aspect of the present invention. The terms “carrier medium” and “machine readable data storage” as used herein refer to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, a floppy disk, a flexible disk, a hard disk, a storage device which is part of mass storage, a magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, a punch card, a paper tape, any other physical medium with patterns of holes. Volatile media include dynamic memory such as a RAM, a PROM, an EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereafter, or any other medium from which a computer can read.


Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor of a computer system for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer can load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local to the computer system can receive the data on the telephone line and use an infrared transmitter to convert the data to an infrared signal. An infrared detector coupled to a bus can receive the data carried in the infra-red signal and place the data on the bus. The bus carries data to main memory, from which the processor retrieves and executes the instructions. The instructions received by main memory may optionally be stored on a storage device either before or after execution by the processor. The instructions can also be transmitted via a carrier wave in a network, such as a LAN, a WAN or the internet. Transmission media can take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications. Transmission media include coaxial cables, copper wire and fibre optics, including the wires that form a bus within a computer. One inventive aspect relates to making a computer program product available for downloading. One inventive aspect also relates to transmission of signals representing the computer program product of the second aspect of the present invention over a local or wide area telecommunications network.


A third inventive aspect relates to a system for analyzing a performance metric of an array type of electronic circuit under process variability effects. The array type of electronic circuit comprises an array with a plurality of array elements. An access path is defined as a model of the array type of electronic circuit, the model comprising building blocks containing all hardware to access one array element in the array. Each building block comprises at least one basic element. The system comprises first calculation means arranged for deriving statistics of the access path due to variations in the building blocks under process variability of the basic elements, and second calculation means arranged for deriving statistics of the full array type of electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture. In embodiments of the present invention, the first and second calculation means may be embodied in a single processor.


Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Features from the dependent claims may be combined with features of the independent claims and with features of other dependent claims as appropriate and not merely as explicitly set out in the claims.


For purposes of summarizing certain inventive aspects and the advantages achieved over the prior art, certain objects and advantages have been described herein above. Of course, it is to be understood that not necessarily all such objects or advantages may be achieved in accordance with any particular embodiment of the invention. Thus, for example, those skilled in the art will recognize that the invention may be embodied or carried out in a manner that achieves or optimizes one advantage or group of advantages as taught herein without necessarily achieving other objects or advantages as may be taught or suggested herein.





BRIEF DESCRIPTION OF THE DRAWINGS

Presently preferred embodiments are described below in conjunction with the appended drawing figures, wherein like reference numerals refer to like elements in the various figures, and wherein:



FIG. 1 shows a full memory circuitry.



FIG. 2 shows schematically a method according to embodiments of the present invention.



FIG. 3 shows a sample of a memory as an example of an array-type of circuit, and the architectural topology and its potential impact on statistical correlations.



FIG. 4 shows schematically an overview of the first process of the overall method, applied to a memory architecture.



FIG. 5 illustrates variability on basic elements, in the particular embodiment illustrated on transistors, due to process variations.



FIG. 6 illustrates in more detail a first process of embodiments of the present invention, where a netlist of a memory architecture is provided together with a transistor variability plot, and where each transistor in a particular building block of the memory architecture is replaced by a model taken from the transistor variability plot (also called injecting or vaccinating), thus generating populations of netlists under variability. (++Please Note: the injection strategy to inject variability into building blocks separately is part of one embodiment. Using Delta-Vt, and Delta-beta circuits is NOT part of the invention. ++)



FIG. 7 shows that the simulation of all netlists delivers one observation per netlist, these observations being represented in a graph.



FIG. 8 shows a sample memory.



FIG. 9 shows a coordinate space serving as memory template for the sample memory of FIG. 8.



FIG. 10 illustrates random picking of all elements to build one random full memory array.



FIG. 11 shows an example of how the parameter variation due to process variations in a specific memory block is accessed for one memory path.



FIG. 12 illustrates collapsing, which calculates the performance metric for a path, and which is done for all paths in one memory.



FIG. 13 illustrates application of a scaling rule (MAX in the case illustrated), which selects the worst path-performance of one random memory.



FIG. 14 shows an example, applying different scaling rules.



FIG. 15 illustrates that repetition of random picking of elements to build one full memory array, populating every path (row in a co-ordinate space) with different random sequences of building block instances and applying a scaling rule, populates the memory statistics.



FIG. 16 shows examples of flexibility of the method according to embodiments of the present invention.



FIG. 17 illustrates probability distributions of MOSFET parameters.



FIG. 18 illustrates the effect of local random variability of the cycle time of a memory.



FIG. 19 illustrates the outcome of process 1 of a method according to embodiments of the present invention.



FIG. 20 illustrates the outcome of process 2 of a method according to embodiments of the present invention.



FIG. 21 shows the output of process 1 of a method according to embodiments of the present invention preserving the correlation relation between any two parameters (cycle time and read margin in the example).



FIG. 22 shows the output of process 2 of a method according to embodiments of the present invention preserving the correlation relation between any two parameters (cycle time and read margin in the example) for local random (called random in the Figure) and global random variations (denoted c2c in the Figure).



FIG. 23 shows the output of process 2 of a method according to embodiments of the present invention preserving the correlation relation between any two parameters (cycle time and read margin in the example) for total variations. It also shows that Digital Corners (“Corners”) are inappropriate to estimate the spread.



FIG. 24 shows a sample output of the implementation of the redundancy model according to embodiments of the present invention.





Any reference signs in the claims shall not be construed as limiting the scope.


DETAILED DESCRIPTION OF CERTAIN ILLUSTRATIVE EMBODIMENTS

The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.


Furthermore, the terms first, second, third and the like in the description, are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. The terms are interchangeable under appropriate circumstances and the embodiments of the invention can operate in other sequences than described or illustrated herein.


Moreover, the terms top, bottom, over, under and the like in the description are used for descriptive purposes and not necessarily for describing relative positions. The terms so used are interchangeable under appropriate circumstances and the embodiments of the invention described herein can operate in other orientations than described or illustrated herein.


The term “comprising” should not be interpreted as being restricted to the means listed thereafter; it does not exclude other elements or steps. It needs to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Thus, the scope of the expression “a device comprising means A and B” should not be limited to devices consisting of only components A and B. It means that with respect to the present invention, the only relevant components of the device are A and B.


With respect to this description, following definitions are used.


An access path netlist is a model of the array type of electronic circuit, for example a memory or an imaging device, that contains all hardware to access one element in the array.


A path through an array comprises electrical hardware that is activated to access an array element. There are several building blocks along this path. Typically, there is only one path in an access path netlist.


A building block, also called island, is a unique set of basic elements, for example transistors, in an access path netlist. A building block differs from other building blocks in the repetition count of the building block hardware in the full array and in the connectivity among other building blocks.


A path coordinate is a set of integer numbers (c1, c2, . . . , cn) that address one particular combination of building blocks, and uniquely identifies a path.


A coordinate space is a list of all possible combinations of path coordinates. The cardinality is the same as the number of array elements in the array.


A primary building block has a given multiplicity: for any combination of primary building blocks a path exists.


A dependent building block has a multiplicity which depends on primary building block multiplicities. Not necessarily for any combination of dependent building blocks there is a path. For example a top left array element is not in the bottom row of the array, or in the right hand column.


Redundancy provides at least one extra path in the array. Such extra paths are activated after manufacturing if and only if original paths fail, i.e. are non-functional or do not provide a pre-determined required performance. This activation changes the statistics.


A coordinate space extension is a list of all extra paths defined by redundancy.


A scaling rule is a mathematical operator to be applied over all paths metric to obtain the array metric. For example: a MAX operator may be used for timing (e.g. a memory is as fast as its slowest bit).


Certain embodiments of the method according to embodiments of the present invention analyze at least one performance metric of array types of electronic systems comprising a large number (more than could be simulated in a reasonable amount of time) of regular building blocks under process variations, like for example semiconductor memories or imaging devices. A regular building block is defined as a subcircuit of an array type of electronic system, when periodically instantiated in a circuit netlist can resemble the whole system. The method comprises indentifying building blocks of the system, thereby analyzing variability of system due to variability in the building blocks separately by using specific statistical simulation techniques in accordance with embodiments of the present invention and as set out below. As mentioned in the background, variation aware design requires mainly two input items: an access path basic element level netlist of the array type of system, e.g. memory or imaging device, and information about the variability of the basic elements, e.g. transistors, photodiodes, interconnects, resistances, capacitors etc., used in the underlying technology. A method according to embodiments of the present invention further comprises re-combining sub-variability information, i.e. information about variability of basic elements, e.g. transistors, photodiodes, interconnects, resistors, capacitors etc. to provide the whole variability under awareness of architecture.


Certain embodiments of the present invention relate to a method of accurately predicting array, e.g. memory, metrics under basic element, e.g. transistor, variability. This method comprises two major processes, as shown schematically in the illustration in FIG. 2:

    • 1) Deriving the statistics of the access path (AP) netlist and certain building blocks thereof—process 20; the access path netlist being a model of the array type of electronic circuit (for example a memory or a visual sensor device) that contains all hardware to access one element in the array; whereby the access path is considered representative for the properties of all paths from input to output of the array under consideration.
    • 2) Deriving the statistics of the full array, e.g. memory, by combining the results of 1), under awareness of the array architecture (organization and possibly redundancy mechanisms)—process 21.


It is to be noted that the method according to embodiments of the present invention may be separated so as to underline that in the first process, the existing netlist with its testbench and the existing simulation tool is re-used in a specific way, while the second process may be implemented as a standalone process.


The goal of this section is to point out the need to derive the statistics not only of the access path netlist of the array type of circuit, e.g. memory, but also of its sensitivity to process variations in certain building blocks thereof (called substatistics), as in process 21 of the method according to embodiments of the present invention. In accordance with embodiments of the present invention, the resulting substatistics are then combined in the second process 21 to the array, e.g. memory, statistics. An access path netlist needs to contain only those elements of an array which are required to accurately simulate the operation of one cell of the array. Parts for simulating other cells are simply missing. This is rectified since the other cells are not activated in the testbench and the designer assumes they do not influence the characteristics of the single activated cell. Parts which passively contribute, such as in case of memories other bitcells on a same bit- or wordline, are often modeled as capacitive loads which are equivalent to the capacitive load of the activated bitcell. Sometimes, there are two (or four) cells, in order to catch the systematic variability along the edges (corners). Such methods to derive local, systematic variations are orthogonal to our method and we can simply overlay the variations with the random variations. This can be done by interpolation of the edge (corner) performances depending on array-cell physical position.


It is important to note that different types of array building blocks differ in the number of instantiations in an array. In case the array is a memory, the building blocks in a memory, such as bitcells, sense amplifiers, or word line drivers, differ in the number of instantiations in the memory. The term multiplicity Mi is introduced for the number of instantiations of a building block i in an array. The multiplicity depends on the organization of the array. For example, in a MBC-bit memory with MR rows, the bitcells have a multiplicity of MBC, and the word-lines and the word-line drivers have a multiplicity of MR. In this example, see FIG. 3, MBC=16 and MR=4.


In general a single representative of every building block of the array is needed. As an example, in case of a memory as illustrated in FIG. 3, representatives for only one word line, one bitcell, one bitline pair (it is to be noted that the term bitline may be used later to denote a differential bitline pair), and one sense amplifier with output buffer are needed. Together with the timing circuit (multiplicity one), it is possible to simulate a memory read access to the single bitcell. In other words, and more in general, in an array access path netlist, different amounts of object instantiations of different types have been pruned in order to get a working netlist for one cell. As pointed out earlier, the statistics of the entire array depends on the statistics of its worst object instances. Thus, it is easy to understand that the statistics of the array must depend on the multiplicity Mi of a building block i. As it turns out, building blocks can be further classified depending on their multiplicity into either primary building blocks or dependent building blocks. Primary building blocks have a multiplicity independent of the multiplicity of other building blocks. Dependent building blocks are defined such that their multiplicities are a product of multiplicities of pre-determined primary building blocks. For example in case of a memory, the bitcell building block multiplicity has the highest dependency and depends on all primary building blocks. For instance, in a simple memory array with MT=1 timing circuit, MR rows and MC columns, the bitcell multiplicity is MBC=MT*MR*MC.


Yet knowing the multiplicity of every building block is necessary but not sufficient for determining the statistics of the array. It is also required to know how these building blocks connect. In general array architectures, not any combination of building block instantiation can occur in a path from input to output. For instance for a memory architecture, the first bit of every word always shares the same sense amplifier. It would therefore be too pessimistic to simply combine the worst of all MSA sense amplifier statistics PSA with the worst of all MBC bitcell statistics PBC in order to get the full memory statistics Pmem. In this example, the additional information is required that first bits of the words in any row share the first sense amplifier, and can never be switched to any other sense amplifier. These types of correlations must be considered, cf. FIG. 3. This drawing illustrates a memory 30 with four wordlines 31, each wordline 31 having a wordline driver 32. The memory 30 furthermore comprises 16 bitcells 33, arranged in four columns, and two sense amplifiers 34. It is clear that the bitcell 33 colored in black, is not in the same path as the wordline driver 32 colored in black, ant that the bitcell 33 colored in black is not in the same path as the sense amplifier 34 colored in black.


In order to formalize this problem, the “cell coordinate” is introduced. A cell coordinate vector, e.g. in case of a memory the bitcell coordinate vector, (b1, . . . , bj) is an instance of a discrete j-dimensional coordinate space (C1, C2, . . . Cj), where Ci={1 . . . Mi}, i=1 . . . j. Each coordinate component corresponds to a primary building block and can assume an integer number between one and the multiplicity of the corresponding primary building block. For example, for the memory array illustrated in FIG. 3, the multiplicities of the building blocks are as follows: MR=4 rows, MC=4 columns, MBC=16 bitcells. The bitcell coordinate vector looks as follows:







[




b





1






b





2






b





3






b





4






b





5






b





6






b





7






b





8






b





9






b





10






b





11






b





12






b





13






b





14






b





15






b





16




]

=

[




R





1




C





1






R





1




C





2






R





1




C





3






R





1




C





4






R





2




C





1






R





2




C





2






R





2




C





3






R





2




C





4






R





3




C





1






R





3




C





2






R





3




C





3






R





3




C





4






R





4




C





1






R





4




C





2






R





4




C





3






R





4




C





4




]





Further properties of this concept are:

    • The number of dimensions j and the coordinates' domains (1 . . . Mi) define the array organization.
    • The number of cells MBC is always the product of the coordinate maximum values Mi.
    • Two different cells' coordinate vectors differ in at least one coordinate.


As pointed out before, the cell building block is always the most dependent one. However, there can be other dependent building blocks. According to the concept above, a general “dependent building block coordinate” may be introduced with vector components corresponding to those primary coordinates the building block depends on. Finally, also the possibility is introduced to have an empty building block (if there is very little hardware in this building block, and/or the building block's influence to the overall statistics is known or assumed to be very little). It still has a multiplicity (so as to describe the topology connecting dependent building blocks) but no hardware attached to it.


For example, consider the column multiplexers and the sense amplifiers of a memory that has Mbpw bits per word and Mwpr words per row. Then, Mbpw bundles of Mwpr columns (differential bitline pairs) are multiplexed to Mbpw sense amplifiers. Thus, a column multiplexer slice that appears on every bitline pair, is a building block whose multiplicity depends on the sense amplifier building block multiplicity and an empty (would be actually one of Mwpr slices of the logic in the building block that generates the column multiplexer signals) building block's multiplicity, MC=Mbpw*Mwpr. At this point it is assumed that the user supplies a description of the memory architecture with the following information:
















<primary island_name 1>
<basic elements>
<multiplicity>


<primary island_name 2>
<basic elements>
<multiplicity>


. . .


<primary island_name j>
<basic elements>
<multiplicity>


<dependent island_name 1>
<basic elements>
<list of primary




island names>


<dependent island_name 2>
<basic elements>
<list of primary




island names>


. . .


<dependent island_name k>
<basic elements>
<list of primary




island names>










This table lists all building blocks (island_name i), the list of basic elements, e.g. transistors, photodiodes, interconnects, resistors, capacitors, in the netlist model that pertain to this building block, and the building block multiplicity. In case of dependent building blocks, the multiplicity is implicitly defined by the multiplicities of the primary building blocks. In certain circumstances, the list of basic elements can be empty. In an implementation of the method according to embodiments of the present invention, supplying a regular expression for the <basic elements> field is used to conveniently match all basic elements that belong to the respective building block by instance names or by instantiating subcircuit names.


From the previous section it is concluded that it is necessary to derive the statistical spread of a performance metric Pi of the array, e.g. memory, due to variations in each building block i. FIG. 4 shows a schematic overview of a first process of a method according to embodiments of the present invention. The distributions of the performance metric Pi are achieved by generating—process 40—and simulating—process 41—access path netlist variants from a given access path netlist. These variants differ in that those basic elements, e.g. transistors, photodiodes, interconnects, resistors, capacitors, pertaining to a pre-determined building block are replaced by basic element variants that can occur under process variations. A correlation plot of possible variants of basic elements, in particular for the example of transistors, is illustrated in FIG. 5. Such replacement is sometimes called injection. In accordance with embodiments of the present invention, basic elements outside the pre-determined building block are kept invariant, i.e. with nominal specifications. There are several ways of doing the injection, such as for example by using a statistical transistor-level simulator or a statistical extension to a transistor-level simulator, by inserting additional active or passive circuit elements that model the process variation or by changing the entire basic element, e.g. transistor, model for every basic element, e.g. transistor. The actual way how to do it is an orthogonal problem, i.e. it does not matter much on the overall process, as long as it supports the concerted replacement as described above, i.e. injection constricted to particular building blocks only. Possible ways of injection are described in “Device mismatch and tradeoffs in the design of analog circuits”, Peter R. Kinget, IEEE Journal of Solid-State Circuits, vol. 40, No. 6, June 2005; and in “Monte Carlo Simulation of Device Variations and Mismatch in Analog Integrated circuits”, Hector Hung et al., Proceedings of the National Conference on Undergraduate Research (NCUR) 2006, Apr. 6-8, 2006, both of which are incorporated herein by reference.


The injection process is illustrated for one embodiment in FIG. 6: each nominal basic element, for example a transistor 60, in a pre-determined building block (e.g. the bottom left building block in the example illustrated) is replaced by an instantiation as randomly selected from the set of possible variants 50 as illustrated in FIG. 5. This way, the basic elements are given a different behavior, depending on the statistical distribution of the basic elements.


By repeating this injection and simulation process sufficient number of times Ni for every building block i, the statistical spread of an array parameter Pi due to process variations in block i is obtained. Increasing this number of iterations Ni helps to increase the confidence on the array parameter Pi, and can be optimized as compared to Monte Carlo by using statistical enhancement techniques. Different access path populations are obtained, each reflecting the consequences of variability of basic elements in one building block.


At this point it is important to note one fundamental difference between local and global process variations, which requires a separate treatment of the two. The former are defined as purely random after cancelling out all known systematic effects. They also comprise unpredicted or unpredictable but known effects and they depend a.o. on transistor area. The latter variations are caused by drift of equipment and environment and are sometimes sub-classified into chip-to-chip, wafer-to-wafer and batch-to-batch variations. As a result, under local random variations the basic element parameters fluctuate randomly when compared to any other basic element in the circuit. Under global variations all basic elements of a same type on a chip are subject to the same shift in parameters. Thus, the described procedure to partition the netlist into building blocks, inject and simulate, applies only to local variations. Since any path under global variations shows the same effect, it is sufficient yet necessary to simulate the entire netlist under global variations to get Pglob. This may be done by injecting global variability into the netlist (hence by replacing all basic elements of one type by a same model selected from the possible variants as in the plot 50) and this way writing a new netlist—process 42—and simulating the new netlist—process 43. Local and global variations may be combined into the total variations as part of the second process of the method according to embodiments of the present invention.


Together with the nominal (invariable) circuit output Pinv, i.e. the parameter under nominal process conditions, the sensitivities of the access path netlist can be derived, due to local variations in building block i, ΔPi:=Pi−Pinv and due to global variations ΔPglob:=Pglob−Pinv where this subtraction is to be understood such that the distributions Pi and Pglob are shifted by the scalar invariable value Pinv to get the distribution of the difference to nominal conditions.


Optionally, it is also proposed to inject and simulate Nloc<=min(Ni) variants of the entire netlist under local variations without partitioning it into building blocks, yielding another access path population ΔPloc:=Ploc—Pinv—processes 44 and 45. This is mainly done for calibrating the collapsing process: during collapsing we build the path's performances under the assumption that they are additively separable along building blocks. With this data we can verify the assumption, and even correct the linear collapsing. We just need to make sure that we use the same basic element variants as used in the separate experiments. A second purpose is comparison between access path and memory statistics.


It is to be noted that the notation of P (or ΔP) actually describes a meta-parameter, which allows to assume that P is multi-valued in general. Its components contain all circuit responses under consideration (for example cycle time, power, etc) and the results of the experiments described above are recorded in tables to store the distributions of the components of ΔPi and ΔPglob and ΔPloc (or of Pi and Pglob and Ploc since it does not matter if we subtract Pinv now or later) such that there are Ni and Nglob and Nloc entries (rows) respectively for the number of Monte Carlo runs, and the different components of ΔPi and ΔPiglob and ΔPiloc in the columns. This way, the correlation between the performance metrics is preserved through the flow. Usually for the this first process, we need a statistical enhancement. Especially the bitcell building block can have a huge multiplicity which we cannot capture with classical Monte Carlo. We used EMC (yield 3) but it could be any other type of variance reduction technique (such as Importance Sampling, Latin Hypercube, Stratified Sampling, Quasi-Monte-Carlo or fitting a density function to the output to extrapolate the density function)


As a final note, failing parameter measurements occurring under process variations may be considered as well. This happens in the testbench if measurement conditions are not fulfilled. For example, suppose a measurement is the time between two signals to change their value. The measurement will fail if an extreme threshold voltage variation on an active transistor causes a signal never to transition, because then the condition is never fulfilled. In this case, P does not assume a value but a special flag to indicate the failure. For every continuous parameter P an additional, binary parameter PGO may be introduced that is set to a first value, e.g. one, if the measurement of P succeeded, and to a second value, e.g. zero, if it failed.


The simulation of all netlists delivers one observation per netlist, as illustrated in FIG. 7, where the results of the modified access path netlists are shown in a graph. These intermediate outputs show a sensitivity analysis of how sensitive metrics react to variation in building blocks; in the example given to variation in the periphery building block and the bitcell building block of a memory array, respectively. The bottom left part 70 of FIG. 7 illustrates an access path netlist population due to variability injected into the periphery building block. The corresponding variability is illustrated in the graph by the region 71. The bottom right part 72 illustrates an access path netlist population due to variability injected into the bitcell building block. The corresponding variability is illustrated in the graph by the region 73.


The second process of a method according to embodiments of the present invention is scaling the access path results to the full array. This happens under awareness of the array architecture. This transformation is referred to as Architecture Aware Scaling (AAS). From the previous process the following were obtained:

    • The description of the array architecture.
    • The Monte Carlo tables of the circuit metrics of interest, see FIG. 7.


The idea behind AAS is to imitate the process of building arrays by following the natural laws of randomness. This means for local variations, that any array contains random variants for all instances of all building blocks. For global variations, all building blocks drift into the same direction, since all basic elements drift into the same direction.


A general array template is built by listing the entire primary coordinate space. This is a list of all parts needed for building the array, which forms the layout of the array. This layout is invariant throughout the method according to embodiments of the present invention. Building the array template means all possible combinations of primary building blocks are formally written down. In the next process also all dependent block combinations are listed, adjacent to the primary columns. Some building blocks, the building blocks with lower multiplicity, of the template maybe shared amongst paths (e.g. periphery in a memory may be shared among all paths, a column decoder may be shared by a plurality of paths), other building blocks may be unique to a path (e.g. each bitcell in a memory is unique to a path). In the end a matrix of integer numbers with MBC rows and j+k columns has been defined. The cells contain a enumeration of the different instances of building blocks that are required to define an array. This structure is an abstracted view on the array organization.


As an example, the simple memory as illustrated in FIG. 8 is considered. There are four primary coordinates: C0=periphery, M0=1; C1=word lines, M1=4; C2=words, M2=2; C3=bit position in word, M3=2. There are two dependent coordinates: C4=Column multiplexer slice, which depends on the word being addressed and the bit position within a word, so M4=M2*M3; and the C5=bitcell, M5=M0*M1*M2*M3=16. Since there are j=4 primary coordinates and k=2 dependent coordinate, and since there are 16 bitcells, a memory template as in FIG. 9 is created, consisting of four+two columns and 16 TOWS.


In order to build a sample of this template, random instances are now selected for all building blocks that appear. These instances for the building blocks are picked from the (enhanced) Monte Carlo tables as received from the previous process, as depicted in FIG. 10 for two building blocks. This is done by picking as many row-indices in a certain table as there are instantiations of the corresponding building block. The actual statistical method for this can be Monte Carlo in the simplest case, i.e. a random number generator with equal chance of picking the Ni entries (however respecting they may have different weights, i.e. probabilities to occur as a result of the enhanced Monte Carlo in the previous process), or a statistically enhanced version. FIG. 10 shows an example. Let's assume p to be an instance of the listed coordinate space. Further, let zi, i=1 . . . j+k, denote j+k vectors, where the cardinality of zi is the multiplicity Mi of the building block coordinate i. These vectors contain the picked indices for coordinate i. Looking up in the Monte Carlo tables at the derived indices zi yields a possible value for influence on a certain path metric P=ΔP+Pinv due to local variations in the corresponding block i. The picking of the building blocks is illustrated in FIG. 11. Note the above-described index-vector zi, pointing into the corresponding (enhanced) Monte Carlo population table number i as generated by injection for the building blocks during phase 1 of a method according to embodiments of the present invention.


Now for a particular array instance, in the example illustrated a memory instance, a set of metric values P are picked for every building block. The concept is pictured in FIG. 11 with a highlighted path p and pointers into the corresponding tables that contain enumeration, z vectors, and looked-up value for Pi(zi(p(i)) (meaning variability on the metric p for a predetermined building block of path p(i)) (in this case, V_WRITE and W_MARGIN, meaning the bitline voltage at write time; and respectively the time between the cell has flipped and the wordline closing time). This means, the path coordinates at position I points to one of Mi random table indices zi, where the one observation of P is looked up. ΔPi(zi(p(i)) is build from Pi(zi(p(i)) by subtracting the nominal value Pinv. By combining these ΔPi(zi(p(i)) for all i=1 . . . j+k, the total effect for a particular path p along the building block instances can be generated. This combining is referred to as collapsing. The values may be combined in a particular way to the path performance. By experiment, a straight forward addition of the sensitivities of the parameter of the building blocks due to local variability (hence the assumption that these sensitivities are additively separable),






P
(p)
=P
invΣi=1 . . . j+kΔPi(zi(p(i))


yields an excellent approximation for the combined effect, as simulations of the entire access path netlist with the same basic element variants used in the building blocks show. For the sake of simplicity the ΣΔ method (method of additively separating sensitivities) is used for the time being.


In the example of FIG. 11, the following operation will be performed for the highlighted instance of a path p=(1,3,2,2,4,15), where P=V_WRITE:






P
(p)
=P
inv
+ΔP
C0#1
+ΔP
C1#3
+ΔP
C2#2
+ΔP
C3#2
+ΔP
C4#4
+ΔP
C5#15


So, along this very path (1223), the parameter assumes a value of:






P
(p)=(143.1+140.2+143.1+144.4+143.1+142.6+163.8−*143.1)mV=161.7 mV


According steps are taken for W_MARGIN or any further meanings of P. This provides the performance parameter of one path.


Performance parameters P of all paths of an array are illustrated in FIGS. 13 and 14, for the example illustrated with respect to FIG. 6 to FIG. 11.


There is one exception to this method of additively separating sensitivities for binary parameters, where the method of summing does not make sense. In this case a binary parameter of the path instance is defined as the AND-combination (or product) of the corresponding building block binary parameters PGO,i. This requires the functional metric to be fulfilled by all building block instances in order to have the requirement fulfilled by the current path.


An array, for example a memory, is always only as good as its worst feasible combination of building block instances, i.e. the worst path. Therefore, after collapsing the performance metrics for all MBC possible coordinate combinations (paths), a search is performed for the worst such combination in order to find a worst-case value of this performance metric for this particular array instance. Worst-case is thereby defined depending on the nature of the parameter.






P
array
=R
p in {all paths}(P(p))


Reference is made to the rule R which translates the path values of a parameter P(p) into its array value Parray as scaling rule. In general, the following scaling rules can occur:

    • MAX: The maximum of all values is the most commonly used, such as for example when seeking the slowest possible access time among all paths. Other uses include, but are not limited to: setup time, hold time, write voltage.
    • MIN: It can be important to know the minimum of certain values, for example the amount of voltage that is built up on the differential bitline pair at read time. Even the minimum of the access time can be critical, for example when considering hold time requirements of memory downstream circuitry. Other uses include, but are not limited to: write margin, the amount of time between the cell has flipped after a write operation and wordline closing time (also sense amp closing time and precharge starting time).
    • AVG: Average. This may be used for example for leakage power and dynamic energy consumption if these are modeled in the netlist model such that they predict the corresponding array values by incorporating the right multiplicities.
    • SUM: Building the sum can be useful if the access path netlist does not implicitly contain the scaling rule for power, as opposed to the AVG operator.
    • Binary operators: For example an AND operator can be used to require some metric to be fulfilled by all paths in order to have the requirement fulfilled by the array. As another example an OR operator could be used to scale failing blocks, indicated by 1 for fail or 0 for no-fail.


      By applying the appropriate rule R for every parameter component P, which the user supplies, for example in a table, the performance metrics is built for a particular array instance Parray. This way, one random array observation has been found, and this one may be put in an overall graph—see FIG. 13. In this case, the scaling rule applied is MAX, and this scaling rule selects the worst performance of one random memory.


An example wherein different scaling rules are used, is shown in FIG. 14.


By iterating the array imitation as described with respect to FIG. 10 to FIG. 13 over several times, and with different random indices into the building block tables, variants for Parray are generated, as illustrated in FIG. 15. This has been the overall objective of the problem. It is to be noted that the components of Parray are still correlated. This means that the user can have multiple constraints on multiple parameters P and compute the component parametric yield, as opposed to only partial yields if the correlation would not be preserved.


The example above can be considered as the simplest description of a memory as a particular type of array. FIG. 16 shows more examples of memories, with more hierarchy in the memory, thus more primary building blocks and also more dependent ones.


Including redundancy (e.g., spare rows and/or columns) and/or Error Correction Codes in an array architecture is classically used as today's main alternative to yield enhancement and cost reduction. The method according to embodiments of the present invention is capable of including various types of redundancy approaches used in array design, for example memory design, and of correctly characterizing their impact on the resulting array performance metrics. Array redundancy, for example memory redundancy, may be implemented by a set of redundant cells, e.g. bitcells, forming redundant cell row(s) or cell column(s) possibly with other redundant array parts (e.g. redundant word line drivers, redundant sense amplifiers, . . . ). In such array architectures two types of data paths can be distinguished—the original data paths created by the original (non-redundant) array parts and data paths created completely or partially by redundant parts (by redundant cells together with other redundant array parts such as for example redundant line drivers, redundant sense amplifiers and so on). According to embodiments of the present invention, the redundant data paths may be characterized under process variability in the same way like the original ones.


A method according to embodiments of the present invention handles array redundancy in two processes:

    • 1. Describing redundant data paths by means of AAS transformation. Redundant data paths of an array form a so called redundant coordinate space, next to the main array coordinate space.
    • 2. Replacing nonfunctional array data paths with redundant ones (possibly functional). In terms of AAS transformation this process represents merging the original coordinate space with the redundant coordinate space and it has to take part just before combining (scaling) access path statistics towards array statistics.


      The redundancy coordinate space is derived from the main array coordinate space by extending one or more original coordinates. The original coordinates under extension are called the key redundancy coordinates. Based on the nature of the key redundancy coordinates (primary or dependent) the redundancy coordinate space can be classified as complete or incomplete. AAS descriptions—redundancy coordinate spaces—will be derived hereinbelow for commonly seen redundancy implementations and also the process of merging redundancy and main coordinate space will be described.


The redundancy approach is analyzed, as an example only, based on row redundancy per memory bank that could be described by the complete redundancy space. The invention, however, is not limited thereto. Other types of redundancy can be dealt with. Furthermore, the redundancy approach in accordance with embodiments of the present invention does not only hold for memories but also for other array types of circuits.


In case of row redundancy per memory bank, the coordinate under extension is the primary coordinate expressing the number of rows (word lines, word line drivers, slice of row address decoder) per memory bank and the extension range is defined by the number of redundant rows available.


Supposing the simplified example of a memory with 4 rows, 2 words per row and 2 bits per word as illustrated in FIG. 8, the coordinate space (C1, C2, C3) is formed by the following three primary bitcell coordinates


C1—number of rows with the range (1 . . . 4)


C2—number of words per row with the range (1 . . . 2)


C3—number of bits per word with the range (1 . . . 2)


Supposing that the memory contains also one redundant bitcell row (next to the existing ones), then the corresponding redundant coordinate space (Cr1, C2, C3) will be formed by the bitcell coordinates where only Cr1 differs from the original coordinate set


Cr1—number of rows with the range (5)


C1 is called the key redundancy coordinate and Cr1 represents its extension. The redundant coordinate space (Cr1, C2, C3) is called complete because it contains all coordinates (or their extensions) from the original coordinate space (C1, C2, C3). Table 1 shows the enumeration of the original and redundancy coordinate space. The columns denoted Per. and BC represent coordinates that are in any memory architecture by default:


Per.—the primary coordinate related to the memory periphery


BC—the dependent coordinate related to the memory bitcell building block.













TABLE 1





Per.
C3
C2
C1
BC















Main coordinate space











1
1
1
1
1


1
1
1
2
2


1
1
1
3
3


1
1
1
4
4


1
1
2
1
5


1
1
2
2
6


1
1
2
3
7


1
1
2
4
8


1
2
1
1
9


1
2
1
2
10


1
2
1
3
11


1
2
1
4
12


1
2
2
1
13


1
2
2
2
14


1
2
2
3
15


1
2
2
4
16







Redundancy coordinate subspace











1
1
1
5
17


1
1
2
5
18


1
2
1
5
19


1
2
2
5
20









Other examples of main and complete redundancy coordinate spaces for commonly used redundancy approaches are listed in Table 2.













TABLE 2








Main
Redundancy




coordinate space
coordinate space


















n redundant rows













Rows
C1
(1 . . . M1)
Cr1
(M1 + 1 . . . M1 + n)



Words
C2
(1 . . . M2)
C2
(1 . . . M2)



Bits
C3
(1 . . . M3)
C3
(1 . . . M3)









IO and row redundancy combined (n redundant row and m redundant bits)













Rows
C1
(1 . . . M1)
Cr1
(M1 + 1 . . . M1 + n)



Words
C2
(1 . . . M2)
C2
(1 . . . M2)



Bits
C3
(1 . . . M3)
Cr3
(M3 + 1 . . . M3 + m)









n redundant rows per bank













Rows
C1
(1 . . . M1)
Cr1
(M1 . . . M1 + n)



Words
C2
(1 . . . M2)
C2
(1 . . . M2)



Bits
C3
(1 . . . M3)
C3
(1 . . . M3)



Banks
C4
(1 . . . M4)
C4
(1 . . . M4)











Each line in the main and in the redundant coordinate space represents a unique data path of a memory.


The first process of the method including redundancy—deriving the statistics of the access path—is as set out above for the method without redundancy.


During the 2nd process of a method according to embodiments of the present invention—scaling the access path statistics towards array statistics—each data path of each generated array instance has to be evaluated with respect to all observed array performance metrics. The performance metrics are real valued or binary parameters. Based on user-supplied limits applied on these parameters (PMIN<P<PMAX) or based on resulting values of binary parameters PGO a decision can be taken on whether a particular data path is functional or not. If there are nonfunctional data paths in the array originated from the main coordinate space and the array contains any type of redundancy expressed by the redundant coordinate space, a trial can be made to replace nonfunctional data paths by possibly functional redundant data paths. This corresponds to a merging of main and redundant coordinate space with respect to key redundancy coordinates.


As an example only, the performance metric may be represented by a timing parameter P which has the max limit value Pmax. After the evaluation of an array instance any data path with the value of parameter P higher that the limit Pmax is classified as a nonfunctional data path. It has to be noted that the subject of a redundancy replacement is not only the nonfunctional data path itself but also all others data paths with the same value of key redundancy coordinate. For ease of explanation, the data paths related by a certain value of key redundancy coordinate are called adjacent data paths. If at the same time the redundant coordinate space contains the block of adjacent functional data paths (a block identified by a certain value of extended key redundancy coordinate) it can be used to replace the block containing one or more nonfunctional data path(s) in the main coordinate space. It corresponds to the reality when the redundancy replacement covers not only a particular defective cell but the whole row, column and so on, depending on the type of redundancy approach implemented.


This process is now described on the simple memory example described hereinabove, for which both, main and redundant, coordinate spaces have already been derived (Table 1). Table 3 illustrates the possible situation.









TABLE 3


















If the performance metric, timing parameter P (e.g. time access of the memory) has the max limit value Pmax=16 ns, then for the data path displayed at the seventh line and defined by the bitcell coordinates (1, 1, 2, 3, 7) the value of parameter P crossed the limit Pmax and the data path has to be classified as a nonfunctional. The key redundancy coordinate of this data path C1 has value 3. It means that all adjacent data paths (with C1 equal to 3) have to be replaced. Fortunately all data paths of spare redundant coordinate space are evaluated as functional and can be used to replace the nonfunctional paths of main redundant space. Because this example covers the case with only one redundant row, the block of redundant data paths that are used for replacement corresponds to the whole redundant coordinate space. It is to be noted also that if there appear more than one nonfunctional data paths with different values of C1, which corresponds to the errors located at different rows, the redundancy approach used in this example can not help. It is also to be noticed that because the redundant coordinate space is complete, which means that all coordinates from the main space are also present in the redundant space, all redundant data paths are fully evaluated with respect to the observed performance metrics and a corrected memory instance doesn't require any additional evaluation before applying the next process of the method—scaling access path statistics towards memory statistics.


A redundancy coordinate space based on extension of dependent coordinates of the main coordinate space is called incomplete redundancy coordinate space. The term incomplete originates from the fact that values of primary coordinates forming dependent key coordinates are not defined in such redundancy space. A typical example of the redundancy approach leading to an incomplete redundancy space is the so called shift column redundancy. In such case the array contains only redundant cells creating one or more columns without any other redundant circuitry, e.g. redundant sense amplifiers, multiplexers and so on. So when a redundancy repair takes place redundancy bitlines have to be reconnected to an existing multiplexer and sense amplifier circuitry the concrete coordinates of these non redundant, reused array parts are not known in advance. Hence the redundancy coordinate space is not defined with respect to these reused array parts and has to be left empty, incomplete.


When turning back to the same simplified example as was used to demonstrate complete redundancy space but now with the shift column redundancy approach implemented, a new dependent coordinate C4←C2*C3 has to be defined expressing the number of columns. The cardinality of the new dependent coordinate M4 is equal to the product of primary coordinates' cardinalities M2M3. Thus the main coordinate space (C1, C2, C3, C4) is formed by the following three primary and one dependent bitcell coordinates


C1—number of rows with the range (1 . . . 4)


C2—number of words per row with the range (1 . . . 2)


C3 —number of bits per word with the range (1 . . . 2)


C4—number of bitline columns with the range (1 . . . 4)


The corresponding redundancy coordinate space (C1, C2, C3, Cr4) extends the dependent coordinate C4 to the redundant one Cr4. Moreover the values of primary coordinates C2 and C3, which form C4, remain undefined before the redundancy takes place. Table 4 shows main and redundancy coordinate space for the described situation.









TABLE 4


















Other examples of redundancy techniques resulting to an incomplete redundancy space are listed in Table 5.











TABLE 5






Main
Redundancy



coordinate space
coordinate subspace















n shift columns











Rows
C1
(1 . . . M1)
C1
(1 . . . M1)


Words
C2
(1 . . . M2)
C2
(x)


Bits
C3
(1 . . . M3)
C3
(x)


Bank
C4?C2 * C3
(1 . . . M2 * M3)
Cr4
( M4 * M3 + 1 . . . M2 *






M2 + n)







n rows per M4 banks











Rows
C1
(1 . . . M1)
Cr1
(x)


Words
C2
(1 . . . M2)
C2
(1 . . . M2)


Bits
C3
(1 . . . M3)
C3
(1 . . . M2)


Bank
C4
(1 . . . M4)
C4
(x)


Total
C5?C1 * C4
(1 . . . M1 * M4)
Cr5
( M1 * M4 + 1 . . . M1 *


Rows



M4 + n)









After the setup of a main and redundant coordinate space the method according to embodiments of the present invention continues in a similar way already described in the case of complete redundant coordinate space—observed array performance metrics are evaluated for all existing data paths and if redundancy takes place both coordinate spaces are merged accordingly. However, due to non-defined primary coordinates existing in incomplete redundancy space, array performance metrics on redundant data paths cannot be fully evaluated. It means that the variability fluctuations of array building blocks described by those non-defined primary coordinates are not included in evaluated performance metrics of redundant space.


Hence, when redundancy correction takes place, a merging of main and redundant coordinate spaces driven by key redundancy coordinates happens. After replacing of non functional data paths of the main coordinate space by redundant ones, the originally non-defined primary coordinates get their values coming from replaced adjacent data paths. Newly formed data paths are fully defined with respect to their cell coordinates and all performance metrics need to be reevaluated for these data paths for a given array instance.


Variants of the method are presented hereinafter.


Global Variations as Primary Coordinate with M=1


During process two, a method according to embodiments of the present invention re-combines statistics of the individual building blocks under local process variations. The result is the array statistics under local process variations. If the user is interested in total (local and global) process variation effects, process 2 can be used in a specific way.


Assuming there exists information on ΔPglob defined earlier, then global variations affect all basic elements of a particular type (e.g. transistors, photodiodes, etc.) on a die in the same manner (to be more precise, every basic element type will receive the same parameter shift). Another primary coordinate can therefore be introduced that captures the global variations, before starting process 2. Setting its multiplicity to one ensures that the number of cells does not change, and that on the same array, every basic element of a same type receives the same variation.


Combining Techniques

The way of combining the effects of basic element parameter fluctuations on the performance metrics in the individual blocks ΔPi has been described until now to be a simple linear addition. This is correct if the effects in the individual blocks are completely independent. In addition, only if global and local variations are orthogonal, linear addition of the effects of ΔPglob is exact. Both requirements are close to reality and the accuracy of linearly combining is accurate within few percent error. This subsection points out that this error can be reduced by selecting different methods of combining ΔPi, ΔPglob, and Pinv in order to get a more accurate estimate of Ppath. In general a combining function ƒ is looked for such that






P
path,loc=ƒ(ΔPi,Pinv,Cloc)






P
path,world=ƒ(ΔPi,ΔPglob,Pinv,cworld)


with a vector of coefficients cloc, or cworld depending on whether a more accurate function for local or for total (world) variations is looked for. It is to be noted that the path indices on the right side of the equations have been dropped for simplification. In order to determine c the data of Ploc, is included, which had been previously defined as a characterization of variants of entire access path netlist under the same process variations as used in the individual blocks for blocks at the same time, or Pworld which contains similar simulation data but under local and global variations, respectively. For example, by going from simple addition of the deltas to linear regression, i.e. adding a scalar constant k to the sum, the typical error was reduced by 50% yielding relative errors of less than one percent for the most commonly used performance parameters. Formally,






P
path,loc
=P
inv
ΣΔP
i
+k
loc





Ppath,world=PinvΣΔPiΔPglobkworld


The constants k have to be derived such that they minimize an adequate error metric that compares the samples of the simulated reference distribution Ploc (or Pworld) with the samples created by the above lines using the same basic element parameter fluctuations. In the example, k was calibrated by minimizing the root mean square error of the individual samples (RMSE). Other error metrics can be used as well.


More complex combining techniques may be used too (nonlinear influences of ΔP and/or interactions between different ΔP). It is also possible to include the basic element parameter fluctuations themselves into the calibration. This enters the field of response surface modeling. If the user wishes to include several global variation types, like chip-to-chip or wafer-to-wafer, etc., it is equally possible to include several corresponding terms into f.


Considering Other Variation Types

Up to now basic element parameter fluctuations have been described. In accordance with embodiments of the present invention, a noise source may be injected, for example on top of the bitline voltage for advanced analysis of read failures caused by too small read margins. In modern processes not only the devices but also the interconnect is subject to manufacturing variations. According to embodiments of the present invention the basic element parameter fluctuations may include also resistance and capacitance variations of interconnect in the same way using the method according to embodiments of the present invention as soon as the problem emerges.


In addition, interconnect delay may cause an important systematic and predictable variation. The physical distance between the outermost word and bitlines across an array, for example a memory, can be so large that additional RC delay t emerges between them. As a result, the distributions of cells far apart shift with respect to each other by t. Up to now the cells were assumed to have no offset with respect to each other. In fact, industrial netlists actually do sometimes contain not only one but two or more cells in order to evaluate P in two or more physical edges or corners of the array. In accordance with embodiments of the present invention, this information may be used when combining the parameter distributions by linear interpolation. For instance, suppose a parameter P is taken for the leftmost column and for the rightmost column and the multiplicity of the column is M. Then when combining the parameter for the column block, one might use:






P(s)=Pinv,left+s×(Pinv,right−Pinv,left)M+ΔPcolumn


where s is the column number, and ΔPcolumn can be either ΔPcolumn,left or ΔPcolumn,right which should be equivalent. This principle can be easily adapted to two- or more-dimensional interpolation, and to interpolation along more coordinate directions.


Alternatively to using interpolation, one can simply use the worse of the two (or more) parameters, accepting little pessimism for reduced complexity. Yet another way is to account 50% of all paths to the one edge and 50% to the other edge. This decreases the pessimism yet keeping the risk of optimism extremely low.


Re-Sampling

Up to now it has been assumed that process one of the method produces a table for representing ΔP. The number of entries or variants is N. It is clear that with increasing number of N, more confidence is gained on the distribution and thus more confidence of the final result. Naturally, the price for N is CPU time as every variant costs a simulation run. In accordance with embodiments of the present invention, N may be increased by fitting a representative distribution function to the samples, and then pick from this distribution function rather than from the small table.


Operating Modes

Exploring different operating modes (voltage, temperature) usually requires re-simulation for accurate results. Yet array samples with different operating modes should correlate. Therefore, the transistor parameter variations may be kept locked during simulation of different modes in process 1. Alternatively, we can keep the generated spice netlists, and re-simulate with different settings for the modes. In order for process two to produce correlated arrays among the modes, a fixed random seed may be set such that the computer random number generator produces the same sequence of random numbers which are used for indexing the parameter tables. This way it is made sure that the same arrays with different modes are built from the same building blocks.


Alternatively, the corresponding (enhanced) Monte Carlo tables for the different modes could be horizontally merged (putting all tables for the same building block side by side, forming a wider table) before starting process 2. When a redundancy sets in for a certain mode, it is locked and applied for every mode. Since a mode parameter change is valid for all basic elements (as opposed to a local parameter fluctuation of a basic element), a response surface model may be built on the invariable netlist and applied to all variants in order to save simulation time.


Using Exponent Monte Carlo

Exponent Monte Carlo is an accelerated Monte Carlo technique which decreases the number of simulations required, especially in the CPU-intensive first part of the method. It can be easily deployed in the method according to embodiments of the present invention by extending the described Monte Carlo tables by a probability (or weight) parameter which assigns a relative weight to every sample produced. This weight depends on the weights of the basic elements in process one, and on the statistical enhancement technique used. Generating the index into the extended Monte Carlo tables now also must take the weights into account.


Circuit Classes

The method according to embodiments of the present invention was exercised on semiconductor memories. These comprise embedded and discrete memories, static and dynamic memories, volatile, non-volatile, (ternary) (range checking) content-addressable, and read-only memories. The concept is more generally equally applicable to any circuit which is modeled such that repeated instantiations of building blocks are missing to increase the simulation speed. These comprise all circuits with array or vector structure, e.g. pixel arrays in sensor chips, mixed-signal circuits such as A/D or D/A converters with many parallel paths, networks on chip (NoC), routers, switch arrays, decoders, FPGAs, processor fabrics, or arithmetic logic.


In order to assess the usefulness of the method results are presented on an industrial memory example. The technology is an industrial 45 nm technology with process variations supplied by a leading semiconductor company and shown in FIG. 17. The memory used carries the label hs-spsram_uhd 4096×64 m8 which means it stores 4096 words of 64 bits, with 4 words in a row. No redundancy mechanisms were assumed at first. FIG. 18 and FIG. 19 show the outcome of process 1. From the more than one hundred available parameters P the cycle time of the memory was selected. The Figures show the invariable value Pinv, and the probability density functions (PDF) of the access path netlist cycle time sensitivity due to variability in building blocks Pi, and for the total ΔP netlist Ploc, and Pglobl, displayed in seconds (s). After also running process 2, the distribution of P is obtained for the entire memory. It is shown in FIG. 20. The shift of the cycle time caused by local process variations can be clearly seen.



FIG. 21 to FIG. 23 show that the correlation between any two parameters, in the example again the cycle time was used, and correlated to the read margin, is preserved both in process one and process two. This time, results for a high speed 512×64 m4 memory are shown. It is easy to see that for the read margin, the MIN operator was applied. This is clear as the smaller the read margin the worse for a safe activation of the sense amplifier. In addition, FIG. 22 shows a comparison to a corner-simulation (see Background section) as done in industry today. The selection of the corners seems to be safe for the access path netlist but is too optimistic for the scaled memory results. Especially the read margin can be much lower than the corner with the lowest read margin. It is now assumed that the memory has two redundant rows and an assumed tester constraint is set to 0.82 ns cycle time. In addition, a minimum limit is set on the read margin to 45 mV. FIG. 24 shows a log file of process two for the first 200 memory samples with activated redundancy mechanism.


The foregoing description details certain embodiments of the invention. It will be appreciated, however, that no matter how detailed the foregoing appears in text, the invention may be practiced in many ways. It should be noted that the use of particular terminology when describing certain features or aspects of the invention should not be taken to imply that the terminology is being re-defined herein to be restricted to including any specific characteristics of the features or aspects of the invention with which that terminology is associated.


While the above detailed description has shown, described, and pointed out novel features of the invention as applied to various embodiments, it will be understood that various omissions, substitutions, and changes in the form and details of the device or process illustrated may be made by those skilled in the technology without departing from the spirit of the invention. The scope of the invention is indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.

Claims
  • 1. A method of analyzing a performance metric of an array type electronic circuit under process variability effects, the electronic circuit comprising an array with a plurality of array elements, an access path being a model of the electronic circuit, the model comprising building blocks comprising all hardware to access one array element in the array, each building block comprising at least one basic element, the method comprising: deriving statistics of the access path due to variations in the building blocks under process variability of the basic elements; andderiving statistics of the full electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture.
  • 2. The method according to claim 1, wherein combining the results of the statistics of the access path under awareness of an architecture of the array comprises taking into account a specification of an instance count and the connectivity of the building blocks
  • 3. The method according to claim 1, wherein deriving statistics of the access path comprises injecting into the basic elements of a building block variability that can occur under process variations, and simulating the thus modified access path.
  • 4. The method according to claim 3, wherein variability is injected into the basic elements of one building block at a time, the other building blocks of the access path remaining invariant with respect to their nominal case.
  • 5. The method according to claim 3, wherein deriving statistics of the access path due to variations in the building blocks comprises any statistical sampling technique.
  • 6. The method according to claim 1, further comprising recording resulting sensitivity populations of the access path.
  • 7. The method according to claim 1, wherein deriving statistics of the full electronic circuit comprises any statistical sampling loop.
  • 8. The method according to claim 1, wherein deriving statistics of the access path to variations in the building blocks comprises combining the building block sensitivities.
  • 9. The method according to claim 1, wherein deriving statistics of the full electronic circuit comprises: generating a template of the electronic circuit comprising all paths through the circuit;creating a random observation of the electronic circuit following this template; andrepeating at least once the process of creating a random observation of the electronic circuit with different random sequences to generate an electronic circuit population.
  • 10. The method according to claim 9, wherein generating a template of the electronic circuit comprises including redundant paths in the template.
  • 11. The method according to claim 9, wherein creating a random observation of the electronic circuit comprises: for each building block of the electronic circuit, selecting one random sample from the obtained sensitivity data;combining the thus-obtained samples; andderiving a corresponding path performance metric for every path in the electronic circuit.
  • 12. The method according to claim 11, further comprising: evaluating a path performance metric for every path in the electronic circuit; andselecting the combination of building blocks corresponding to the worst-case value of this path performance metric.
  • 13. The method according to claim 11, wherein deriving statistics of the full electronic circuit further comprises scaling the path performance metrics into an observation of the electronic circuit performance, using any of MAX, MIN, AVG, SUM, AND, OR operators.
  • 14. The method according to claim 1, wherein the method is performed by one or more computing devices.
  • 15. A computer-readable medium having stored thereon a program which, when executed on a computer, performs the method according to claim 1.
  • 16. A system for analyzing a performance metric of an array type electronic circuit under process variability effects, the electronic circuit comprising an array with a plurality of array elements, an access path being a model of the electronic circuit, the model comprising building blocks containing all hardware to access one array element in the array, each building block comprising at least one basic element, the system comprising: first calculation means arranged for deriving statistics of the access path due to variations in the building blocks under process variability of the basic elements; andsecond calculation means arranged for deriving statistics of the full electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture.
  • 17. A system for analyzing a performance metric of an array type electronic circuit under process variability effects, the electronic circuit comprising an array with a plurality of array elements, an access path being a model of the electronic circuit, the model comprising building blocks containing all hardware to access one array element in the array, each building block comprising at least one basic element, the system comprising: a first calculation module configured to derive statistics of the access path due to variations in the building blocks under process variability of the basic elements; anda second calculation module configured to derive statistics of the full electronic circuit by combining the results of the statistics of the access path under awareness of the array architecture.
  • 18. The system according to claim 17, wherein combining the results of the statistics of the access path under awareness of an architecture of the array comprises taking into account a specification of an instance count and the connectivity of the building blocks
  • 19. The system according to claim 17, wherein the first calculation module is configured to inject into the basic elements of a building block variability that can occur under process variations, and to simulate the thus modified access path.
  • 20. The system according to claim 17, further comprising at least one computing device configured to execute at least one of the first calculation module and the second calculation module.
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority under 35 U.S.C. §119(e) to U.S. provisional patent application 61/163,390 filed on Mar. 25, 2009, which application is hereby incorporated by reference in its entirety.

Provisional Applications (1)
Number Date Country
61163390 Mar 2009 US