Circuit reliability under statistical process variation is an area of growing concern. As transistor sizes are becoming smaller, small imperfections during manufacturing result in large percentage variation in the circuit performance. Hence, statistical analysis of circuits, given probability distributions of the circuit parameters, has become indispensable. Performing such analysis usually entails estimating some metric, like parametric yield, failure probability, etc. Designs that add excess safety margin, or rely on simplistic assumptions about “worst case” corners no longer suffice. Worse, for critical circuits such as SRAMs and flip flops, replicated across 10K-10M instances on a large design, there is the new problem that statistically rare events are magnified by the sheer number of these elements. In such scenarios, an exceedingly rare event for one circuit may induce a not-so-rare failure for the entire system. Existing techniques perform poorly when tasked to generate both efficient sampling and sound statistics for these rare events: Such techniques are literally seeking event in the 1-in-a-million regime, and beyond. Statistical metrics such as parametric yield and failure probability can be represented as high dimensional integrals and are often evaluated using a Monte Carlo simulation.
Monte Carlo analysis remains the gold standard for the required statistical modeling. Standard Monte Carlo techniques are, by construction, most efficient at sampling the statistically likely cases. However, when used for simulating statistically unlikely or rare events, these techniques are extremely slow. For example, to simulate a 5 event, 100 million circuit simulations would be required, on average.
There is another application domain characterized by many of the same technical challenges faced with semiconductors. That domain is computational finance. Indeed, the parallels are striking. There are celebrated analytical results, for example, the Nobel Prize winning Black-Scholes model for option pricing (see Background reference [2]). But there is also the reality that, as financial instruments have become ever more complex and subtle, analytical models have given way to Monte Carlo as the only practical analysis method (see Background reference [2]). The problems are not only very nonlinear, they can also be quite large: pricing a portfolio of options or securities over a several year horizon can create problems with 1000+ statistical variables (see Background reference [3]). Accuracy is often required to the level of one basis point (a relative accuracy of 10−4) under impressively short time constraints (minutes, in the case of real-time arbitrage).
The natural question becomes: Can any of these methods be redeployed, moving them from finance to flip flops? In particular, can recent Monte Carlo methods developed for quickly pricing complex financial instruments be retargeted to the problem of estimating statistical quantities of interest in deeply scaled circuits? To be concrete: Does the deep statistical structure of pricing a 30-year mortgage backed security resemble, in any practical and exploitable way, the structure of random dopant fluctuations in an SRAM column? As it turns out, the answer is “yes,” as is discussed later in the specification.
Consequently, there exists a need to develop Monte Carlo-type strategies that sample and interpret systems data (whether semiconductors or computational finance systems) much more rapidly and efficiently while maintaining meaningful results.
The following is a listed of related art that is referred to in and/or forms some of the basis of other sections of this specification.
The invention provides a means to efficiently and effectively detect and/or predict relatively rare failures or events to a wide range of industrial circuits and systems. The approach to the invention involves the representation of circuit metrics as a large multi-dimensional integral. This invention estimates such statistical circuit metric integrals by sampling the statistical variable space using a so-called “low-discrepancy sequence.” This is similar to the Monte Carlo method, the main difference being the method of sampling the variable space.
Compared with standard Monte Carlo simulation, this technique, “Quasi-Monte Carlo Methods,” gives similarly reliable estimates of the result, but requiring many fewer samples of the circuit or system being evaluated. In practice, speedups of 2× to 50× across a range of practical examples are observed.
This embodiment involves the application of one of the most celebrated methods developed in computational finance in the last decade: the Quasi Monte Carlo (QMC) method to statistical circuit analysis, using a computing device programmed to use QMC as it evaluates circuit metric data. As with all Monte Carlo methods, the goal is to converge to the required accuracy as rapidly as possible, with as few sample simulations as possible. Although the underpinnings of QMC are not new (see Background reference [4]), recent improvements in both theory and implementation complexity, along with the empirical discovery that these methods are unexpectedly efficient at high-dimensional statistical integral evaluation, propelled these techniques onto center stage in the computational finance world (see Background reference [5]).
Unfortunately, like all complex mathematical methods, correct application requires adapting the strengths of the methods to the specifics of the problem. In other words, these ideas cannot be applied blindly and expect to extract maximum (or, perhaps, any) benefit. This embodiment reviews the convergence theory for both standard Monte Carlo and QMC methods, and show how to correctly apply these ideas to a range of statistical circuit analysis problems.
Monte Carlo methods are typically used to approximate some integral of the following standard form:
I(f)=∫C
Where Cs=[0, 1)s is the s-dimensional unit cube, and ƒ is some integrable function.
The Monte Carlo approximation is given by
Problems with different variable ranges, arbitrary statistical distributions, arbitrary nonlinearity, etc., can always be transformed into this canonical integral form; i.e., these can always be included in our function ƒ, without any loss of generality. Thus, the problems we discuss are all defined over the s-dimensional unit cube. Parametric yield computation for circuits also follows the form in Equation 1. Given this, let us look at the convergence properties of standard Monte Carlo.
If has finite variance
σ2(ƒ)=∫C
then the mean square error of the Monte Carlo integral approximation is given as
E[(Q(ƒ)−I(ƒ))2]=σ2(ƒ)/n (Equation 4)
Hence, the expected Monte Carlo error is O(n−1/2). The advantage of standard Monte Carlo is that this error does not depend on the dimensionality s.
There is another way to look at the error, using the concept of discrepancy.
There are several definitions of discrepancy (see Background reference [6]), the simplest being the Star Discrepancy, or the L∞-discrepancy:
D*
n
=supJ|A(J;n)/n−Vol(J)|, (Equation 5)
Geometrically speaking, the star discrepancy measures how well the (relative) volume of any origin-anchored hyper-rectangle in the unit cube is approximated by the fraction of sample points that lie in that volume. Surprisingly enough, samples from the standard uniform distribution xi˜U[0, 1)s may show extremely large discrepancy, as
The Koksma-Hlawka theorem (see Background reference [7]) quantifies this effect. If has a suitably bounded variation V(ƒ) then the absolute integration error is itself bounded by the star discrepancy, as:
|Q(ƒ)−I(f)|≦V(ƒ)D*n (Equation 6)
(V(ƒ) itself has a rather technical definition; see Background reference [8].) The larger implication is that sample points with lower discrepancy can produce integral estimates with lower errors.
The first obvious question is: What is the discrepancy for standard Monte Carlo? Chung (see Background reference [9]) showed that, for uniform points xi˜U[0, 1)s,
Thus, there is an echo of the familiar convergence behavior. But the real question is this: Are there sampling sequences that guarantee a better, lower discrepancy? The answer is “yes”.
Sequences with asymptotically superior discrepancy exist and are known as Low Discrepancy Sequences (LDSs).
D*
n
=O((log n)s/n) (Equation 8)
and they possess the surprising attribute that they are generated deterministically, in contrast to the standard pseudo-random sampling of classical Monte Carlo. Monte Carlo performed using samples generated deterministically from a low discrepancy sequence is known as Quasi-Monte Carlo (QMC). LDSs are also known as Quasi-Random Sequences. The overall idea is conceptually simple: Rather than randomly sampling the space, the space is attempted to be filled with samples that are as geometrically, homogeneously equidistant as possible.
Comparing the bounds of Equations 7 and 8 gives some sense of the possible advantages, and challenges, of the method. Comparing denominators, there is the tantalizing possibility of linear convergence for QMC. But comparing numerators, the advantages of QMC may, for larger problems (large dimensionality s), only make themselves apparent after a huge number of sample points n. Luckily, in many empirical situations, this turns out not to be the case; this is discussed later in this embodiment.
The first construction of an LDS for all problem dimensions was given by Halton in 1960 (see Background reference [4]). Other constructions have been introduced by Sobol' (see Background reference [10]), Faure (see Background reference [11]), Niederreiter (see Background reference [12]) and Niederreiter and Xing (NX) (see Background reference [13]). Space does not permit any detailed survey of the different strategies here; see Background reference [2] for a survey. Niederreiter showed a general construction principle for one large and popular class of LDSs called (t,s)-sequences (see Background reference [6]). One particularly successful set of (t,s)-sequence, called Sobol' point, was used for the experiments.
Sobol's construction, introduced in Background reference [10], is one of the most popular in current use. Sobol' points perform significantly better than the original Halton points in terms of discrepancy. Also, empirical results (see Background references [2] and [14]) suggest that Sobol' points perform better than Faure points—at least, for modern computational finance applications. The NX points promise to have significantly better discrepancy (see Background reference [13]). However, their implementation is significantly more complex and, currently, not flexible enough for an arbitrary problem dimension s, requiring the solution of a set of thorny number theoretic problems for each dimension. For all these reasons, the Sobol' points were chosen as the representative LDS.
The following is offered to briefly describe the Sobol' points construction. Implementations in Background references [15] and [16] are used. First, suppose that only one dimension is being worked in; i.e., s=1. One primitive polynomial (see Background reference [17]) is chosen in the field Z2 (coefficients from {10,1})
P≡x
d
+a1xd−1+ . . . +ad−1x+1 (Equation 9)
Also, odd integers are chosen, m1, . . . , md, such that 0<mj<2j. Define direction numbers
v
j
=m
j/2j,j≦d (Equation 10)
and their recurrence relation (in Boolean operations)
v
j
=a
1
v
j−1
⊕ . . . ⊕a
d−1
v
j−d+1
⊕v
j−d⊕(vj−d/2d),j>d (Equation 11)
This results in a set of direction numbers vj for j>0. To compute the n-th Sobol' value xn, the following equation is used:
xn=n1v1⊕n2v2⊕ . . . , (Equation 12)
where . . . n3n2n1 is the Gray code representation of n.
Using the Gray code representation is must faster than using the binary representation, since only one bit changes in the Gray code from n to n+1, making the operation in Equation 12 incremental (only one XOR). This reshuffling does not affect the asymptotic discrepancy.
For a general problem with s>1 dimensions, s different primitive polynomials are chosen and sequences for each coordinate are generated, using the above method. The polynomials are chosen sequentially with non-decreasing degree d, for increasing dimension.
One additional problem is how to choose the initial values for each dimension i. Each of these shall be named as mj. Also, renaming the direction numbers as vi,j, where i is the dimension 1≦i≦s, vi,j,1 is defined as the first bit after the binary point of vi,j. Set
Vd=[vi,j,1],where 1≦i≦d and 1≦j≦d (Equation 13)
Then, according to Sobol's development in Background reference [10], the condition
det(Vd)=1(mod 2) (Equation 14)
gives better uniformity. Hence, mi,j are chosen to satisfy Equation 14 (see Background reference [16]).
A generator for Sobol' points is relatively straightforward to implement, requiring mainly bit-level Boolean operations, and relatively little of the number-theoretic difficulty of some of the other LDS strategies. However, all LDS schemes suffer from some idiosyncrasies when applied to higher dimensional problems, requiring additional finesse in the way statistical integration problems are mapped into a viable QMC formulation.
Looking only at the asymptotics, the O((log n)s/n) error bound of QMC should show no runtime improvements over O(n−0.5) the bound of conventional Monte Carlo for very large s and feasibly large n. However, QMC has been seen to outperform Monte Carlo even for problems with very large s; e.g., IBM's1439-dimensional derivative-pricing experiments of Background reference [3]. This anomalous, empirical success has been largely explained using the concept of effective dimension (see Background reference [8]). The concept is reviewed here because it strongly impacts the manner in which will map the circuits problems into a successful QMC form.
Reviewing first the concept of the Analysis of Variance (ANOVA) Decomposition, the decomposition expresses a function ƒ(x) as a sum of simpler functions ƒu(x), each depending on a subset of the inputs x=(x1, . . . , xs). For any subset u⊂{1, . . . , s} let −u be its complementary set {1, . . . s}−u and let xu={xi}, iεu be the sub-vector of coordinates of corresponding to U. Also, let Cu denote the unit cube in the dimensions that belong to u. Then, for any square integrable function ƒ, the ANOVA decomposition is
where the ANOVA terms follow the recursion
and are orthogonal. Hence, the variance off can be written as
Definition 1. The effective dimension of ƒ, in the superposition sense, is the smallest integer
Definition 2. The effective dimension of ƒ, in the truncation sense, is the smallest integer
Hence, sT is the number of leading dimensions, in a fixed ordering, that account for most of the variance in the function, while sS is an indicator of whether only low-dimensional interactions dominate the variation in ƒ. For example, ƒ(x)=x1+x2+x3 has truncation dimension 4, but superposition dimension 1.
Effective dimension is relevant for two important reasons. First, it is widely invoked to help explain why QMC has been so strikingly efficient (e.g., 150× speedup (see Background reference [3])) on large financial problems. These tasks seem to have low effective dimension; for example, in a pricing task with a long time horizon. Money today is much more valuable than money tomorrow, which reduces the impact of many dimensions of the problem. It is an open question if this behavior obtains in circuit analysis. Second, effective dimension is essential to optimally map problems into QMC form, which we discuss next.
Ideally, it should not matter how problem variables are assigned to elements in our LDS points x=(x1, . . . , xs). Suppose, for example, there are 100 random threshold voltages to sample. It should not matter if any particular voltage is mapped to x1, or x37, or x99. Unfortunately, this is not the case. All LDSs are imperfect, and usually show degraded uniformity as dimension increases. This takes the form of pattern dependencies (see Background reference [8]), illustrated in
This problem can be finessed by trying to assign the most “important” statistical variables to the lower, less pattern sensitive coordinates of x. More formally, in the language of ANOVA, we can write
This suggests that if ƒ has low sS, then, because of lower D*n,u, QMC will perform better than Monte Carlo. But to deal with the pattern effects of
For problems with a time-series random-walk structure, there are good techniques for mapping (see Background reference [19]), but these are not applicable in the case of circuit yield analysis. Principal Components analysis (PCA) is obviously useful, but even here, we still need to be able to best map the problem to a QMC form after PCA has completed. Two strategies are suggested:
The latter method is concentrated on here. The measure of sensitivity that we use is the absolute value of the Spearman's Rank Correlation Coefficient (see Background reference [20]). This is similar to Pearson's Correlation, but more robust in the presence of non-linear relationships. Suppose Ri and Si are the ranks of corresponding values of a parameter and a metric, then their rank correlation is given as:
This approach has a two-fold advantage. First, it helps reduce the truncation dimension, since all the important dimensions are the first few. Second, the first few dimensions of the Sobol' points are more uniform for small (see Background references [2] and [19]), and this approach helps map the important subset of variables (large σu) to the dimensions with good uniformity (small D*n,u). The rank correlation can be computed by first running a smaller Monte Carlo run. For multiple metrics, the sum of the rank correlation values across all the metrics is used.
One final problem is now confronted: The error bound Equation 6 for QMC is very difficult to compute. Also, it is only an upper bound on the error: It does not provide a practical way to measure the actual error, if the exact solution is unknown. In a standard Monte Carlo scenario, several different pseudo-random samplings would be simply run and compared. But QMC generates deterministic samples: Each run yields the same samples. To address this, Owen (see Background reference [21]) introduced Randomized QMC (RQMC) to estimate the variance, using so-called scrambled versions of the same LDS. Let {x0, x1, . . . } and {y0, y1, . . . } denote the original LDS and a randomly scrambled version, respectively. Let xni=0, xni1xni2 . . . be the i-th coordinate of xn. Then,
y
ni1=πi(ni1), and ynik=πx
where π( . . . ) are random permutations of {0, 1, . . . , b−1}, chosen uniformly and mutually independently.
Hence, this method scrambles the digits of the original LDS. Other methods have also been introduced (see Background reference [22]). All these randomized sequences maintain the uniformity properties of the original LDS.
Owen's original scrambling uses a large amount of memory. Hence, a more scalable, but less powerful, version is used, described in Background reference [23].
In this discussion, the performance of the scrambled Sobol' points is compared against the performance of standard Monte Carlo, on three different testcases. First, some observations can be made about the Monte Carlo and RQMC implementations:
Now, the testcases and the experiments will be discussed. All samples were evaluated using detailed circuit simulation in Cadence Spectre. Results for all testcases will be analyzed together later in this embodiment.
The first testcase is a commonly seen Master-Slave Flip-Flop with scan chain (MSFF),
σ(Vt)=0.0135Vt0/√{square root over (WL)} where W,L are in μm (Equation 21)
Vt0 is the nominal threshold voltage. This results in 30% standard deviation for a minimum-sized transistor. The tox standard deviation is taken as 2%. The metric being measured is the clock-output delay, τcq. The integral being estimated is the parametric yield, with a maximum acceptable delay of τmax=200 ps. If we define
then yield can be expressed in the form (Equation 1) as follows:
Y
t(ƒ,φ)=ƒC
There are a total of 31 statistical variables in this problem. For the MSFF, yield will be given as γτ
As an illustrating example, consider how the rank correlation-based variable-dimension mapping works for this testcase.
The second testcase is a 64-bit SRAM column. Yield analysis of SRAMs is unavoidable, given the large capacity of SRAMs and large variation due to RDF. Our second testcase is a 64-bit SRAM Column, with non-restoring write driver and column multiplexor (
σ(Vt)=5mV/√{square root over (WL)} where W,L are in μm (Equation 24)
This variation is too large for the 90 nm process, but is in the expected range for more scaled technologies. σ(tox) is taken to be 2%.
The metric being measured is the write time τw: the time between the wordline going high to the non-driven cell node (node 2) transitioning. Here, “going high” and “transitioning” imply crossing 50% of the full voltage change. The write time is measured as a multiple of the fanout-4 delay of an inverter (FO4). The value being estimated is the 90-th percentile of the write time. If we write
then any p-th percentile can be expressed in form (Equation 1) as
πp(f,φ)=∫C
then, the 90-th percentile in this case will be π900(τw, Φ).
Ten Monte Carlo runs of 20,000 pseudo-random points each were run. One QMC run of 20,000 Sobol' points and 9 QMC runs of 20,000 scrambled Sobol' points each, were also run. The results are discussed later in this embodiment.
The third testcase is a low-voltage CMOS bandgap reference.
In this case, three metrics are measured: 1) output voltage (Vref), 2) settling time (τS) and 3) dropout voltage (Vdo). Vdo is the difference between the supply voltage and Vref when Vref falls by 1% of its nominal value (0.6V): lower Vdo implies a more robust circuit. The circuit performance is deemed acceptable only if Vref is within 10% of 0.6V, τs≦200 ns, and Vdo≦0.9V. The yield integral can be written in form (Equation 1), similar to as was done for the MSFF discussed earlier. Ten Monte Carlo runs of 10,000 pseudo-random points each were run. One QMC run of 10,000 Sobol' points and nine QMC runs of 10,000 scrambled Sobol' points each, were also run.
Using these fits, the number of Monte Carlo or QMC samples needed such that the result lies within a given interval for a given confidence level can be estimated. Using the Central Limit Theorem (see Background reference [30]), for a confidence level of 95.45%, this interval is [μ−2σ, μ+2σ]. Hence, for the estimates to lie within 1% deviation from the exact value, with a confidence of 95.45%, the value of a should be no greater than 0.5% of the exact value. Table 1 compares the number of points needed for Monte Carlo and QMC, for maximum errors of 1% and 0.1%, at the same confidence level. The exact value is approximated by the best estimate, shown in
Moderate-to-large speedups (2× to 50×) were observed, showing the effectiveness of QMC as a variance reduction method. These speedups improve as the required accuracy increases. Here, it was assumed that the value of computed using 10 runs is exact. This is not true in reality, but, since the same assumption is being used for the Monte Carlo and QMC cases, the relative trends seen here can be believed. It should also be possible to apply other Monte Carlo variance reduction techniques (see Background reference [2]), independently, on top of QMC, to further improve accuracy.
Computational finance problems share a number of the features with statistical circuit analysis problems. It has been demonstrated that one of the most celebrated techniques in the finance world, Quasi-Monte Carlo analysis, can be successfully applied to statistical circuit yield problems, with attractive runtime speedups. However, one must be quite careful in mapping these problems onto a QMC form, using appropriate sensitivity information. To the best of the inventor's knowledge, this is the largest and most rigorous experimental comparison of Monte Carlo versus QMC ideas ever undertaken in the context of industrially relevant scaled CMOS technologies and circuits.
This embodiment involves a method used with respect to a manufacturing process for a circuit, with the manufacturing process being susceptible to simulation of quality, and the manufacturing process having a number of statistical parameters defined as “d”. The method is comprised of the steps of generating a point from a low-discrepancy sequence in a d-dimensional cube of side length one, the point having coordinates within the cube; transforming the coordinates of the point, such that the distribution changes from a uniform unit cube to that specified for the statistical parameters; creating an instance of the circuit or system, in a form suitable for detailed numerical simulation, with the values of the statistical parameters as given by the generated point; simulating the circuit using a circuit simulator, yielding measured circuit performances; combining the measured circuit performances to arrive at a current estimate of the quality; repeating the generating, transforming, creating, simulating, and combining steps until the estimate of the quality has been obtained to a desired quality; and communicating the estimate to a human user. This method would normally performed by way of programmed computing device that yields an output to a human-readable display or printout.
This method can be further extended wherein the circuit simulator is Spectre.
This method can be further extended wherein the circuit simulator is HSPICE.
This method can be further extended wherein the low-discrepancy sequence is a sequence of Sobol' points.
This method can be further extended wherein the simulation of quality comprises a simulation of reliability.
Those skilled in the art will have no difficulty devising myriad obvious variations and improvements to the invention, all of which are intended to be encompassed within the scope of the claims which follow.