This application relates generally to determination of root cause(s) for semiconductor wafer excursions, and more particularly, to identifying particular sequences of processing that are linked to production of off-quality wafers.
The determination of a root cause for a semiconductor production problem is a well-known but difficult issue. Systems for classification and anomaly detection typically rely upon analysis of extensive data obtained from production runs to evaluate data excursions from expected values. It would be desirable to have effective tools for narrowing the scope of possible causes to thereby simply classification schemes.
In this disclosure, the transitions from one step to another (or one piece of equipment to another) in a fabrication facility are evaluated to identify those transitions, and in particular, pairs of transitions, that are critical for distinguishing classes of wafers, most simply, good wafers or bad wafers.
This disclosure describes an approach that is useful in determining root cause(s) for semiconductor wafer production quality issues. The approach models semiconductor processing equipment history for wafer or lot production as an event sequence. Probabilities are then computed for each transition between steps or states of a particular semiconductor recipe as the wafer/lot moves from equipment to equipment or chamber to chamber, namely, is this transition likely to lead to good wafers or bad wafers? The computed probabilities for the complete wafer processing path are aggregated and cross-validated to confirm the accuracy of the model.
Because it is common to provide multiple processing paths for selected steps of a process, such as providing multiple lithography chambers that feed multiple etching chambers, it has been recognized that such paired combinations can produce differing quality results. Thus, the objective is to identify a particular sequence of processing steps that accounts for more bad or off-quality wafers/lots than another sequence. This objective can be achieved by returning to the individual probabilities to find and evaluate anomalous transitions. This information will narrow the field of possible root causes, and for that reason, will be an important input for determining root cause.
Of course, a typical semiconductor process may have hundreds of steps to form the desired circuit features, including deposition, diffusion, ion implantation, lithography, etch, metallization, etc., and upon completion of the device, post-fabrication testing. In addition, as noted above, it is common to provide multiple parallel processing paths for selected steps of the recipe. However, having multiple processing paths creates the opportunity for differing quality results, which will be evaluated as shown here.
Referring now to
In this example, assume that final results from a current production run of this recipe show that 90% of the wafers that are processed along path A1 are acceptable good quality while 10% turn out to be off-quality; conversely, only 10% of the wafers that are processed along path A2 are acceptable good quality, while 90% of the wafers processed along path A2 turn out to be off-quality. Thus, we clearly now know that most of the off-quality wafers come from path A2, while most of the good quality wafers come from path A1. This conclusion indicates that some interaction between Litho-A and Etch-2 is problematic and should be identified and corrected to improve yield. For example, there may be a slight misalignment of the mask in the Litho-A operation that does not severely impact the quality of the wafer after processing in Etch-1. However, the misalignment in Litho-A may be propagated and further impacted by an additional misalignment in the etching step, and the combination of misalignments in Litho-A and Etch-2 cause the wafer to fail quality testing. Identifying path A2 as the culprit narrows the list of possible causes for off-quality wafers to lithography-related issues in Litho-A, etch-related issues in Etch-2, and the transport of wafers from Litho-A to Etch-2.
Thus, a model can be created to evaluate the probabilities at each transition from one step of the process to another step for a particular process path. The transition probabilities are then aggregated to check the performance of the model. If the model produces results that match the production results, then the individual probabilities of each process step can be reviewed to identify the process paths (as event sequences) that lead to anomalous results.
The model can be based on known classification and anomaly detection models for event sequences, including but not limited to a Naïve Bayes classifier, a Markov chain (MC), a hidden Markov model (HMM), and a recurrent neural network (RNN), and is trained on production data from that process path.
As a high-level example, a machine learning model is configured based on a Markov chain stochastic model to evaluate transitions from one state to the next state, the state transition representing a step in the wafer processing path from one piece of equipment (or chamber) to the next piece of equipment (or chamber) in the processing recipe.
The model is generalized by an example of a using a portion of a processing path that proceeds through state i to state j and then on the path to some final state k. For example, equation (1) below computes a fraction Tgood lot of normal/good wafers that pass to state j from state i, as measured by metrology and other common statistical indicators. The fraction Tgood lot is equal to the count of normal quality wafers that pass from state i to state j, divided by the sum of counts of normal quality wafers that pass from state i to the final state k.
Similarly, equation (2) below computes a fraction Tbad lot of off-quality/bad wafers that pass to state j from state i. The fraction Tbad lot is equal to the count of off-quality wafers that pass from state i to state j, divided by the sum of counts of off-quality wafers that pass from state i to state k.
The final prediction W for each wafer lot is the sum of log-odd transitions, as shown in equation (3) below. The more positive the final prediction W, the more likely it is that the entirety of the processing path leads to normal quality wafers. The more negative the final prediction W, the more likely it is that the processing path leads to off-quality wafers.
For processing paths that produce more negative results, equation (2) provides a quantification of off-quality at each step along the path, and can be reviewed and analyzed to identify significant fractions Tbad lot that require investigation for corrective action. As noted above, by identifying anomalous transitions, the list of possible causes becomes more limited and very likely known as a result. Further, the results of computations from each transition can be provided as inputs to a hierarchical model configured for determining root cause.
Example data for a prediction Won a training set is shown in
This is a promising result for identifying the problematic equipment chains. For example, the model detects that when the wafer goes through a particular sequence of equipment processing, such as equipment A to equipment B to equipment C in that order, the wafer has a statistically significant increase in probability that it will be a bad wafer. This knowledge of sequence and transition probabilities will help customer identify the likely root cause for the bad wafer. Outputs from the model can be provided as inputs to a second model configured for root cause determination, where the equipment-history based inputs can provide significant predictive ability for root cause determinations. Outputs from the model can also be provided to a third model configured to improve the predictability of the first model, for example, through feature engineering and selection to limit inputs to those having a significant predictive ability.
The model needs to handle cases where there is uncertainty in the transition probability due to a small sample size. To do so, transitions that are not statistically significant are removed from the average transition probabilities. Further, a prefix can be added to all transition counts T. For example, the initial transition counts can be set as follows:
CNTstate
CNTstate
where a is the ratio of bad lots to good lots. The transition probability can be recomputed when the model detects a significant change.
A hidden Markov model assumes a Markov model with unobservable hidden states. For this application, internal hidden states can be represented as wafer quality inputs, which produces observables such as intermediate Wafer Acceptance Test (WAT) or Process Control Monitoring (PCM) data, which are test measurements from test structures built in scribe lines and collected during manufacturing steps. Other possible inputs include, but are not limited to, defect data, metrology data, and FDC indicators. The transition probability for this hidden Markov model could be configured as dependent on different processing path scenarios, for example, based on the current equipment in use, the current manufacturing process step, or pairs of equipment (what was used previously and what will be used next), process step pairs, etc.
The modeling of transition probabilities is facilitated by the emergence of parallel processing architectures and the advancement of Machine Learning algorithms which allow users to model problems and gain insights and make predictions using massive amounts of data at speeds that make such approaches relevant and realistic. Machine Learning is a branch of artificial intelligence that involves the construction and study of systems that can learn from data. These types of algorithms, and along with parallel processing capabilities, allow for much larger datasets to be processed, and are much better suited for multivariate analysis in particular.
The creation and use of processor-based models for implementing classification and anomaly detection methods, including computing transition probabilities as described herein, can be desktop-based, i.e., standalone, or part of a networked system; but given the heavy loads of information to be processed and displayed with some interactivity, processor capabilities (CPU, RAM, etc.) should be current state-of-the-art to maximize effectiveness. Additionally, these computations are highly parallelizable in a map-reducing manner, i.e., the computations could run easily in Big Data ecosystems. In the semiconductor foundry environment, the Exensio® analytics platform is a useful choice for building interactive GUI templates. In one embodiment, coding of the processing routines may be done using Spotfire® analytics software version 7.11 or above, which is compatible with Python object-oriented programming language, used primarily for coding machine learning models.
The foregoing description has been presented for the purpose of illustration only—it is not intended to be exhaustive or to limit the disclosure to the precise form described. Many modifications and variations are possible in light of the above teachings.
This application claims priority from U.S. Provisional Application No. 63/071,981 entitled Event Sequence Driven Approach to Determine Quality of Wafer Path for Semiconductor Applications, filed Aug. 28, 2020, and incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63071981 | Aug 2020 | US |