This application claims priority under 35 U.S.C. § 119 to patent application no. DE 10 2023 207 670.5, filed on Aug. 10, 2023 in Germany, the disclosure of which is incorporated herein by reference in its entirety.
The disclosure relates to a method for teaching a machine learning system for predicting failure probabilities of chips on a wafer and a method for predicting failure probabilities of the chips with the taught machine learning system.
It is generally known that machine learning (ML for short) and machine learning methods can be suitable for predicting defects in components during production.
Currently, predicting the failure probability or defect probability of an individual semiconductor chip during final testing based on wafer level test measurements is not possible due to the loss of traceability between processes. In particular, the link between a single chip for which wafer-level measurements have been performed and the associated final test is lost after the wafer has been cut, making the training of supervised models impossible.
It is therefore a task of the disclosure to provide a prediction of the probability of an individual chip being defective based on test measurements at wafer level.
By predicting the failure probability, chips with a high failure probability can be removed early in the manufacturing process or, depending on the failure probabilities of all chips in a batch, an intelligent combination of chips in the end product with a low failure probability of the end product can be determined.
As a result, the disclosures can reduce waste, rework and additional costs associated with manufacturing and shipping defective products. By detecting and eliminating defects at an early stage, manufacturers can improve product quality and reduce the reject rate.
In a first aspect, the disclosure relates to a method for teaching a machine learning system to predict failure probabilities of the chips on a wafer during final-level testing or, in particular, later during operation of the chips.
The method begins by providing a training data set comprising wafer-level test measurements and an associated final test yield of a plurality of wafers. The final test yield can be understood as a quotient of the number of chips on the wafer that have passed the final test compared to the total number of chips on the wafer tested in the final test.
The following steps are then repeated several times:
What is surprising here is that the actually simplified training task of yield prediction is used to learn during training which chips will have a high or low failure probability and this essentially corresponds to the actual failure probability on a validation data set.
The disclosure of the first aspect can be used to color defective chips at an early stage of the manufacturing process, so that the chip cannot be further processed or removed after the dicing process (=cutting out the individual chips from the wafer). Another application is using the failure probability to perform smart sorting, where chips with a high failure probability are packaged together, reducing the risk of multiple good chips being packaged with a single bad chip, resulting in the elimination of the entire package.
It is also conceivable within the scope of the disclosure that the assignment of the components to the functional units comprises a sorting into at least two or three different classes according to the probability of the failure or defect occurring.
Furthermore, it is advantageous if, within the scope of the disclosure, the components are combined into the functional units by the assignment in such a way that the components are combined into the functional units depending on their class, so that preferably each of the functional units has only the components of the same class. This increases the likelihood that the entire functional unit will comprise more than one defective component when it is sorted out and, in particular, disposed of. Alternatively or additionally, it is possible that the assignment of the components to the functional units is carried out in such a way that the components are combined depending on their probability of failure in the functional unit, so that preferably the probability is maximized that in the case of a failed functional unit more than one defective component in the functional unit is responsible for this failure.
Another object of the disclosure is a computer program, in particular a computer program product, comprising instructions which, when the computer program is executed by a computer, cause the computer to carry out the method according to the disclosure. The computer program according to the disclosure thus brings with it the same advantages as have been described in detail with reference to a method according to the disclosure.
The disclosure also relates to a device for data processing which is configured to carry out the method according to the disclosure. The device can be a computer, for example, that executes the computer program according to the disclosure. The computer can comprise at least one processor for executing the computer program. A non-volatile data memory can be provided as well, in which the computer program can be stored and from which the computer program can be read by the processor for execution.
An object of the disclosure can also be a computer-readable storage medium comprising the computer program according to the disclosure. The storage medium is configured as a data memory such as a hard drive and/or a non-volatile memory and/or a memory card, for example. The storage medium can, for example, be integrated into the computer.
In addition, the method according to the disclosure can also be designed as a computer-implemented method.
Further advantages, features and details of the disclosure will emerge from the following description, in which embodiment examples of the disclosure are described in detail with reference to the drawings. The features mentioned in the claims and in the description can each be essential to the disclosure individually or in any combination. Shown are:
The FIGURE is a schematic visualization of a method, a device, a storage medium and a computer program according to embodiment examples of the disclosure.
In the manufacture of semiconductor wafers, the wafers are tested after various production steps. Before the wafer is broken down into individual chips and the chips are packaged into the end product, the “finished” wafers are tested in what is known as wafer-level testing or EWS (=Electrical Wafer Sorting). In this step, the position of the individual chips on the wafer is still known.
After dicing and packaging, the wafers are tested again in the so-called final test. Since the position of the wafers is lost after the dicing process, it is no longer possible to trace the chips back to their original coordinates, which makes the training of supervised machine learning models impossible. Even if the individual chip position of each chip is lost, the information as to which wafer as a whole is part of the test lot is still available.
In the following, a method is proposed that enables the prediction of failure probability in the final inspection of individual chips based on wafer-level measurements.
For this purpose, a neural network architecture is used that receives a series of wafer-level measurements of a wafer as input and predicts a failure probability for each chip on the wafer (e.g. by using a sigmoid activation function in the last layer). For each matching set of wafers and final inspection lots, an average of the chip predictions can be calculated to obtain a yield of chips on the wafer.
For this purpose, the neural network is trained so that it can predict a yield of chips after the final test. The training is carried out as shown in the FIGURE.
According to embodiment examples of the disclosure, the FIGURE illustrates a method 100 for teaching a machine learning system, in particular a neural network, to predict failure probabilities of chips on a wafer. Also shown is a computer program 20, a storage medium 40 and a device 10 for data processing according to embodiment variants of the disclosure.
The method 100 begins by providing 101 a training data set comprising wafer-level test measurements at the chip level and an associated final test yield of a plurality of wafers.
The subsequent steps are then carried out several times until a predefined termination criterion is met:
The drawn wafer level measurements are processed as input by the machine learning system, in particular propagated through the neural network to obtain a failure probability for each chip.
All failure probabilities are then aggregated to a single lot yield (e.g. by using the average).
After training, the neural network can be used to predict a failure probability for each chip in the final test.
Number | Date | Country | Kind |
---|---|---|---|
10 2023 207 670.5 | Aug 2023 | DE | national |