The present technology relates to a polishing apparatus, an information processing system, a polishing method, and a computer-readable storage medium.
Polishing apparatuses that polish substrates (for example, wafers) are known. For example, Patent Document 1 discloses, for example, that a polishing apparatus includes a polishing table provided with a polishing member and configured to be rotatable, and a polishing head facing the polishing table and configured to be rotatable, wherein a substrate is attachable to the surface facing the polishing table.
In the polishing apparatus, the polishing condition may deteriorate. Here, regarding the deterioration of the condition, a consumable member of the polishing apparatus (for example, polishing pad as an example of the polishing member) may be consumed, resulting in the deterioration of the table condition. When the polishing condition deteriorates as described above, the profile of the film thickness after polishing of the substrate (also referred to as a residual film) deteriorates (for example, the variation in film thickness becomes large). In such a case, in order to check whether the product is defective or not, a film thickness measuring device measures the film thickness or the film thickness profile after polishing for all polished substrates. Thus, a lot of man-hours are required. In particular, when only one film thickness measuring device is provided for a plurality of polishing apparatuses, there is a problem that, if all the polished substrates are measured with the film thickness measuring device, a bottleneck in measurement time with the film thickness measuring device occurs, and thus the throughput is decreased. It is also practiced to extract only some substrates and measure the film thicknesses of the extracted substrates or to reduce the measurement time of the film thickness measuring device (ITM) by reducing the number of measurement points of a substrate. However, both methods have a possibility of missing defective products and have an influence on yield of the product. Thus, the above methods are not preferable.
The present technology has been made in view of the above problems, and it is desired to provide a polishing apparatus, an information processing system, and a program capable of improving the throughput or yield without missing defective products.
A polishing apparatus according to one embodiment, the polishing apparatus that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the device comprising:
a polishing table provided with the polishing member and configured to be rotatable;
a polishing head facing the polishing table and configured to be rotatable, wherein the substrate is attachable to a surface facing the polishing table;
a control unit configured to perform control to polish the substrate by pressing the substrate against the polishing member while rotating the polishing table and the polishing head having the substrate attached thereto; and
a processor configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing, and output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
An information processing system according to one embodiment, the information processing system that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the system comprising:
a generation unit configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing; and
an estimation unit configured to output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
A polishing method according to one embodiment, the polishing method for polishing a substrate by a polishing apparatus that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the method comprising:
polishing the substrate by pressing the substrate against the polishing member while rotating the polishing table and the polishing head having the substrate attached thereto;
generating the feature amount by measuring the signal regarding the frictional force between the polishing member and the substrate in polishing or the temperature of the polishing member or a target substrate in polishing;
inputting the generated feature amount to the learned machine learning model; and
outputting the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value.
A computer-readable storage medium according to one embodiment, the computer-readable storage medium storing a program causing a computer that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, to function as:
a generation unit configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing; and
an estimation unit configured to output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
Hereinafter, embodiments will be described with reference to the drawings. More detailed description than necessary may be omitted. For example, detailed description of already well-known matters and repetitive description for substantially the identical configuration may be omitted. This is to avoid unnecessary redundancy of the following description and to facilitate the understanding of those skilled in the art.
A polishing apparatus according to a 1st aspect of one embodiment, the polishing apparatus that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the device comprises: a polishing table provided with the polishing member and configured to be rotatable; a polishing head facing the polishing table and configured to be rotatable, wherein the substrate is attachable to a surface facing the polishing table; a control unit configured to perform control to polish the substrate by pressing the substrate against the polishing member while rotating the polishing table and the polishing head having the substrate attached thereto; and a processor configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing, and output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
According to this configuration, during polishing of a polishing apparatus, an estimated value of data regarding a film thickness of the polished substrate or an estimated value of a parameter related to the yield of a product included in the polished substrate is obtained. Thus, it is possible to predict the state of the substrate after polishing without measuring the film thickness. Thus, it is possible to recognize the state of the substrate after polishing without measuring the film thickness and to reduce the number of times of measuring the film thickness. Accordingly, it is possible to improve the throughput without missing a defective product. It is possible to improve the throughput by omitting the film thickness measurement in a case of normal polishing as described above. In addition, it is possible to detect or predict defects by estimating the parameter related to the yield. Further, it is possible to improve the yield by updating a polishing parameter in accordance with the parameter related to the yield.
A polishing apparatus according to a 2nd aspect of one embodiment, the polishing apparatus according to the 1st aspect, wherein the processor stops processing for a subsequent substrate when the output estimated value satisfies a predetermined polishing deterioration condition.
According to this configuration, when a polishing state deteriorates, processing for the subsequent substrate is stopped, so that it is possible to perform the maintenance such as replacement of the polishing member. Thus, it is possible to prevent the polishing state from further deterioration.
A polishing apparatus according to a 3rd aspect of one embodiment, the polishing apparatus according to the 1st aspect, further comprising a film thickness measuring device configured to measure the film thickness of the substrate, wherein the processor controls the film thickness measuring device to measure a film thickness of the polished target substrate when the output estimated value satisfies a predetermined polishing deterioration condition, and controls the film thickness measuring device not to measure the film thickness of the polished target substrate when the output estimated value does not satisfy the predetermined polishing deterioration condition.
According to this configuration, when the polishing state deteriorates, the film thickness of the substrate is measured, so that it is possible to determine whether or not the polishing is successful. In addition, when the polishing state does not deteriorate, the film thickness of the substrate is set not to be measured, and thus it is possible to improve the throughput.
A polishing apparatus according to a 4th aspect of one embodiment, the polishing apparatus according to the 1st aspect, wherein the processor outputs a maintenance timing by using a tendency of the estimated value output for the polished substrate at a plurality of different times.
According to this configuration, it is possible to predict a timing at which the polishing state deteriorates, and perform the maintenance such as replacement of the polishing member at this timing. Thus, it is possible to prevent the polishing state from further deterioration.
A polishing apparatus according to a 5th aspect of one embodiment, the polishing apparatus according to the first aspect, wherein the processor performs control to issue a warning for urging a maintenance, when the output estimated value satisfies a predetermined polishing deterioration condition. [0020]
According to this configuration, when the polishing state deteriorates, it is possible to perform the maintenance such as replacement of the polishing member, and thus to prevent the polishing state from further deterioration.
A polishing apparatus according to a 6th aspect of one embodiment, the polishing apparatus according to the 1st aspect, wherein the processor adjusts a polishing condition for a subsequent substrate in accordance with the output estimated value so that data regarding a desired film thickness of a polished substrate or a parameter related to desired yield of a product included in the polished substrate is obtained.
According to this configuration, it is possible to change the polishing condition for the subsequent substrate so that the polishing state is improved. Thus, it is possible to maintain the favorable polishing state for a longer time.
A polishing apparatus according to a 7th aspect of one embodiment, the polishing apparatus according to the first aspect, wherein the processor learns the machine learning model again using the feature amount during an operation of the polishing apparatus.
According to this configuration, it is possible to improve the estimation accuracy.
An information processing system according to an 8th aspect of one embodiment, the information processing system that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the system comprises: a generation unit configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing; and an estimation unit configured to output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
According to this configuration, during polishing of a polishing apparatus, an estimated value of data regarding a film thickness of the polished substrate or an estimated value of a parameter related to the yield of a product included in the polished substrate is obtained. Thus, it is possible to predict the state of the substrate after polishing without measuring the film thickness. Thus, it is possible to recognize the state of the substrate after polishing without measuring the film thickness and to reduce the number of times of measuring the film thickness. Accordingly, it is possible to improve the throughput without missing a defective product.
A polishing method according to a 9th aspect of one embodiment, the polishing method for polishing a substrate by a polishing apparatus that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, the method comprising: polishing the substrate by pressing the substrate against the polishing member while rotating the polishing table and the polishing head having the substrate attached thereto; generating the feature amount by measuring the signal regarding the frictional force between the polishing member and the substrate in polishing or the temperature of the polishing member or a target substrate in polishing; inputting the generated feature amount to the learned machine learning model; and outputting the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value.
According to this configuration, during polishing of a polishing apparatus, an estimated value of data regarding a film thickness of the polished substrate or an estimated value of a parameter related to the yield of a product included in the polished substrate is obtained. Thus, it is possible to predict the state of the substrate after polishing without measuring the film thickness. Thus, it is possible to recognize the state of the substrate after polishing without measuring the film thickness and to reduce the number of times of measuring the film thickness. Accordingly, it is possible to improve the throughput without missing a defective product.
A computer-readable storage medium according to a 10th aspect of one embodiment, the computer-readable storage medium stores a program causing a computer that is capable of referring to a storage in which a machine learning model is stored, the machine learning model being learned using learning data in which a feature amount of a signal regarding a frictional force between a polishing member and a substrate in polishing or a feature amount of a temperature of the polishing member or the substrate in polishing is input, and data regarding a film thickness of the polished substrate or a parameter related to yield of a product included in the polished substrate is output, to function as: a generation unit configured to generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or the feature amount from the temperature of the polishing member or a target substrate in polishing; and an estimation unit configured to output the data regarding the film thickness of the polished substrate or any of the parameters related to the yield of the product included in the polished substrate, as an estimated value, by inputting the generated feature amount to the learned machine learning model.
In addition to the above-described problems, there is also a problem that it takes time to determine the deterioration of the polishing condition (for example, table condition).
In embodiments, the polishing state and the film thickness after polishing (also referred to as a residual film thickness), the statistical value (mean, maximum, minimum, and the like) of the film thickness, or the profile of the film thickness (also referred to as film thickness distribution) are determined from the change in a monitoring waveform during polishing. This makes it possible to estimate and manage favorable/defective polishing and the polishing condition (for example, table conditions) on time. Therefore, in the case of a defect, it is possible to adjust the table condition without performing the next polishing. Thus, it is possible to reduce the number of defectively polished samples. In the embodiments, a wafer will be described as an example of the substrate.
Firstly, a first embodiment will be described.
The load/unload unit 2 includes two or more (four in this embodiment) front load units 20 on which a wafer cassette that stocks multiple wafers is mounted. An open cassette, a standard manufacturing interface (SMIF) pod, or a front opening unified pod (FOUP) can be mounted on the front load unit 20.
Here, the SMIF and the FOUP are airtight containers that can secure an environment independent from the external space by storing the wafer cassette therein and covering the wafer cassette with a partition wall. Here, description will be made on the assumption that a FOUP 21 is mounted on one of the front load units 20 as an example. The wafer is transferred from the load/unload unit 2 to the polishing apparatus 10 by a transport robot 22 (see Patent Document 1).
The film thickness measuring device 6 measures the film thickness of a substrate (here, wafer) or the profile of the film thickness (also referred to as the film thickness distribution). The film thickness measuring device 6 is, for example, an optical film thickness measuring device (also referred to as an ITM).
The polishing apparatus 10 includes an AI unit 4. The AI unit 4 outputs any of data regarding the film thickness of the polished substrate, the statistical value of the profile of the film thickness of the polished substrate, or a parameter (for example, yield rate) related to the yield of a product included in the polished substrate, as an estimated value. For example, when the estimated value is out of a predetermined normal polishing condition or satisfies a predetermined polishing deterioration condition, the AI unit 4 causes the film thickness measuring device 6 to measure the film thickness of the polished target substrate. For example, when the estimated value for a wafer W1 in
When the normal polishing condition is used, the AI unit 4 may be made to learn the normal data. When the polishing deterioration condition is used, the AI unit 4 may be made to learn the defective data. The ratio between the normal data and the defective data may be determined, and then the AI unit 4 may be made to perform learning.
The output of the AI unit 4 may be divided into three types of being normal, defective, and a defective candidate. When the output means a defective candidate, the film thickness is measured by the film thickness measuring device 6.
In
In addition to or in place of this, when a polishing time is longer than the normal range, the AI unit 4 may determine that the substrate is a defective candidate and measure the film thickness.
The output of the AI unit 4 may be divided into two types of being normal and a defective candidate.
A polishing-liquid supply nozzle 60 is installed above the polishing table 100. A polishing liquid (polishing slurry) Q is supplied from the polishing-liquid supply nozzle 60 onto the polishing pad 101 on the polishing table 100.
The polishing head 1 is basically configured by a top ring body 2 and a retainer ring 3 as a retainer member. The top ring body 2 presses the semiconductor wafer W against the polishing surface 101a. The retainer ring 3 holds the outer peripheral edge of the semiconductor wafer W to prevent the semiconductor wafer W from popping out from the polishing head 1. The polishing head 1 is connected to a top ring shaft 111. The top ring shaft 111 moves up and down with respect to the top ring head 110 by an up-down movement mechanism 124. Positioning of the polishing head 1 in an up-down direction is performed by moving the top ring shaft 111 up and down to move the entirety of the polishing head 1 up and down with respect to the top ring head 110. A rotary joint 26 is attached to the upper end of the top ring shaft 111.
The up-down movement mechanism 124 that moves the top ring shaft 111 and the polishing head 1 up and down includes a bridge 128 that rotatably supports the top ring shaft 111 via a bearing 126, a ball screw 132 attached to the bridge 128, a support base 129 supported by a support column 130, and a servomotor 138 provided on the support base 129. The support base 129 that supports the servomotor 138 is fixed to the top ring head 110 via the support column 130.
The ball screw 132 includes a screw shaft 132a joined to the servomotor 138 and a nut 132b into which the screw shaft 132a is screwed. When the servomotor 138 is driven, the bridge 128 moves up and down via the ball screw 132, and thus the top ring shaft 111 and the polishing head 1 that move up and down integrally with the bridge 128 move up and down.
As shown in
The top ring head 110 is supported by a top ring head shaft 117 that is rotatably supported by a frame (not shown). The polishing apparatus 10 includes a control unit 500 that is connected to the devices in the apparatus, that include the top-ring rotation motor 114, the servomotor 138, and the table rotation motor 102, via control lines, and controls the devices. The control unit 500 performs control to polish the substrate by pressing the substrate against the polishing member (here, polishing pad 101) while rotating the polishing table 100 and the polishing head 1 to which the substrate is attached.
The input of the machine learning model described later includes table rotation, head rotation, and rotation of a motor (not shown) for swinging the top ring head 110. One or more sensor detection values (for example, motor current value) or the calculated value of the torque calculated from the sensor detection value may be used as the input.
The polishing apparatus 10 includes the AI unit 4 connected to the control unit 500 via a wiring.
The storage 41 stores a machine learning model. The machine learning model is learned using learning data in which the feature amount of a signal regarding a frictional force between the polishing member (here, polishing pad 101) and the substrate in polishing is input, and data regarding the film thickness of the polished substrate, the statistical value of the profile of the film thickness of the polished substrate, or the parameter related to the yield of a product included in the polished substrate is output. The storage 41 stores a program to be read and executed by the processor 45.
Here, the signal regarding the frictional force between the polishing member and the substrate is, for example, a signal of the current value (also referred to as a table current monitor (TCM)) for calculating the torque of the table rotation motor 102 in polishing.
Here, the signal regarding the frictional force between the polishing member and the substrate may be the calculated value of the torque converted from the current value of the motor. The signal regarding the frictional force between the polishing member and the substrate may be a signal of the drive current value of the top-ring rotation motor 114 that rotates the polishing head 1, or a signal of the drive current value of the motor (not shown) that rotates the top ring head 110 (that is, top ring head shaft 117).
The polishing apparatus 10 may include a load cell that measures the frictional force between the polishing member and the substrate. In this case, the signal regarding the frictional force between the polishing member and the substrate may be a signal of the load cell. The polishing apparatus 10 may include a strain sensor that measures the strain of the substrate. In this case, the signal regarding the frictional force between the polishing member and the substrate may be a signal of the strain sensor.
The memory 42 is a medium that temporarily stores information.
The input unit 43 receives the information from the control unit 500 and outputs the received information to the processor 45.
The output unit 44 receives information from the processor 45 and outputs the received information to the control unit 500.
The processor 45 reads and executes the program from the storage 41 to function as a generation unit 451, an estimation unit 452, and a determination unit 453.
The generation unit 451 generates the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing. Here, the term “in polishing” means, for example, a period during when the substrate is polished by pressing the substrate against the polishing member while rotating the polishing table 100 and the polishing head 1 having the substrate attached thereto. The details of this processing will be described later.
The estimation unit 452 outputs, as an estimated value, any of the data regarding the film thickness of the polished substrate or the parameter related to the yield of a product included in the polished substrate, by inputting the feature amount generated by the generation unit 451 to the learned machine learning model. The details of this processing will be described later. Here, the data regarding the film thickness of the polished substrate is, for example, any of the film thickness itself of the polished substrate, the statistical value (for example, mean value, maximum value, minimum value, variation width, and standard deviation of the film thickness distribution) of the film thickness profile of the polished substrate, and the film thickness profile of the polished substrate. Here, the film thickness profile is a film thickness data group (combination of XY coordinates and film thickness) in which a plurality of points are measured at different positions in the wafer.
There are a plurality of chips in the wafer. A defect determination is performed for each chip, and the parameter related to the chip yield in the wafer can be calculated. The yield of the product included in the polished substrate described above is, for example, the chip yield in the wafer.
As shown in
As the lengths of arrows A12 and A13 are shorter than the lengths of arrows A11 and A14 in
The inventor of the present application has found that, when a film is polished non-uniformly, a timing at which a underlayer is exposed varies within the wafer surface, and thus a TCM signal (waveform of a descending curve as an example in
A cutout of a part of a signal used for calculating the feature amount from the TCM signal will be described with reference to
An example of the feature amount will be described with reference to
In
The feature amount T1_min-d_r10 is the minimum value of the derivative of the moving average of 10 pieces of TCM data in the predetermined period T1 in the period of the graph G2 in
T2_sum is the total value of the TCM in a period T2 after the period T1 in the period of the graph G2 in
The number (for example, the top 10 parameters) of types of parameters to be used may be determined and used from the upper parameters having a high correlation coefficient, or the application conditions may be determined.
The application condition may be, for example, a condition that a parameter having a correlation coefficient equal to or more than the mean value of the correlation coefficient is used, or a condition that a parameter having a correlation coefficient equal to or more than a value obtained by adding the standard deviation 6 to the mean value of the correlation coefficient.
All_range-d_r25 is the range of the derivative of the moving average of 25 pieces of TCM data in the entire period of the graph G2 in
T1_var-d_r25 is the variance of the derivative of the moving average of 25 pieces of TCM data in the predetermined period T1 of the period of the graph G2 in
T1_sum is the total TCM in the predetermined period T1 in the period of the graph G2 in
T1_mean-d_r10 is the mean of the derivative of the moving average of 10 pieces of TCM data in the predetermined period T1 in the period of the graph G2 in
T1_range-d_r10 is the range of the derivative of the moving average of 10 pieces of TCM data in the predetermined period T1 in the period of the graph G2 in
T2_mean-d_r5 is the mean of the derivative of the moving average of 5 pieces of TCM data in the period T2 after the period T1 in the period of the graph G2 in
The AI (artificial intelligence) model used by the estimation unit 452 of the AI unit 4 may be, for example, Light Gradient Boosting Machine (LightGBM) disclosed in Non-Patent Document 1. LightGBM is a machine learning model based on a search tree.
In the present embodiment, after the data obtained by previous polishing is divided into learning data and test data, the AI is trained using only the learning data for learning, estimation is performed for all pieces of data by the AI, and the estimated value is compared with the actual measured value. The comparison results for the maximum film thickness value, the average film thickness value, and the film thickness range will be described below.
Test indicates the test data. In both cases, the estimation error is within a predetermined range.
For example, the determination unit 453 may perform determination of measuring the film thickness when the AI-estimated value of the maximum film thickness standardized by the maximum allowable film thickness exceeds the first threshold value. Thus, when the AI-estimated value exceeds the first threshold value, the control of measuring the film thickness may be performed because there is a possibility that a wafer having uncut parts exceeding the maximum allowable film thickness value is included. Thus, it is possible to set the condition for measuring the film thickness without missing the defective product, by using the AI-estimated value as the determination value. In this example, by using the AI-estimated value, it was found that only about 25% of the substrates need to be measured. Specifically, the determination unit 453 may control one or more robots (for example, transporter 7, transport robot 22, and transport robot 53, see Patent Document 1) to move the wafer to the film thickness measuring device 6 after polishing.
Next, an example of processing of stopping processing for the subsequent substrate when the estimated value output from the estimation unit 452 satisfies the predetermined polishing deterioration condition will be described with reference to
Here, the determination unit 453 determines whether or not the estimated value output by the estimation unit 452 satisfies the predetermined polishing deterioration condition. As an example, when the polishing deterioration condition is a condition that “the estimated value is out of the set range”, the determination unit 453 determines whether or not the estimated value output by the estimation unit 452 is out of the set range. In
(Step S110) Firstly, the processor 45 acquires a TCM signal when a wafer is polished.
(Step S120) Then, the generation unit 451 calculates the feature amount from the acquired TCM signal.
(Step S130) Then, the estimation unit 452 inputs the feature amount to the learned machine learning model stored in the storage 41, and outputs, for example, the estimated value of the standard deviation of the film thickness profile. Here, the learned machine learning model is, for example, a model in which learning data in which the feature amount of the TCM signal is input, and the standard deviation of the film thickness profile is output is learned.
(Step S140) Then, the determination unit 453 determines whether or not the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value. When the standard deviation of the film thickness profile is not equal to or more than the set threshold value (that is, when the standard deviation of the profile is less than the set threshold value), the process returns to Step S110 and the subsequent processes are repeated.
(Step S150) On the other hand, when it is determined in Step S140 that the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value, the determination unit 453 controls the control unit 500 to stop the processing for the subsequent wafer. Thus, the control unit 500 performs control to stop the processing for the subsequent wafer.
As described above, when the estimated value output from the estimation unit 452 satisfies the predetermined polishing deterioration condition, the processor 45 may stop the processing for the subsequent substrate. Thus, when a polishing state deteriorates, processing for the subsequent substrate is stopped, so that it is possible to perform the maintenance such as replacement of the polishing member. Thus, it is possible to prevent the polishing state from further deterioration.
Next, processing in which the film thickness measuring device inside or outside the apparatus measures the film thickness when the estimated value satisfies the predetermined polishing deterioration condition will be described.
(Step S210) Firstly, the processor 45 acquires a TCM signal when a wafer is polished.
(Step S220) Then, the generation unit 451 calculates the feature amount from the acquired TCM signal.
(Step S230) Then, the estimation unit 452 inputs the feature amount to the learned machine learning model stored in the storage 41, and outputs, for example, the estimated value of the standard deviation of the film thickness profile. Here, the learned machine learning model is, for example, a model in which learning data in which the feature amount of the TCM signal is input, and the standard deviation of the film thickness profile is output is learned.
(Step S240) Then, the determination unit 453 determines whether or not the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value.
(Step S250) When it is determined in Step S240 that the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value, the processor 45 controls one or more robots (for example, transporter 7, transport robot 22, and transport robot 53, see Patent Document 1) so that the film thickness measuring device 6 measures the film thickness of the polished wafer.
(Step S260) When the estimated value of the standard deviation of the film thickness profile is not equal to or more than the set threshold in Step S240 (that is, the standard deviation of the profile is less than the set threshold), the processor 45 controls one or more robots (for example, transporter 7, transport robot 22, and transport robot 53, see Patent Document 1) so that the wafer is brought back to the FOUP without measurement of the film thickness measuring device 6.
As described above, when the estimated value output by the estimation unit 452 satisfies the predetermined polishing deterioration condition, the processor 45 controls the film thickness measuring device 6 to measure the film thickness of the polished target substrate. When the estimated value output by the estimation unit 452 does not satisfy the predetermined polishing deterioration condition, the processor controls the film thickness measuring device 6 not to measure the film thickness of the polished target substrate. Thus, when the polishing state deteriorates, the film thickness of the substrate is measured, so that it is possible to determine whether or not the polishing is successful. In addition, when the polishing state does not deteriorate, the film thickness of the substrate is set not to be measured, and thus it is possible to improve the throughput.
Next, the processing in which the film thickness measuring device inside or outside the apparatus measures the film thickness when the estimated value satisfies the predetermined polishing deterioration condition will be described.
(Step S310) Firstly, the processor 45 acquires a TCM signal when a wafer is polished.
(Step S320) Then, the generation unit 451 calculates the feature amount from the acquired TCM signal.
(Step S330) Then, the estimation unit 452 inputs the feature amount to the learned machine learning model stored in the storage 41, and outputs, for example, the estimated value of the standard deviation of the film thickness profile. Here, the learned machine learning model is, for example, a model in which learning data in which the feature amount of the TCM signal is input, and the standard deviation of the film thickness profile is output is learned.
(Step S340) Then, the determination unit 453 determines whether or not, for example, the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value. When the estimated value of the standard deviation of the film thickness profile is not equal to or more than the set threshold value (that is, when the standard deviation of the profile is less than the set threshold value), the process returns to Step S310 and the subsequent processes are repeated.
(Step S350) On the other hand, when it is determined in Step S340 that the estimated value of the standard deviation of the film thickness profile is equal to or more than the set threshold value, the processor 45 performs control to output a warning for urging the maintenance. The warning may be voice. The warning may be displayed on a display device. When a light source (for example, PATLITE (registered trademark)) of a plurality of colors (for example, three colors of red, yellow, and green) is provided, PATLITE of a specific color (for example, yellow) may be turned on (or blink). Vibration may be generated. A user may be notified of the warning by transmitting an e-mail to the user so that the user of the polishing apparatus 10 can be automatically contacted. Two or more of the above methods may be combined.
As described above, when the estimated value output by the estimation unit 452 satisfies the predetermined polishing deterioration condition, the processor 45 performs the control to issue the warning for urging the maintenance. Thus, when the polishing state deteriorates, it is possible to perform the maintenance such as replacement of the polishing member, and thus to prevent the polishing state from further deterioration.
Next, processing of issuing the warning for urging the maintenance when the polishing deterioration condition is satisfied will be described with reference to
(Step S410) Firstly, the processor 45 acquires a TCM signal when a wafer is polished.
(Step S420) Then, the generation unit 451 calculates the feature amount from the acquired TCM signal.
(Step S430) Then, the estimation unit 452 inputs the feature amount to the learned machine learning model stored in the storage 41, and outputs, for example, the estimated value of the standard deviation of the film thickness profile. Here, the learned machine learning model is, for example, a model in which learning data in which the feature amount of the TCM signal is input, and the standard deviation of the film thickness profile is output is learned.
(Step S440) Then, the estimation unit 452 stores the estimated value in the storage 41.
(Step S450) Then, the determination unit 453 determines whether or not a predetermined number of estimated values are accumulated. When the predetermined number of estimated values are not accumulated, the process returns to Step S410 and the subsequent processes are repeated.
(Step S460) On the other hand, when it is determined in Step S450 that the predetermined number of estimated values are accumulated, the processor 45 refers to the estimated values output for the polished substrate at a plurality of different times, which are stored in the storage 41, and outputs the maintenance timing by using the tendency of the estimated value output for the polished substrate at the plurality of different times. The output aspect of the maintenance timing may be “recommending maintenance after 0 hours”. The processor 45 may notify the user of the polishing apparatus of the maintenance timing. Thus, the notification of the maintenance timing is automatically performed. In this notification method, the maintenance timing may be displayed on a WEB screen or an application, or an e-mail may be transmitted to the user.
Alternatively, the processor 45 may notify the user of the polishing apparatus when the time reaches the maintenance timing. Thus, the notification of the maintenance timing is automatically performed. In this notification method, a message indicating that it is time to perform the maintenance may be displayed on a WEB screen or an application, or an e-mail may be transmitted to the user.
Specifically, for example, the processor 45 may store the estimated value of the standard deviation of the film thickness profile at set time intervals, calculate the variation of the estimated value per unit time, which is obtained by dividing the difference of the estimated value by the set time interval, and output a timing at which the estimated value is equal to or more than the set threshold value, as the maintenance timing. Thus, it is possible to predict a timing at which the polishing state deteriorates, and perform the maintenance such as replacement of the polishing member at this timing. Thus, it is possible to prevent the polishing state from further deterioration.
In addition, the processor 45 may adjust the polishing condition for the subsequent substrate in accordance with the estimated value output by the estimation unit 452 so that data regarding a desired film thickness of a polished substrate or a parameter related to desired yield of a product included in the polished substrate is obtained. Thus, it is possible to change the polishing condition for the subsequent substrate so that the polishing state is improved. Thus, it is possible to maintain the favorable polishing state for a longer time.
The processor 45 may learn the machine learning model again by using the feature amount during an operation of the polishing apparatus. Thus, it is possible to improve the estimation accuracy.
As described above, the polishing apparatus 10 according to the first embodiment is capable of referring to the storage 41 that stores the machine learning model learned using the learning data in which the feature amount of the signal regarding the frictional force between the polishing member and the substrate in polishing is input, and data regarding the film thickness of the polished substrate or the parameter related to the yield of a product included in the polished substrate is output. The polishing apparatus 10 includes the polishing table 100 provided with the polishing member and configured to be rotatable, the polishing head 1 facing the polishing table 100 and configured to be rotatable, wherein the substrate is attachable to the surface facing the polishing table 100, and the control unit 500 that performs control to polish the substrate by pressing the substrate against the polishing member while rotating the polishing table and the polishing head having the substrate attached thereto.
The polishing apparatus 10 includes the processor 45. The processor 45 generates the feature amount from the signal regarding the frictional force between the polishing member and the substrate in polishing or from the temperature of the polishing member or the target substrate in polishing, and outputs, as the estimated value, any of the data regarding the film thickness of the polished substrate or the parameter related to the yield of a product included in the polished substrate, by inputting the generated feature amount to the learned machine learning model.
With this configuration, during polishing of a polishing apparatus, an estimated value of data regarding a film thickness of the polished substrate or an estimated value of a parameter related to the yield of a product included in the polished substrate is obtained. Thus, it is possible to predict the state of the substrate after polishing without measuring the film thickness. Thus, it is possible to recognize the state of the substrate after polishing without measuring the film thickness and to reduce the number of times of measuring the film thickness. Accordingly, it is possible to improve the throughput without missing a defective product. It is possible to improve the throughput by omitting the film thickness measurement in a case of normal polishing as described above. In addition, it is possible to detect or predict defects by estimating the parameter related to the yield. Further, it is possible to improve the yield by updating a polishing parameter in accordance with the parameter related to the yield.
The AI unit 4 may be mounted on a gateway in a factory, which is a gateway to which the polishing apparatus is connected via a network line. The gateway is preferably in the vicinity of the polishing apparatus. When high-speed processing is required (for example, when the sampling speed is 100 ms or less), the AI unit 4 in the polishing apparatus or the AI unit 4 mounted on the gateway may execute the edge computing. The AI unit 4 in the polishing apparatus may be mounted on a PC or a controller for the apparatus.
Next, a second embodiment will be described. The second embodiment is different from the first embodiment that the polishing apparatus 10 includes the AI unit 4 in the first embodiment, but, in the second embodiment, the AI unit 4 is provided in a factory management room, a clean room, or the like in a factory instead of the polishing apparatus.
When the AI unit 4 is provided in the polishing apparatus or in the gateway, high-speed processing can be performed by performing the learned machine learning model by edge computing. For example, on-time (real-time) high-speed processing can be performed.
Further, when the AI unit 4 is mounted on a server or fog (computing) in the factory, the machine learning model may be updated by collecting data of a plurality of polishing apparatuses in the factory. In addition, the data of a plurality of polishing apparatuses in the factory may be collected and analyzed, and the analysis result may be applied in the polishing parameter setting.
Next, a third embodiment will be described. The third embodiment is different from the first embodiment that the polishing apparatus 10 includes the AI unit 4 in the first embodiment, but, in the third embodiment, the AI unit 4 is provided in a cloud instead of the polishing apparatus.
By providing the AI unit 4 in the cloud physically separated from the polishing apparatus as described above, it is possible to share the AI unit 4 among a plurality of factories, and the maintainability of the AI unit 4 is improved. Furthermore, by learning the machine learning model again with a large amount of data by using the data during polishing in a plurality of factories, it is possible to improve the estimation accuracy more quickly.
In addition, the machine learning model may be updated by collecting data (for example, large amount of data) of a plurality of polishing apparatuses over a plurality of factories. The data (for example, a large amount of data) of a plurality of polishing apparatuses over a plurality of factories may be collected and analyzed, and the analysis result may be applied in the polishing parameter setting.
The AI unit 4 may be provided in an analysis center that concentrates analysis, instead of the cloud.
The mounting location of the AI unit 4 may be (1) in the polishing apparatus, (2) a gateway near the polishing apparatus, and/or (3) a computer (PC, server, fog (computing), and the like) in a factory (for example, factory management room).
The mounting location of the AI unit 4 may be (1) in the polishing apparatus, (2) the gateway near the polishing apparatus, and/or (4) a computer in a cloud (or analysis center).
The mounting location of the AI unit 4 may be (1) in the polishing apparatus, (2) the gateway near the polishing apparatus, (3) the computer in the factory (for example, in the factory control room), and/or (4) the computer in the cloud (or analysis center).
In addition, the components of the AI unit 4 may be arranged to be distributed into (1) in the polishing apparatus, (2) the gateway near the polishing apparatus, (3) the computer (PC, server, fog (computing), and the like) in a factory (for example, factory management room), and/or (4) the computer in the cloud (or analysis center).
In the embodiments, the feature amount of the signal regarding the frictional force between the polishing member and the substrate in polishing is set to be input to the machine learning model, but the input is not limited this. The feature amount of the temperature of the polishing member (here, polishing pad 101) or the substrate in polishing may be input to the machine learning model. The reason is as follows. When the frictional force between the polishing member and the substrate during polishing increases, the amount of heat generated from the polishing member or the substrate increases by the increase of the frictional force, and the temperature of the polishing member or the substrate increases. Thus, the temperature of the polishing member or the substrate has a positive correlation with the frictional force between the polishing member and the substrate in polishing.
That is, the storage 41 may store the machine learning model learned using learning data in which the feature amount of the temperature of the polishing member or the substrate in polishing is input, and data regarding the film thickness of the polished substrate, the statistical value of the profile of the film thickness of the polished substrate, or the parameter related to the yield of a product included in the polished substrate is output.
In this case, the generation unit 451 may generate the feature amount from the signal regarding the frictional force between the polishing member and the substrate or the temperature of the polishing member or the substrate, during a period when the substrate is polished by pressing the target substrate against the polishing member while rotating the polishing head 1 having the target substrate attached thereto and the polishing table 100.
At least a part of the AI unit 4 described in the above-described embodiments may be configured by hardware or software. When a part of the AI unit 4 is configured by software, a program that realizes at least the part of the function of the AI unit 4 may be stored in a recording medium such as a flexible disk or a CD-ROM, and read and executed by a computer. The recording medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed recording medium such as a hard disk device or a memory.
The program that realizes at least a part of the function of the AI unit 4 may be distributed via a communication line (including wireless communication) such as the Internet. Further, the identical program may be distributed via a wired line such as the Internet or a wireless line or may be stored and distributed in a recording medium, in a state of being encrypted, modulated, or compressed.
Further, one or a plurality of information processing devices may function as the AI unit 4. When a plurality of information processing devices are used, one of the information processing devices may be a computer, and the computer executes a predetermined program, and thereby the function as at least one means of the AI unit 4 may be realized.
In addition, in the invention of the method, all the processes (steps) may be realized by automatic control by a computer. The progress control between the processes may be manually performed while the computer is used to perform each process. Further, at least a part of the entire process may be performed manually.
Hitherto, the present technology is not limited to the above-described embodiments as they are, and can be embodied by modifying the components in a range without departing from the gist of the present technology at the implementation stage. In addition, various inventions can be formed by an appropriate combination of a plurality of components disclosed in the above-described embodiments. For example, some components may be removed from all the components shown in the embodiments. Furthermore, components over different embodiments may be combined as appropriate.
Number | Date | Country | Kind |
---|---|---|---|
2020-048666 | Mar 2020 | JP | national |