The application claims priority to Chinese patent application No. 202311002718.X, filed on Aug. 10, 2023, the entire contents of which are incorporated herein by reference.
The present invention relates to the technical field of semiconductors, and in particular, to an endpoint detection method and device for wafer film grinding.
With the rapid development of the semiconductor industry, the feature sizes of integrated circuits continuously tend towards miniaturization; semiconductor chips continue to develop towards small size, high circuit density, fast speed, and low power consumption; and integrated circuits have reached the ULSI sub-micron technology level. As the diameter of a silicon wafer gradually increases, the width of scribed lines in the component gradually decreases, and the number of metal layers increases. Therefore, high polishing of the surface of a semiconductor film has an important impact on the high performance, low cost, and high yield of devices, leading to increasingly stringent requirements for the surface flatness of silicon wafers.
Chemical Mechanical Planarization (CMP) is a global surface planarization technology used to reduce the impact of substrate thickness changes and surface topography during a semiconductor manufacturing process. Since CMP can accurately and uniformly planarize a substrate to the required thickness and flatness, it has become the most widely used surface planarization technology in the semiconductor manufacturing process. The current mainstream technology is online endpoint detection, which can better control wafer film thickness changes, reduce repeated operations, and realize automated CMP operations, thereby increasing the utilization rate and output of the CMP device and reducing the density distribution defects of an IC device, reducing non-uniformity, and ultimately improving the stability and reliability of semiconductor devices. The principle of online endpoint detection technology is mainly based on detection through optical, electrical, acoustic or vibrational, thermal, frictional, chemical, or electrochemical means. In the monitoring process of a spectral endpoint detection method, it is first necessary to establish reference spectra under different film thicknesses within a defined wavelength range, and then match the spectrum measured in situ during a CMP process with a reference spectrum library through a set matching method to find the best matching reference spectrum. The film thickness corresponding to the best matching reference spectrum is used as the film thickness during the CMP process.
Since the measurement environment for in-situ online measurement during an actual CMP process is relatively harsh and will be affected by various factors, for example, systematic errors inevitably caused by spectra and random errors caused by measurement conditions such as ambient light and probe distance jitter. Especially during CMP of multi-layer films, external factors will cause compression, elevation, and parallel displacement differences in the measured reflectance curve, and may also cause loss of information in some wavelength ranges, which will subsequently result in that the measured reflectance curve cannot match the theoretical curve well, thus leading to a decrease in the accuracy of the final measured value of the film thickness.
According to a first aspect of the embodiments of the present application, an online thickness detection method for wafer film grinding is provided, including the following steps:
Optionally, the step of determining a plurality of feature points in the measured spectral curve includes: determining points with abnormal fluctuations based on reflectance of each point in the measured spectral curve and adjacent points thereof; and taking a preset number of points from points in the measured spectral curve except for the points with abnormal fluctuations as the feature points.
Optionally, the step of determining a plurality of feature points in the measured spectral curve includes: determining at least one critical point based on a change rate of each point in the measured spectral curve so that the measured spectral curve is divided into at least two bands; and selecting a corresponding preset number of points in each of the bands as the feature points, where the preset number of points corresponding to a band with a higher change rate is greater than the preset number of points corresponding to a band with a lower change rate.
Optionally, the reference spectral curve is a theoretical spectral curve of a correspondence between theoretical reflectance and wavelength computed based on a refractive index of the wafer film and a pre-detected film thickness range.
Optionally, in the step of computing a cosine similarity between the measured spectral curve and each of the reference spectral curves, the cosine similarity between the measured spectral curve and the reference spectral curves is computed as follows:
Optionally, the step of computing a cosine similarity between the measured spectral curve and each of the reference spectral curves includes: processing the plurality of feature points and the plurality of reference points; and computing the cosine similarity based on the processed plurality of feature points and the processed plurality of reference points to obtain a computed result in an interval of [−1,1].
Optionally, the step of processing the plurality of feature points and the plurality of reference points includes: computing mean measured reflectance of the plurality of feature points and mean reference reflectance of the plurality of reference points; and subtracting the mean measured reflectance from reflectance of each of the feature points, and the mean reference reflectance from reflectance of each of the reference points.
Optionally, the cosine similarity is computed as follows:
Optionally, the wafer film is a multi-layer film.
Optionally, the feature points include all data points in the measured spectral curve.
Optionally, the number of feature points is a preset number.
Optionally, the number of feature points is a number determined in real time according to characteristics of the measured spectral curve.
Optionally, the plurality of feature points are evenly distributed on the measured spectral curve; and the step of determining a plurality of feature points in the measured spectral curve includes: uniformly selecting the plurality of feature points based on a wavelength range of the measured spectral curve and a preset number of feature points.
Optionally, the plurality of feature points are located in a band with relatively rich feature information in the measured spectral curve.
Optionally, the reference spectral curve matching the measured spectral curve refers to a reference spectral curve with the cosine similarity closest to 1.
Optionally, the reference spectral curves are collected in advance, and parameters used during collection of the reference spectral curves are the same as those used during collection of the measured spectral curve, where the parameters include a wavelength range, a wavelength resolution, and a reflectance resolution.
Optionally, each of the reference spectral curves is a reflectance-wavelength reference spectral curve that is established in advance based on reflectance of the wafer film and the pre-detected film thickness range and includes a defined wavelength range under several different thicknesses of the wafer film.
Optionally, before the step of determining a plurality of feature points in the measured spectral curve, the method further includes: filtering the measured spectral curve to filter out noise caused by a grinding environment during the grinding process, where a method for the filtering is any one of wavelet filtering, Fourier filtering, and sliding window average filtering.
According to a second aspect of the embodiments of the present application, an endpoint detection method for wafer film grinding is further provided, including: using the online thickness detection method described above to monitor in real time whether a thickness of a wafer film reaches a target thickness; and stopping grinding when the thickness of the wafer film reaches the target thickness.
According to a third aspect of the embodiments of the present application, an online thickness detection device for wafer film grinding is further provided, including: a processor and a memory connected to the processor, where the memory stores instructions that can be executed by the processor, and the instructions are executed by the processor to cause the processor to perform the online thickness detection method for wafer film grinding described above.
According to a fourth aspect of the embodiments of the present application, an endpoint detection device for wafer film grinding is further provided, including: a processor and a memory connected to the processor, where the memory stores instructions that can be executed by the processor, and the instructions are executed by the processor to cause the processor to perform the endpoint detection method for wafer film grinding described above.
According to the online thickness detection method and device for wafer film grinding provided by the present invention, the current thickness of the wafer film is determined by computing the cosine similarity between the measured spectral curve and the reference spectral curve, which can better reduce problems such as compression and elevation of the measured curve caused by multi-layer films and a decrease in the matching degree caused by parallel displacement differences; moreover, the time required to match the in-situ measurement with the reference curve is short, thereby achieving real-time in-situ measurement. The present invention selects the cosine similarity as a scoring function since the evaluation method of the cosine similarity is sensitive to waveforms, and the cosine similarity of two parallel curves in a two-dimensional space is 1, which is suitable for an actual measurement situation herein. Therefore, the selection of the cosine similarity as the scoring function to evaluate a similarity in the case of multi-layer dielectrics improves the accuracy of online thickness detection of a wafer film.
To describe the technical solutions in the embodiments of the present invention or in the prior art more clearly, the following briefly describes the accompanying drawings needed for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show some embodiments of the present invention, and those of ordinary skill in the art may further derive other accompanying drawings from these accompanying drawings without making creative efforts.
An explicit and complete description of the technical solutions in the present invention will be given below in conjunction with the accompanying drawings. Apparently, the described embodiments are part, but not all, of the embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by those of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.
In addition, the technical features involved in different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
As shown in
S101: obtaining a measured spectral curve of a correspondence between reflectance and wavelength of the silicon-based silicon dioxide wafer film during the grinding process of the silicon-based silicon dioxide wafer film. The reflectance of the silicon dioxide wafer in a defined wavelength range during the grinding process is measured in real time through a spectrometer, and during real-time measurement, the dielectrics that the incident light of the spectrometer encounters are air, glass, PU, a slurry, silicon dioxide, and silicon in sequence.
The spectrometer outputs a reflectance-wavelength curve in the defined wavelength range under a current thickness of the wafer film in real time. In an embodiment, the defined wavelength range is 400-1,000 nm, the wavelength resolution is 0.8 nm; the reflectance resolution is 1*10*{circumflex over ( )}−4; and the spectral curve acquisition time interval is 100 ms.
As shown in
S102: determining a plurality of feature points in the measured spectral curve. These feature points may be automatically selected according to a certain rule (such as local extreme points, or jump points), or manually specified. The number of feature points may be fixed or determined in real time according to characteristics of a current curve. In some embodiments, these feature points are evenly distributed on the curve. When the operation of selecting feature points is performed, the feature points can be selected evenly from the curve according to a wavelength range of the curve and a preset number of feature points.
Or, the plurality of feature points are determined in the measured spectral curve. These feature points may be all data points in the measured spectral curve, or some data points automatically selected according to a certain rule (such as local extreme points, or jump points), or some data points specified manually. The number of feature points may be fixed (a preset number), or determined in real time according to characteristics of a current curve. In some embodiments, these feature points are evenly distributed on the curve. When the operation of selecting feature points is performed, the feature points can be selected evenly from the curve according to a wavelength range of the curve and a preset number of feature points.
Regarding the selection of feature points, on the one hand, the number of feature points is large enough, for example, hundreds; on the other hand, the positions of the feature points on the curve, for example, in a band with relatively rich feature information. Enough feature points are selected within a suitable wavelength range so that subsequently computed data can accurately reflect the characteristics of the spectral curve.
It can be seen from the example shown in
In order to solve the above possible problems, in an embodiment, after obtaining the measured spectral curve, points with abnormal fluctuations are determined first based on reflectance of each point and adjacent points thereof, and then the points with abnormal fluctuations are removed during the selection of feature points, and a preset number of points are taken from other normal data points as the feature points.
It can be found from the measured spectral curves at various thicknesses that feature information of a short-wave band is richer than that of a long-wave band. Taking
These critical points cause the measured spectral curve to be divided into at least two bands, i.e. the curve is divided into two bands in the case of one critical point, three bands in the case of two critical points, and so on. A corresponding number of feature points selected for each band is set in advance, and the preset number corresponding to a band with a higher change rate is greater than the preset number corresponding to a band with a lower change rate. For example, as shown in
After obtaining the critical points and bands, a corresponding preset number of points are selected from each band as the feature points so that the selection density of feature points in a band with richer feature information is greater, thereby improving the accuracy of subsequently computed data, and thus improving thickness detection accuracy.
It should be noted that the above two methods of selecting feature points can be used simultaneously. For example, the points with abnormal fluctuations are excluded first, following which the above critical points are determined, and then different numbers of feature points are selected from different bands.
S103: determining corresponding reference points in reference spectral curves based on wavelength values of the plurality of feature points. The reference spectral curves are collected in advance, and parameters used during collection of the reference spectral curves are the same as those used during collection of the measured spectral curve. That is to say, the same wavelength range, wavelength resolution, and reflectance resolution are used.
Specifically, a reference curve library of reflectance-wavelength reference spectral curves including a defined wavelength range under several different thicknesses of the wafer film can be established in advance based on reflectance n of the wafer film and the pre-detected film thickness range d. In an embodiment, the defined wavelength range is 400-1,000 nm; the wavelength resolution is 0.8 nm; the reflectance resolution is 1*10*{circumflex over ( )}−4 (the same parameters used to collect the measured spectral curve); the pre-detected film thickness range d is 200-800 nm; and the film thickness resolution is 0.1 nm, that is, a corresponding reflectance-wavelength spectral curve is established every 0.1 nm.
In this way, the spectral curve of a wafer film with a known thickness can be obtained. For example, for a wafer film with a known thickness of d1, the spectral curve of a correspondence between reflectance and wavelength thereof is obtained . . . for a wafer film with a known thickness of dn, the spectral curve of a correspondence between reflectance and wavelength thereof is obtained, and the resulting spectral curves are called reference spectral curves. A collection of these reference spectral curves is called a reference curve library.
As shown in
S104: computing a cosine similarity between the measured spectral curve and each of the reference spectral curves based on the plurality of feature points and a plurality of reference points. The feature points (or reference points) contain two pieces of information: wavelength and reflectance. In the embodiment shown in
S105: determining a reference spectral curve matching the measured spectral curve based on the cosine similarity. Specifically, a reference spectral curve with the cosine similarity closest to 1 can be taken as the curve matching the measured spectral curve.
S106: determining a current thickness of the wafer film based on a thickness corresponding to the matched reference spectral curve. For example, if the thickness corresponding to the matched reference spectral curve is di, it can be determined that di is the current thickness of the wafer film.
In this embodiment, the current thickness of the wafer film is determined by computing the cosine similarity between the measured spectral curve and the reference spectral curve, which can better reduce problems such as compression and elevation of the measured curve caused by multi-layer films and a harsh grinding environment, and a decrease in the matching degree caused by parallel displacement differences; moreover, the time required to match the in-situ measurement with the reference curve is short, thereby achieving real-time in-situ measurement. It only takes 400 ms from real-time collection to obtaining the final film thickness. This embodiment selects the cosine similarity as a scoring function since the evaluation method of the cosine similarity is sensitive to waveforms, and it is determined whether the cosine similarity of two parallel curves in a two-dimensional space is 1, which is relatively consistent with an actual measurement situation. Therefore, the selection of the cosine similarity as the scoring function to evaluate a similarity in the case of multi-layer dielectrics improves the accuracy of online thickness detection of a wafer film.
Before the step S102, the measured spectral curve can be filtered to filter out noise caused by a grinding environment during the grinding process. Methods for the filtering include, but are not limited to, wavelet filtering, Fourier filtering, and sliding window average filtering.
This embodiment improves the accuracy of the measured spectral curve by filtering the noise caused by the grinding environment during the grinding process, thereby improving the accuracy of the thickness of the wafer film.
In a preferred embodiment, the step of S102: determining a plurality of feature points in the measured spectral curve includes:
This embodiment selects the preset number of feature points within the same wavelength range as the reference curve to compute the cosine similarity of the feature points within the same wavelength range, thereby improving the accuracy of computation and also improving the accuracy of the thickness of the wafer film.
In an embodiment, the cosine similarity is computed as follows in the step S104:
Among the 800 selected feature points and reference points, the reflectance of some points is shown in the following table:
The cosine similarity between the two curves is computed as follows:
According to the above computational formula, the cosine similarity between the measured spectral curve and each reference spectral curve is obtained, and a reference spectral curve with the cosine similarity closest to 1 is taken as the curve matching the measured spectral curve.
This embodiment only uses reflectance when computing the cosine similarity, thereby reducing the computation burden and the matching computation time, further enhancing the real-time performance of film thickness measurement.
To further improve accuracy, in an embodiment, the cosine similarity is computed as follows in the step S104:
Since vectors of the reflectance curve are all represented in the first quadrant of a plane coordinate system, no matter how different the reflectance curve is from the theoretical value, the value range of cosine similarity is in [0,1], and the value range [−1,0] of cosine similarity is not fully utilized. The operation of mean subtraction is performed in both the measured value and the theoretical value libraries so that the measured value and theoretical value curves may exist in the first or second quadrant simultaneously, and at this time, the evaluation value range of cosine similarity is fully utilized to expand the difference in cosine similarity scores between the two curves that are actually quite different.
Here is a more extreme example: if there are two curves with the line segment at each feature point close to 90°, the waveform features are shown in
If the data is not subjected to mean subtraction, the cosine similarity is as follows:
It is not a value close to −1 as expected since the line segment at each feature point is not mapped to the origin. Therefore, the cosine similarity cannot reflect the angle therebetween.
The corresponding mean is subtracted from the feature point and the reference point respectively in the above processing manner:
Then, the cosine similarity is as follows:
The cosine similarity computed thereby is negative, which can better reflect an extremely low similarity between the two curves using a larger value range [−1,1], and can also better reflect a similarity between the two curves, thereby improving the accuracy of matching between the measured spectral curve and the reference spectral curve.
As shown in
As shown in
In order to further highlight the beneficial effects brought by the present invention, this embodiment provides a comparative example using a mean squared error (MSE) as the matching algorithm evaluation function (this method mainly uses a distance between two curves as the evaluation criterion, and the smaller the distance, the more similar the two curves are). The specific computational formula is as follows:
Except for S103-S105, the other steps are the same. At a certain time during grinding, real-time film thickness measurement was performed according to the above process, and the final matching graph is shown in
As shown in
S107: determining whether the thickness of the wafer film reaches a target thickness, and when the thickness of the wafer film reaches the target thickness, performing a step S108; otherwise, continuing grinding and returning to the step S101; and
S108: stop grinding.
In this embodiment, since the reference spectral curve matched by cosine similarity is highly accurate, the obtained real-time thickness of the wafer film is accurate, and grinding is stopped timely when a current thickness reaches the target thickness, thereby improving the accuracy of wafer grinding.
Those skilled in the art should understand that the embodiments of the present invention may be provided as a method, a system, or a computer program product. Therefore, the present invention may be in the form of a hardware-only embodiment, a software-only embodiment, or an embodiment with a combination of software and hardware. Moreover, the present invention may be in the form of a computer program product that is implemented on one or more computer-usable storage media (including but not limited to a disk memory, a CD-ROM, an optical memory, and the like) that include computer-usable program code.
The present invention is described with reference to the flowcharts and/or block diagrams of the method, the device (system), and the computer program product according to the embodiments of the present invention. It should be understood that a computer program instruction may be used to implement each process and/or each block in the flowcharts and/or the block diagrams and a combination of a process and/or a block in the flowcharts and/or the block diagrams. These computer program instructions may be provided for a general-purpose computer, a dedicated computer, an embedded processor, or a processor of another programmable data processing device to generate a machine, so that the instructions executed by a computer or a processor of another programmable data processing device generate an apparatus for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
These computer program instructions may be stored in a computer readable memory that can instruct the computer or another programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory generate an artifact that includes an instruction apparatus. The instruction apparatus implements a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
These computer program instructions may also be loaded onto a computer or another programmable data processing device, so that a series of operation steps are performed on the computer or another programmable device to generate computer-implemented processing. Therefore, the instructions executed on the computer or another programmable device are used to provide steps for implementing a specific function in one or more processes in the flowcharts and/or in one or more blocks in the block diagrams.
Obviously, the above embodiments are merely examples for clear description, rather than limit the implementation. For those of ordinary skill in the art, other changes or modifications in different forms can be made on the basis of the above description. It is neither necessary nor impossible to exhaust all the embodiments herein. However, any obvious changes or modifications derived thereof still fall within the protection scope of the present invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202311002718.X | Aug 2023 | CN | national |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2023/126417 | Oct 2023 | WO |
| Child | 19060272 | US |