The present disclosure relates generally to monitoring and control of multiple substrates during chemical mechanical polishing.
An integrated circuit is typically formed on a substrate by the sequential deposition of conductive, semiconductive, or insulative layers on a silicon wafer. One fabrication step involves depositing a filler layer over a non-planar surface and planarizing the filler layer. For certain applications, the filler layer is planarized until the top surface of a patterned layer is exposed. A conductive filler layer, for example, can be deposited on a patterned insulative layer to fill the trenches or holes in the insulative layer. After planarization, the portions of the conductive layer remaining between the raised pattern of the insulative layer form vias, plugs, and lines that provide conductive paths between thin film circuits on the substrate. For other applications, such as oxide polishing, the filler layer is planarized until a predetermined thickness is left over the non planar surface. In addition, planarization of the substrate surface is usually required for photolithography.
Chemical mechanical polishing (CMP) is one accepted method of planarization. This planarization method typically requires that the substrate be mounted on a carrier head. The exposed surface of the substrate is typically placed against a rotating polishing pad with a durable roughened surface. The carrier head provides a controllable load on the substrate to push it against the polishing pad. A polishing liquid, such as a slurry with abrasive particles, is typically supplied to the surface of the polishing pad.
One problem in CMP is using an appropriate polishing rate to achieve a desirable profile, e.g., a substrate layer that has been planarized to a desired flatness or thickness, or a desired amount of material has been removed. Variations in the initial thickness of a substrate layer, the slurry composition, the polishing pad condition, the relative speed between the polishing pad and a substrate, and the load on a substrate can cause variations in the material removal rate across a substrate, and from substrate to substrate. These variations cause variations in the time needed to reach the polishing endpoint and the amount removed. Therefore, determining the polishing endpoint merely as a function of the polishing time may lead to overpolishing or underpolishing, and it may not be possible to achieve a desired profile merely by applying a constant pressure.
In some systems, a substrate is optically monitored in-situ during polishing, e.g., through a window in the polishing pad. Some optical monitoring systems detect a “polishing endpoint”, after which they continue polishing for a preset overpolishing time. For example, in copper polishing, the optical monitoring system can detect exposure of the underlying layer, and overpolishing can be used to ensure complete removal of any copper residue. However, existing overpolishing and optical monitoring techniques may not satisfy increasing demands of semiconductor device manufacturers.
In one aspect a polishing method includes simultaneously polishing a first substrate and a second substrate on the same polishing pad, storing a default overpolishing time, monitoring the first substrate and the second substrate during polishing with an in-situ monitoring system, determining a first polishing endpoint time of the first substrate with the in-situ monitoring system, determining a second polishing endpoint time of the second substrate with the in-situ monitoring system, determining a difference between the first polishing endpoint time and the second endpoint time, and determining whether the difference exceeds a threshold. If the difference is less than the threshold, then an overpolishing stop time is calculated and polishing of the first substrate and the second substrates is halted simultaneously at the overpolishing stop time. If the difference is greater than the threshold, then a first overpolishing stop time that equals the first endpoint time plus the default overpolishing time is calculated and a second overpolishing stop time that equals the second endpoint time plus the default overpolishing time is calculated, and polishing of the first substrate is halted at the first overpolishing stop time and polishing of the second substrate is halted at the second overpolishing stop time.
Implementations can include one or more of the following features. Calculating the overpolishing stop time may include calculating an average of the first polishing endpoint time and the second polishing endpoint time. Calculating the overpolishing stop time may include adding the default overpolishing time to the average. The default overpolishing time may be between five and twenty seconds. The default overpolishing time may be between ten and fifteen seconds. The threshold may be between two and six seconds.
Determining the first polishing endpoint time may include storing a first target value for the first substrate, generating a first sequence of values for the first substrate with the in-situ monitoring system, fitting a first function to the first sequence of values, and determining the first polishing endpoint time by calculating a projected time at which the first substrate will reach the target value based on the first function. Determining the second polishing endpoint time may include storing a second target value for the second substrate, generating a second sequence of values for the second substrate with the in-situ monitoring system, fitting a second function to the second sequence of values, and determining the second polishing endpoint time by calculating a projected time at which the second substrate will reach the target value based on the second function. The first function and the second function may be linear functions.
The in-situ monitoring system may include a spectrometric optical monitoring system. Generating the first sequence of values may include measuring a first sequence of spectra from the first substrate during polishing with the optical monitoring system, for each measured spectrum in the first sequence of spectra for the first substrate, determining a best matching reference spectrum from one or more libraries of reference spectra, and for each best matching reference spectrum for the first substrate, determining an index value to generate a sequence of first index values. Generating the second sequence of values may include measuring a second sequence of spectra from the second substrate during polishing with the optical monitoring system, for each measured spectrum in the second sequence of spectra for the second substrate, determining a best matching reference spectrum from the one or more libraries of reference spectra, and for each best matching reference spectrum for the second substrate, determining an index value to generate a sequence of second index values.
The in-situ monitoring system may include an eddy current monitoring system. The first sequence of values and the second sequence of values may be eddy current signal values. Determining the first polishing endpoint time may include detecting clearance of a first overlying layer from a first underlying layer on the first substrate. Detecting clearance of a first overlying layer may include detecting a sudden change in a signal from the in-situ monitoring system. The first substrate and the second substrate may be removed from the polishing pad simultaneously. The polishing pad may be rinsed after removing the first substrate and the second substrate. The default overpolishing time may include a first default overpolishing time for the first substrate and a second default overpolishing time for the second substrate.
In other aspects, polishing systems and computer-program products tangibly embodied on a computer readable medium are provided to carry out these methods.
Certain implementations may have one or more of the following advantages. A good balance can be struck between avoiding defects and having the substrates be uniformly polished. By having the substrates on the same platen endpoint at approximately the same time, defects can be avoided, such as scratches caused by rinsing a substrate with water too early or corrosion caused by failing to rinse a substrate in a timely manner. Equalizing polishing times across multiple substrates can also improve throughput. On the other hand, by permitting substrates to be polished for different amounts of time if the potential difference exceeds a threshold, significant variations in polishing can be avoided and wafer-to-wafer polishing uniformity can be increased.
The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims.
Like reference numbers and designations in the various drawings indicate like elements.
Where multiple substrates are being polished simultaneously, e.g., on the same polishing pad, polishing rate variations between the substrates can lead to the substrates reaching their target thickness at different times. By determining a polishing rate for each substrate from in-situ measurements, a projected endpoint time for a target thickness or a projected thickness for a target endpoint time can be determined for each substrate, and the polishing rate for at least one substrate can be adjusted so that the substrates achieve closer endpoint conditions. By “closer endpoint conditions,” it is meant that the substrates would reach their target thickness closer to the same time than without such adjustment, or if the substrates halt polishing at the same time, that the substrates would have closer to the same thickness than without such adjustment.
Nevertheless, even if the polishing rate for the one of the substrates is adjusted based on in-situ measurements, variations in when the substrates reach their target thickness can still occur. On the one hand, if polishing is halted simultaneously for the substrates, then some will not be at the desired thickness. On the other hand, if polishing for the substrates is stopped at different times, then some substrates may have defects and the polishing apparatus may be operating at lower throughput.
A technique for controlling overpolishing is to determine whether the difference between the respective times that the substrates will reach their polishing endpoints exceeds a threshold. If the time difference is below the threshold, then the polishing of the substrates can be halted simultaneously. On the other hand, if the time difference is above the threshold, then the polishing of each substrate can be halted at a different time that depends on the time that a polishing endpoint condition is detected.
The polishing apparatus 100 includes a rotatable disk-shaped platen 120 on which a polishing pad 110 is situated. The platen is operable to rotate about an axis 125. For example, a motor 121 can turn a drive shaft 124 to rotate the platen 120. The polishing pad 110 can be detachably secured to the platen 120, for example, by a layer of adhesive. The polishing pad 110 can be a two-layer polishing pad with an outer polishing layer 112 and a softer backing layer 114. The polishing apparatus 100 can include a combined slurry/rinse arm 130. During polishing, the arm 130 is operable to dispense a polishing liquid 132, such as a slurry, onto the polishing pad 110. While only one slurry/rinse arm 130 is shown, additional nozzles, such as one or more dedicated slurry arms per carrier head, can be used. The polishing apparatus can also include a polishing pad conditioner to abrade the polishing pad 110 to maintain the polishing pad 110 in a consistent abrasive state.
In this embodiment, the polishing apparatus 100 includes two (or two or more) carrier heads 140. Each carrier head 140 is operable to hold a substrate 10 (e.g., a first substrate 10a at a first carrier head 140a and a second substrate 10b at a second carrier head 140b) against the polishing pad 110, i.e., the same polishing pad. Each carrier head 140 can have independent control of the polishing parameters, for example pressure, associated with each respective substrate.
In particular, each carrier head 140 can include a retaining ring 142 to retain the substrate 10 below a flexible membrane 144. Each carrier head 140 also includes a plurality of independently controllable pressurizable chambers defined by the membrane, e.g., 3 chambers 146a-146c, which can apply independently controllable pressurizes to associated zones 148a-148c on the flexible membrane 144 and thus on the substrate 10 (see
Returning to
While only two carrier heads 140 are shown, more carrier heads can be provided to hold additional substrates so that the surface area of polishing pad 110 may be used efficiently. Thus, the number of carrier head assemblies adapted to hold substrates for a simultaneous polishing process can be based, at least in part, on the surface area of the polishing pad 110.
The polishing apparatus also includes an in-situ monitoring system 160, which can be used to detect a pointing endpoint, or to determine whether to adjust a polishing rate or an adjustment for the polishing rate, as discussed below. For each substrate, the in-situ monitoring system generates a time-varying sequence of values that depends on the thickness of a layer on that substrate.
For example, the in-situ-monitoring system 160 can be an optical monitoring system. In particular, the in-situ-monitoring system 160 can be an optical monitoring system that measures a sequence of spectra of light reflected from a substrate during polishing. One monitoring technique is, for each measured spectrum, to identify a matching reference spectrum from a library of reference spectra. Each reference spectrum in the library can have an associated characterizing value, e.g., a thickness value or an index value indicating the time or number of platen rotations at which the reference spectrum is expected to occur. By determining the associated characterizing value for each matching reference spectrum, a time-varying sequence of characterizing values can be generated. This technique is described in U.S. Patent Publication No. 2010-0217430, which is incorporated by reference. Another monitoring technique is to track a characteristic of a spectral feature from the measured spectra, e.g., a wavelength or width of a peak or valley in the measured spectra. The wavelength or width values of the feature from the measured spectra provide the time-varying sequence of values. This technique is described in U.S. Patent Publication No. 2011-0256805, which is incorporated by reference. Another monitoring technique is to fit an optical model to each measured spectrum from the sequence of measured spectra. In particular, a parameter of the optical model is optimized to provide the best fit of the model to the measured spectrum. The parameter value generated for each measured spectrum generates a time-varying sequence of parameter values. This technique is described in U.S. Patent Application No. 61/608,284, filed Mar. 8, 2012, which is incorporated by reference. Another monitoring technique is to perform a Fourier transform of each measured spectrum to generate a sequence of transformed spectra. A position of one of the peaks from the transformed spectrum is measured. The position value generated for each measured spectrum generates a time-varying sequence position values. This technique is described in U.S. patent application Ser. No. 13/454,002, filed Apr. 23, 2012, which is incorporated by reference.
Other examples of the in-situ-monitoring system 160 include eddy current monitoring systems, capacitive measurement systems, and slurry chemistry monitoring systems. Eddy current monitoring systems are described in U.S. Pat. No. 6,924,641 and U.S. Pat. No. 7,112,960, each of which is incorporated by reference.
The in-situ monitoring system 160 includes a sensor 162 that is supported by and rotates with the platen 120. In this case, the motion of the platen will cause the sensor 162 to scan across each substrate.
As shown by in
Thus, for any given rotation of the platen, based on timing, motor encoder and/or platen position sensor information, the controller 190 can determine which substrate, e.g., substrate 10a or 10b, is the source of the signal. Over multiple rotations of the platen, for each substrate, a sequence of values can be obtained over time.
Referring to
Referring to
As shown in
Referring to
In some implementations, one substrate is selected as a reference substrate, and a projected endpoint time TE at which the reference substrate will reach a target value V is determined. For example, as shown in
In order to determine the projected time at which the reference substrate will reach the target value, the intersection of the line of the reference substrate, e.g., line 214, with the target value, V, can be calculated. Assuming that the polishing rate does not deviate from the expected polishing rate through the remainder polishing process, then the sequence of values should retain a substantially linear progression. Thus, the expected endpoint time TE can be calculated as a simple linear interpolation of the line to the target value V, e.g., V=S·(TE−T).
The substrates other than the reference substrate can be defined as adjustable substrates. The point where the lines for an adjustable substrates meets the expected endpoint time TE defines a projected endpoint for the adjustable substrate. The linear function of each adjustable substrate, e.g., line 224 in
As shown in
If, as shown in
Thus, in the example of
The reference substrate can be, for example, a predetermined substrate, or a substrate having the earliest or latest projected endpoint time of the substrates. The earliest time is equivalent to the substrate with the thinnest layer if polishing is halted at the same time. Likewise, the latest time is equivalent to the substrate with the thickest layer if polishing is halted at the same time.
For each of the adjustable substrates, a desired slope for the trace can be calculated such that the adjustable substrate reaches the target value at the same time as the reference substrate. For example, the desired slope SD can be calculated from (V−I)=SD*(TE−T0), where I is the value (calculated from the linear function fit to the sequence of values) at time T0 that the polishing parameter is to be changed, Vis the target value, and TE is the calculated expected endpoint time.
In some implementations, there is no reference substrate. For example, the expected endpoint time TE′ can be a predetermined time, e.g., set by the user prior to the polishing process, or can be calculated from an average or other combination of the expected endpoint times of two or more substrate (as calculated by projecting the lines for various substrates to the target value). In this implementation, the desired slopes are calculated substantially as discussed above (using the expected endpoint time TE′ rather than TE), although the desired slope for the first substrate must also be calculated, e.g., the desired slope SD can be calculated from (V−I)=SD*(TE′−T0).
In some implementations, (which can also be combined with the implementation shown in
For any of the above methods described above, the polishing rate is adjusted to bring the slope of a trace closer to the desired slope. The polishing rate can be adjusted by, for example, increasing or decreasing the pressure in a corresponding chamber of a carrier head. The change in polishing rate can be assumed to be directly proportional to the change in pressure, e.g., a simple Prestonian model. For example, for each substrate, where the substrate was polished with a pressure Pold prior to the time T0, a new pressure Pnew to apply after time T0 can be calculated as Pnew=Pold*(SD/S), where S is the slope of the line prior to time T0 and SD is the desired slope.
The process of determining projected times that the substrates will reach the target thickness, and adjusting the polishing rates, can be performed just once during the polishing process, e.g., at a specified time, e.g., 40 to 60% through the expected polishing time, or performed multiple times during the polishing process, e.g., every thirty to sixty seconds. At a subsequent time during the polishing process, the rates can again be adjusted, if appropriate. During the polishing process, changes in the polishing rates can be made only a few times, such as four, three, two or only one time. The adjustment can be made near the beginning, at the middle, or toward the end of the polishing process.
Polishing continues after the polishing rates have been adjusted, e.g., after time T0, and the optical monitoring system continues to collect spectra and determine values for each substrate.
Referring to
The time T1 that the first line 214′ equals the target value V can be calculated, and similarly the time T2 that the second line 224′ equals the target value V can be calculated. A time difference ΔT is calculated as |T1−T2|.
In some implementations, e.g., for metal polishing, e.g., copper polishing, after detection of the endpoint for a substrate, the substrate is immediately subjected to an overpolishing process, e.g., to remove metal residue, e.g., copper residue. Although theoretically the polishing process can be stopped as soon as an underlying layer, e.g., a dielectric material, is exposed, in practice stopping the polishing immediately may result in metal residue (e.g., in the form of spots or islands) over the underlying layer. Overpolishing the metal (e.g., copper, in this example) ensures removal of such residues and reduces undesired short circuits. The overpolishing process can be at a uniform pressure for all zones of the substrate, e.g., 1 to 1.5 psi. The overpolishing process typically has a duration of 10 to 15 seconds.
During bulk polishing of a metal such as copper, pressure can be used as a control variable to polish dual (or multiple) substrates on a same platen to substantially the same thickness in a target time. In case of overpolishing, however, pressure is typically not used as a control variable since pressure variations may result in poor topography. In such cases, the overpolishing time may be suitably adjusted to achieve substantially equal polishing time for multiple substrates. This avoids defects caused due to unequal polishing times while achieving good topography by maintaining a substantially same pressure during the overpolishing.
The controller 190 can store a threshold time difference TTD. The threshold time difference TTD can be set by the user or the manufacturer of the equipment. The threshold time difference TTD can be, e.g., 2 to 6 seconds. The controller 190 can also store a default overpolishing time TOP. The default overpolishing time TOP can be set by the user or the manufacturer of the equipment. The default overpolishing time TOP can be, e.g., 5 to 20 seconds.
If the time difference ΔT is less than the threshold time difference TTD, then the controller 190 can halt polishing of the substrates 10a, 10b simultaneously. In this case, the overpolishing time for at least one of the substrates is calculated, but all of the substrates halt polishing at the same time. In some implementations, an overpolishing time can be calculated for each substrate.
For example, the overpolishing time for a substrate (say substrate i) can be calculated as:
T
OPi
=T
OP
+Ti−T
AVG
wherein Ti denotes the endpoint time for the substrate, e.g., T1 for the first substrate 10a and T2 for the second substrate 10b, and TAVG denotes the average endpoint time across all substrates being polished on the same platen, e.g., (T1+T2)/2.
For example, if two substrates have endpoint times T1=57.4 seconds and T2=58 seconds, then TAVG=57.7 seconds. Therefore, assuming the default overpolishing time TOP is 15 seconds, then using the above equation, the respective overpolishing time TOP1 for the first substrate is calculated as 15.3 seconds. Polishing is halted for both substrates at the time T1+TOP1. Thus, the entire polishing process for both substrates ends at 72.7 seconds. The pressure is kept substantially the same throughout the overpolishing in order to ensure good topography on both substrates. Alternatively, overpolishing times could be calculated for all of the substrates. Alternatively, an overpolishing stop time can be calculated by adding the default overpolishing time to the average endpoint time.
On the other hand, if the time difference ΔT is greater than the threshold time difference TTD, then the controller 190 can use the default overpolishing time TOP for each substrate, so that polishing halts for the different substrates at different times.
For example, polishing of the first substrate 10a can be halted at a first time T1+TOP and polishing of the second substrate 10b can be halted at a second time T2+TOP.
This approach sets a balance between avoiding defects and having the substrates be uniformly polished. On the one hand, by halting polishing of substrates simultaneously on the same platen, the substrates can be lifted from the polishing pad simultaneously, and defects can be avoided, such as scratches caused by rinsing a substrate with water too early or corrosion caused by failing to rinse a substrate in a timely manner. On the other hand, by permitting substrates to be polished for different amounts of time if the potential difference exceeds a threshold, significant variations in polishing can be avoided and wafer-to-wafer polishing uniformity can be increased.
As seen from the above equation, the adjusted overpolishing times for various substrates are functions of the predetermined parameter TOP. The parameter TOP can be chosen in various ways. For example, the parameter TOP can be selected based on the material being polished. In some cases, the parameter TOP may be adjusted based on observed results. For example, if it is observed that polishing for an adjusted overpolishing time fails to remove all residues, then the parameter TOP may be increased to achieve better removal of the residues. Even though the example in
After overpolishing has been completed for all substrates, rinsing of the polishing pad commences. In addition, all of the carrier heads can lift the substrates off the polishing pad simultaneously.
Referring to
Referring to
A first substrate and a second substrate are monitored to determine polishing endpoint times (step 704) (this can be performed by step 614 above). The monitoring can be done in various ways, including, for example using a spectrometric optical monitoring system, a laser based monitoring system or an eddy current monitoring system. Even though the flowchart 700 describes only a first substrate and a second substrate, additional substrates can be polished on the same platen.
The time difference between the endpoint times is calculated (step 706), and an overpolishing time is calculated for each substrate on the platen (step 708). Calculating the overpolishing time includes comparing the time difference to a threshold and using a different overpolishing calculation if the difference is above the threshold than below the threshold (step 710). If the time difference is less than the threshold, then an overpolishing time can be calculated for a substrate, and the controller can halt polishing of all substrates based on when the overpolishing time elapses such that the overall polishing process (including the polishing and overpolishing) ends at the same time for all substrates (step 712). Thus, the overpolishing stop time is the same for all of the substrates. On the other hand, if the time difference is greater than the threshold, then the overpolishing times can be equal to the default overpolishing time for each substrate, so that the substrates halt polishing at different times (step 714).
Although the examples above discuss calculating the time difference from projections of functions that are fit to a sequence of values, the time difference can be calculated as the difference between times of detection of clearance of an underlying layer. In general, for some monitoring systems, when an underlying layer is exposed, there is a sudden change in the signal from the sensor. This sudden change can be detected and the time of the sudden change can be used as the endpoint time.
The monitoring systems can be of various types, e.g., a spectrographic monitoring system, a laser monitoring system or an eddy current monitoring system. For example, in the case of a laser monitoring system used to monitor polishing of metal, e.g., copper, the intensity of the reflected light beam, and thus signal from the in-situ monitoring system, drops as the underlying dielectric layer is exposed. For example, in the case of an eddy current monitoring system used to monitor polishing of metal, e.g., copper, the signal strength from the in-situ monitoring system can be generally proportional to the metal layer thickness.
The controller 190 can include a central processing unit (CPU) 192, a memory 194, and support circuits 196, e.g., input/output circuitry, power supplies, clock circuits, cache, and the like. In addition to receiving signals from the optical monitoring system 160 (and any other endpoint detection system 180), the controller 190 can be connected to the polishing apparatus 100 to control the polishing parameters, e.g., the various rotational rates of the platen(s) and carrier head(s) and pressure(s) applied by the carrier head. The memory is connected to the CPU 192. The memory, or computable readable medium, can be one ore more readily available memory such as random access memory (RAM), read only memory (ROM), floppy disk, hard disk, or other form of digital storage. In addition, although illustrated as a single computer, the controller 190 could be a distributed system, e.g., including multiple independently operating processors and memories.
Embodiments of the invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. Embodiments of the invention can be implemented as one or more computer program products, i.e., one or more computer programs tangibly embodied in a machine-readable storage media, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple processors or computers. A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
The above described polishing apparatus and methods can be applied in a variety of polishing systems. Either the polishing pad, or the carrier heads, or both can move to provide relative motion between the polishing surface and the substrate. For example, the platen may orbit rather than rotate. The polishing pad can be a circular (or some other shape) pad secured to the platen. Some aspects of the endpoint detection system may be applicable to linear polishing systems, e.g., where the polishing pad is a continuous or a reel-to-reel belt that moves linearly. The polishing layer can be a standard (for example, polyurethane with or without fillers) polishing material, a soft material, or a fixed-abrasive material. Terms of relative positioning are used; it should be understood that the polishing surface and substrate can be held in a vertical orientation or some other orientation.
Particular embodiments of the invention have been described. Other embodiments are within the scope of the following claims.