LITHOGRAPHIC APPARATUS AND DEVICE MANUFACTURING METHOD

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority of EP application 15182697.1 which was filed on 27 Aug. 2015 and which is incorporated herein in its entirety by reference.

BACKGROUND
Field of the Invention

The present invention relates to an initialization method for a sensor, a metrology sensor such as an alignment sensor, an overlay sensor or a level sensor, a metrology measurement method such as an alignment method, a lithographic apparatus, and a method for manufacturing a device.

Description of the Related Art

A lithographic apparatus is a machine that applies a desired pattern onto a substrate, usually onto a target portion of the substrate. A lithographic apparatus can be used, for example, in the manufacture of integrated circuits (ICs). In such a case, a patterning device, which is alternatively referred to as a mask or a reticle, may be used to generate a circuit pattern to be formed on an individual layer of the IC. This pattern can be transferred onto a target portion (e.g. including part of, one, or several dies) on a substrate (e.g. a silicon wafer). Transfer of the pattern is typically via imaging onto a layer of radiation-sensitive material (resist) provided on the substrate. In general, a single substrate will contain a network of adjacent target portions that are successively patterned.

Conventional lithographic apparatus include so-called steppers, in which each target portion is irradiated by exposing an entire pattern onto the target portion at once, and so-called scanners, in which each target portion is irradiated by scanning the pattern through a radiation beam in a given direction (the “scanning”-direction) while synchronously scanning the substrate parallel or anti-parallel to this direction. It is also possible to transfer the pattern from the patterning device to the substrate by imprinting the pattern onto the substrate.

Typically, the integrated circuits as manufactured include a plurality of layers containing different patterns, each layer being generated using an exposure process as described above. In order to ensure proper operation of the integrated circuit that is manufactures, the layers as consecutively exposed need to be properly aligned to each other. In order to realize this, substrates are typically provided with a plurality of so-called alignment marks (also referred to as alignment targets), whereby a position of the alignment marks is used to determine or estimate a position of a previously exposed pattern. As such, prior to the exposure of a subsequent layer, the position of alignment marks is determined and used to determine a position of the pattern that was previously exposed. Typically, in order to determine the positions of such alignment marks, an alignment sensor is applied which may e.g. be configured to project a radiation beam onto an alignment mark or target and determine, based on a reflected radiation beam, a position of the alignment mark.

Ideally, the measured position of the alignment mark would correspond to the actual position of the mark. However, various causes may result in a deviation between the measured position and the actual position of the alignment mark. In particular, a deformation of the alignment mark may result in the mentioned deviation. Such a deformation may e.g. be caused by the processing of the substrate outside the lithographic apparatus, such processing e.g. including etching and chemical mechanical polishing.

As a result, the subsequent layer may be projected or exposed on a position which is not in line, i.e. not aligned, with the previously exposed pattern, resulting in a so-called overlay error.

Note that other sensors as typically applied in a lithographic apparatus or as applied to assess lithographic processes as performed by a lithographic apparatus, e.g. metrology sensors, may suffer from similar problems. Examples of such sensors include overlay sensors and level sensors.

SUMMARY

It is desirable to provide in an improved measurement method for measuring a property of an object, in particular a substrate or a patterning device. Such a measurement method is e.g. performed by a metrology sensor such as an alignment sensor, an overlay sensor or a level sensor.

In an embodiment, a measurement method for measuring a position of alignment marks on a substrate enabling a more accurate determination of an actual position of an alignment mark can be mentioned. In a first aspect of the present invention, an initialization method for a sensor, in particular a metrology sensor, is provided, the sensor being configured to perform a plurality of measurements of a property of an object using a respective plurality of different measurement parameters, different ones of the plurality of measurements using different measurement parameters, the method comprising:

- estimating a characteristic of the property based on the plurality of measurements, the characteristic comprising a combination of respective outcomes of respective ones of the plurality of measurements weighted by a respective weighting coefficient;
- using a plurality of models of the object, each respective one of the models being configured to enable a respective simulation of the performing of the plurality of measurements;
- performing, for each of respective one of the plurality of models, a respective simulation, the respective simulation including simulating the plurality of measurements under control of a respective plurality of different simulation parameters to obtain a respective plurality of simulated characteristics of the property, the plurality of different simulation parameters being indicative of the plurality of different measurement parameters;
- determining, for each respective one of the plurality of models, a respective bias representative of a respective difference between a respective theoretical characteristic of the property in accordance with the respective model and a respective further combination of the simulated characteristics of the property in the respective model; the respective further combination of the simulated characteristics comprising the plurality of weight coefficients, each particular one of the plurality of weight coefficients being associated with a particular one of the plurality of different simulation parameters;
- using a cost function configured to optimize a correspondence between the simulated characteristic of the property and the theoretical characteristic of the property; the cost function being a function of the respective biases of the plurality of models;
- optimizing the cost function, thereby deriving the plurality of the weight coefficients from the cost function;
- using the weight coefficients and the associated simulation parameters in a controller associated with the sensor.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will now be described, by way of example only, with reference to the accompanying schematic drawings in which corresponding reference symbols indicate corresponding parts, and in which:

FIG. 1 depicts a lithographic apparatus according to an embodiment of the invention;

FIG. 2 depicts several possible alignment measurement results when applying different measurement parameters;

FIG. 3 depicts a cross-section an alignment mark and possible alignment mark deformations;

FIGS. 4a and 4b depict a simulation model of part of a stack of a substrate;

FIG. 5 schematically depicts the simulated alignment mark positions as obtained for a set of T samples.

FIG. 6 depicts an alignment system enabling asymmetrical measurements.

DETAILED DESCRIPTION

FIG. 1 schematically depicts a lithographic apparatus according to one embodiment of the invention. The apparatus includes an illumination system (illuminator) IL configured to condition a radiation beam B (e.g. UV radiation or any other suitable radiation), a mask support structure (e.g. a mask table) MT constructed to support a patterning device (e.g. a mask) MA and connected to a first positioning device PM configured to accurately position the patterning device in accordance with certain parameters. The apparatus also includes a substrate table (e.g. a wafer table) WT or “substrate support” constructed to hold a substrate (e.g. a resist-coated wafer) W and connected to a second positioning device PW configured to accurately position the substrate in accordance with certain parameters. The apparatus further includes a projection system (e.g. a refractive projection lens system) PS configured to project a pattern imparted to the radiation beam B by patterning device MA onto a target portion C (e.g. including one or more dies) of the substrate W.

The illumination system may include various types of optical components, such as refractive, reflective, magnetic, electromagnetic, electrostatic or other types of optical components, or any combination thereof, for directing, shaping, or controlling radiation.

The mask support structure supports, i.e. bears the weight of, the patterning device. It holds the patterning device in a manner that depends on the orientation of the patterning device, the design of the lithographic apparatus, and other conditions, such as for example whether or not the patterning device is held in a vacuum environment. The mask support structure can use mechanical, vacuum, electrostatic or other clamping techniques to hold the patterning device. The mask support structure may be a frame or a table, for example, which may be fixed or movable as required. The mask support structure may ensure that the patterning device is at a desired position, for example with respect to the projection system. Any use of the terms “reticle” or “mask” herein may be considered synonymous with the more general term “patterning device.”

The term “patterning device” used herein should be broadly interpreted as referring to any device that can be used to impart a radiation beam with a pattern in its cross-section so as to create a pattern in a target portion of the substrate. It should be noted that the pattern imparted to the radiation beam may not exactly correspond to the desired pattern in the target portion of the substrate, for example if the pattern includes phase-shifting features or so called assist features. Generally, the pattern imparted to the radiation beam will correspond to a particular functional layer in a device being created in the target portion, such as an integrated circuit.

The patterning device may be transmissive or reflective. Examples of patterning devices include masks, programmable mirror arrays, and programmable LCD panels. Masks are well known in lithography, and include mask types such as binary, alternating phase-shift, and attenuated phase-shift, as well as various hybrid mask types. An example of a programmable minor array employs a matrix arrangement of small minors, each of which can be individually tilted so as to reflect an incoming radiation beam in different directions. The tilted minors impart a pattern in a radiation beam which is reflected by the minor matrix.

The term “projection system” used herein should be broadly interpreted as encompassing any type of projection system, including refractive, reflective, catadioptric, magnetic, electromagnetic and electrostatic optical systems, or any combination thereof, as appropriate for the exposure radiation being used, or for other factors such as the use of an immersion liquid or the use of a vacuum. Any use of the term “projection lens” herein may be considered as synonymous with the more general term “projection system”.

As here depicted, the apparatus is of a transmissive type (e.g. employing a transmissive mask). Alternatively, the apparatus may be of a reflective type (e.g. employing a programmable minor array of a type as referred to above, or employing a reflective mask).

The lithographic apparatus may be of a type having two (dual stage) or more substrate tables or “substrate supports” (and/or two or more mask tables or “mask supports”). In such “multiple stage” machines the additional tables or supports may be used in parallel, or preparatory steps may be carried out on one or more tables or supports while one or more other tables or supports are being used for exposure.

The lithographic apparatus may also be of a type wherein at least a portion of the substrate may be covered by a liquid having a relatively high refractive index, e.g. water, so as to fill a space between the projection system and the substrate. An immersion liquid may also be applied to other spaces in the lithographic apparatus, for example, between the mask and the projection system. Immersion techniques can be used to increase the numerical aperture of projection systems. The term “immersion” as used herein does not mean that a structure, such as a substrate, must be submerged in liquid, but rather only means that a liquid is located between the projection system and the substrate during exposure.

Referring to FIG. 1, the illuminator IL receives a radiation beam from a radiation source SO. The source and the lithographic apparatus may be separate entities, for example when the source is an excimer laser. In such cases, the source is not considered to form part of the lithographic apparatus and the radiation beam is passed from the source SO to the illuminator IL with the aid of a beam delivery system BD including, for example, suitable directing mirrors and/or a beam expander. In other cases the source may be an integral part of the lithographic apparatus, for example when the source is a mercury lamp. The source SO and the illuminator IL, together with the beam delivery system BD if required, may be referred to as a radiation system.

The illuminator IL may include an adjuster AD configured to adjust the angular intensity distribution of the radiation beam. Generally, at least the outer and/or inner radial extent (commonly referred to as σ-outer and σ-inner, respectively) of the intensity distribution in a pupil plane of the illuminator can be adjusted. In addition, the illuminator IL may include various other components, such as an integrator IN and a condenser CO. The illuminator may be used to condition the radiation beam, to have a desired uniformity and intensity distribution in its cross-section.

The radiation beam B is incident on the patterning device (e.g., mask MA), which is held on the mask support structure (e.g., mask table MT), and is patterned by the patterning device. Having traversed the mask MA, the radiation beam B passes through the projection system PS, which focuses the beam onto a target portion C of the substrate W. With the aid of the second positioning device PW and position sensor IF (e.g. an interferometric device, linear encoder or capacitive sensor), the substrate table WT can be moved accurately, e.g. so as to position different target portions C in the path of the radiation beam B. Similarly, the first positioning device PM and another position sensor (which is not explicitly depicted in FIG. 1) can be used to accurately position the mask MA with respect to the path of the radiation beam B, e.g. after mechanical retrieval from a mask library, or during a scan. In general, movement of the mask table MT may be realized with the aid of a long-stroke module (coarse positioning) and a short-stroke module (fine positioning), which form part of the first positioning device PM. Similarly, movement of the substrate table WT or “substrate support” may be realized using a long-stroke module and a short-stroke module, which form part of the second positioner PW. In the case of a stepper (as opposed to a scanner) the mask table MT may be connected to a short-stroke actuator only, or may be fixed. Mask MA and substrate W may be aligned using mask alignment marks M1, M2 and substrate alignment marks P1, P2. Although the substrate alignment marks as illustrated occupy dedicated target portions, they may be located in spaces between target portions (these are known as scribe-lane alignment marks). Similarly, in situations in which more than one die is provided on the mask MA, the mask alignment marks may be located between the dies.

The depicted apparatus could be used in at least one of the following modes:

1. In step mode, the mask table MT or “mask support” and the substrate table WT or “substrate support” are kept essentially stationary, while an entire pattern imparted to the radiation beam is projected onto a target portion C at one time (i.e. a single static exposure). The substrate table WT or “substrate support” is then shifted in the X and/or Y direction so that a different target portion C can be exposed. In step mode, the maximum size of the exposure field limits the size of the target portion C imaged in a single static exposure.
2. In scan mode, the mask table MT or “mask support” and the substrate table WT or “substrate support” are scanned synchronously while a pattern imparted to the radiation beam is projected onto a target portion C (i.e. a single dynamic exposure). The velocity and direction of the substrate table WT or “substrate support” relative to the mask table MT or “mask support” may be determined by the (de-)magnification and image reversal characteristics of the projection system PS. In scan mode, the maximum size of the exposure field limits the width (in the non-scanning direction) of the target portion in a single dynamic exposure, whereas the length of the scanning motion determines the height (in the scanning direction) of the target portion.
3. In another mode, the mask table MT or “mask support” is kept essentially stationary holding a programmable patterning device, and the substrate table WT or “substrate support” is moved or scanned while a pattern imparted to the radiation beam is projected onto a target portion C. In this mode, generally a pulsed radiation source is employed and the programmable patterning device is updated as required after each movement of the substrate table WT or “substrate support” or in between successive radiation pulses during a scan. This mode of operation can be readily applied to maskless lithography that utilizes programmable patterning device, such as a programmable minor array of a type as referred to above.

Combinations and/or variations on the above described modes of use or entirely different modes of use may also be employed.

In order to facilitate the exposure process, the lithographic apparatus typically comprises one or more sensors, also referred to as metrology sensors that are used to measure certain properties of an object, e.g. a substrate, prior to the object being used in a lithographic process, e.g. exposed. Examples of such sensors e.g. include a level sensor, an overlay sensor and an alignment sensor. Typically, these sensors enable to characterize a particular property of the object by means of measurements. A result of such a measurement of a particular property is referred to, within the meaning of the present invention, as a characteristic of the property. Such a characteristic may e.g. be a particular value of the property, e.g. a height level at a particular position on a substrate it may however also be a vector or tensor representing the property. As an example, an overlay error at a particular position on a substrate may e.g. be characterized as a vector, e.g. both the amplitude and direction of the overlay error at the particular position.

The present invention provides in methods to enhance the performance of metrology sensors. Such sensors have been found to be sensitive to a phenomenon referred to in the present invention as ‘measurement parameter dependency’, referring to the fact that measurement results obtained using these sensors may vary depending on e.g. a measurement parameter used. Examples of such measurement parameters include but are not limited to, the use of different wavelengths or polarizations in measurement beams, the use of different measurement angles. Also, variations that may e.g. attributed to manufacturing tolerances or sensor imperfections, e.g. drift may be considered examples of such measurement parameter dependency. The present invention advantageously makes use of the observation that such variations in measurements obtained when using different measurement parameters, may be caused by undesired phenomena such as process variations, mark deformations, sensor asymmetries and imperfections and that the sensitivity for such variations has been found to also vary depending on the applied measurement parameter.

In an embodiment, the present invention provides in an initialization method or calibration method for such sensors. Such initialization or calibration method involves determining an optimal way of combining a plurality of measurements as performed by the sensor to arrive at a property of the substrate (e.g. an alignment mark position or a height level), thereby taking proper account of the phenomenon of ‘measurement parameter dependency’, as will be explained in more detail below.

Embodiments of the initialization method according to the present invention may be performed off line or inline.

The methods according to the present invention that enable the improved performance of the sensors are explained below for an alignment sensor. It can however be noted that the methods as described may readily be applied to the measurements as performed by other metrology sensors such as level sensors or overlay sensors as well.

Further, it can be noted that the initialization method according to the present invention may make use of simulations or measurements or a combination thereof. As will be explained in more detail below, the initialization method according to the present invention makes use of a plurality of so-called sampled characteristics, e.g. scalar values or vectors of a particular property that is measured by the sensor. Within the meaning of the present invention, a sample may either refer to a model such as a mathematical model that is used to simulate a plurality of measurements, using a plurality of different measurement parameters, of the particular property, or it may refer to a substrate onto which a plurality of measurements, using a plurality of different measurement parameters, of the particular property are performed. In order to distinguish both, a reference to a simulation sample refers to a mathematical model used to simulate a particular measurement, e.g. an alignment measurement, whereas a reference to a measurement sample refers to the physical item, i.e. the substrate, in general the object, that is used to perform the measurement on. As such, a more generalized formulation of the initialization method may have the following form:

An initialization method for a sensor, the sensor being configured to perform a plurality of measurements of a property of an object such as a substrate using a respective plurality of different measurement parameters, the method comprising:

- estimating a characteristic of the property based on the plurality of measurements, the characteristic comprising a combination of respective outcomes of respective ones of the plurality of measurements weighted by a respective weighting coefficient;
- obtaining, for each of a plurality of samples, a plurality of sample characteristics, the sample characteristics representing measurements of the property by means of a respective plurality of different sample parameters;
- determining, for each of the samples, a bias as the difference between a theoretical value of the property in accordance with the respective sample and a combination of the sample values of the property of the respective sample weighted by a respective weighting coefficient; the combination of the sample values of the property comprising the plurality of weight coefficients, each weight coefficient of the plurality of weight coefficients being associated with a respective sample parameter of the plurality of different sample parameters; whereby the weighted combination of the sample values of the property of each sample comprising the same plurality of weight coefficients;
- using a cost function configured to optimize a correspondence between the sample characteristic of the property and the theoretical characteristic of the property; the cost function being a function of the respective biases of the plurality of models, e.g. comprising a sum of the biases;
- optimizing the cost function, thereby deriving the plurality of the weight coefficients from the cost function;
- using the weight coefficients and the associated sample parameters in a controller associated with the sensor.

In case the samples represent actual objects, e.g. substrates, the sample characteristics may e.g. be values derived from measurements on a plurality of such substrates. The method in general results in a set of weight coefficients for weighing a plurality of measurements, e.g. a plurality of alignment measurements using different parameters. The method however also enables to match different sensors, e.g. sensors used in different lithographical apparatuses.

In below embodiment, the initialization method according to the present invention is illustrated for an alignment sensor, whereby a plurality of mathematical models is used as samples.

In accordance with an embodiment of the present invention, the lithographic apparatus further comprises an alignment system AS configured to determine a position of one or more alignment marks that are present on a substrate. The alignment system AS according to the present invention may be calibrated by the calibration method according to the present invention and may be configured to perform the alignment method according to the present invention. As such, the alignment system AS enables to obtain, in a more accurate manner, an actual position of a plurality of alignment marks that are provided on a substrate and as a result, provide in an improved way of performing an alignment between a substrate (e.g. provided with a pattern during an exposure process) and a patterning device. In particular, an embodiment of the present invention provides in a method of obtaining a more accurate alignment by taking deformations of alignment marks, e.g. particular asymmetries, into account. It has been observed by the inventors that such alignment mark deformations may cause errors in the alignment measurement process. In particular, alignment mark deformations may cause discrepancies between the positions of the alignment marks as measured and the actual positions.

In accordance with the present invention, the alignment system as applied is configured to perform a plurality of different alignment measurements, thereby obtaining a plurality of measured alignment mark positions for the alignment mark that is considered. Within the meaning of the present invention, performing different alignment measurements for a particular alignment mark means performing an alignment measurement using different measurement parameters or characteristics, the different measurement parameters or characteristics as applied are, within the meaning of the present invention, denoted as different values of a parameter λ. Such different measurement parameters or characteristics λ_b, λ₂, λ₃, λ_imay e.g. include using different optical properties to perform the alignment measurement. As an example, the alignment system as applied in the lithographic apparatus according to the present invention may include an alignment projection system configured to project a plurality of alignment beams having different characteristics or parameters onto alignment mark positions on the substrate and a detection system configured to determine an alignment position based on a reflected beam off of the substrate. As an example, the alignment system as applied in the lithographic apparatus according to the present invention may include an alignment projection system configured to project one or more alignment beams having different characteristics or parameters λ₁, λ₂, λ₃, . . . , λ_ionto alignment mark positions on the substrate and a detection system configured to determine an alignment position based one or more reflected beams off of the substrate.

In an embodiment, the alignment projection system may be configured to sequentially project different alignment beams (i.e. beams having different characteristics or parameters λ₁, λ₂, λ₃, . . . λ_i) onto a particular position on a substrate to determine an alignment mark position.

In another embodiment, a plurality of different alignment beams may be combined into one alignment beam having different characteristics or parameters λ₁, λ₂, λ₃, . . . λ_ithat is projected onto the substrate to determine the alignment mark position. In such embodiment, it may be advantageous to arrange for the reflected beams off of the substrate to arrive at the detection system at different instances. In order to realize this, use can e.g. be made of a dispersive fiber as e.g. described in U.S. Pat. No. 9,046,385, incorporated herein by reference. Alternatively, the reflected alignment beam, including a plurality of different reflected alignment beams off of the substrate, may be provided to one or more filters to separate the different reflected alignment beams and assess the alignment mark position.

Within the meaning of the present invention, different measurement parameters or characteristics λ₁, λ₂, λ₃, . . . λ_ias applied by the alignment system include at least a difference in polarization of the alignment beam or beams used, a difference in frequency or frequency content of an alignment beam or the alignment beams used, or a difference in diffraction orders used to assess the position of the alignment mark or a difference in illumination angle .

The alignment system according to the present invention, may thus determine, using the different measurement parameters or characteristics λ₁, λ₂, λ₃, . . . λ_i, (e.g. using alignment beams having a different color, i.e. frequency or frequency content), a position of an alignment mark. Note that, within the meaning of the present invention, “color” should not be construed as being limited to visible light, but may e.g. also encompass UV or IR radiation, i.e. radiation outside the visible light spectrum.

In an embodiment, the alignment system AS may be configured to perform the position measurements based on one or more diffractions of one or more measurement beams incident on the substrate.

In general, the object of such alignment mark measurements as performed by the alignment system is to determine or estimate a position of the target portions (such as target portions C as shown in FIG. 1) of a next exposure process.

In order to determine these target portion positions, positions of alignment marks that are e.g. provided in scribe-lanes surrounding the target portions are measured. In general, the alignment marks as applied may also include so-called in-die marks or in-product marks, i.e. alignment marks that are located inside the exposed pattern. When the alignment mark positions as measured deviate from nominal or expected positions, one can assume that the target portions where the next exposure should take place, also have deviating positions. Using the measured positions of the alignment marks, one may determine or estimate the actual positions of the target portions, thus ensuring that the next exposure can performed at the appropriate position, thus aligning the next exposure to the target portion.

Note that, in case the patterns of two consecutive layers would not be properly aligned, this could cause a malfunction in the circuit that is manufactured. Such a positional deviation or positional offset between two consecutive layers is often referred to as overlay. Such an overlay may be determined by off line measurements performed once two consecutive layers have been created by exposure processes. Ideally, the alignment process, i.e. the process to determine a position of a previously created layer of patterns based on a position measurement of alignment marks, provides in an accurate determination of the actual position of the alignment marks, based upon which, by appropriate modelling, an accurate determination of the actual position of a previously exposed pattern can be determined. This modelling involves determining, using the determined position of the alignment marks that are e.g. disposed in scribe lanes, the position of the previously exposed patterns. This position of a previously exposed pattern, i.e. the position of the previously exposed layer of the integrated circuits that are manufactured, may then be used as a target position for a next exposure process, i.e. the process of exposing a subsequent layer of the integrated circuit).

Such modelling may involve various mathematical techniques such as approximating or interpolating the alignment mark positions by means of higher order two-dimensional polynomials or other functions. Within the meaning of the present invention, it is assumed that this modelling does not introduce any further deviations or errors. Phrased differently, any errors introduced due to the processing of the alignment mark positions to arrive at the positions of the target portions, which errors would introduce a further overlay, are disregarded or assumed to be non-existent. The same holds for the actual exposure process, which is assumed to project a subsequent pattern accurately onto the target portion.

One of the main reasons for performing an alignment measurement between two consecutive exposures is to take into account any deformations of the substrate that may have occurred after a previous exposure. In general, a substrate will undergo a plurality of processing steps between the creation of two consecutive patterns, these processing steps potentially causing deformations of the substrate and thus displacements of the alignment marks. These displacements of the alignment marks may be characterized as positional deviations of the alignment marks, i.e. deviations between a measured position of an alignment mark and a nominal or expected position of the alignment mark.

Similar to the modelling as described above, when a plurality of measured alignment mark positions are available, and positional deviations, i.e. deviations of the expected alignment mark positions, are determined, these deviations may e.g. be fitted to a mathematical function so as to describe the deformation of the substrate. This may e.g. be a two-dimensional function describing a deviation Δ(x,y) as a function of an (x,y) position, the x-coordinate and y-coordinate determining the position in a plane spanned by the X-direction and Y-direction. Using such a function, one may then determine or estimate an actual position of a target portion where a next layer or pattern needs to be projected.

In general, one would expect that a measured alignment mark position would not deviate, depending on the measurement characteristic that is used, e.g. the type of alignment beam that is applied.

However, it has been recognized by the inventors that an alignment position measurement as performed by an alignment system may be disturbed by a deformation or asymmetry of the alignment mark itself. Phrased differently, due to a deformation of an alignment mark, a deviating alignment mark position measurement can be obtained, compared to a situation whereby the alignment mark is not deformed. In case no measures are taken, such deviating alignment mark position measurement could result in an erroneous determination of the alignment mark position.

It has further been observed that this type of deviation, i.e. a deviating position measurement caused by an alignment mark deformation, depends on the measurement characteristic as applied.

As an example, when an alignment mark position is measured using different measurement characteristics, e.g. using alignment beams having a different frequency, this may lead to different results, i.e. different measured positions for the alignment marks. This phenomenon, whereby different measured positions are obtained when different measurement characteristics or parameters are applied (e.g. whereby measurement beams having a different frequency or frequency content are applied), is referred to as ‘measurement parameter dependency’ In case the measurements refer to position measurements of an alignment mark, the occurrence of such ‘measurement parameter dependency’ may be indicative that the alignment mark that is being measured, has been deformed or has some asymmetry, e.g. caused by processed performed in preparation of the exposure of a pattern onto the substrate.

As such, when a position of an alignment mark is measured using a plurality of different measurement characteristics λ₁, X₂, X₃, . . . λ_i, e.g. using alignment beams having a different frequency or a single alignment beam comprising beams having different frequencies, different results are obtained, e.g. a plurality of different alignment mark positions may be obtained based on the measurements.

As will be clear from the above, the outcome of the alignment measurement procedure should be an assessment of the actual substrate deformation, i.e. an assessment of the actual positions of the alignment marks, which may then be used to determine an actual position of the target portions for a subsequent exposure.

In view of the effects described, in particular the effects of the alignment mark deformations, the measured alignment mark positions, i.e. the alignment mark positions as derived from the different measurements (i.e. using different measurement characteristics) are both affected by the actual (unknown) substrate deformation and by occurring (unknown) mark deformations causing deviating alignment position measurements. Both effects may be interpreted as a deviation between an expected alignment mark position and a measured alignment mark position. As such, when a position deviation is observed, it may either be caused by an actual substrate deformation or by an alignment mark deformation or by a combination thereof.

FIG. 2 schematically depicts some possible scenarios; Assuming that three measurements M1, M2, M3 are performed to determine a position of an alignment mark X. FIG. 2(a) schematically shows the nominal or expected position E of the alignment mark and the measured positions M1, M2, M3. FIG. 2(a) further shows the actual position A of the alignment mark. As can be seen, none of the measurements performed provide in an accurate representation of the actual position deviation (E-A), i.e. the difference between the expected position E and the actual position A.

The scenario as depicted in FIG. 2(a) thus involves an actual displacement of an alignment mark (the actual alignment mark position A differs from the expected position E) combined with a mark deformation causing deviating measurements.

FIG. 2(b) shows an alternative scenario whereby differences are observed in the measurements (M1, M2, M3), the measured positions differing from the expected position E, while the actual position A is assumed to coincide with the expected position E. In this scenario, the measurements would imply that there is a positional deviation of the alignment mark, whereas, in reality, there is none, i.e. the position of the alignment mark is not affected by a substrate deformation. FIG. 2(c) schematically shows a third scenario whereby all three measurements M1, M2, M3 coincide and coincide with the actual position A. Such a scenario may occur when there is no alignment mark deformation affecting the measurements.

With respect to the occurring substrate deformations and mark asymmetries or mark deformations, the following should be noted: As already indicated above, in between two consecutive exposure steps, i.e. the consecutive application of particular patterns onto target portions such as target portions C as shown in FIG. 1, a substrate undergoes various processes outside the lithographic apparatus. These processes may cause the aforementioned substrate deformations and mark deformations or mark asymmetries.

Two types of process equipment are generally used for the processing of substrates outside a lithographic apparatus, affecting the substrates in a different manner.

A first type of equipment can be characterized as surface modifying equipment, such equipment or process tool processing the exposed surface of the substrate. Examples of such tools include tools for etching a substrate or tools for rendering the top surface substantially flat, such as CMP (Chemical Mechanical Planarization) tools.

A second type of equipment can be characterized as processing the substrate as a whole, or the bulk of the substrate. Such processing e.g. include thermal processing of the substrate or mechanical handling of a substrate. Typically, these bulk modifying tools may introduce mechanical stresses in the substrate resulting into strain, i.e. a deformation of the substrate.

It has been observed by the inventors that the first type of equipment typically results in deformations of the alignment marks themselves, and e.g. introduces mark asymmetries. The second type of equipment has been devised to result in actual deformations of the substrate as a whole, thus resulting in actual displacements of the alignment marks with respect to their expected or nominal position. As such, in general, when a substrate is brought into a lithographic apparatus after being processed, both mark deformations and substrate deformations may have been introduced due to the processing.

As such, when subsequently the position of a plurality of alignment marks on the substrate is determined using an alignment system AS according to the present invention, the position measurements may be affected by both mark and substrate deformations, e.g. resulting in different position measurements when using different measurement parameters or characteristics λ.

The present invention provides, in an embodiment, in a manner to determine an optimal combination of such a set of different position measurements of an alignment mark. In particular, in the embodiment as described below, the present invention makes use of simulations of alignment measurements to arrive at a set of weight coefficients which can be applied in an alignment sensor or system AS. Note that, as indicated above, either use can be made of a plurality of simulation samples, i.e. mathematical models, or use can be made of measurement samples, i.e. substrates onto which measurements are performed, to arrive at the input data required to determine the weight coefficients. Note that a combination of both, i.e. a combination of simulated data and measurement data may be considered as well. Below, a more detailed embodiment that makes use of simulated data is discussed:

The purpose of the simulations and of the processing of these simulations is to arrive, in the described embodiment at a so-called ‘estimator’, in particular an ‘alignment position estimator’.

Within the meaning of the present invention, estimator is used to indicate a function that is used to estimate or characterize the property that is examined, e.g. an alignment position or height level.

As such, the use of an estimator may also be referred to, in an embodiment, as estimating a characteristic of the property based on a plurality of measurements, whereby the characteristic, e.g. the value of the property that is looked for, includes a combination of outcomes of the plurality of measurements, whereby the outcomes, i.e. the measured characteristics, are weighted by respective weighting coefficients.

In the embodiment described, such an alignment position estimator thus includes a set of weight coefficients, associated with a set of simulated measurement parameters. In the present invention, a simulated measurement parameter, or in short, a simulated parameter refers to a measurement parameter or characteristic of the alignment sensor as used in the simulations. Once the set of weight coefficients is determined, these weight coefficients can be applied during an actual alignment measurement process, to calculate a weighted combination of a set of position measurements of an alignment mark (using the set of weight coefficients) as the estimated or expected alignment mark position.

In an embodiment, referred to as the predictor form or predictor part of the alignment position estimator, the following form or format for the alignment position estimator is applied:

y=w
^†
·x (1)

Where:

- the dagger denotes the transpose operator;
- y denotes the estimated alignment position;
- x denotes a Mx1 vector comprising a set of M alignment position measurements as performed by an alignment sensor, e.g. an alignment sensor or system AS;
- M ∈{1, 2, 3, . . . } denotes the total number of alignment measurements that are performed to measure the position of a particular alignment mark. M thus denotes the number of distinct measurement characteristics or parameters λ₁, λ₂, λ₃, . . . λ_ias used to measure the position of the particular alignment mark. As such, M corresponds to the different wavelengths, polarizations, illumination angles and/or diffraction order combinations that are used in the plurality of alignment mark position measurements as performed by the alignment sensor.

$\underline{w} \in {\underline{w} \in ℝ^{M \times 1} : \sum_{m = 1}^{M} {(\underline{w})}_{m} = 1},$

- denotes the weight coefficients or weights of the predictor that are applied to the alignment position measurements x to arrive at the estimated alignment position. In order to maintain the proper unit of the estimated alignment position, the sum of the weights w can be set equal to one, i.e.

$\sum_{m = 1}^{M} {(\underline{w})}_{m} = 1.$

As an example, an alignment system AS may be equipped to perform alignment measurements using 4 different alignment beams, denoted by λ₁, λ₂, λ₃, λ₄. Vector x may thus consist of 4 measurements:

$\begin{matrix} \underline{x} = [\begin{matrix} x (λ_{1}) \\ x (λ_{2}) \\ x (λ_{3}) \\ x (λ_{4}) \end{matrix}] & (2) \end{matrix}$

Where:

x(λ₁) represents the alignment measurement result obtained using parameter λ₁;

x(λ₂) represents the alignment measurement result obtained using parameter λ₂;

x(λ₃) represents the alignment measurement result obtained using parameter λ₃;

x(λ₄) represents the alignment measurement result obtained using parameter λ₄.

For this example, vector w may thus consist of 4 weight coefficients:

$\begin{matrix} \underline{w} = [\begin{matrix} w (λ_{1}) \\ w (λ_{2}) \\ w (λ_{3}) \\ w (λ_{4}) \end{matrix}] & (3) \end{matrix}$

Whereby:

w (λ₁) represents the weight coefficient applied to the alignment measurement performed by using parameter λ₁;

w (λ₂) represents the weight coefficient applied to the alignment measurement performed by using parameter λ₂;

w (λ₃) represents the weight coefficient applied to the alignment measurement performed by using parameter λ₃;

w (λ₄) represents the weight coefficient applied to the alignment measurement performed by using parameter λ₄;

Once the weight coefficients w are known, an alignment system AS, as e.g. applied in a lithographic apparatus according to the present invention, may perform, for each of a plurality of alignment marks, a set of M alignment measurements (using the measurement parameters by λ₁, λ₂, λ₃, λ₄) and apply the weight coefficients w to arrive at an estimated position of the alignment mark.

The present invention provides in various methods to arrive at the weight coefficients w. In particular, in order to determine the weight coefficients vector w, the present invention makes use of simulations. In case of an alignment sensor, the simulations involve using models for simulating the measurement of the position of an alignment mark, by means of a plurality of different measurement parameters or characteristics. In particular, a portion of a stack of a substrate, the portion of the stack including an alignment mark, is modelled and used to simulate a set of alignment measurements, using different alignment measurement parameters or characteristics.

In order to determine the weight coefficients vector w, the embodiment of the present invention makes us of a plurality of such models, whereby the plurality of models differ from one another in that the geometry and/or physical properties of the modelled stack portion are different and/or the operational properties of the modelled sensor are different. As an example, alignment mark deformations as may be caused by processing of a substrate may be modelled and the effects of the deformations on alignment measurements may be simulated.

FIG. 3 schematically shows some possible alignment mark deformations.

FIG. 3(a) schematically shows the alignment mark 400 without any deformations and/or asymmetries, i.e. having substantially vertical side walls 410 and a substantially horizontal bottom portion 420. FIG. 3(b) schematically shows the alignment mark 400 having slanted side walls 430. Such slanted side walls may be considered a mark deformation and may be characterized by an angle α. FIG. 3(c) schematically shows the alignment mark 400 having a tilted bottom portion 440. Such a tilted bottom portion may also be considered a mark deformation and may be characterized by a tilt angle β of the alignment mark. FIGS. 3(b) and 3(c) thus illustrate two possible mark deformations which may have an effect on a mark position measurement as performed by an alignment sensor or system.

Within the meaning of the present invention, the plurality of models or simulation models that is used to determine the weight coefficients w are also referred to as training samples. FIG. 4 schematically shows such a model 500 of a portion of a stack of a substrate, including an alignment mark. Within the meaning of the present invention, stack refers to the set of layers 510 that is applied on a substrate, said layers having different optical or electromagnetic properties, e.g. due to the use of different materials. The stack of layers 510, or in short the stack 510, as schematically shown in FIG. 4 further comprises an alignment mark 520. In the model 500 as shown, the alignment mark 520 may be considered to be in a nominal position (along the X-axis) and having a nominal or expected geometry. The model 500 as schematically shown may be used in simulations, whereby an alignment measurement is simulated. Such a simulation may e.g. involve simulating the response of the model to an alignment beam, schematically indicated by the arrows 530, that is incident on to stack of layers 510. Based on this response, one may determine the position, in particular the Y-position, of the alignment mark 520 as modelled.

In order to address the issue of alignment mark deformations, a plurality of models such as model 500 are applied for simulating the response of an alignment beam.

Such a plurality of models may e.g. be constructed by varying a position of the corners 520.1-520.4 of the alignment mark 520.

FIG. 4b schematically shows a close-up of portion 540 of the model 500.

FIG. 4b schematically shows a portion of the alignment mark 520 as modelled, including one side wall 522 and part of the bottom 524 of the alignment mark 520. Corner 520.1 as shown is in a nominal position 520.1. in order to model a deformation of alignment mark 520, the corner 520.1 may e.g. be displaced to any location inside the area 545, e.g. a rectangular area representing a possible deformation of the alignment mark. In particular, when the corner 520.1 is e.g. displaced to the position as indicated by the 520.5, the geometry of the side wall 522 and the bottom 524 are changed. The dotted lines 526 and 528 indicating the positions of the side wall and bottom of the alignment mark 520 when the corner 520.1 is moved to the position indicated by 520.5.

In an embodiment, the deformation of the alignment mark as applied in a particular model can be determined randomly. Alternative, a two-dimensional grid can be applied inside the area 545, thereby generating n x m different positions for the corner 520.1 of the alignment mark 520, n being the number of positions considered along the X-axis, m being the number of positions considered along the Z-axis.

In a similar manner, the position of the corner 520.2 may be modified. Further, corners 520.3 and 520.4 of the alignment mark 520 may also be displaced, e.g. along the X-axis. As an example, corner 520.4 as shown in FIG. 4b may be displaced along the X-axis within the range indicated by arrow 546.

The generation of the plurality of models may also involve changing other parameters of the stack 510, apart from the geometry of the alignment mark. In particular, the generation of the plurality of simulation models or training samples may also involve changing the geometry of one or more layers of the modelled stack. As an example, a thickness T of a layer 510.1 as shown in FIG. 4b, may e.g. be varied within a range of +10% to -10% of an nominal thickness. In a similar manner, physical properties of the layers as applied in the model, such as optical parameters, may be varied as well within certain ranges.

Summarizing, the plurality of models used for the measurement simulations may be obtained by applying one or more of the following modifications to a model having the nominal or expected geometry and physical properties:

- modifying a shape or size of the alignment mark as modelled;
- modifying any other geometrical parameter of the model such as a thickness of a layer of the stack;
- modifying a physical property as applied in the model, e.g. a physical properties of one of the layers or the alignment mark as modelled.
- Modifying properties of the sensing process that is simulates, thus modelling sensor variations occurring in real life.

As mentioned, these variations may be applied in a random manner or may be applied in a more structured manner to arrive at the plurality of models.

Typically, the number of models that is considered to take into account possible deformations of the alignment mark and other possible deviations may be comparatively high, e.g. in a range from 100 to 100000.

In accordance with the present invention, the different models or samples may be used to simulate alignment measurements, whereby alignment measurements are simulated for a plurality of alignment measurement characteristics or parameters λ. Within the meaning of the present invention, reference is made to a simulated measurement parameter λ, or in short, a simulated parameter λ, to indicate a corresponding measurement parameter or characteristic λ of the alignment sensor. Note however that, in general, the set of simulation parameters λ used in the simulations need not be the same as the set of measurement parameters λ as applied by the alignment sensor during an alignment measurement procedure. As an example, it may be advantageous to simulate the alignment measurements with a set of S simulation parameters, a first subset of the set of simulation parameters corresponding to the alignment measurement parameters as applied by a first alignment sensor and a second subset of the set of simulation parameters corresponding to the alignment measurement parameters as applied by a second alignment sensor. However, as will be apparent from equation (1), the weight coefficients as derived from the simulations (e.g. based on set of simulation parameters λ used in the simulations) should correspond to, or be associated with, measurement parameters or characteristics λ that are actually applied by the alignment sensor.

Further on, it is assumed that there is a one-to-one correspondence between the parameters as applied in the simulations (i.e. the simulated measurement parameters λ, or in short, the simulated parameters λ) and the parameters that are applied as measurement parameters or characteristics λ of the alignment sensor.

In case T samples or models are available and each model is subjected to M alignment measurement simulations (i.e. using M different alignment measurement characteristics or parameters λ), a set of T×M simulation results is obtained.

For each sample, a set of M simulations is thus available, representing the alignment measurement simulation using M different measurement characteristics, e.g. the same characteristics as are available to perform the actual alignment measurements using an alignment system AS. For each of the models used, the simulation results (i.e. the positions of the alignment mark, e.g. alignment mark 520, as derived from the simulations) may be compared to an theoretical position of the alignment mark in the model. The results of the simulations may be expressed in a similar manner as in equation (1). In order to make a distinction between actual measurements and simulations, a subscript t is applied for the simulated data, t ∈ {1, 2, 3, . . . , T}, whereby T ∈ {1, 2, 3, . . . } denotes the total number of training samples or models that is used.

For each of the samples t, the theoretical aligned position of the mark (e.g. mark 520 of FIG. 4a) is further denoted as y_training,t. Note that this theoretical aligned position may also be user defined, i.e. application specific.

FIG. 5 schematically shows, for a plurality of samples T, simulated values x_tfor the alignment mark position, given a set of 4 simulation parameters λ₁, λ₂, λ₃, λ₄.

As can be seen, for training sample t=1, none of the simulations, using the simulation parameters λ₁, λ₂, λ₃, λ₄results in a simulated position x_tequal to the theoretical aligned position of the mark y_training,tin the sample t=1. In training sample t=3, one can observe that the simulated position using parameter λ₃, indicated as x_t(λ₃) substantially equals to the theoretical aligned position of the mark y_training,tin the sample t=3.

In the present invention, several methods are described to derive a value of the weight coefficients w.

In a first embodiment, use is made of a cost function, or optimization function, that is optimized, e.g. minimized, to arrive at the weight coefficients w.

A first example of such a cost function includes the sum, over all samples T, of the squared biases of the simulations.

Within the meaning of the present invention, bias is used to denote the difference between the theoretical aligned position of a simulated mark y_training,tof training sample t and the weighted combination as derived from the simulated measurements x_t. The sum of the squared biases of all the training samples T may be expressed as:

$\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} = \sum_{t = 1}^{T} {({\underline{w}}^{†} \cdot {\underline{x}}_{t} - y_{training, t})}^{2} & (4) \end{matrix}$

When considering the sum of all squared biases as the cost function, optimal values of the weight coefficients w can be found by setting the derivative with respect to the variables w equal to zero:

$\begin{matrix} \begin{matrix} \frac{\partial}{\partial \underline{w}} \cdot (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2}) = \frac{\partial}{\partial w} \cdot \sum_{t = 1}^{T} {({\underline{w}}^{†} \cdot {\underline{x}}_{t} - y_{training, t})}^{2} \\ = 2 \cdot \sum_{t = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t} - y_{training, t}) \cdot {\underline{x}}_{t} . = 0 \end{matrix} & (5) \end{matrix}$

Solving equation (5), together with the constraint that

$\sum_{m = 1}^{M} {(\underline{w})}_{m} = 1$

enables to derive a value for the weight coefficients w.

A second example of the cost function includes, in addition to the sum of all squared biases, an estimate of the reproducibility of the alignment measurements. As will be apparent to the skilled person, when performing alignment measurements, these measurements may be affected by noise and/or uncertainties and/or systematic errors with respect to the measurement parameters that are actually applied. Phrased differently, minor deviations in the applied measurement parameters λ (which will in general be unknown when the measurements are performed) may affect the actual measurement. The degree at which these deviations affect the actual measurement may be different for the different measurement parameters. Certain measurement parameters may be more robust to such deviations than others.

It has been devised by the inventors that it may be advantageous to take this reproducibility of the measurements into account when determining the optimal weight coefficients.

It is therefore proposed, in the second example of the cost function, to include, as a measure for the reproducibility or robustness of the measurements, the variance of y, i.e. the estimated alignment position.

The variance of y can be expressed as a function of the covariance matrix of x, denoted by C_x, as follows:

$\begin{matrix} \begin{matrix} σ_{y}^{2} = \frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x}) \cdot {\underset{\underline{_}}{C}}_{\underline{x}} \cdot {(\frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x}))}^{†} \\ = {\underline{w}}^{†} \cdot {\underset{\underline{_}}{C}}_{\underline{x}} \cdot \underline{w} . \end{matrix} & (6) \end{matrix}$

In a similar manner, the variance may be determined for each of the training samples T and a sum of the variances of all the training samples may be included in the cost function. The sum of the sample variances can be expressed as (subscript t referring to the use of training data or data obtained from the simulations):

$\begin{matrix} \begin{matrix} \sum_{t = 1}^{T} σ_{y, t}^{2} = \sum_{t = 1}^{T} {\underline{w}}^{†} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot \underline{w} \\ = {\underline{w}}^{†} \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w} . \end{matrix} & (7) \end{matrix}$

Combining equations (7) and (4) as the cost function results in:

$\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2} & (8) \end{matrix}$

Note that in equation (8), the summation over t is assumed to apply for both the squared biases and the variances.

The derivative of the sum of all training sample variances with respect to the variables w equals

$\begin{matrix} \begin{matrix} \frac{\partial}{\partial \underline{w}} \cdot (\sum_{t = 1}^{T} σ_{y, t}^{2}) = \frac{\partial}{\partial \underline{w}} \cdot ({\underline{w}}^{†} \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w}) \\ = \frac{\partial {\underline{w}}^{†}}{\partial \underline{w}} \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}})  \underline{w} + {\underline{w}}^{†} \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \frac{\partial \underline{w}}{\partial \underline{w}} \\ = 2 \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w} . \end{matrix} & (9) \end{matrix}$

Using the derivatives as shown in equations (5) and (9), the condition to find optimal values for the weight coefficients w becomes:

$\begin{matrix} \frac{\partial}{\partial \underline{w}} \cdot (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2}) = \underline{0} & (10) \\ 2 \cdot (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w} + 2 \cdot \sum_{t^{'} = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t^{'}} - y_{training, t^{'}}) \cdot {\underline{x}}_{t^{'}} = \underline{0} \\ (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w} + \sum_{t^{'} = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t^{'}}) \cdot {\underline{x}}_{t^{'}} = \sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}} \\ (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) \cdot \underline{w} + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†} \cdot \underline{w} = \sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}} \\ ((\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†}) \cdot \underline{w} = \sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}} . \end{matrix}$

Taking the constraint

$\sum_{m = 1}^{M} {(\underline{w})}_{m} = 1$

into account, the following Karush-Kuhn-Tucker system of equations can be solved:

$\begin{matrix} [\begin{matrix} (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†} & \underline{1} \\ {\underline{1}}^{†} & 0 \end{matrix}] \cdot [\begin{matrix} \underline{w} \\ - λ \end{matrix}] = [\begin{matrix} \sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}} \\ 1 \end{matrix}] & (11) \end{matrix}$

Where λ denotes the Lagrange multiplier of the constraints

$\sum_{m = 1}^{M} {(\underline{w})}_{m} = 1.$

The optimal values of the variables w can be computed by solving this linear system of equations.

It can further be noted that the matrix

$[\begin{matrix} (\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†} & \underline{1} \\ {\underline{1}}^{†} & 0 \end{matrix}]$

is a symmetrical matrix, which may be beneficial when computing a solution for this linear system of equations.

Note that computing a solution to the Karush-Kuhn-Tucker system of equations (11) is equivalent with solving the following quadratic program:

$\min (\begin{matrix} \frac{1}{2} \cdot {\underline{w}}^{†} \cdot ((\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†}) \cdot \underline{w} \\ - {\underline{w}}^{†} \cdot (\sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}}) \end{matrix}),$

subject to 1^†·w=1.

Note that in order to reduce the amplitude of the resulting weights (which can be beneficial for the propagation of sensor errors), one can add additional constraints on the weights and/or include an additional extra regularization term, as follows (here written in quadratic program form for convenience):

$\min (\begin{matrix} \frac{1}{2} \cdot {\underline{w}}^{†} \cdot ((\sum_{t = 1}^{T} {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}}) + \sum_{t^{'} = 1}^{T} {\underline{x}}_{t^{'}} \cdot {\underline{x}}_{t^{'}}^{†} + μ \cdot \underset{\underline{_}}{I}) \cdot \underline{w} \\ - {\underline{w}}^{†} \cdot (\sum_{t^{″} = 1}^{T} y_{training, t^{″}} \cdot {\underline{x}}_{t^{″}}) \end{matrix}),$

subject to 1^†·w=1 and w_lb·1≤w≤w_ub·1.

Where:

- μ={μ∈: μ≥0} denotes the regularization parameter on the weights w;
- w_lb={w_lb∈: w_lb≤w_ub} and w_ub={w_ub∈: w_lb≤w_ub} denote the lower and upper bound on the weights w.

In order to apply the cost function according to the second example, the simulations as performed on the T samples thus need to include the calculation, an approximation or an estimation of the covariance matrices of the samples, in addition to the simulation of the alignment measurements, using the different simulation parameters λ.

It can further be noted that, in the cost functions as described, all samples or models are considered equally important. This can be seen by considering that the variances of the different samples are summed and, similarly, that the biases of the different samples are summed. As such, in the examples given, the likelihood that a particular model actually occurs in practice is not considered. In case such a likelihood can be determined or estimated, it could be taken into account by weighing the biases or variances of the least likely models down and/or weighing the biases or variances of the most likely models up.

Further, when considering the cost function (8), it can be noted that the sum of the variances is given the same weight or importance as the sum of the biases. As an alternative, one could e.g. scale either one of the sum of the variances or the sum of the biases in order give more weight or importance to either the variances or the biases.

In a second embodiment of the alignment position estimator, the alignment position estimator includes an additional term referred to as the corrector or corrector part.

In the most generic form, the combination of the predictor and the corrector or corrector part can be formulated as:

y=w
^†
·x+f(x−w^†·x)

Where f is an unknown function.

Below, a specific form of the combination of the predictor and corrector of the alignment position estimator is developed:

$\begin{matrix} \begin{matrix} y = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x}) \\ = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x}) . \end{matrix} & (12) \end{matrix}$

Where ω is a B×1 vector denoting the weights of the corrector.

Where φ is a B×1 vector denoting a set of basis-functions.

Where B ∈{1, 2, 3, . . . } denotes the total number of basis-functions.

Where we have introduced the following shorthand notation:

$\underset{\underline{_}}{D} = (\underset{\underline{_}}{I} - [\begin{matrix} {\underline{w}}^{†} \\ {\underline{w}}^{†} \\ ⋮ \\ {\underline{w}}^{†} \end{matrix}]) .$

Compared to the alignment position estimator of the first embodiment, the alignment position estimator further includes a further weighted combination (using weight coefficients ω) of a function of the remaining differences between the weighted (using weights w) combination of the measurements x and the estimated alignment position y. These remaining differences are further on also referred to as measurement biases. From a practical point of view, this can be seen as the corrector or corrector part of the alignment position estimator focusing on the correction of the mark deformation that may have occurred. In the second embodiment of the alignment position estimator, values for the weight coefficients w and the weight coefficients ω are again determined based on simulations performed on a set of T models as described above.

In the following, a method is described to determine the further weight coefficients or weight coefficients ω based on simulated data as described above. In the described method, it is assumed that the weights or weight coefficients w have already been determined, using any of the methods described above. However, the possibility of solving equation (12) simultaneously for both the weight coefficients w and the weight coefficients ω should not be excluded.

From a simplicity point of view, it may however be preferred to solve for the weight coefficients w and the weight coefficients ω sequentially, i.e. first solving a linear set of equations to obtain the weight coefficients w, and subsequently, as will be explained below, solving a second linear set of equations to obtain the weight coefficients ω.

As in the first embodiment, the weight coefficients ω are again determined by minimizing a cost function or optimization function.

The cost function as will be applied in the second embodiment is the mean sum (taken over all T samples or training samples) of the squared biases and the variances, similar to the second example of the first embodiment. Note that, similar to the equation (6), the variance of y can be expressed as a function of the covariance matrix of x, as follows

$\begin{matrix} \begin{matrix} σ_{y}^{2} = \frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x})) \cdot {\underset{\underline{_}}{C}}_{\underline{x}} \cdot \\ {(\frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x})))}^{†} \\ = ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{\underline{x}} \cdot \\ {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x})}{\partial \underline{x}})}^{†} \\ = ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{\underline{x}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x})}{\partial \underline{x}})}^{†} . \end{matrix} & (13) \end{matrix}$

In the cost function used to find the weight coefficients ω, the sum of the variances of all training samples T can be expressed as:

$\begin{matrix} \sum_{t = 1}^{T} σ_{y, t}^{2} = \sum_{t = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} . & (14) \end{matrix}$

The sum of the squared biases of all the training samples T can be expressed as:

$\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} = \sum_{t = 1}^{T} {({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t})}^{2} . & (15) \end{matrix}$

The derivative of the sum of the variances of all training samples T with respect to the variables ω then equals:

$\begin{matrix} \frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} σ_{y, t}^{2}) = \frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†}) = \sum_{t = 1}^{T} \frac{\partial}{\partial \underline{ω}} \cdot ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} + \sum_{t^{'} = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{'}}} \cdot \frac{\partial}{\partial \underline{ω}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}})}^{†} = \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} + {(\sum_{t^{'} = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{'}}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}})}^{†})}^{†} = 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} = 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot (\underline{w} + {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} \cdot \underline{ω}) = 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} \cdot \underline{ω} + 2 \cdot \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} . & (16) \end{matrix}$

In above derivation, use has been made of the fact that the matrix C_xis a symmetrical matrix.

The derivative of the sum of the squared biases with respect to the variables ω equals:

$\begin{matrix} \frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2}) = \frac{\partial}{\partial \underline{ω}} \cdot \sum_{t = 1}^{T} {({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t})}^{2} = \sum_{t = 1}^{T} \frac{\partial}{\partial \underline{ω}} (\cdot {({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t})}^{2}) = \sum_{t = 1}^{T} 2 \cdot ({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t}) \cdot \frac{\partial}{\partial \underline{ω}} \cdot ({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t}) = \sum_{t = 1}^{T} 2 \cdot ({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t}) \cdot \frac{\partial}{\partial \underline{ω}} \cdot ({\underline{ω}}^{†} \cdot φ (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})) = 2 \cdot \sum_{t = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) - y_{training, t}) \cdot φ (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t}) . & (17) \end{matrix}$

Similar to the first embodiment, one can observe that the cost function, i.e.

$\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2},$

is a quadratic function of the variables ω. Hence a sufficient condition of optimality equals

$\frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2}) = \underline{0} .$

Applying this condition of optimality, using the derivatives as derived in equations (16) and (17), results in:

$\begin{matrix} \frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2}) = \underline{0} & (18) \\ 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} \cdot \underline{ω} + 2 \cdot \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} + 2 \cdot \sum_{t^{″} = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t^{″}} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}}) - y_{training, t^{″}}) \cdot φ (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}}) = \underline{0} \\ \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} \cdot \underline{ω} + \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} + \sum_{t^{″} = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t^{″}} + {\underline{ω}}^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}}) - y_{training, t^{″}}) \cdot φ (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}}) = \underline{0} \\ \sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} \cdot \underline{ω} + \sum_{t^{'} = 1}^{T} ω^{†} \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}}) \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}}) = \sum_{t^{″} = 1}^{T} (y_{training, t^{″}} - {\underline{w}}^{†} \cdot {\underline{x}}_{t^{″}}) \cdot \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}}) - \sum_{t^{″} = 1}^{T} (\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{′′′}})}{\partial \underline{x}}) \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{″}}} \cdot \underline{w} \\ (\sum_{t = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t})}{\partial \underline{x}})}^{†} + \sum_{t^{'} = 1}^{T} \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}}) \cdot {\underline{φ}}^{†} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{'}})) \cdot \underline{ω} = \sum_{t^{″} = 1}^{T} \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{″}})  \cdot (y_{training, t^{″}} - {\underline{w}}^{†} \cdot {\underline{x}}_{t^{″}}) - \sum_{t^{′′′} = 1}^{T} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot {\underline{x}}_{t^{′′′}})}{\partial \underline{x}} \cdot {\underset{\underline{_}}{C}}_{{\underline{x}}_{t^{″}}} \cdot \underline{w} . \end{matrix}$

The optimal values of the variables ω can now be computed by solving this linear system of equations (17).

As an example of the basis-functions as applied in above equations, the use of radial basis functions can be mentioned. Referring to equation (12), the following radial basis functions can be used in the corrector:

$\begin{matrix} \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x}) = [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \end{matrix}] & (19) \end{matrix}$

Where S ∈ (1, 2, 3, . . . ) denotes the number of support vectors that is used.

Where ξ_{s=1 . . . S}denote the S support vectors, the support vectors defining the position at which the radial basis function is positioned.

Where:

γ∈{γ∈ custom-character ^M×M: (γ)_r,c≥0, for all r and c}

denotes the radial basis function radius scaling (non-negative) matrix.

With respect to the scaling, it can be mentioned that the scaling may be defined in such manner that the model results in a good approximation of the optimal response surface. The scaling as applied may e.g. be predetermined or may be optimized using an outer optimization loop.

The radial basis functions as expressed by equation (19) are Gaussian or exponential radial basis functions.

In case the radial basis functions are used, the required derivatives can be calculated as follows:

$\begin{matrix} \begin{matrix} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x})}{\partial \underline{x}} = \frac{\partial}{\partial \underline{x}} \cdot [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \end{matrix}] \\ = [\begin{matrix} {(\frac{\partial}{\partial \underline{x}} \cdot \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}))}^{†} \\ {(\frac{\partial}{\partial \underline{x}} \cdot \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}))}^{†} \\ ⋮ \\ {(\frac{\partial}{\partial \underline{x}} \cdot \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}))}^{†} \end{matrix}] \\ = [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \cdot {(\frac{\partial}{\partial \underline{x}} \cdot (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}))}^{†} \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \cdot {(\frac{\partial}{\partial \underline{x}} \cdot (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}))}^{†} \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \cdot {(\frac{\partial}{\partial \underline{x}} \cdot (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}))}^{†} \end{matrix}] \end{matrix} & (20) \\ Whereby : \\ \frac{\partial}{\partial \underline{x}} \cdot (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - \underline{ξ}) }_{2}^{2}) = 1 \cdot \frac{\partial}{\partial \underline{x}} \cdot ({(\underline{x} - \underline{ξ})}^{†} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - \underline{ξ})) = - 1 \cdot (\frac{\partial ({(\underline{x} - \underline{ξ})}^{†})}{\partial \underline{x}}  \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - \underline{ξ}) + {(\underline{x} - \underline{ξ})}^{†} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot \frac{\partial (\underline{x} - \underline{ξ})}{\partial \underline{x}}) = - 2 \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - \underline{ξ}) . & (21) \end{matrix}$

Inserting equation (21) in equation (20) results in:

$\begin{matrix} \begin{matrix} \frac{\partial \underline{φ} (\underset{\underline{_}}{D} \cdot \underline{x})}{\partial \underline{x}} = [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \cdot {(- 2 \cdot γ^{2} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}))}^{†} \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \cdot {(- 2 \cdot γ^{2} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}))}^{†} \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \cdot {(- 2 \cdot γ^{2} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}))}^{†} \end{matrix}] \\ = - 2 \cdot [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \cdot {({\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}))}^{†} \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \cdot {({\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}))}^{†} \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \cdot {({\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}))}^{†} \end{matrix}] \\ = - 2 \cdot [\begin{matrix} \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 1}) }_{2}^{2}) \cdot {(\underline{x} - {\underline{ξ}}_{s = 1})}^{†} {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = 2}) }_{2}^{2}) \cdot {(\underline{x} - {\underline{ξ}}_{s = 2})}^{†} {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \\ ⋮ \\ \exp (- { \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \cdot (\underline{x} - {\underline{ξ}}_{s = S}) }_{2}^{2}) \cdot {(\underline{x} - {\underline{ξ}}_{s = S})}^{†} \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} \end{matrix}] \\ = - 2 \cdot diag (φ (\underset{\underline{_}}{D} \cdot \underline{x})) \cdot [\begin{matrix} {(\underline{x} - {\underline{ξ}}_{s = 1})}^{†} \\ {(\underline{x} - {\underline{ξ}}_{s = 2})}^{†} \\ ⋮ \\ {(\underline{x} - {\underline{ξ}}_{s = S})}^{†} \end{matrix}]  \cdot {\underset{\underline{_}}{D}}^{†} \cdot {\underset{\underline{_}}{γ}}^{†} \cdot \underset{\underline{_}}{γ} \cdot \underset{\underline{_}}{D} . \end{matrix} & (22) \end{matrix}$

In an embodiment, the corrector part may make use of any further or additional information available with respect to the simulated measurements or the measurements that are used as samples. As an example, in a third embodiment, the corrector part of the alignment position estimator includes so-called pupil intensity information.

In such embodiment, the the combination of the predictor and corrector of the alignment position estimator can be set equal to

$\begin{matrix} \begin{matrix} y = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\underline{x} - {\underline{w}}^{†} \cdot \underline{x}]) \\ = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}]) . \end{matrix} & (23) \end{matrix}$

Whereby the exponential radial basis function is redefined to equal:

$\begin{matrix} \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}]) = [\begin{matrix} \exp (- { [\begin{matrix} \underline{\underline{γ_{x}}} \cdot \underline{\underline{D}} \cdot (\underline{x} - {\underline{ξ}}_{x = 1}) \\ \underline{\underline{γ_{I}}} \cdot (\underline{I} - {\underline{ϑ}}_{x = 1}) \end{matrix}] }_{2}^{2}) \\ \exp (- { [\begin{matrix} \underline{\underline{γ_{x}}} \cdot \underline{\underline{D}} \cdot (\underline{x} - {\underline{ξ}}_{x = 2}) \\ \underline{\underline{γ_{I}}} \cdot (\underline{I} - {\underline{ϑ}}_{x = 2}) \end{matrix}] }_{2}^{2}) \\ ⋮ \\ \exp (- { [\begin{matrix} \underline{\underline{γ_{x}}} \cdot \underline{\underline{D}} \cdot (\underline{x} - {\underline{ξ}}_{x = S}) \\ \underline{\underline{γ_{I}}} \cdot (\underline{I} - {\underline{ϑ}}_{x = S}) \end{matrix}] }_{2}^{2}) \end{matrix}] & (24) \end{matrix}$

Where:

- I is a N×1 vector denoting the pupil intensity information measured by the alignment sensor, for all wavelength, polarization, diffraction order and/or illumination angle combinations;
- N ∈{1, 2, 3, . . . } denotes the total number of wavelength, polarization, diffraction order and/or illumination angle combinations, of the pupil intensity information measured by the alignment sensor;
- γ_x∈{γ∈^M×M:(γ)_rc≥0, for all r and c} denotes the radial basis function radius scaling (non-negative) matrix, for the position part;
- γ_l∈{γ∈^M×M:(γ)_r,c≤0, for all r and c} denotes the radial basis function radius scaling (non-negative) matrix, for the intensity part;

As can be seen from equation (23), the corrector part of the alignment position estimator also includes the corrector part of the second embodiment. It should however be noted that it is not required to combine both. The pupil intensity information may also be used separately, i.e. without the corrector portion of the second embodiment.

When doing so, the alignment position estimator would become:

y=w
^t
·x+ω
^t·φ(I) (25)

Equation (25) can be considered a fourth embodiment of alignment position estimator.

With respect to the pupil intensity information referred to above, the following can be mentioned: in case an alignment sensor is configured such that intensities of certain diffraction orders can be measured in a pupil plane of the sensor, this information can be used to provide in an improved estimate of the aligned position.

FIG. 6 schematically illustrates an alignment sensor whereby pupil plane intensity measurements can be performed. FIG. 6 schematically shows an alignment system 600 configured to determine a position of an alignment mark 610, by projecting an alignment beam 620 onto the alignment mark 610. The reflected beam or beams 630 are subsequently provided, via a lens system 640 to a detector 650, e.g. via a grating 660 or the like. Based on the intensity as detected by the detector 650, a relative position of the alignment mark 610 and the grating 660 or detector 650 of the alignment system 600.

FIG. 6 further schematically shows a pupil plane 670 of the lens system 630 and two locations 680 at which an intensity of the reflected beam or beams 630 can be measured. In an embodiment, the locations may be selected to enable measurement of the −1 and +1 order of the reflected beam 630. Alternatively or in addition, higher order components of the reflected measurement beams may be measured as well.

It has been observed by the inventors that, in case the alignment mark 610 as measured is deformed, e.g. comprising a deformation as shown in FIGS. 4(b) 4(c), an asymmetry may be observed between the intensity as measured at the different locations, e.g. locations 680, in the pupil plane of the alignment sensor. Such asymmetry measurement, e.g. providing in a difference between an observed intensity of a +1 reflected order and a −1 reflected order, provides in additional information regarding the occurring mark deformations.

This pupil intensity information can be applied in a similar manner as describe above with respect to the simulated alignment measurements.

In particular, simulations of alignment measurements can be applied, as described above, to provide in an estimate of an aligned position when actual alignment measurements are made, using the same or nearly the same measurement parameters or characteristics as applied in the simulations.

Similarly, in case pupil plane intensities can be measured by an alignment sensor, such pupil plane intensities may be simulated as well, using a plurality of simulated parameters λ, and weight coefficients can be sought such that a weighted combination of the pupil plane intensities, as measured using the same plurality of characteristics or parameters λ during an actual alignment process, can be used as an improved (or corrected) estimate of the aligned position.

As such, assuming that a particular set of pupil intensity information can be obtained by an alignment sensor, the same intensity information may be retrieved by simulations.

In particular, the same set of T samples may be used and the pupil intensity information (e.g. the difference between an observed intensity of a +1 reflected order and a −1 reflected order in a pupil plane) may be obtained from these models, for a plurality of simulation parameters λ.

When this information is available, a cost function can be defined that is to be optimized.

In order to obtain the weight coefficients ω as applied in the alignment position estimator of equation (23), a similar cost function is applied as the second example cost function as applied in the first embodiment of the alignment position estimator, i.e. a cost function including the sum of all training sample variances and the sum of all training sample squared biases.

Note that, similar to the equations (6) and (13), the variance of y (as expressed in equation (23)) can be expressed as a function of the covariance matrix of x, as follows:

$\begin{matrix} σ_{y}^{2} = \frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])) \cdot {\underline{\underline{C}}}_{\underline{x}} \cdot {(\frac{\partial}{\partial \underline{x}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])))}^{†} + \frac{\partial}{\partial \underline{I}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])) \cdot {\underline{\underline{C}}}_{\underline{I}} \cdot {(\frac{\partial}{\partial \underline{I}} \cdot ({\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])))}^{†} = ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{x}}) \cdot {\underline{\underline{C}}}_{\underline{x}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{x}})}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{\underline{I}} \cdot {({\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{I}})}^{†} = {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{x}})}^{†} \cdot {\underline{\underline{C}}}_{\underline{x}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{x}})}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{\underline{I}} \cdot {({\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{I}})}^{†} . & (26) \end{matrix}$

For this derivation, it has been assumed here that x and I are statistically uncorrelated. Applying equation (26) to derive the sum of all training sample variances results in:

$\begin{matrix} \sum_{t = 1}^{T} σ_{y, t}^{2} = \sum_{t = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}}) \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t}} \cdot {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}})}^{†} + \sum_{t^{'} = 1}^{T} {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{\underline{I_{t^{'}}}} \cdot {({\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}})}^{†} . & (27) \end{matrix}$

Further, the sum of all training sample squared biases for the alignment position estimator of equation (23) equals:

$\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} = \sum_{t = 1}^{T} {({\underline{w}}^{†} \cdot {\underline{x}}_{t} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}]) - y_{training, t})}^{2} & (28) \end{matrix}$

In a similar manner as above, the optimum of cost function is found by equating the sum of the derivatives of the sum of all training sample squared biases and of the sum of all training sample variances with respect to the variables ω to zero.

The derivative of the sum of all training sample variances with respect to the variables ω equals:

$\begin{matrix} \frac{\partial}{\partial \underline{ω}} \cdot (\sum_{t = 1}^{T} σ_{y, t}^{2}) = \frac{\partial}{\partial \underline{ω}} \cdot (\begin{matrix} \sum_{t = 1}^{T} ({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}}) \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t}} \cdot \\ {({\underline{w}}^{†} + {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}})}^{†} + \\ \sum_{t^{'} = 1}^{T} {\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{\underline{I_{t^{'}}}} \cdot {({\underline{ω}}^{†} \cdot \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}})}^{†} \end{matrix}) = 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}})}^{†} \cdot \underline{ω} + 2 \cdot \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} + 2 \cdot \sum_{t^{″} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{{\underline{I}}_{t^{″}}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}])}{\partial \underline{I}})}^{†} \cdot \underline{ω} . & (29) \end{matrix}$

Note that we have made use of the fact that the matrix C_x_r, is a symmetrical matrix.

The derivative of the sum of all squared biases with respect to the variables ω equals

Using the derivatives as computed in equations (29) and (30), the condition of optimality for the cost function becomes:

$\begin{matrix} \frac{\partial}{\partial \underline{ω}} (\sum_{t = 1}^{T} {(y_{t} - y_{training, t})}^{2} + σ_{y, t}^{2}) = \underline{0} (\begin{matrix} 2 \cdot \sum_{t = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}})}^{†} \cdot \underline{ω} + \\ 2 \cdot \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} + \\ 2 \cdot \sum_{t^{″} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{{\underline{I}}_{t^{″}}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}])}{\partial \underline{I}})}^{†} \cdot \underline{ω} + \\ 2 \cdot \sum_{t^{″} = 1}^{T} ({\underline{w}}^{†} \cdot {\underline{x}}_{t^{″}} + {\underline{ω}}^{†} \cdot \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}]) - y_{training, t^{″}}) \cdot \\ \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}]) \end{matrix}) = \underline{0} (\begin{matrix} \sum_{t = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}])}{\partial \underline{x}})}^{†} + \\ \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}} \cdot {\underline{\underline{C}}}_{{\underline{I}}_{t^{'}}} \cdot {(\frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{I}})}^{†} + \\ \sum_{t^{″} = 1}^{T} \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}]) \cdot {\underline{φ}}^{†} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{″}} \\ {\underline{I}}_{t^{″}} \end{matrix}]) \end{matrix}) \cdot \underline{ω} = (\begin{matrix} \sum_{t = 1}^{T} \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t} \\ {\underline{I}}_{t} \end{matrix}]) \cdot (y_{training, t} - {\underline{w}}^{†} \cdot {\underline{x}}_{t}) - \\ \sum_{t^{'} = 1}^{T} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot {\underline{x}}_{t^{'}} \\ {\underline{I}}_{t^{'}} \end{matrix}])}{\partial \underline{x}} \cdot {\underline{\underline{C}}}_{{\underline{x}}_{t^{'}}} \cdot \underline{w} \end{matrix}) . & (31) \end{matrix}$

As can be seen, equation (31) now includes the pupil intensity information I. The following derivatives, whereby exponential radial basis functions are applied as the basis functions, are however straightforward to derive, in a similar manner as in equation (22). Therefore, only the result is presented here:

$\begin{matrix} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{x}} = - 2 \cdot diag (\underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])) \cdot [\begin{matrix} {(\underline{x} - {\underline{ξ}}_{s = 1})}^{†} \\ {(\underline{x} - {\underline{ξ}}_{s = 2})}^{†} \\ M \\ {(\underline{x} - {\underline{ξ}}_{s = S})}^{†} \end{matrix}] \cdot {\underline{\underline{D}}}^{†} \cdot {\underline{\underline{γ}}}_{x}^{†} \cdot {\underline{\underline{γ}}}_{x} \cdot \underline{\underline{D}} & (31) \end{matrix}$

Similarly, the derivative with respect to the pupil intensity information I can be found as:

$\begin{matrix} \frac{\partial \underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])}{\partial \underline{I}} = - 2 \cdot diag (\underline{φ} ([\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}])) \cdot [\begin{matrix} {(\underline{I} - {\underline{ϑ}}_{s = 1})}^{†} \\ {(\underline{I} - {\underline{ϑ}}_{s = 2})}^{†} \\ M \\ {(\underline{I} - {\underline{ϑ}}_{s = S})}^{†} \end{matrix}] \cdot {\underline{\underline{γ}}}_{I}^{†} \cdot {\underline{\underline{γ}}}_{I} & (32) \end{matrix}$

Using any of the embodiments as described above, optimal weight coefficients w or a combination of optimal weight coefficients w and optimal weight coefficients ω can be derived.

Once derived, these weight coefficients may subsequently be used online, i.e. during operation of a lithographic apparatus, to determine a position of an alignment mark as a weighted combination of measurements performed on that alignment mark, whereby the measurements include position measurements (using a plurality of alignment measurement parameters or characteristics λ) and optionally include pupil plane intensity measurements.

The following overview summarizes the forms of the alignment position estimators as described in the various embodiments:

$\begin{matrix} y = {\underline{w}}^{†} \cdot \underline{x} & (1) \\ \begin{matrix} y = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underline{x} - {\underline{w}}^{†} \cdot \underline{x}) \\ = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\underline{\underline{D}} \cdot \underline{x}) . \end{matrix} & (12) \\ \begin{matrix} y = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\begin{matrix} \underline{x} - {\underline{w}}^{†} \cdot \underline{x} \\ \underline{I} \end{matrix}) \\ = {\underline{w}}^{†} \cdot \underline{x} + {\underline{ω}}^{†} \cdot \underline{φ} (\begin{matrix} \underline{\underline{D}} \cdot \underline{x} \\ \underline{I} \end{matrix}) . \end{matrix} & (23) \\ y = {\underline{w}}^{t} \cdot \underline{x} + {\underline{ω}}^{t} \cdot \underline{φ} (\underline{I}) & (25) \end{matrix}$

In order to determine the weight coefficients as used in the alignment position estimators, use is made of a cost function that is optimized, whereby the cost function is a function based on simulated data of the alignment process, using a plurality of models or training samples T. In the above, two examples have been given, i.e.

- a cost function including the sum, over all samples T, of the squared biases of the simulations and;
- a cost function including the sum, over all samples T, of the squared biases and the sum, over all samples T, of the variances, of the simulations.

Both cost functions may be applied in either of the alignment position estimators to determine either the weight coefficients w, the weight coefficients ω or both. With respect to the latter cost function, it is worth mentioning that either the sum, over all samples T, of the squared biases or the sum, over all samples T, of the variances, or both may be weighted as well.

The aligned position estimators in equations (12), (23) and (25) make use of basis-functions. As an example, the use of exponential radial basis functions is described. However, it should be noted that other functions such as polynomials or spline functions may be considered as well in the corrector part of the alignment position estimator as well. In an embodiment, the present invention provides in an initialisation process of a sensor that senses a property of a substrate as used in a lithographic process. Examples of such sensors include alignment sensors, level sensors and overlay sensors. In a level sensor, a height map of a substrate is generated by projecting a measurement beam at a slanted angle towards the substrate surface and determining a position of a reflected beam off of the surface. An overlay sensor is typically used as an off-line tool to assess the alignment of two consecutive layers of patterns on a substrate.

Each of these sensors can be configured to perform a plurality of measurements in order to arrive at a value for the property that is measured, i.e. the alignment mark position, the height level or the overlay value. Using the above described techniques, weight coefficients may be derives to weigh such a plurality of measurements, thereby arriving at an optimised value for the measurement property.

As such, the present invention may also be embodied as an alignment sensor, a level sensor or an overlay sensor, said sensors comprising, e.g. in a controller or control unit associated with the sensor, the weight coefficients as derived by the initialisation process.

Although specific reference may be made in this text to the use of lithographic apparatus in the manufacture of ICs, it should be understood that the lithographic apparatus described herein may have other applications, such as the manufacture of integrated optical systems, guidance and detection patterns for magnetic domain memories, flat-panel displays, liquid-crystal displays (LCDs), thin-film magnetic heads, etc. The skilled artisan will appreciate that, in the context of such alternative applications, any use of the terms “wafer” or “die” herein may be considered as synonymous with the more general terms “substrate” or “target portion”, respectively. The substrate referred to herein may be processed, before or after exposure, in for example a track (a tool that typically applies a layer of resist to a substrate and develops the exposed resist), a metrology tool and/or an inspection tool. Where applicable, the disclosure herein may be applied to such and other substrate processing tools. Further, the substrate may be processed more than once, for example in order to create a multi-layer IC, so that the term substrate used herein may also refer to a substrate that already contains multiple processed layers.

Although specific reference may have been made above to the use of embodiments of the invention in the context of optical lithography, it will be appreciated that the invention may be used in other applications, for example imprint lithography, and where the context allows, is not limited to optical lithography. In imprint lithography a topography in a patterning device defines the pattern created on a substrate. The topography of the patterning device may be pressed into a layer of resist supplied to the substrate whereupon the resist is cured by applying electromagnetic radiation, heat, pressure or a combination thereof. The patterning device is moved out of the resist leaving a pattern in it after the resist is cured.

The terms “radiation” and “beam” used herein encompass all types of electromagnetic radiation, including ultraviolet (UV) radiation (e.g. having a wavelength of or about 365, 248, 193, 157 or 126 nm) and extreme ultra-violet (EUV) radiation (e.g. having a wavelength in the range of 5-20 nm), as well as particle beams, such as ion beams or electron beams.

The term “lens”, where the context allows, may refer to any one or combination of various types of optical components, including refractive, reflective, magnetic, electromagnetic and electrostatic optical components.

While specific embodiments of the invention have been described above, it will be appreciated that the invention may be practiced otherwise than as described. For example, the invention may take the form of a computer program containing one or more sequences of machine-readable instructions describing a method as disclosed above, or a data storage medium (e.g. semiconductor memory, magnetic or optical disk) having such a computer program stored therein.

The descriptions above are intended to be illustrative, not limiting. Thus, it will be apparent to one skilled in the art that modifications may be made to the invention as described without departing from the scope of the claims set out below.

LITHOGRAPHIC APPARATUS AND DEVICE MANUFACTURING METHOD

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information