This application claims priority to and the benefit of Korean Patent Application No. 10-2024-0001795 filed at the Korean Intellectual Property Office on Jan. 4, 2024, the entire contents of which are incorporated herein by reference.
The present disclosure relates to a method, apparatus, and system with domain adaptation-based classification.
Test-time adaptation is a technology for improving a classification performance of a domain shifted image, for example, when the image (the domain shifted image) of a different domain from a domain of an image used for training of an artificial intelligence classifier is input to an already trained artificial intelligence classifier. In general, an adaptation procedure for adapting the trained artificial intelligence classifier to the domain shifted image may be performed, so that parameters of an encoder of the artificial intelligence classifier are updated and then classification of the domain shifted image is performed.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In one general aspect, a processor-implemented method includes: generating an attention mask based on a feature of an image; updating an image classification model by performing domain adaptation on the image based on the attention mask; and determining a class of the image using the updated image classification model.
The domain of the input image may be different from a source domain with which the image classification model was trained, wherein the determined class may be indicative of whether a defect is detected in the input image with respect to a semiconductor manufacturing process, and the image may be captured by an inspection equipment in an in-fabrication environment.
The generating of the attention mask may include: generating a spatial feature by embedding the image into a latent space; and generating the attention mask based on the spatial feature.
The encoder may be an encoder of the image classification model.
The updating of the image classification model may include: calculating a loss by masking, using the attention mask, a difference between the image and a reconstructed image generated based on a decoding of the spatial feature; and updating the image classification model based on the loss.
The encoder may be an encoder of the image classification model, which may comprise the encoder and a classifier, and wherein the updating of the image classification model based on the loss may comprise one of: updating the image classification model by updating the encoder based on the loss without updating the classifier; and updating the image classification model by performing the updating of the encoder, and updating the classifier based on the loss or dependent on another image of the domain of the input image.
The generating of the attention mask based on the spatial feature may include: generating an attention map set comprising a plurality of attention maps generated based on the spatial feature; merging the plurality of attention maps; and generating the attention mask by performing sampling on a result of the merging.
The encoder may be an encoder of the image classification model, which may comprise the encoder and a classifier, and wherein the plurality of attention maps may be respective maps output by different layers of the classifier.
The merging of the plurality of attention maps may include merging the plurality of attention maps into a merged attention map, as the result of the merging, by performing a rollout operation or a layer average operation on the plurality of attention maps.
The performing of the sampling on the result of the merging may include performing a Bernoulli sampling or a threshold-based sampling on each patch of the result of the merging.
The generating of the attention mask based on the spatial feature may include: generating an attention vector based on the spatial feature; and generating the attention mask by performing sampling on a plurality of elements included in the attention vector.
Each of a plurality of elements may correspond to a respective patches belonging to the spatial feature, and each of the plurality of elements may be determined based on respective similarities between a reference feature of the spatial feature and respective features corresponding to the respective patches.
For each of the plurality of elements, the similarity may be a cosine similarity between the reference feature and the respective features corresponding to the respective patches.
The generating of the spatial feature by embedding the image into the latent space may include: transforming the image; and generating the spatial feature by embedding the transformed image into the latent space.
In another general aspect, an apparatus includes: one or more processors; and a memory storing instructions, which when executed by the one or more processors, configures the one or more processors to: generate an attention mask based on a feature of an image; update an artificial intelligence (AI) model through a domain adaptation, dependent on the image and the attention mask, on an AI model to a domain of the image; and determine a class of the image using the updated AI model.
The AI model may be trained based on images of a source domain, and the domain of the image may be different from the source domain.
For the generating of the attention mask, the instructions may be configured to cause the one or more processors to: generate a spatial feature through an embedding of the image into a latent space using an encoder of the AI model; and generate the attention mask based on the spatial feature.
For the domain adaptation on the AI model, the instructions may be configured to cause the one or more processors to: calculate a loss through a masking, using the attention mask, of a difference between the image and a reconstructed image generated based on a decoding of the spatial feature; and update the AI model based on the loss.
For the generating of the attention mask based on the spatial feature, the instructions may be configured to cause the one or more processors to: generate an attention map set comprising a plurality of attention maps generated based on the spatial feature; merge the plurality of attention maps; and generate the attention mask by performing sampling on a result of the merging.
For the generating of the attention mask based on the spatial feature, the instructions may be configured to cause the one or more processors to: generate an attention vector based on the spatial feature; and generate the attention mask by performing sampling a plurality of elements included in the attention vector.
For the generating of the spatial feature through the embedding of the image into the latent space, the instructions may be configured to cause the one or more processors to: transform the image; and generate the spatial feature through an embedding of the transformed image into the latent space.
In another general aspect, a defect inspection system for a semiconductor manufacturing process includes: one or more processors; and one or more memories storing an artificial intelligence (AI) model, trained based on source images of a source domain to classify a class of an input image provided to the AI model, and instructions that when executed by the one or more processors configures the one or more processors to: generate an attention mask based on the input image acquired from an inspection equipment in an in-fabrication environment; update the AI model through domain adaptation using the attention mask; and determine the class of the input image using the updated AI model.
For the updating of the AI model, the instructions may be configured to cause the one or more processors to: reconstruct the input image through a decoding of a spatial feature, of the input image, that is generated by the AI model; and update the AI model through a calculation of a loss using the input image, the reconstructed image, and the attention mask.
For the calculating of the loss, the instructions may be configured to cause the one or more processors to calculate the loss through a masking, using the attention mask, of a difference between the input image and the reconstructed image.
Other features and aspects will be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described or provided, the same or like drawing reference numerals will be understood to refer to the same or like elements, features, and structures. The drawings may not be to scale, and the relative size, proportions, and depiction of elements in the drawings may be exaggerated for clarity, illustration, and convenience.
The following detailed description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. However, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be apparent after an understanding of the disclosure of this application. For example, the sequences within and/or of operations described herein are merely examples, and are not limited to those set forth herein, but may be changed as will be apparent after an understanding of the disclosure of this application, except for sequences within and/or of operations necessarily occurring in a certain order. As another example, the sequences of and/or within operations may be performed in parallel, except for at least a portion of sequences of and/or within operations necessarily occurring in an order, e.g., a certain order. Also, descriptions of features that are known after an understanding of the disclosure of this application may be omitted for increased clarity and conciseness.
The features described herein may be embodied in different forms, and are not to be construed as being limited to the examples described herein. Rather, the examples described herein have been provided merely to illustrate some of the many possible ways of implementing the methods, apparatuses, and/or systems described herein that will be apparent after an understanding of the disclosure of this application. The use of the term “may” herein with respect to an example or embodiment (e.g., as to what an example or embodiment may include or implement) means that at least one example or embodiment exists where such a feature is included or implemented, while all examples are not limited thereto. The use of the terms “example” or “embodiment” herein have a same meaning (e.g., the phrasing “in one example” has a same meaning as “in one embodiment”, and “one or more examples” has a same meaning as “in one or more embodiments”).
The terminology used herein is for describing various examples only and is not to be used to limit the disclosure. The articles “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. As non-limiting examples, terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, but do not preclude the presence or addition of one or more other features, numbers, operations, members, elements, and/or combinations thereof, or the alternate presence of an alternative stated features, numbers, operations, members, elements, and/or combinations thereof. Additionally, while one embodiment may set forth such terms “comprise” or “comprises,” “include” or “includes,” and “have” or “has” specify the presence of stated features, numbers, operations, members, elements, and/or combinations thereof, other embodiments may exist where one or more of the stated features, numbers, operations, members, elements, and/or combinations thereof are not present.
Throughout the specification, when a component or element is described as being “on”, “connected to,” “coupled to,” or “joined to” another component, element, or layer it may be directly (e.g., in contact with the other component, element, or layer) “on”, “connected to,” “coupled to,” or “joined to” the other component, element, or layer or there may reasonably be one or more other components, elements, layers intervening therebetween. When a component, element, or layer is described as being “directly on”, “directly connected to,” “directly coupled to,” or “directly joined” to another component, element, or layer there can be no other components, elements, or layers intervening therebetween. Likewise, expressions, for example, “between” and “immediately between” and “adjacent to” and “immediately adjacent to” may also be construed as described in the foregoing.
Although terms such as “first,” “second,” and “third”, or A, B, (a), (b), and the like may be used herein to describe various members, components, regions, layers, or sections, these members, components, regions, layers, or sections are not to be limited by these terms. Each of these terminologies is not used to define an essence, order, or sequence of corresponding members, components, regions, layers, or sections, for example, but used merely to distinguish the corresponding members, components, regions, layers, or sections from other members, components, regions, layers, or sections. Thus, a first member, component, region, layer, or section referred to in the examples described herein may also be referred to as a second member, component, region, layer, or section without departing from the teachings of the examples.
As used herein, the term “and/or” includes any one and any combination of any two or more of the associated listed items. The phrases “at least one of A, B, and C”, “at least one of A, B, or C”, and the like are intended to have disjunctive meanings, and these phrases “at least one of A, B, and C”, “at least one of A, B, or C”, and the like also include examples where there may be one or more of each of A, B, and/or C (e.g., any combination of one or more of each of A, B, and C), unless the corresponding description and embodiment necessitates such listings (e.g., “at least one of A, B, and C”) to be interpreted to have a conjunctive meaning.
Unless otherwise defined, all terms, including technical and scientific terms, used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains and specifically in the context on an understanding of the disclosure of the present application. Terms, such as those defined in commonly used dictionaries, are to be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and specifically in the context of the disclosure of the present application, and are not to be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As non-limiting examples, Artificial intelligence (AI) models of the present disclosure may be machine learning models that have been trained or learn at least one task. The AI models may be implemented through the execution of instructions by one or more processors, which may be configured to execute such, as well as other, instructions and which may be configured to perform operations of the AI models through the execution of the instructions. The task that the AI model learns may refer to a task to be solved through machine learning and/or a task to be executed through machine learning, though examples are not limited thereto. In examples, the AI models may be downloaded through a network, or may be sold as a product, by themselves or included within a product, such as in one or more memories of the product, where the product can be an electronic device or system, as non-limiting examples. Alternatively, all or portions of the AI model may be networked with or from a variety of electronic devices or systems. Further, while some below descriptions may refer to computing or electronic device examples, it is noted that such examples are also applicable to electronic systems. Below, AI models that are configured to perform at least a classification task will be referred to as a classification model, noting that embodiments are not limited thereto.
If a domain of an image is shifted (or changed), a characteristic of the image may be modified and an image classification model trained with images of different domain(s) may show a low classification performance for the domain shifted image.
For example, when an image is input, such from an inspection equipment as a non-limiting example, an image classification model may be used to classify the image, objects, or features, aspects, or characteristics of the objects in the input image. However, there may be a difference between a domain (e.g., a ‘target domain’) of the input image and one or more source domains of source images, objects, or features, aspects, or characteristics of the objects in the source images, that were used to train of the image classification model. For example,
In one or more embodiments, an attention-based domain adaptation may be performed to update the image classification model, to be applicable or accurate with respect to the target domain of the input image. An effect of the domain adaptation may be reduced when there is a large difference between the target domain of the input image and the source domain(s) of source images, for example, that were used to train the image classification model. Accordingly, a domain adaptation model according to one or more embodiments may update the image classification model by performing attention-based domain adaptation on a region (i.e., one or more regions) that is robust to domain changes using attention information generated based on the input image. For example, the attention information may apply attention to a region that may improve a classification accuracy or performance of a performed classification of the image using an updated classification model, e.g., generated through an update of the classification model based on the attention information, compared to the use of the image classification model to perform classification of the input image.
In one or more embodiments, a computing system 100 may perform attention-based domain adaptation for an input image, may update an image classification model based on the result of the attention-based domain adaptation, and may classify the input image into one class, for example, using the updated image classification model, such as the updated image classification model 210 of
In one or more embodiments, before the attention-based domain adaptation is performed, an image classification model may be trained based on images of a source domain. The trained image classification model 110 may be updated (or adapted) for a new target domain different from the source domain through the attention-based domain adaptation using image(s) in the target domain, and then the computing system 100 may implement the updated image classification model to determine the class of the input image through an inference operation of the updated image classification model (e.g., based on the input image being input or provided to the updated image classification model). Through the attention-based domain adaptation, the updated image classification model can perform the classification task well not only for the images in the source domain but also for the images in the target domain.
Referring to
As respective non-limiting examples, herein, each of the encoder 111, the classifier 112, the image classification model 110, the decoder 130, the mask generator 140, the domain adapter 150, and the image transformer 120 may be respective, or any combination thereof, operations performed by one or more processors represented by the computing system 100. As non-limiting examples, the computing system 100 represents one or more processors, which may be the processor 1310 of the electronic device 1300 of
In one or more embodiments, the image classification model 110 may be pre-trained with images in the source domain. When an image is input to the computing system 100 (or captured by an image sensor or camera of an inspection equipment of the computing system 100, such as an image sensor or camera represented by the inspection equipment of
On the other hand, the computing system 100 may further include a domain determiner (not shown) for determining whether the domain of the input image is the same as a training domain (or a source domain) of the image classification model 110. The domain determiner may determine whether the domain adaptation needs to be performed based on the determining whether the domains are the same.
For example, when the domain determiner determines that the input image belongs to the source domain, the image classification model 110 may classify the class of the input image, i.e., without the domain adaptation being performed or without using an updated image classification model that has resulted from performance of the domain adaption based on a previous image of another domain different from the source domain. On the other hand, when the domain determiner determines that the input image belongs to the other domain different from the source domain, the computing system 100 may use the updated image classification model that resulted from the previous performance of the domain adaption to classify the input image, or perform the domain adaptation on the input image to update the image classification model, and may then perform class classification on the input image using the updated image classification model. Thus, the computing system 100 may selectively use the image classification model 110, use the updated image classification model previously generated, or generate the updated image classification model, in which case the newly generated updated image classification model may be used to classify the input image.
Below, the computing system 100 may perform the attention-based domain adaptation on the input image and may determine the class of the input image using the updated domain-adapted image classification model. Because the computing system 100 performs the attention-based domain adaptation on the input image, parameter(s) of the image classification model 110 may be updated and the updated image classification model may be implemented and thereby infer the class of the input image.
Referring to
When the input image is unknown or of a different domain than the source domain, the encoder 111 may embed a transformed image x′ that is output by the image transformer 120, which was provided the input image x, to generate a spatial feature v′ from the transformed input image x′ (S110). The spatial feature v′ generated by the encoder 111 may be transferred to the classifier 112, the decoder 130, and the mask generator 140. The classifier 112 of the image classification model 110 may receive the spatial feature v′ from the encoder 111 and may generate an attention map set A, based on the received spatial feature v′ (S120). For example, in one or more embodiments, the attention map set may include the respective attention maps ai that were generated by one or more or each hidden layer of the classifier 112, and may include the classification information generated by the output layer by the classifier 112, such as in non-limiting examples where the classifier is a neural network. When there are n layers included in the classifier 112, the attention map set A may be expressed as Equation 1 below.
The image transformer 120 may transform the input image x using a predetermined method. For example, the image transformer 120 may perform image augmentation on the input image x or may perform a masking operation on the input image X. Alternatively, the image transformer 120 may transfer the input image X as it is to the image classification model 110 without transforming (identity function). Referring to
The decoder 130 may decode the spatial feature v (or spatial feature (v) when the image transformer transfers the input image x as it is) received from the encoder 111 to reconstruct the input image x (S130). In other words, the decoder 130 may reconstruct the input image x from the spatial feature v′ to output the reconstructed image {circumflex over (x)}.
The mask generator 140 may generate a mask m providing an attention using the attention map set A received from the classifier 112 (S140). The mask m providing the attention may be generated based on the input image. The mask m may be used to select a patch that is robust to a shift (or a change) of the domain.
Referring to
As respective non-limiting examples, herein, each of the attention map merger 141 and the sampler 142 may be respective, or any combination thereof, operations performed by one or more processors represented by the mask generator 140. In addition, as respective non-limiting examples, each of, any respective combination of, or all of the attention map merger 141 and the sampler 142 may be representative of one or more processors, one or more non-transitory recording media or other memories storing respective instructions to cause one or more of the represented processors when executed by the one or more of the represented processors, and/or AI models (e.g., machine learning models, such as neural networks, as non-limiting examples) themselves or as respectively saved in/on such one or more non-transitory recording media or other memories. Further, a reference herein to one or more processors being configured to perform all or any combination of a merging or and/or use of the attention map merger 141 and a sampling and/or use of the sampler 142 a reference to respective such operations or uses being performed by any one, any combination of the one or more processors represented by the mask generator 140, or any respective one or more processors that are correspondingly represented by the attention map merger 141 and the sampler 142.
Referring to
The sampler 142 may generate the mask m by performing sampling on the merged attention map a* (S142). The sampler 142 may perform sampling based on a score of each patch of the merged attention map a*. For example, the sampler 142 may generate the mask by performing Bernoulli sampling on each patch of the merged attention map or performing thresholding on each patch of the merged attention map according to a predetermined value.
Referring back to
In Equation 2, N represents the entire number of patches and i represents an index of each patch. In Equation 2, the loss function may be calculated by multiplying a L2 Norm value between an i-th patch of the reconstructed image {circumflex over (x)} and an i-th patch of the input image x by a value of an i-th patch (mi) of the mask m. As a non-limiting example, mi may be 0 or 1 in one or more embodiments. According to Equation 2, after the difference between the input image and the reconstructed image is masked by the mask, the masked differences may be all summed up so that the summed value is output as the result of the loss function. In other words, the domain adapter 150 may perform the domain adaptation on a region selected by the mask in the input image.
Thereafter, the computing system 100 may update the image classification model 110 based on the result of the loss function determined based on the input image, the reconstructed image, and the mask (S160).
Because the image classification model 110 is updated based on the result of the loss function determined based on the input image, the reconstructed image, and the mask, the domain adaptation based on an attention may be completed and then the updated image classification model may classify the input image used for the domain adaptation. That is, the attention may mean focusing on the difference between the input image and the reconstructed image, and can be achieved by the AI learning scheme through the mask.
In one or more embodiments, the image classification model may be updated by the computing system 100 updating parameters (e.g., weights or weighted connections) of the encoder 111 of the image classification model 110 through the attention-based domain adaptation, without updating parameters (e.g., weights or weighted connections) of the classifier 112. In one or more embodiments, the image classification model may be updated by the computing system 100 updating the parameters of the classifier 112 by performing few shot adaptation using a different image of the same target domain as that of the input image.
The computing system 100 according to one or more embodiments may update the encoder 111 of the image classification model 110 by performing the attention-based domain adaptation (a first domain adaptation) on the input image and may update the classifier 112 of the image classification model 110 by performing the few shot adaptation (a second domain adaptation) using the different image of the same domain as that of the input image. Thereafter, the computing system 100 may classify the class of the input image using the updated image classification model.
In one or more embodiments, in the attention-based domain adaptation for the input image, the computing system 100 may generate the mask using the attention map set generated by the encoder 111 and the classifier 112 and may update parameters of the encoder 111 based on the loss function calculated by using the generated mask.
In addition, in the few shot adaptation for the input image, the computing system 100 may generate a feature of a labeled image of the same domain as that of the input image, may perform classification based on the generated feature, and may update the classifier 112 based on the loss function calculated by using the classification result and a label of the input image.
As respective non-limiting examples, herein, each of the encoder 211 and the classifier 212 may be respective, or any combination thereof, operations performed by one or more processors represented by the updated image classification model 210. As non-limiting examples, the updated image classification model 210 may be further included in and implemented by the computing system 100 of
Referring to
In one or more embodiments, an encoder 211 may or may not have been updated (e.g., updated from the encoder 111 of
For example, the classifier 212 may be the same as the classifier 112 of
In one or more embodiments, the computing system 100 of
As described above, the computing system 100 may perform the attention-based domain adaptation on the input image using the mask generated from the attention map of the input image, so that the class of the input image is successfully classified even if a domain of the input image is different from the source domain or the domain of the input image is unknown.
Referring to
As respective non-limiting examples, herein, each of the encoder 311, the classifier 312, the image classification model 310, the decoder 330, the mask generator 340, the domain adapter 350, and the image transformer 320 may be respective, or any combination thereof, operations performed by one or more processors represented by the computing system 300, or by the one or more processors represented by the computing system 100 of
The computing system 300 may perform attention-based domain adaptation on an input image to update an image classification model (e.g., the updated image classification model 210 of
Referring to
In one or more embodiments embodiment, the image transformer 320 may transform the input image using a predetermined method. For example, the image transformer 320 may perform image augmentation on the input image, or may perform a masking operation on the input image. Alternatively, the image transformer 320 may transfer the input image as it is to the image classification model 310 without transforming the input image.
In an embodiment, the decoder 330 may reconstruct the input image x by decoding the spatial feature v′ received from the encoder 311 (S220). In other words, the decoder 330 may reconstruct the input image x from the spatial feature v′ to output the reconstructed image {circumflex over (x)}.
In an embodiment, the mask generator 340 may generate a mask (or an attention mask) m that may provide an attention based on the spatial feature v′ received from the encoder 311 (S230). The mask m may be used to select a patch that is robust to a shift (or a change) of the domain.
Referring to
As respective non-limiting examples, herein, each of the attention vector generator 341 and the sampler 342 may be respective, or any combination thereof, operations performed by one or more processors represented by the mask generator 340. In addition, as respective non-limiting examples, each of, any respective combination of, or all of the attention vector generator 341 and the sampler 342 may be representative of one or more processors, one or more non-transitory recording media or other memories storing respective instructions to cause one or more of the represented processors when executed by the one or more of the represented processors, and/or AI models (e.g., machine learning models, such as neural networks, as non-limiting examples) themselves or as respectively saved in/on such one or more non-transitory recording media or other memories. Further, a reference herein to one or more processors being configured to perform all or any combination of attention vector generation or and/or use of the attention vector generator 341 and a sampling and/or use of the sampler 342 is a reference to respective such operations or uses being performed by any one, any combination of the one or more processors represented by the mask generator 340, or any respective one or more processors that are correspondingly represented by the attention vector generator 341 and the sampler 342.
Referring to
Referring to Equation 3, the attention vector a may include a plurality of elements (sk) (wherein k is a random integer between 1 and n), and each of the plurality of elements included in the attention vector may correspond to the patch belonging to the spatial feature. Each element of the attention vector may be determined based on similarity between a reference feature of the spatial feature v′ and a feature corresponding to the patch belonging to the spatial feature v′.
In an embodiment, the encoder 311 or the mask generator 340 may determine the reference feature of the spatial feature v′ using various methods. For example, the reference feature of the spatial feature v′ may be determined as an average value of features of each patch of the spatial feature v′, and a method for determining the reference feature of the spatial feature v′ is not limited thereto.
In an embodiment, the similarity (sk) between a k-th feature (fk) corresponding to a k-th patch of the spatial feature v′ and the reference feature (fcls) may be determined as in Equation 4 below.
In Equation 4, sk may represent cosine similarity between the k-th feature (fk) and the reference feature (fcls). However, a method for calculating the similarity between the feature corresponding to the patch of the spatial feature and the reference feature is not limited thereto.
In an embodiment, the sampler 342 may generate the mask m by performing sampling on the attention vector a (S232). The sampler 342 may perform sampling based on a size of each element of the attention vector a. For example, the sampler 342 may generate the mask by performing Bernoulli sampling on each element of the attention vector or performing thresholding on each element of the attention vector according to a predetermined value. In an embodiment, a binary mask may be generated by assigning 1 to a sampled element of the attention vector and assigning 0 to a non-sampled element.
Referring back to
Thereafter, the computing system 300 may update the image classification model 310 based on the loss function determined based on the input image, the reconstructed image, and the mask (S250). Based on the loss function calculated according to the input image, the reconstructed image, and the mask, the encoder 311 of the image classification model 310 may be updated to complete the domain adaptation based on an attention, and then the updated image classification model 310 may classify the input image used for the domain adaptation.
In an embodiment, when the attention-based domain adaptation is performed on the input image, the classifier 312 of the image classification model 310 may not be updated. Even if the mask is generated based on the spatial feature generated by the encoder 311 and the encoder 311 of the image classification model 310 is updated based on a result of the domain adaptation performed on a portion (e.g., the patch of the image) selected by the mask, the classifier 312 of the image classification model 310 may not be updated.
In this case, in order to improve a classification performance for the input image even if the input image does not belong to a source domain of the image classification model 310, the computing system 300 may update the classifier 312 of the image classification model 310 by performing few shot adaptation using at least one sheet of the same domain image within the same domain as that of the input image. The image classification model 310 that includes the classifier 312 updated through the few shot adaptation and the encoder 311 updated through the attention-based domain adaptation may successfully classify the input image that does not belong to the source domain.
As described above, the computing system 300 may perform the attention-based domain adaptation on the input image using the mask generated from the attention vector of the input image, so that the class of the input image is successfully classified even if the domain of the input image is changed.
In one or more embodiments, the in-fabrication environment of the semiconductor manufacturing process, a domain adaptation model 400 may update the image classification model 500 by performing test-time adaptation on an image x transferred from an inspection equipment. The test-time adaptation may be performed in real-time when an input image is of a manufactured product that has been generated in the real-time, and thus an image classification model may be adapted at the test-time, though the test-timing of the adaptation herein is not limited thereto. The image classification model 500 updated through the test-time adaptation may infer a class of the image x. The domain adaptation model 400 may perform the attention-based domain adaptation that is a test-time adaptation mechanism for the input image.
The domain adaptation model 400 may perform the test-time adaptation when the image is transferred from the inspection equipment of a semiconductor and may allow the image classification model 500 to be adapted to the domain of the transferred image even if the domain of the image is not always the same or the domain of the image is unknown.
When the image classification model 500 is trained based on an image of a source domain, the image input from the inspection equipment may be the image of the source domain or an image of a target domain that is different from the source domain. The image classification model 500 may accurately classify a class of the input image through the test-time adaptation during the semiconductor manufacturing process even if images of any domain are input.
A change of a domain of the input image (e.g., a change from the source domain to the target domain) may occur due to a change of a generation of a product, an addition of a manufacturing step of the product, or the like. Alternatively, the change of the domain may occur due to a change of a design of the semiconductor, addition/change of the inspection equipment, addition/change of an inspection method, or the like. If the change of the domain of the input image occurs during the manufacturing process of the semiconductor, the domain adaptation model 400 according to one or more embodiments may update the image classification model 500 through the attention-based domain adaptation, and the updated image classification model 500 may accurately classify the class of the input image.
Referring to
A mask generator of the domain adaptation model 400 may generate an attention mask based on the input image. The mask generated from an attention for the input image by the mask generator may allow the domain adaptation to be selectively performed on the region robust to the change of the domain.
A classifier of the image classification model 500 trained based on the image of the source domain may generate an attention map set from the input image, and the attention map set generated by the classifier may be used as the attention for the input image. The attention map may represent the region robust to the change of the domain, and may represent a region that may improve a classification performance for the image. The mask generator may generate a mask that may provide an attention by merging a plurality of attention maps within the set generated by the classifier and performing sampling on the merged attention map.
Alternatively, an encoder of the image classification model 500 trained based on the image of the source domain may generate a spatial feature from the input image, and the spatial feature generated by the encoder may be used to generate the attention for the input image. The mask generator may generate an attention vector from the spatial features generated by the encoder, and may generate the attention mask by performing sampling on an element of the attention vector.
In one or more embodiments, a domain adapter of the domain adaptation model 400 may perform the domain adaptation on a region selected by the mask in the input image.
Referring to
As respective non-limiting examples, herein, each of the image classification model 500, the domain adaption model 400, the decoder, the mask generator, the domain adapter, and the image transformer of
Thereafter, the image classification model 500 updated through the attention-based domain adaptation may determine the class y of the input image x (S320). That is, in a step of inferring the class of the input image, configurations (e.g., the image transformer, the decoder, the mask generator, and the domain adaptor) of the domain adaptation model 400 used for the attention-based domain adaptation may not be used.
As described above, in the fab environment of the manufacturing process, the domain adaptation model 400 may update an artificial intelligence (AI) model through the attention-based domain adaptation without additional training for a classification model, and the updated AI model may classify the class of the input image.
Referring to
The input layer 1210 may include a set of input nodes x1 to xi, and the number of input nodes x1 to xi may correspond to the number of independent input variables. For training of the neural network 1200, a training set may be input to the input layer 1210 and a training result (e.g., a class) may be output from the output layer 1230, a loss may be calculated based on the training result, and the neural network 1200 may be trained by adjusting some or all of the weights of any one, combination, or all layers of the neural network 1200 (e.g. of the hidden layer 1220 and the output layer 1230), until the training result has a predetermined accuracy and/or predetermined inaccuracy. Once the neural network is trained, if a test dataset is input to the input layer 1210 of the trained neural network 1200, an inference result (e.g., a class) may be output from the output layer 1230 of the trained artificial neural network 1200. In one or more embodiments, the input layer 1210 may have a structure suitable for processing large-scale inputs.
The hidden layer 1220 may be disposed between the input layer 1210 and the output layer 1230, and may include at least one hidden layer or 12201 to 1220n hidden layers, as noted above. The output layer 1230 may include at least one output node or y1 to yj output nodes. Activation function may be used in the hidden layer(s) 1220 and the output layer 1230.
Any or any combination of the above computing systems and/or operations respectively described above may be implemented by an electronic device and/or a non-transitory computer-readable medium. Referring to
The processor 1310 may be configured to implement any one, combination, or all functions, processes, and/or methods described herein.
The memory 1320 may represent one or more memories which may be disposed inside and/or outside the processor 1310, and the memory 1320 may be connected to the processor 1310 through a bus, for example. The memory 1320 may be a volatile or nonvolatile storage medium of various forms, and for example, the memory 1320 may include a read-only memory (ROM) or a random access memory (RAM).
The memory 1320 may store instructions (e.g., programs, source code, executable instructions, machine code, etc.) that when executed by the processor 1310 configure (e.g., cause) the processor 1310 to perform any one, combination, or all functions, processes, and/or methods described above with respect to
The computing systems, apparatuses, the electronic devices, the processors, the memories, the encoders, classifiers, decoders, mask generators, domain adaptors, image transformers, domain determiners, attention vector generators, samplers, inspection equipment, cameras, and other apparatuses, devices, and components described herein with respect to
The methods illustrated in, and discussed with respect to,
Instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above may be written as computer programs, code segments, instructions or any combination thereof, for individually or collectively instructing or configuring the one or more processors or computers to operate as a machine or special-purpose computer to perform the operations that are performed by the hardware components and the methods as described above. In one example, the instructions or software include machine code that is directly executed by the one or more processors or computers, such as machine code produced by a compiler. In another example, the instructions or software includes higher-level code that is executed by the one or more processors or computer using an interpreter. The instructions or software may be written using any programming language based on the block diagrams and the flow charts illustrated in the drawings and the corresponding descriptions herein, which disclose algorithms for performing the operations that are performed by the hardware components and the methods as described above.
The instructions or software to control computing hardware, for example, one or more processors or computers, to implement the hardware components and perform the methods as described above, and any associated data, data files, and data structures, may be recorded, stored, or fixed in or on one or more non-transitory computer-readable storage media, and thus, not a signal per se. As described above, or in addition to the descriptions above, examples of a non-transitory computer-readable storage medium include one or more of any of read-only memory (ROM), random-access programmable read only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, non-volatile memory, CD-ROMs, CD-Rs, CD+Rs, CD-RWs, CD+RWs, DVD-ROMs, DVD-Rs, DVD+Rs, DVD-RWs, DVD+RWs, DVD-RAMs, BD-ROMs, BD-Rs, BD-R LTHs, BD-REs, blue-ray or optical disk storage, hard disk drive (HDD), solid state drive (SSD), flash memory, a card type memory such as multimedia card micro or a card (for example, secure digital (SD) or extreme digital (XD)), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and/or any other device that is configured to store the instructions or software and any associated data, data files, and data structures in a non-transitory manner and provide the instructions or software and any associated data, data files, and data structures to one or more processors or computers so that the one or more processors or computers can execute the instructions. In one example, the instructions or software and any associated data, data files, and data structures are distributed over network-coupled computer systems so that the instructions and software and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by the one or more processors or computers.
While this disclosure includes specific examples, it will be apparent after an understanding of the disclosure of this application that various changes in form and details may be made in these examples without departing from the spirit and scope of the claims and their equivalents. The examples described herein are to be considered in a descriptive sense only, and not for purposes of limitation. Descriptions of features or aspects in each example are to be considered as being applicable to similar features or aspects in other examples. Suitable results may be achieved if the described techniques are performed in a different order, and/or if components in a described system, architecture, device, or circuit are combined in a different manner, and/or replaced or supplemented by other components or their equivalents.
Therefore, in addition to the above and all drawing disclosures, the scope of the disclosure is also inclusive of the claims and their equivalents, i.e., all variations within the scope of the claims and their equivalents are to be construed as being included in the disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 10-2024-0001795 | Jan 2024 | KR | national |