One or more aspects of embodiments according to the present disclosure relate to manufacturing processes, and more particularly to a system and method for defect detection, e.g., in a manufacturing process.
In manufacturing processes, defect detection by machine learning-based systems may be challenging. For example in circumstances in which defects are rare, the paucity of samples with defects may be an obstacle to the assembling of a labeled training set for performing supervised training. Moreover, to the extent defective samples, or images of defective articles, are available, it may be more advantageous to reserve them for verification than to use them for training.
It is with respect to this general technical environment that aspects of the present disclosure are related.
According to an embodiment of the present disclosure, there is provided a method, including: training a first neural network with a first set of images, wherein: the first neural network includes: a first student neural network, and a first teacher neural network; the training of the first neural network with the first set of images includes: introducing defects into a first subset of the first set of images, and training the first student neural network with the first set of images; the training of the first student neural network includes training the first student neural network with a first cost function, that: for an image of the first set and not of the first subset, rewards similarity between a feature map of the first student neural network and a feature map of the first teacher neural network, and for an image of the first subset, rewards dissimilarity between a feature map of the first student neural network and a feature map of the first teacher neural network.
In some embodiments, the method further includes training the first teacher neural network with a second set of images and a second cost function, wherein: the second set of images includes images each labeled with a classification label; and the second cost function rewards, for each image, similarity between a classification generated by the first teacher neural network and the classification label of the image.
In some embodiments, the first neural network further includes: a second student neural network, and a second teacher neural network.
In some embodiments, the method further includes training the second teacher neural network with a third set of images and a third cost function, wherein: the third set of images includes masked generic images; and the third cost function rewards, for each masked generic image, similarity between an output image generated by the second teacher neural network and an original generic image corresponding to the masked generic image.
In some embodiments, the method further includes training the second teacher neural network with a third set of images and a third cost function, wherein: the third set of images includes reduced-resolution generic images; and the third cost function rewards, for each reduced-resolution generic image, similarity between an output image generated by the second teacher neural network and a full-resolution image corresponding to the reduced-resolution generic image.
In some embodiments, the first neural network further includes: a third student neural network, and a third teacher neural network.
In some embodiments, the method further includes training the third teacher neural network with a fourth set of images and a fourth cost function, wherein: the fourth set of images includes masked generic images; and the fourth cost function rewards, for each masked generic image, similarity between an output image generated by the third teacher neural network and an original generic image corresponding to the masked generic image.
In some embodiments, the method further includes training the second student neural network with the first set of images and the first cost function.
In some embodiments, the method further includes training the third student neural network with the first set of images and the first cost function.
In some embodiments, a first image of the first subset includes a first portion, processed by a reconstruction neural network.
In some embodiments, the method further includes generating the first portion, the generating of the first portion including: masking out a portion of a normal image to form a masked image; and feeding the masked image to the reconstruction neural network, to form the first portion as an output of the reconstruction neural network.
In some embodiments, the first image further includes a second portion, processed by a super-resolution neural network.
In some embodiments, the method further includes generating the second portion, the generating of the second portion including: adding noise to a portion of a normal image to form a noisy image; and feeding the noisy image to the super-resolution neural network, to form the second portion as an output of the super-resolution neural network.
In some embodiments: the first image further includes: a third portion, processed by a super-resolution neural network, and a fourth portion, processed by a reconstruction neural network; the third portion is diagonally opposed to the second portion; and the fourth portion is diagonally opposed to the first portion.
In some embodiments, the method further includes: classifying, by the first neural network, a product image of an article in a manufacturing process as including a defect; and removing the article from the manufacturing process.
In some embodiments, the classifying of the product image as including a defect includes: feeding the product image to the first student neural network and to the first teacher neural network; and determining that a measure of the difference between a latent feature vector of the first student neural network and a corresponding latent feature vector of the first teacher neural network exceeds a threshold.
In some embodiments, the measure of the difference is an L2 norm of the difference.
In some embodiments, the product image is an image of a display panel in a manufacturing flow.
According to an embodiment of the present disclosure, there is provided a system including: a processing circuit configured to train a first neural network with a first set of images, wherein: the first neural network includes: a first student neural network, and a first teacher neural network; the training of the first neural network with the first set of images includes: introducing defects into a first subset of the first set of images, and training the first student neural network with the first set of images; the training of the first student neural network includes training the first student neural network with a first cost function, that: for an image of the first set and not of the first subset, rewards similarity between a feature map of the first student neural network and a feature map of the first teacher neural network, and for an image of the first subset, rewards dissimilarity between a feature map of the first student neural network and a feature map of the first teacher neural network.
According to an embodiment of the present disclosure, there is provided a system including: means for processing configured to train a first neural network with a first set of images, wherein: the first neural network includes: a first student neural network, and a first teacher neural network; the training of the first neural network with the first set of images includes: introducing defects into a first subset of the first set of images, and training the first student neural network with the first set of images; the training of the first student neural network includes training the first student neural network with a first cost function, that: for an image of the first set and not of the first subset, rewards similarity between a feature map of the first student neural network and a feature map of the first teacher neural network, and for an image of the first subset, rewards dissimilarity between a feature map of the first student neural network and a feature map of the first teacher neural network.
These and other features and advantages of the present disclosure will be appreciated and understood with reference to the specification, claims, and appended drawings wherein:
The detailed description set forth below in connection with the appended drawings is intended as a description of exemplary embodiments of a system and method for defect detection provided in accordance with the present disclosure and is not intended to represent the only forms in which the present disclosure may be constructed or utilized. The description sets forth the features of the present disclosure in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions and structures may be accomplished by different embodiments that are also intended to be encompassed within the scope of the disclosure. As denoted elsewhere herein, like element numbers are intended to indicate like elements or features.
In manufacturing processes, defect detection by machine learning-based systems may be challenging, for example in circumstances in which defects are rare, which may be an obstacle to the assembling of a labeled training set for performing supervised training. In some embodiments, training of a machine learning system without the use of samples based on defective products is performed, as discussed in further detail herein.
In some embodiments, a method for defect-detection in an image of a product (or “product image”, e.g., a photograph of a board or a display panel) may include feeding the product image to one or more student teacher neural networks (discussed in further detail below), and classifying an image as (i) including a defect or (ii) not including any defects. When the product image is an image of an article in a manufacturing process, the determination that the product image includes a defect may result in the article's being removed from the manufacturing process (e.g., to be scrapped or reworked). This removal from the manufacturing process may be performed in an autonomous manner (e.g., without the participation of a human operator), e.g., by a processing circuit (discussed in further detail below). Referring to
The training of the student teacher neural network of
Once the teacher neural network 110 has been trained, the student neural network 105 may be trained by feeding a set of training images to the student neural network 105 and to the teacher neural network 110, each of the training images being either a “normal” image (an image of a product believed to be free of defects), or an image including simulated defects (discussed in further detail below). For example, each training image may be an image of a respective board (as explained in the example discussed below); some of the boards in the images may include defects, and some may be defect-free. The cost function used to train the student neural network 105 in this second phase of the training of the student teacher neural network may be a cost function that (i) when the training image is a normal image, rewards similarity between latent variables of the student neural network and corresponding latent variables of the teacher neural network, and (ii) when the training image is an image including simulated defects, rewards dissimilarity between latent variables of the student neural network and the corresponding latent variables of the teacher neural network. The neural network may be trained using a suitable algorithm (e.g., backpropagation) to adjust the weights so as to minimize the total (or average) of the cost function when the entire set of images (normal images and images including simulated defects) is processed. This similarity or dissimilarity may be measured, for each of the training images, for example, using an L2 norm of the difference between (i) the latent feature vector of the student neural network 105 for the training image and (ii) the latent feature vector of the teacher neural network 110 for the training image. Each latent feature vector may be a vector of output values of internal, or “hidden”, layers of the neural network. For example, for any pixel of the input image, an n-element latent feature vector may be formed from the output values, of the neuron corresponding to the pixel, in each of n internal layers of the neural network.
When used for inference, the student teacher neural network may be fed the product image, and each pixel of the image may be assigned, by the student teacher neural network, a likelihood value, the likelihood value being a measure of the likelihood that the pixel corresponds to the location of a defect in the product. The likelihood value may be calculated, for example, as a norm (e.g., as the L2 norm) of the differences, per layer, of (i) the latent variable or variables at the output of the layer of the teacher neural network 110 and (ii) the latent variable or variables at the output of the corresponding layer of the student neural network 105. The circuit or software, for performing such calculations based on the latent variables and output variables, may be considered to be part of the student teacher neural network.
Referring to
The teacher neural network of the classification student teacher neural network 205 may be a neural network as illustrated in
The teacher neural network (and the student neural network) of the reconstruction student teacher neural network 215 may each have the structure 305 illustrated in the middle of
The teacher neural network of the super-resolution student teacher neural network 210 may be a neural network as illustrated in
As mentioned above, during the training of each student neural network from a respective teacher neural network, a combination of normal images and images containing simulated defects may be employed as the training data set. Each normal image may be an image of the product (e.g., of the article being manufactured) that is thought to be defect-free, and each of the images containing simulated defects may be a modified version of a respective normal image, the normal image having been modified by the introduction of one or more defects into the image. As used herein, terms such as “introduction of defects” are used to describe any intentional degradation of the quality of an image, notwithstanding the fact that the defects introduced in such a process may not be discrete or countable.
The generation of images containing simulated defects may proceed as illustrated in
As used herein, “a portion of” something means “at least some of” the thing, and as such may mean less than all of, or all of, the thing. As such, “a portion of” a thing includes the entire thing as a special case, i.e., the entire thing is an example of a portion of the thing. As used herein, the term “or” should be interpreted as “and/or”, such that, for example, “A or B” means any one of “A” or “B” or “A and B”. As used herein, the term “rectangle” includes a square as a special case, i.e., a square is an example of a rectangle, and the term “rectangular” encompasses the adjective “square”. As used herein, determining that a measure of difference between two quantities exceeds (or is less than) a threshold encompasses, as an equivalent operation, determining that a measure of similarity between the two quantities is less than (or exceeds) a threshold.
Each of the neural networks described herein may be implemented in a respective processing circuit or in a respective means for processing (or more than one neural network, or all of the neural networks described herein may be implemented together in a single processing circuit or in a single means for processing, or a single neural network may be implemented across a plurality of processing circuits or means for processing). Each of the terms “processing circuit” and “means for processing” is used herein to mean any combination of hardware, firmware, and software, employed to process data or digital signals. Processing circuit hardware may include, for example, application specific integrated circuits (ASICs), general purpose or special purpose central processing units (CPUs), digital signal processors (DSPs), graphics processing units (GPUs), and programmable logic devices such as field programmable gate arrays (FPGAs). In a processing circuit, as used herein, each function is performed either by hardware configured, i.e., hard-wired, to perform that function, or by more general-purpose hardware, such as a CPU, configured to execute instructions stored in a non-transitory storage medium. A processing circuit may be fabricated on a single printed circuit board (PCB) or distributed over several interconnected PCBs. A processing circuit may contain other processing circuits; for example, a processing circuit may include two processing circuits, an FPGA and a CPU, interconnected on a PCB.
As used herein, the term “array” refers to an ordered set of numbers regardless of how stored (e.g., whether stored in consecutive memory locations, or in a linked list).
As used herein, when a method (e.g., an adjustment) or a first quantity (e.g., a first variable) is referred to as being “based on” a second quantity (e.g., a second variable) it means that the second quantity is an input to the method or influences the first quantity, e.g., the second quantity may be an input (e.g., the only input, or one of several inputs) to a function that calculates the first quantity, or the first quantity may be equal to the second quantity, or the first quantity may be the same as (e.g., stored at the same location or locations in memory as) the second quantity.
It will be understood that, although the terms “first”, “second”, “third”, etc., may be used herein to describe various elements, components, regions, layers and/or sections, these elements, components, regions, layers and/or sections should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or section from another element, component, region, layer or section. Thus, a first element, component, region, layer or section discussed herein could be termed a second element, component, region, layer or section, without departing from the spirit and scope of the inventive concept.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the inventive concept. As used herein, the terms “substantially,” “about,” and similar terms are used as terms of approximation and not as terms of degree, and are intended to account for the inherent deviations in measured or calculated values that would be recognized by those of ordinary skill in the art.
As used herein, the singular forms “a” and “an” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising”, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. Expressions such as “at least one of,” when preceding a list of elements, modify the entire list of elements and do not modify the individual elements of the list. Further, the use of “may” when describing embodiments of the inventive concept refers to “one or more embodiments of the present disclosure”. Also, the term “exemplary” is intended to refer to an example or illustration. As used herein, the terms “use,” “using,” and “used” may be considered synonymous with the terms “utilize,” “utilizing,” and “utilized,” respectively.
It will be understood that when an element or layer is referred to as being “on”, “connected to”, “coupled to”, or “adjacent to” another element or layer, it may be directly on, connected to, coupled to, or adjacent to the other element or layer, or one or more intervening elements or layers may be present. In contrast, when an element or layer is referred to as being “directly on”, “directly connected to”, “directly coupled to”, or “immediately adjacent to” another element or layer, there are no intervening elements or layers present.
Any numerical range recited herein is intended to include all sub-ranges of the same numerical precision subsumed within the recited range. For example, a range of “1.0 to 10.0” or “between 1.0 and 10.0” is intended to include all subranges between (and including) the recited minimum value of 1.0 and the recited maximum value of 10.0, that is, having a minimum value equal to or greater than 1.0 and a maximum value equal to or less than 10.0, such as, for example, 2.4 to 7.6. Similarly, a range described as “within 35% of 10” is intended to include all subranges between (and including) the recited minimum value of 6.5 (i.e., (1−35/100) times 10) and the recited maximum value of 13.5 (i.e., (1+35/100) times 10), that is, having a minimum value equal to or greater than 6.5 and a maximum value equal to or less than 13.5, such as, for example, 7.4 to 10.6. Any maximum numerical limitation recited herein is intended to include all lower numerical limitations subsumed therein and any minimum numerical limitation recited in this specification is intended to include all higher numerical limitations subsumed therein.
Although exemplary embodiments of a system and method for defect detection have been specifically described and illustrated herein, many modifications and variations will be apparent to those skilled in the art. Accordingly, it is to be understood that a system and method for defect detection constructed according to principles of this disclosure may be embodied other than as specifically described herein. The invention is also defined in the following claims, and equivalents thereof.
The present application claims priority to and the benefit of U.S. Provisional Application No. 63/310,000, filed Feb. 14, 2022, entitled “MULTI-TASK KNOWLEDGE DISTILLATION APPROACH FOR INDUSTRIAL ANOMALY DETECTION”, the entire content of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63310000 | Feb 2022 | US |