The disclosure relates to a detection technique, and more particularly, to a method and a system for vision-based defect detection.
A speaker is a transducer that converts electrical signals to audio signals widely used in devices including audio electronics, earphones and the like, and its performance affects the use of these devices. Traditionally, rub and buzz of the speaker are detected by experienced listeners at the end of the production line. This type of detection requires a log-swept sine chirps to be applied to the speaker and uses human auditory detection to analyze whether a response signal is normal. However, such a result detected by the human ear may vary with the subjective factors such as the age, mood change, and hearing fatigue of the listener, and is likely to cause occupational injury to the listener.
The disclosure provides a method and a system for vision-based defect detection, which can detect whether a device-under-test (DUT) has an unacceptable defect with respect to a predefined auditory standard through computer vision from a spectrogram.
In an embodiment of the disclosure, the method includes the following steps. A test audio signal is outputted to the DUT, and a response signal of the DUT with respect to the test audio signal is received to generate a received audio signal. Signal processing is performed on the received audio signal to generate the spectrogram and whether the DUT has an unacceptable defect with respect to a predefined auditory standard is determined through computer vision according to the spectrogram.
In an embodiment of the disclosure, the system includes a signal outputting device, a microphone, an analog-to-digital converter and a processing device. The signal outputting device is configured to output a test audio signal to the DUT. The microphone is configured to receive a response signal of the DUT with respect to the test audio signal. The analog-to-digital converter is configured to convert the response signal to a received audio signal. The processing device is configured to perform signal processing on the received audio signal to generate a spectrogram and determine whether the DUT has an unacceptable defect with respect to a predefined auditory standard through computer vision according to the spectrogram.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the disclosure and, together with the description, serve to explain the principles of the disclosure.
Reference will now be made in detail to the present preferred embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
Some embodiments of the disclosure are described in details below by reference with the accompanying drawings, and as for reference numbers cited in the following description, the same reference numbers in difference drawings are referring to the same or like parts. The embodiments are merely a part of the disclosure rather than disclosing all possible embodiments of the disclosure. More specifically, these embodiments are simply examples of the method and the system recited in claims of the disclosure.
Referring to
The signal outputting device 110 is configured to output a test audio signal to the DUT T, which may be, for example, an electronic device having a digital audio outputting interface. The test audio signal is outputted to the DUT T in wireless or wired manner. The microphone 120 is configured to receive a response of the DUT T with respect to the test audio signal, and may be disposed in the vicinity of the DUT T or at a best receiving position with respect to the DUT T. The analog-to-digital converter 130 is connected to the microphone 120, and is configured to convert an analog sound received by the microphone 120 to a digital sound signal.
The processing device 140 is connected to the analog-to-digital converter 130, and is configured to process the digital sound signal received from the analog-to-digital converter 130 so as to detect whether the DUT T has a defect. The processing device 140 includes a memory and a processor. The memory may be, for example, a fixed or movable device in any possible forms, including a random access memory (RAM), a read-only memory (ROM), a flash memory, a hard drive or other similar devices, integrated circuits or a combination of the above-mentioned devices. The processor may be, for example, a central processing unit (CPU), an application processor (AP) or other programmable microprocessors for general purpose or special purpose, a digital signal processor (DSP), or other similar devices, integrated circuits and a combination of the above.
It should be noted that in an embodiment, the signal outputting device 110, the microphone 120, the analogue-to-digital converter 130 and the processing device 140 may respectively belong to four separate devices. In an embodiment, the signal outputting device 110 and the processing device 140 may be integrated into the same device, and the processing device 140 can control outputting of the signal outputting device 110. In an embodiment, the signal outputting device 110, the microphone 120, the analog-to-digital converter 130 and the processing device 140 may also be an all-in-one computer system. The disclosure does not impose any limitation on integration of the signal output device 110, the microphone 120, the analog-to-digital converter 130 and the processing device 140. Any systems including such devices are within the scope of defect detection system 100.
Embodiments are provided below to describe detailed steps in a defect detection method used by the defect detection system 100 for the DUT T. The following embodiment is described by using an electronic device having a speaker as an example of the DUT T, and the defect to be detected by the defect detection system 100 is rub and buzz of the DUT T.
Referring to
The processing device 140 performs signal processing on the received audio signal to generate a spectrogram (step S208), and determines whether the DUT T has the defect through computer vision according to the spectrogram (step S210). The processing device 140 may perform Fast Fourier Transform (FFT) on the received audio signal to generate the spectrogram. Here, the reason why the received audio signal is converted to the spectrogram is that the rub and buzz does not have a significant feature in the received audio signal, and yet the rub and buzz can have a time continuity when having a resonance with the received audio signal. Therefore, if a time domain signal is converted to the spectrogram, the feature of the rub and buzz will exhibit time-continuous and energy clustering in the spectrogram, which can used to achieve the defect detection on the DUT through computer vision.
In an example of
In the following embodiment, a classifier is used to perform image recognition. Accordingly, before the processing device 140 detects whether the DUT T has the defect, a trained classifier will be obtained. Here, the classifier may be trained by the processing device 140 itself, or may be a classifier trained by other processing devices, which are not particularly limited in the disclosure.
Referring to
Then, the training system converts the training data to a spectrogram 404. In order to reduce the computational complexity and to avoid low frequency noise and high frequency noise images, the training system selects a preset frequency range of, for example, 3K to 15K Hz as an inspection region. In the example of
Then, the training system obtains feature values corresponding to the different regions in each of the defective inspection region images and each of the non-defective inspection region images, and obtains texture correlation 406 of each of the defective inspection region images and each of the non-defective inspection region images with respect to a reference model 408 as a spatial feature 410 to train a training classifier 412 and thereby generate a classifier 414 for detecting whether the DUT T has the defect.
Here, the training system performs image segmentation on all of the defective inspection region images and the non-defective inspection region images to generate a plurality of sub-blocks of a same size (e.g., a pixel size of 40×200). In this embodiment, if the size of the sub-block is too large, a proportion of the feature of the rub and buzz will be reduced; and if the size of the sub-block is too small, the feature of the rub and buzz will not be covered and a subsequent recognition result will be affected. Therefore, the training system may obtain the spatial feature of each of the defective inspection region images and the non-defective inspection region images according to
Referring to
Next, the training system performs a feature extraction FE on each of the training sub-blocks segmented from each of the non-defective inspection region images and the defective inspection region images with different scales. In this embodiment, the training system can compute at least one of a standard deviation a and a Kurtosis of pixel values of each of the training sub-blocks k as the feature value of each of the training sub-blocks, but the disclosure is not limited thereto. In addition, in order to improve differentiation between non-defective and defective, the training system can generate a reference model associated with non-defective according to the N1 non-defective inspection region images. For instance, the training system can obtain the reference model by averaging the pixel values of the N1 non-defective inspection region images of the same scale. In this way, each scale can have its own corresponding reference model. In this embodiment, the training system generates a reference model R1 corresponding to the image T1 and a reference model R0 corresponding to the image T0. Here, because the reference model R1 and the image T1 have the same scale, the training sub-block in the image T1 can locate the corresponding sub-block (hereinafter referred to as “a reference sub-block”) in the reference model R1. Similarly, because T0 and the reference model R0 have the same scale, the training sub-block in the image T0 can locate the corresponding reference sub-block in the reference model R0.
Next, the training system computes the texture correlation between each of the sub-blocks of each scale and the reference sub-blocks in the corresponding reference model. Specifically, the training system computes the texture correlation between the training sub-block T11 and the reference sub-block R11 and computes the texture correlation between the training sub-block T01 and the reference sub-block R01. Here, the texture correlation may be a correlation coefficient coeff of a local binary pattern (LBP) between the sub-block and the reference sub-block.
Here, each sub-block has a feature vector f={σ, k, coeff} of its own, and each image has an image feature vector F={f1, f2, . . . , fn} of its own, wherein n is the number of the sub-blocks. Taking
After all the feature vectors corresponding to the N1+N2 training data to the classifier are being inputted, the training system starts to train the classifier M. Here, the classifier may be a support vector machines (SVM) classifier. Accordingly, the training system computes an optimal separating hyperplane of the SVM classifier as a basis for distinguishing whether the DUT T has the defect.
Referring to
Next, the processing device 140 obtains a plurality of sub-blocks associated with the spectrogram, and obtains a spatial feature 610 therefrom to be inputted to a classifier 612. In this embodiment, the processing device 140 also uses, for example, the preset frequency range of 3K to 15K Hz as the inspection region to generate an inspection region image. In an embodiment, the processing device 140 may directly segment the inspection region image to directly generate a plurality of sub-blocks of a same size. In another embodiment, the processing device 140 may perform image pyramid processing on the inspection region image to generate a plurality of inspection region images with different scales. Next, the processing device 140 segments the inspection region images with different scales to generate a plurality of sub-blocks of a same size.
Then, the processing device 140 obtains a feature value of each of the sub-blocks and obtains texture correlation 606 between each of the sub-blocks with respect to a reference model 608. Here, the feature value is, for example, at least one of a standard deviation and a Kurtosis of a plurality of pixel values of the sub-block, but needs to meet an input requirement of a pre-stored classifier. Here, the texture correlation may be a correlation coefficient of a local binary pattern between the sub-block and the reference sub-block corresponding to the reference model. Next, the processing device 140 then inputs the feature value and the texture correlation corresponding to each of the sub-blocks to the classifier 612 to generate an output result. This output result indicates whether the DUT T has the defect.
In this embodiment, in order to achieve a more rigorous detection and avoid the fact that the DUT T is being mistaken as non-defective, when the output result indicates that the DUT T does not have the defect, the processing device 140 may conduct a further confirmation according to reliability of the output result. In detail, taking the SVM classifier as an example, the processing device 140 can obtain a confidence level of the output result, and determine whether the confidence level is greater than a preset confidence threshold 614, wherein the preset confidence threshold may be 0.75. If true, the processing device 140 determines that the DUT T does not have the defect. Otherwise, the processing device 140 determines that the DUT T has the defect.
In this embodiment, the defect detected by the defect detection system 100 is the rub and buzz of the DUT T. Since different types of the rub and buzz generate resonance harmonics when a specific audio signal is played, the processing device 140 can further utilize a frequency and a harmonic frequency range of the rub and buzz in the spectrogram to identify a component in the DUT that causes the rub and buzz. From another perspective, the processing device 140 identifies the component in the DUT that causes the rub and buzz according to a specific region of the spectrogram.
For instance,
In practice, once a DUT is identified as defective due to rub and buzz, the testing operator would further determine whether the rub and buzz is acceptable or unacceptable based on its loudness to avoid overkill. If the rub and buzz is acceptable (e.g. little or not noticeable by human), the DUT would be considered as an “OK” DUT. If the rub and buzz is unacceptable, the DUT would be considered as an “NG” DUT. Visually speaking,
Referring to
Referring to
Accordingly, before the processing device 140 determines whether the DUT T has an unacceptable defect, another trained classifier is constructed. The classifier may be trained by the processing device 940, or may be a classifier trained by other processing devices. The disclosure is not limited in this regard. In the following embodiment, the construction of the classifier is performed by a device similar to the processing device 940 (hereinafter referred to as “a training system”). First, the training system collects a plurality pieces of training data. The training data may be a plurality of training objects labeled as “acceptable defective” with respect to the predefined auditory standard. According to temporal and spatial features presented in a spectrogram of a DUT having rub and buzz, the training system would perform projection transformation and feature quantification on a spectrogram corresponding to each training audio samples.
Referring to
In terms of the projection curves CR1-CR3 where horizontal and vertical axes respectively represent time and energy, the projection values tend to be relatively higher for the sub-spectrogram having rub and buzz features. If the projection values are continuously high over time, it is highly possible that severe rub and buzz occurs. In addition, the rub and buzz features are further classified into unacceptable (severe) and acceptable rub and buzz features with respect to the predefined auditory standard. Assume that the predefined auditory standard is set based on the range within human auditory perception. The human ear is more sensitive to some frequencies than to others. For example, if features of rub and buzz only appear in the sub-spectrogram R1 (frequencies are all approximately larger than 10K), then such rub and buzz may possibly be acceptable. However, if features of rub and buzz appear in all the sub-spectrograms R1-R3, such rub and buzz may possibly be unacceptable. In other words, unacceptable (severe) rub and buzz would possess the following features: (1) larger projection energy, (2) long continuous time, and (3) broader frequency range coverage. Next, the training system would proceed to feature quantification.
In detail, to make local features prominent, each projection curve is further divided into a plurality of segments with respect to different time intervals (i.e. vertical division). For example, the curve CR1 in
Herein, Hμ denotes the average of values being greater than an average of all values μ=mean(xij) in the segment xij, and kσ is greater than k times of the standard deviation of the corresponding segment xij. HHmean and HHmean respectively denote the average of HH and HL, and HHsize and HLsize respectively denote the numbers of HH and HL. Note that HH and HL form a set of H that denotes the values being greater than an average of all values μ=mean(xij) in the segment xij. length(xij) denotes the number of data points in the segment xij. wij denotes weights in different sub-spectrograms, and L denotes a coefficient of each sub-spectrogram. Note that the lower the frequencies in a sub-spectrogram, the lower the coefficient L, and the more the importance of rub and buzz in the corresponding interval.
To be more comprehensible, suppose that xij={0.5,0.9,0.1,0.6,0.2,0.7}, then μ=0.5 and H={0.5,0.9,0.6,0.7}. Suppose that kσ=0, then. Hμ=0.675. Suppose that HH={0.9,0.7} and HL={0.5,0.6}, then HHmean=0.8, HLmean=0.55,
Suppose that the weight of the sub-spectrogram with low-frequencies L=−1, then wij=exp{−1×(1−0.8)}=0.818. The feature quantification result of the segment xij is expressed as vij={0.8,0.55,0.33,0.33,0.818}.
Once the training system computes the feature quantification result V={v1, v2, v3}, vi={xij, j=0, . . . n} for each sub-spectrogram of training objects with acceptable defects, a one-class SVM (OCSVM) classifier for identifying acceptable rub and buzz is constructed and trained based on machine learning/deep learning models as known per se. Then, the classifier would be able to distinguish between unacceptable and acceptable rub and buzz.
Revisiting
For example,
Table 1 summaries the results of experiments conducted using a method for vision-based defect detection without (e.g.
In summary, the method and the system for vision-based defect detection proposed by the disclosure can detect whether DUT has an unacceptable defect with respect to a predefined auditory standard through computer vision according to the spectrogram. In this way, the disclosure can provide more accurate defect detection than subjective determination of the human ear, and thereby reduce related occupational injuries.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present disclosure without departing from the scope or spirit of the disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
108137945 | Oct 2019 | TW | national |
This application is a continuation application of U.S. application Ser. No. 17/088,591, filed on Nov. 4, 2020, which is a continuation-in-part application of and claims the priority benefit of U.S. application Ser. No. 16/706,817, filed on Dec. 8, 2019, now allowed. The prior U.S. application Ser. No. 16/706,817 is based on and claims the priority benefit of Taiwan application serial no. 108137945, filed on Oct. 21, 2019. The entirety of each of the above-mentioned patent applications is hereby incorporated by reference herein and made a part of this specification.
Number | Date | Country | |
---|---|---|---|
Parent | 17088591 | Nov 2020 | US |
Child | 17328775 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16706817 | Dec 2019 | US |
Child | 17088591 | US |