This application claims under 35 U.S.C. § 119(a) the benefit of Korean Patent Application No. 10-2022-0069865 filed on Jun. 9, 2022, the disclosure of which is incorporated herein by reference in its entirety.
The present invention relates to a material classification apparatus and method based on a multi-spectral NIR band. The present invention is supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A4A4079705).
In the case of an object having the same color or an object in a low-illumination environment, it is very difficult to identify the material. As illustrated in
The present invention is to provide a material classification apparatus and method based on a multi-spectral NIR band.
In addition, the present invention is to provide a material classification apparatus and method based on a multi-spectral NIR band capable of classifying materials in consideration of not only spatial information but also multi-spectral information for a near-infrared image.
In addition, the present invention is applicable to the face recognition anti-spoofing field. Based on material discrimination, the present invention is possible to accurately determine whether a face is real or imitation, and the present invention can be expanded as a technology for determining whether a face is real or not of various objects. This is a key technology for vision cameras in the field of mobility, such as autonomous driving and robots.
According to an aspect of the present invention, there is provided a material classification apparatus based on a multi-spectral NIR band.
According to an embodiment of the present invention, a material classification apparatus based on a multi-spectral NIR band may include: an input unit configured to acquire a multi-band NIR image of a target; an attention module configured to generate a spatio-spectral correlation map considering spatial information on the multi-band NIR image and a correlation between each band; and a classification model unit configured to analyze the spatio-spectral correlation map and output a material classification label for the target.
The input unit may acquire the multi-band NIR image of the target by dividing a near-infrared wavelength band into n pieces (where the n is a natural number).
The attention module may be a 3D convolution-based model, and set temporal information of the 3D convolution-based model to a multi-spectral axis to generate the spatio-spectral correlation map that includes spatial information of each band image and a correlation on the multi-spectral axis.
The attention module may further receive a visible light image of the target and use the received visible light image to generate the spatial-spectral correlation map.
According to another aspect of the present invention, there is provided a material classification method based on a multi-spectral NIR band.
According to another embodiment of the present invention, a material classification method based on a multi-spectral NIR band may include: acquiring a multi-band NIR image of a target; generating a spatio-spectral correlation map considering spatial information on the multi-band NIR image and a correlation between each band by applying the multi-band NIR image to a trained 3D convolution-based attention module; and outputting a material classification label for the target by applying the spatio-spectral correlation map to the trained classification model.
According to an embodiment of the present invention, by providing a material classification apparatus and method based on a multi-spectral NIR band, it is possible to classify materials with high accuracy in consideration of not only spatial information but also multi-spectral information for near-infrared images.
In the present specification, singular forms include plural forms unless the context clearly indicates otherwise. In the specification, it is to be noted that the terms “comprising” or “including,” and the like, should not be construed as necessarily including several components or several steps described in the specification and some of the above components or steps may not be included or additional components or steps should be construed as being further included. In addition, the terms “ . . . unit,” “module,” and the like, described in the specification refer to a processing unit of at least one function or operation and may be implemented by hardware or software or a combination of hardware and software.
Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Referring to
The input unit 110 is a means for acquiring a multi-band NIR image of a target.
Referring to
That is, as illustrated in
The attention module 120 is a means for generating a spatio-spectral correlation map considering spatial information of the multi-band NIR image and the correlation between each band.
This will be described in more detail with reference to
According to an embodiment of the present invention, the attention module 120 may be a 3D convolution-based module. Accordingly, the attention module 120 may replace temporal information of a 3D convolution model with spectral information.
In this way, the attention module 120 may generate the spatio-spectral correlation map that simultaneously considers the spatial information of the multi-band NIR image and the correlation between each band through the 3D convolution model.
Since the 3D convolution model itself is a well-known technology, a separate description of the function and operation of the 3D convolution model will be omitted.
However, according to an embodiment of the present invention, the 3D convolution-based attention module 120 does not utilize temporal information, unlike the conventional 3D convolution model, and may replace the temporal information with spectral information. In this way, the conventional 3D convolution model may use a spatio-temporal feature of an image, whereas the attention module 120 according to an embodiment of the present invention may use a spatial-spectral correlation feature of the multi-band NIR image.
As already described above with reference to
Therefore, in an embodiment of the present invention, the near-infrared image acquired by dividing the multi-band NIR image into n wavelength bands may be applied to the 3D convolution-based attention module 120 to derive the spatio-spectral correlation map that considers a correlation between each multi-spectral axis by using spatial features of each near-infrared image and a multi-band axis as the temporal information.
In this way, the spatio-spectral correlation map may be derived by simultaneously considering the spatial information of the near-infrared image and the multi-spectral aspect information (i.e., correlation between bands) through the 3D convolution-based attention module 120, and used for material classification, thereby improving classification accuracy.
It can be seen that, when the channel attention and the spatial attention are applied respectively, features are not well extracted in a front band region of the near-infrared image, and the shape takes on a dark form on the whole. On the other hand, it can be seen that features that may not be extracted from the original image are extracted as it goes to the back bands, and detail restoration is improved.
In this way, it can be seen that the correlation between multiple bands in the near-infrared image exists in a specific band, which is different for each material.
Therefore, when all the spatial-spectral attentions are considered as in an embodiment of the present invention, it can be seen that the feature map for the shape of the object is well extracted from the front band, and the intensity value is increased overall. In addition, it can be seen that the back bands show a significant feature that is opposite in scale compared to the front bands.
Therefore, it can be seen that the performance of the material classification is improved by additionally considering multi-spectral information as well as spatial information when classifying materials of a surface of an object using a near-infrared image.
In this way, as in an embodiment of the present invention, it can be seen that, by extracting the spatio-spectral correlation map for the multi-band NIR image through the 3D convolution-based attention module 120 and using the extracted spatio-spectral correlation map for the material classification, the material classification accuracy is higher than separately using the spatial or spectral information.
The classification model unit 130 is a means for classifying materials using the spatio-spectral correlation map generated by the attention module 120. It is assumed that the classification model unit 130 is pre-trained for the spatio-spectral correlation map and each material label. The classification model unit 130 may classify a material of an object using EfficientNet, which is a classification network.
The classification model unit 130 may be trained using a cross-entropy loss function. The cross-entropy loss may be calculated using Equation 1 below.
Here, p denotes actual data (correct answer label), and q denotes the material classification result (label) generated through the trained classification model. In addition, x denotes a classification label index.
In addition, for quantitative evaluation for classifying the material of the object, the accuracy evaluation performance was calculated as shown in Equation 2.
Here, TP denotes true positive, TN denotes true negative, FN denotes false negative, and FP denotes false positive.
In addition, according to an embodiment of the present invention, instead of using only the multi-band NIR image, the spatio-spectral correlation map may be generated using the RGG image (visible light image) together with the multi-band NIR image and used for the material classification. A network structure for material classification of an object including the attention module 120 and the classification model unit 130 is illustrated in detail in
The memory 140 is a means for storing instructions for performing a material classification method using a multi-band NIR image according to an embodiment of the present invention.
The processor 150 is a means for controlling internal components (e.g., the input unit 110, the attention module 120, the classification model unit 130, the memory 140, etc.) of the material classification apparatus 100 based on a multi-spectral NIR band according to an embodiment of the present invention.
In step 810, the material classification apparatus 100 acquires a multi-band NIR image of a target. Of course, the material classification apparatus 100 may acquire an RGB image of a target together with a multi-band NIR image.
In step 815, the material classification apparatus 100 applies the multi-band NIR image to the trained 3D convolution-based attention module to generate the spatio-spectral correlation map. As already described above, the 3D convolution-based attention module is a 3D convolution model, but uses the temporal information as the spectral information.
Therefore, the 3D convolution-based attention module may simultaneously consider the spatial attention and the spectral attention after receiving the multi-band NIR image to generate the spatio-spectral correlation map.
In addition, the material classification apparatus 100 may apply both the multi-band NIR image and the RGB image to the 3D convolution-based attention module to generate the spatio-spectral correlation map.
In step 820, the material classification apparatus 100 applies the spatio-spectral correlation map to the trained classification model to classify the material. It is assumed that the 3D convolution-based attention module and the classification model are pre-trained based on training data.
The apparatus and the method according to the embodiment of the present invention may be implemented in the form of program commands that may be executed through various computer means and may be recorded in a computer-readable medium. The computer-readable medium may include a program command, a data file, a data structure, or the like, alone or a combination thereof. The program commands recorded in the computer-readable medium may be especially designed and constituted for the present invention or known to those skilled in a field of computer software. Examples of the computer-readable medium may include magnetic media such as a hard disk, a floppy disk, and a magnetic tape, optical media such as a compact disk read only memory (CD-ROM) or a digital versatile disk (DVD), magneto-optical media such as a floptical disk, and a hardware device specially configured to store and execute program commands, such as a ROM, a random access memory (RAM), a flash memory, or the like. Examples of the program commands include a high-level language code capable of being executed by a computer using an interpreter, or the like, as well as a machine language code made by a compiler.
The above-mentioned hardware device may be constituted to be operated as one or more software modules in order to perform an operation according to the present invention, and vice versa.
Hereinabove, the present invention has been described with reference to exemplary embodiments thereof. It will be understood by those skilled in the art to which the present invention pertains that the present invention may be implemented in a modified form without departing from essential characteristics of the present invention. Therefore, the exemplary embodiments disclosed herein should be considered in an illustrative aspect rather than a restrictive aspect. The scope of the present invention is shown in the claims rather than the above-mentioned description, and all differences within the scope equivalent to the claims will be interpreted to fall within the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2022-0069865 | Jun 2022 | KR | national |