This application claims the priority benefit of China application serial no. 202011402819.2, filed on Dec. 2, 2020. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The invention relates to a method of generating an image recognition model and an electronic device using the method.
In the field of image recognition (face recognition) and machine learning, transfer learning has become one of the important methods of training an image recognition model. A standard transfer learning process may include pre-training the model and fine-tuning the model. Pre-training the model includes the following. Source data containing a large amount of data is configured to pre-train the model; appropriate feature data is identified to establish a preliminary image recognition model; and specific target data is configured to fine-tune the model. When the appropriate feature data cannot be identified in the process of pre-training the model, even if the model is fine-tuned with the specific target data, the model is still unable to yield a fine result. Obtaining the appropriate feature data is particularly important in a face recognition technology. It is especially the case when face information is incomplete (such as being covered by bangs, glasses, a mask, etc.). Therefore, developing a method of obtaining the appropriate feature data is certainly an issue to work on in the field.
The information disclosed in this Background section is only for enhancement of understanding of the background of the described technology and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art. Further, the information disclosed in the Background section does not mean that one or more problems to be resolved by one or more embodiments of the invention was acknowledged by a person of ordinary skill in the art.
The invention provides a method of generating an image recognition model and an electronic device using the method. The method may generate feature data adapted for pre-training and fine-tuning.
An aspect of the invention provides a method of generating an image recognition model. The method includes the following. A source image is obtained; a first image is cut out of a first region of the source image to generate a cut source image; a preliminary image recognition model is pre-trained according to first feature data and first label data to generate a pre-trained preliminary image recognition model, in which the first feature data is associated with the cut source image, and the first label data is associated with the first image; and the pre-trained preliminary image recognition model is fine-tuned to generate the image recognition model.
Another aspect of the invention provides an electronic device adapted for generating an image recognition model. The electronic device includes a transceiver and a processor. The transmitter obtains a source image. The processor is coupled to the transmitter, and is configured to: cut a first image out of a first region of the source image to generate a cut source image; pre-train a preliminary image recognition model according to first feature data and first label data to generate a pre-trained preliminary image recognition model, in which the first feature data is associated with the cut source image, and the first label data is associated with the first image; and fine-tune the pre-trained preliminary image recognition model to generate the image recognition model.
Based on the above, according to the embodiments of the invention, the source image or the target image is cut, and the cut source image or the cut target image is adapted for pre-training and fine-tuning the image recognition model.
Other objectives, features and advantages of the present invention will be further understood from the further technological features disclosed by the embodiments of the present invention wherein there are shown and described preferred embodiments of this invention, simply by way of illustration of modes best suited to carry out the invention.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
It is to be understood that other embodiment may be utilized and structural changes may be made without departing from the scope of the present invention. Also, it is to be understood that the phraseology and terminology used herein are for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising,” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. Unless limited otherwise, the terms “connected,” “Coupled,” and “mounted,” and variations thereof herein are used broadly and encompass direct and indirect connections, couplings, and mountings.
The processor 110 is, for example, a central processing unit (CPU), or a programmable general-purpose or special-purpose micro control unit (MCU), microprocessor, digital signal processor (DSP), programmable controller, application specific integrated circuit (ASIC), graphics processor (graphics processing unit, GPU), image signal processor (ISP), image processing unit (IPU), arithmetic logic unit (ALU), complex programmable logic device (CPLD), field programmable gate array (FPGA), or a similar component or a combination of the components mentioned above. The processor 110 may be coupled to the storage medium 120 and the transceiver 130, and may access and execute multiple modules and various applications stored in in the storage medium 120, such as generating an image recognition model. The processor 110 may, for example, read each step (or a computing layer) or process of a module or an application of the storage medium 120 and perform calculation, and then output a calculation result to the module or the application (or the computing layer) corresponding to the storage medium 120.
The storage medium 120 is, for example, any type of fixed or movable random access memory (RAM), read-only memory (ROM), flash memory, hard disk drive (HDD), solid state drive (SSD) or a similar component or a combination of the components mentioned above, and is adapted for storing the modules or various applications that may be executed by the processor 110 to implement a method of generating the image recognition model of the invention. The storage medium 120 may include, for example, a preliminary image recognition model 300 (or a pre-trained preliminary image recognition model 310 and an image recognition model 320) and a loss function calculation model 400.
The transceiver 130 delivers and receives a signal in a wireless or a wired manner. The transceiver 130 may further execute operations such as low noise amplification, impedance matching, frequency mixing, a frequency up-conversion or a frequency down-conversion, filtering, amplification, and other similar operations.
Then, the processor 110 may pre-train the preliminary image recognition model 300 according to first feature data A1 and first label data B1 to generate the pre-trained preliminary image recognition model 310. The first feature data Al may be associated with the cut source image 210, and the first label data B1 may be associated with the first image 220. The pre-trained preliminary image recognition model 310 is fine-tuned to generate the image recognition model 320. In an embodiment, the processor 110 may encode the cut source image 210 to configure the cut source image 210 as the first feature data A1, and the processor 110 may encode the first image 220 to configure the first image 220 as the first label data B1.
Specifically, the preliminary image recognition model 300 is, for example, the architecture of a neural network model as shown in
On the other hand, the feature extraction layer 301 may further encode at least one region of the first image 220 according to the feature extraction algorithm to generate at least one embedding matrix of the first image as shown in a matrix (2), in which m is greater than or equal to 1, and n is greater than or equal to 1. Taking
In an embodiment, the feature extraction algorithm may include an autoencoder, scale-invariant feature transform (SIFT), and/or a histogram of oriented gradients (HOG), but the invention is not limited thereto.
That is, the first embedding matrix of the source image, the second embedding matrix of the source image, and the third embedding matrix of the source image may be configured as the first feature data A1 adapted for pre-training the preliminary image recognition model 300, and the first embedding matrix of the first image and the second embedding matrix of the first image may be configured as the first label data B1 adapted for pre-training the preliminary image recognition model 300. Specifically, the layer 302 of the preliminary image recognition model 300 may be connected to the feature extraction layer 301 and may include two sub-layers, namely a normalization layer and a multi-head attention layer. After the first feature data A1 is generated, the normalization layer connected to the feature extraction layer 301 may normalize the first feature data A1 (such as normalizing the first embedding matrix of the source image, the second embedding matrix of the source image, and the third embedding matrix of the source image, respectively). The multi-head attention layer may implement an attention function on the normalized first feature data A1 to generate a correlation matrix A2 of the information of the correlation between each element pair in the first feature data Al. After the correlation matrix A2 is generated, an adder 311 of the preliminary image recognition model 300 may add the correlation matrix A2 and the first feature data Al after positional encoding to generate a matrix A3.
The layer 303 of the preliminary image recognition model 300 may be connected to the layer 302 through the adder 311, and may include two sub-layers, namely a normalization layer and a feed forward layer. The normalization layer connected to the adder 311 may normalize the matrix A3. The normalized matrix A3 passes through the feed forward layer to generate a matrix A4. The feed forward layer may also include a fully connected layer adapted for outputting the matrix A4. An adder 312 of the preliminary image recognition model 300 may add the matrix A3 and the matrix A4 together to generate a matrix A5. The matrix A5 may be input to the softmax layer 304 to normalize the matrix A5 and generate an output image 306.
After generating the output image 306, the processor 110 may input the output image 306 and the first label data B1 to the loss function calculation model 400 of the storage medium 120 to calculate a loss function LF, so as pre-train the preliminary image recognition model 300 by using the loss function. For example, the loss function calculation model 400 may encode an image of the first region 22 corresponding to the output image 306 to generate a embedding matrix of an output label, calculate the loss function LF of the embedding matrix of the output label and the first label data B1, and feed the calculated loss function LF back to the preliminary image recognition model 300 for a pre-training adjustment.
In order to diversify the feature data and facilitate the noise processing ability of the image recognition model, the processor 110 may add noise to the feature data.
After the pre-training of the preliminary image recognition model 300 is completed, the processor 110 may fine-tune the pre-trained preliminary image recognition model 310 to generate the image recognition model 320. Specifically,
Then, the processor 110 may fine-tune the pre-trained preliminary image recognition model 310 according to second feature data C1 and second label data D1 to generate the image recognition model 320. The second feature data C1 may be associated with the cut target image 510, and the second label data D1 may be associated with the second image 520. In an embodiment, the processor 110 may encode at least one region of the cut target image 510 to generate at least one embedding matrix of the target image, configure the at least one embedding matrix of the target image as the second feature data C1, and encode the second image 520 to generate at least one embedding matrix of the second image. The second label data D1 includes the at least one embedding matrix of the second image. It is to be noted that, the preliminary image recognition model 300, the pre-trained preliminary image recognition model 310, and the image recognition model 320 have the same model architecture. The difference between the preliminary image recognition model 300, the pre-trained preliminary image recognition model 310, and the image recognition model 320 lies in different functions, different feature data, different weights, or different parameters in each layer, which are not particularly limited in the disclosure.
The method of fine-tuning the pre-trained preliminary image recognition model 310 to generate the image recognition model 320 is similar to the method of pre-training the preliminary image recognition model 300 to generate the pre-trained preliminary image recognition model 310, and details in this regard will not be repeated herein. It is to be noted that, the pre-trained preliminary image recognition model 310 may replace the preliminary image recognition model 300 by updating, and the image recognition model 320 may replace the pre-trained preliminary image recognition model 310 by updating, but the invention is not limited thereto. In other embodiments, after pre-training and fine-tuning, one pre-trained preliminary image recognition model 310 and one image recognition model 320 may be saved, respectively.
In order to diversify the second feature data and facilitate the noise processing ability of the image recognition model, the processor 110 may add noise to the second feature data C1.
In the method of generating the image recognition model of the invention, during pre-training and fine-tuning, the cut image or the noise-containing image is input intentionally, and the image of the region that is originally cut is calculated as the loss function. Therefore, the image recognition model of the invention may remove the region where the image is interfered or noise in the input image and restore the input image to generate the output image.
In summary, in the embodiments of invention, the source image may be cut to generate the feature data and the label data adapted for pre-training, and the target image may be cut to generate the feature data and the label data adapted for fine-tuning. In addition, in the invention, the source image or the target image may be filled with noise to diversify the feature data. The image recognition model generated according to the embodiments of the invention may correctly restore the input image to generate the complete and noise-free output image when there is noise or deficiency in the input image.
The foregoing description of the preferred of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form or to exemplary embodiments disclosed. Accordingly, the foregoing description should be regarded as illustrative rather than restrictive. Obviously, many modifications and variations will be apparent to practitioners skilled in this art. The embodiments are chosen and described in order to best explain the principles of the invention and its best mode practical application, thereby to enable persons skilled in the art to understand the invention for various embodiments and with various modifications as are suited to the particular use or implementation contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents in which all terms are meant in their broadest reasonable sense unless otherwise indicated. Therefore, the term “the invention”, “the present invention” or the like does not necessarily limit the claim scope to a specific embodiment, and the reference to particularly preferred exemplary embodiments of the invention does not imply a limitation on the invention, and no such limitation is to be inferred. The invention is limited only by the spirit and scope of the appended claims. Moreover, these claims may refer to use “first”, “second”, etc. following with noun or element. Such terms should be understood as a nomenclature and should not be construed as giving the limitation on the number of the elements modified by such nomenclature unless specific number has been given. The abstract of the invention is provided to comply with the rules requiring an abstract, which will allow a searcher to quickly ascertain the subject matter of the technical the invention of any patent issued from this the invention. It is submitted with the understanding that it will not be configured to interpret or limit the scope or meaning of the claims. Any advantages and benefits described may not apply to all embodiments of the invention. It should be appreciated that variations may be made in the embodiments described by persons skilled in the art without departing from the scope of the present invention as defined by the following claims. Moreover, no element and component in the present the invention is intended to be dedicated to the public regardless of whether the element or component is explicitly recited in the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202011402819.2 | Dec 2020 | CN | national |