The present disclosure claims the priority of the Chinese patent application filed on Sep. 11, 2020 before the CNIPA, China National Intellectual Property Administration with the application number of 202010956031.X and the title of “METHOD, SYSTEM AND APPARATUS FOR TRAINING OBJECT RECOGNITION MODEL”, which is incorporated herein in its entirety by reference.
The present disclosure relates to the field of model training and, more particularly, to a training method, system and device for an object recognition model.
With the high-speed development of a deep learning model in the field of computer vision, a face recognition technology has been significantly progressed, the precision of the model has basically reached the human recognition level, and therefore, it has been widely applied to application scenarios such as entrance guard attendance.
In a training process of an existing face recognition model, a training method which is generally adopted includes: inputting a face picture into a deep learning model, and outputting, by the deep learning model, a feature vector for representing feature information of the input picture; then, multiplying this feature vector and a parameter matrix (used for representing respective feature information of a plurality of identities) forming a linear relationship with the total sum of the identities; then, calculating a loss function; and finally, performing back propagation of a gradient, and updating all parameters in the parameter matrix and the deep learning model.
However, the size of the parameter matrix is linearly increased with the increment of the total sum of the identities. If each identity is represented by a 128-dimensional vector, when there are a billion of identities in total, the parameter matrix needs to take up memory space (109*128*4B=0.5 TB) which is about 0.5 TB, and a Graphics Processing Unit (GPU) for model training calculation has been unable to store all data of the parameter matrix.
At present, when a billion-level face recognition model is trained, a model parallel method is generally adopted, that is, a complete parameter matrix is split onto a plurality of GPUs, and a calculated result is stipulated after calculation is completed on each GPU. However, the problem that the GPU is unable to store the parameter matrix due to the overlarge data volume may not be effectively solved even if the model parallel method is adopted; moreover, there is a great calculated quantity on the GPU, which causes a slower model training process.
Therefore, how to provide a solution for solving the above-mentioned technical problems is a problem to be solved by the skilled in the art at present.
The objective of the present disclosure is to provide a training method, system and device for an object recognition model. A parameter matrix used for calculation during model training is a part of parameter matrix extracted from the original parameter matrix, and the data volume of the part of parameter matrix extracted is smaller, so that the calculated quantity is reduced, and the model training process is accelerated; moreover, the original parameter matrix is stored in an internal memory having larger storage space, and thus, the problem that the parameter matrix may not be stored due to the overlarge data volume is effectively solved.
In order to solve the above-mentioned technical problems, the present disclosure provides a training method for an object recognition model, including:
In an embodiment, the process of pre-storing the parameter matrix composed of the plurality of feature vectors for representing object feature information into the internal memory includes:
In an embodiment, the training method for the object recognition model further includes:
In an embodiment, the process of this round of training of the deep learning model includes:
In an embodiment, the training method for the object recognition model further includes:
In an embodiment, the deep learning model is a convolutional neural network model.
In order to solve the above-mentioned technical problems, the present disclosure further provides a training system for an object recognition model, including:
In an embodiment, the matrix storage module is configured to:
In an embodiment, the training system for the object recognition model further includes:
In order to solve the above-mentioned technical problems, the present disclosure further provides a training device for an object recognition model, including:
The present disclosure provides a training method for an object recognition model, including: pre-storing a parameter matrix into an internal memory; inputting sample pictures into a deep learning model during model training to obtain sample feature vectors; extracting the feature vectors corresponding to the sample pictures from the parameter matrix, randomly extracting a certain number of feature vectors from a remaining parameter matrix, and reconstructing all extracted feature vectors to be a new parameter matrix; multiplying the sample feature vectors and the new parameter matrix to obtain a similarity between each of the sample feature vectors and each feature vector in the new parameter matrix; and calculating a loss function according to the similarity, performing back propagation of a gradient on the basis of the loss function, updating parameters of the new parameter matrix and the deep learning model, and updating a total parameter matrix in the internal memory on the basis of the updated new parameter matrix to complete this round of training of the deep learning model. It is thus clear that the parameter matrix used for calculation during model training is a part of parameter matrix extracted from the original parameter matrix, and the data volume of the part of parameter matrix extracted is smaller, so that the calculated quantity is reduced, and the model training process is accelerated; moreover, the original parameter matrix is stored in the internal memory having larger storage space, and thus, the problem that the parameter matrix may not be stored due to the overlarge data volume is effectively solved.
The present disclosure further provides a training system and device for an object recognition model, which have the same beneficial effects as the above-mentioned training method.
In order to more clearly explain the technical solution in the embodiment or the existing technology of the present disclosure, the following will briefly introduce the drawings that need to be used in the embodiment or the existing technology description. Obviously, the drawings in the following description are only the embodiment of the present disclosure. For a person skilled in the art, other drawings may be obtained according to the provided drawings without paying creative labor.
The core of the present disclosure is to provide a training method, system and device for an object recognition model. A parameter matrix used for calculation during model training is a part of parameter matrix extracted from the original parameter matrix, and the data volume of the part of parameter matrix extracted is smaller, so that the calculated quantity is reduced, and the model training process is accelerated; moreover, the original parameter matrix is stored in an internal memory having larger storage space, and thus, the problem that the parameter matrix may not be stored due to the overlarge data volume is effectively solved.
In order to make objectives, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are a part of the embodiments of the present disclosure, not all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protective scope of the present disclosure.
With reference to
The training method for the object recognition model includes:
In some embodiments of the present disclosure, in view of the situation that the storage space of the internal memory is much larger than the storage space of GPU equipment, according to the present disclosure, a complete parameter matrix composed of the plurality of feature vectors for representing the object feature information is stored into the internal memory in advance, so that the problem that the parameter matrix may not be stored due to the overlarge data volume is effectively solved.
It may be understood that one of the feature vectors in the parameter matrix stored in the internal memory correspondingly represents the feature information of one of the pictures, the complete parameter matrix corresponds to a plurality of pictures basically being billion-level pictures, and sample pictures required for subsequently training a deep learning model for object recognition need to be selected from these pictures.
Step S2: sample pictures are input into a deep learning model for object recognition during model training to obtain sample feature vectors for representing feature information of the sample pictures.
In some embodiments of the present disclosure, during the training of the deep learning model, firstly, the sample pictures required for this round of training are acquired from the plurality of pictures corresponding to the parameter matrix stored in the internal memory, then, the sample pictures are input into the deep learning model, and the sample feature vectors for representing feature information of the sample pictures may be output by the deep learning model so as to be used for subsequent calculation.
Step S3: the feature vectors corresponding to the sample pictures are extracted from the parameter matrix, a certain number of feature vectors are randomly extracted from a remaining parameter matrix, and all extracted feature vectors are reconstructed to be a new parameter matrix.
In some embodiments of the present disclosure, in view of the situation that the parameter matrix participating in calculation in the prior art is a complete parameter matrix stored in the internal memory and requires a great calculated quantity, the new parameter matrix relatively small in data volume is reconstructed in the present disclosure, so that the calculated quality is reduced, and the model training process is accelerated.
In some embodiments of the present disclosure, the process that the new parameter matrix is reconstructed includes: on one hand, the feature vectors (referred to as first feature vectors) corresponding to the sample pictures are extracted from the complete parameter matrix stored in the internal memory; and on the other hand, a certain number of feature vectors (referred to as second feature vectors) are randomly extracted from the remaining parameter matrix (a parameter matrix formed by remaining feature vectors other than the feature vectors corresponding to the sample pictures in the complete parameter matrix stored in the internal memory), and then, all the extracted feature vectors are reconstructed to be the new parameter matrix (the first feature vectors+the second feature vectors) so as to be used for subsequent calculation.
Step S4: the sample feature vectors and the new parameter matrix are multiplied to obtain a similarity between each of the sample feature vectors and each feature vector in the new parameter matrix.
In some embodiments of the present disclosure, according to the present disclosure, after the sample feature vectors output by the deep learning model and the reconstructed new parameter matrix are obtained, the sample feature vectors and the new parameter matrix are multiplied to obtain the similarity between each of the sample feature vectors and each feature vector in the new parameter matrix.
Step S5: a loss function is calculated according to the similarity, back propagation of a gradient is performed on the basis of the loss function, parameters of the new parameter matrix and the deep learning model are updated, and a total parameter matrix in the internal memory is updated on the basis of the updated new parameter matrix to complete this round of training of the deep learning model.
In some embodiments of the present disclosure, according to the present disclosure, the loss function may be calculated according to the similarity between each of the sample feature vectors and each feature vector in the new parameter matrix, back propagation of the gradient is performed on the basis of the loss function, the new parameter matrix is updated, the total parameter matrix in the internal memory is updated on the basis of the updated new parameter matrix, then, back propagation of the gradient is further performed, the parameters of the deep learning model are updated, and then, this round of training of the deep learning model is ended.
It should be noted that, as shown in
The present disclosure provides a training method for an object recognition model, including: a parameter matrix is pre-stored into an internal memory; sample pictures are input into a deep learning model during model training to obtain sample feature vectors; the feature vectors corresponding to the sample pictures are extracted from the parameter matrix, a certain number of feature vectors are randomly extracted from a remaining parameter matrix, and a new parameter matrix is reconstructed by using all the extracted feature vectors; the sample feature vectors and the new parameter matrix are multiplied to obtain a similarity between each of the sample feature vectors and each feature vector in the new parameter matrix; and a loss function is calculated according to the similarity, back propagation of a gradient is performed on the basis of the loss function, parameters of the new parameter matrix and the deep learning model are updated, and a total parameter matrix in the internal memory is updated on the basis of the updated new parameter matrix to complete this round of training of the deep learning model. It is thus clear that the parameter matrix used for calculation during model training is a part of parameter matrix extracted from the original parameter matrix, and the data volume of the part of parameter matrix extracted is smaller, so that the calculated quantity is reduced, and the model training process is accelerated; moreover, the original parameter matrix is stored in the internal memory having larger storage space, and thus, the problem that the parameter matrix may not be stored due to the overlarge data volume is effectively solved.
On the basis of the above-mentioned embodiment:
In some embodiments of the present disclosure, the size of the complete parameter matrix always stored in the internal memory is: emb_size×cls_size, wherein emb_size is the size of one of the feature vectors, and cls_size is the total quantity of the feature vectors included in the complete parameter matrix. An initial value of the parameter matrix is randomly generated, one of the feature vectors is used for representing the feature information of one of the sample pictures, and then, the complete parameter matrix corresponds to cls_size pictures.
Based on this, the data volume of the new parameter matrix reconstructed in the present disclosure is m×emb_size×4B, wherein m is the total quantity of the feature vectors included in the new parameter matrix, and m is much smaller than cls_size.
In an embodiment of the present disclosure, the training method for the object recognition model further includes:
In an embodiment, according to the present disclosure, the plurality of sample pictures corresponding to the complete parameter matrix stored in the internal memory may be stored into the dataset in advance, and the sample Identity Documents IDs are configured for the plurality of sample pictures one by one, which is equivalent to that a label is configured for each sample picture, thereby facilitating the subsequent acquisition of the required sample pictures.
Based on this, the process that the sample feature vectors used for subsequent calculation are acquired includes: according to the present disclosure, a batch of sample IDs (referred to as target sample IDs) may be randomly acquired from all the sample IDs, and corresponding sample pictures (referred to as target sample pictures), namely sample pictures required for this round of training of the deep learning model, are acquired from the dataset on the basis of the target sample IDs; and then, the target sample pictures are input into the deep learning model to obtain sample feature vectors for representing feature information of the target sample pictures.
The process that the new parameter matrix used for subsequent calculation is acquired includes: on one hand, a batch of sample IDs (referred to as target sample IDs) are randomly acquired from all the sample IDs; on the other hand, a certain number of sample IDs (referred to as random sample IDs) are randomly acquired from the remaining sample IDs (the remaining sample IDs other than the target sample IDs in all the sample IDs), then, the feature vectors corresponding to the target sample IDs and the random sample IDs are extracted from the complete parameter matrix stored in the internal memory, and all the extracted feature vectors are reconstructed to be a new parameter matrix.
In an embodiment of the present disclosure, the process of this round of training of the deep learning model includes: different sample pictures are pre-allocated for different Graphics Processing Units (GPUs);
In some embodiments of the present disclosure, according to the present disclosure, the plurality of GPUs are adopted to participate in the training of the deep learning model. The training process of the deep learning model includes: different sample pictures are pre-allocated for different GPUs (for example, there are two GPUs participating in model training, sample picture 1 and sample picture 2 are allocated for GPU 1, and sample picture 3 and sample picture 4 are allocated for GPU 2); the sample pictures corresponding to any one of the GPUs (referred to as a target GPU) are input into the deep learning model to obtain target sample feature vectors for representing feature information of the sample pictures; the feature vectors corresponding to all the sample pictures allocated for all the GPUs (for example, all the sample pictures are sample pictures 1, 2, 3 and 4, and the sample pictures 1, 2, 3 and 4 correspond to feature vectors 1, 2, 3 and 4) are extracted from the complete parameter matrix stored in the internal memory, a certain number of feature vectors (such as feature vectors 5, 6, 7 and 8) are randomly extracted from a remaining parameter matrix, all the extracted feature vectors (such as feature vectors 1, 2, 3, 4, 5, 6, 7 and 8) are reconstructed to be the new parameter matrix, and the new parameter matrix is transmitted to the target GPU; the target sample feature vectors and the new parameter matrix are multiplied by utilizing the target GPU to obtain a target similarity between each of the target sample feature vectors and each feature vector in the new parameter matrix; a target loss function is calculated according to the target similarity, and back propagation of a gradient is performed on the basis of the target loss function to obtain a gradient of to-be-updated parameter values of the new parameter matrix and the deep learning model; and an average value of the gradient of the to-be-updated parameter values corresponding to all the GPUs is solved, parameters of the new parameter matrix and the deep learning model are updated according to the average value of the gradient of the to-be-updated parameter values, a total parameter matrix in the internal memory is updated on the basis of the updated new parameter matrix, and thus, this round of training of the deep learning model is ended.
In an embodiment of the present disclosure, the training method for the object recognition model further includes:
In an embodiment, according to the present disclosure, after the previous round of training of the deep learning model is completed, it may be further determined whether the deep learning model satisfies a requirement for model precision of object recognition; if the deep learning model has satisfied the requirement for model precision of object recognition, it is proven that the deep learning model does not need to be further trained and may be directly put into use, and therefore, it is determined that the training of the deep learning model is ended; and if the deep learning model has not satisfied the requirement for model precision of object recognition, it is proven that the deep learning model needs to be further trained and may not be directly put into use, and therefore, new sample pictures are re-input into the deep learning model to perform the next round of training until the deep learning model satisfies the requirement for model precision of object recognition, and then, the training of the deep learning model is ended.
In an embodiment of the present disclosure, the deep learning model is a convolutional neural network model.
In some embodiments of the present disclosure, the deep learning model in the present disclosure may select, but is not limited to the convolutional neural network model (such as a ResNet model and a SqueezeNet model), which is not limited herein.
With reference to
The training system for the object recognition model includes:
In an embodiment of the present disclosure, the matrix storage module 1 is configured to:
In an embodiment of the present disclosure, the training system for the object recognition model further includes:
The introduction of the training system provided in the present disclosure refers to that in the embodiment of the above-mentioned training method, the descriptions thereof will be omitted herein.
The present disclosure further provides a training device for an object recognition model, including: a memory configured to store a computer program; and
The introduction of the training device provided in the present disclosure refers to that in the embodiment of the above-mentioned training method, the descriptions thereof will be omitted herein.
For example,
It should also be noted that the relational terms such as “first” and “second” in the present specification are used solely to distinguish one entity or operation from another entity or operation without necessarily requiring or implying any actual such relationship or order between such entities or operations. Furthermore, the terms like “include”, “include”, or any other variations thereof, are intended to indicate a non-exclusive inclusion, such that a process, method, article, or apparatus that includes a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element defined by a phrase like “includes a . . . ” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that includes the element.
The above description of the embodiments disclosed enables a person skilled in the art may realize and use the present disclosure. Various modifications to these embodiments will be obvious to a person skilled in the art. The general principles defined herein may be realized in other embodiments without breaking away from the spirit or scope of the present disclosure. Therefore, the present disclosure will not be limited to these embodiments shown in this specification, but to conform to the widest range consistent with the principles and novel features disclosed in this specification.
Number | Date | Country | Kind |
---|---|---|---|
202010956031.X | Sep 2020 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/109199 | 7/29/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2022/052656 | 3/17/2022 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
10551846 | Kim | Feb 2020 | B1 |
10872297 | Kim | Dec 2020 | B2 |
11120276 | Zhang | Sep 2021 | B1 |
11227182 | Li | Jan 2022 | B2 |
11270028 | Chen | Mar 2022 | B1 |
11494595 | Wu | Nov 2022 | B2 |
11568645 | Lempitsky | Jan 2023 | B2 |
11816882 | Irie | Nov 2023 | B2 |
12100192 | Bai | Sep 2024 | B2 |
20080298691 | Zhang | Dec 2008 | A1 |
20100177956 | Cooper | Jul 2010 | A1 |
20140010410 | Sakurai | Jan 2014 | A1 |
20180032840 | Yu | Feb 2018 | A1 |
20180157916 | Doumbouya | Jun 2018 | A1 |
20180189650 | Wang | Jul 2018 | A1 |
20190080148 | He | Mar 2019 | A1 |
20190164047 | ter Haar Romenij | May 2019 | A1 |
20190183429 | Sung | Jun 2019 | A1 |
20190362220 | Yap | Nov 2019 | A1 |
20190377979 | Jiang | Dec 2019 | A1 |
20200151309 | Thuillier | May 2020 | A1 |
20200242478 | Kim | Jul 2020 | A1 |
20200250514 | Kim | Aug 2020 | A1 |
20200265218 | Dai | Aug 2020 | A1 |
20200410241 | Sundareson | Dec 2020 | A1 |
20200411164 | Donner | Dec 2020 | A1 |
20210114627 | McCurrie | Apr 2021 | A1 |
20210174133 | Nitsch | Jun 2021 | A1 |
20210232932 | Liu | Jul 2021 | A1 |
20210279505 | Zhong | Sep 2021 | A1 |
20210303903 | Watanabe | Sep 2021 | A1 |
20210350117 | Kedarisetti | Nov 2021 | A1 |
20210350176 | Klaiman | Nov 2021 | A1 |
20210357705 | Sung | Nov 2021 | A1 |
20210358115 | Hever | Nov 2021 | A1 |
20210374525 | Bremer | Dec 2021 | A1 |
20220067386 | Rotman | Mar 2022 | A1 |
20220101626 | Lenga | Mar 2022 | A1 |
20220139072 | Klaiman | May 2022 | A1 |
20220237788 | Shaul | Jul 2022 | A1 |
20220245777 | Lu | Aug 2022 | A1 |
20220269893 | Dammu | Aug 2022 | A1 |
20220284695 | Watanabe | Sep 2022 | A1 |
20220366670 | Kim | Nov 2022 | A1 |
20230298334 | Ren | Sep 2023 | A1 |
Number | Date | Country |
---|---|---|
111368997 | Jul 2020 | CN |
111401521 | Jul 2020 | CN |
111611880 | Sep 2020 | CN |
112115997 | Dec 2020 | CN |
Entry |
---|
PCT/CN2021/109199 international search report. |
Number | Date | Country | |
---|---|---|---|
20230267710 A1 | Aug 2023 | US |