This application claims priority to Chinese Patent Application No. 202210424667.9, filed on Apr. 22, 2022 in China National Intellectual Property Administration and entitled “Person Re-identification Method, Apparatus, and Device and Storage Medium”, which is hereby incorporated by reference in its entirety.
The present application relates to a person Re-identification (Re-ID) method, apparatus, and device and a storage medium.
Person Re-ID is an important image identification technology widely used in public security systems, traffic supervision, and other fields. Person Re-ID determines whether persons in different camera views are the same by searching for cameras distributed at various locations. The inventors have realized that in some scenes of person Re-ID, there are many persons, and the massive data formed by person images needs to be labeled individually, which will result in a significant workload and may even be unrealizable. Therefore, reducing the workload required for person Re-ID is currently an urgent problem to be addressed by the skilled in the art.
According to various embodiments disclosed in the present application, a person Re-ID method, apparatus, and device and a storage medium are provided.
A person Re-ID method includes:
A person Re-ID apparatus includes:
A person Re-ID device includes:
A computer-readable storage medium stores thereon computer-readable instructions that, when executed by a processor, implement any step of the person Re-ID method as described above.
The details of one or more embodiments of the present application are outlined in the drawings and the description below, and other features and advantages of the present application will be apparent from the specification, drawings, and claims.
To explain the embodiments of the present application or the technical solutions in the prior art more clearly, a brief introduction will be made to the drawings used in the embodiments or the description of the prior art. It is obvious that the drawings in the description below are only some embodiments of the present application, and those ordinarily skilled in the art can obtain other drawings according to these drawings without creative work.
The technical solutions in the embodiments of the present application will be described clearly and completely below in combination with the drawings in the embodiments of the present application. The described embodiments are not all but only part of the embodiments of the present application. All other embodiments obtained by those ordinarily skilled in the art based on the embodiments in the present application without creative work shall fall within the scope of protection of the present application. Referring to
S11: Acquire a data set, where pieces of data in the data set are unlabeled person images.
Unsupervised learning uses a large number of unlabeled data for pattern identification, therefore, the application of unsupervised learning to person Re-ID might both ensure the identification accuracy of persons and greatly reduce the workload.
The embodiments of the present application acquire the unlabeled data set represented as N; all the pieces of data in N are unlabeled person images; N; represents a certain piece of data in N; i∈[1,T] represents the ith piece of data in N, and there are T pieces of data in N.
S12: Perform block processing on each piece of data in the data set, perform random ordering on each piece of blocked data to obtain out-of-order data corresponding to each piece of data, and generate negative sample data corresponding to each piece of data based on each piece of data and the corresponding out-of-order data.
The arbitrary data in the data set is subjected to block processing, and then various blocked parts are subjected to random ordering to obtain out-of-order data of the arbitrary data. The original data and the corresponding out-of-order data constitute a pair of positive sample data. The arbitrary data and the out-of-order data of the arbitrary data are further mixed to generate corresponding negative sample data. The embodiments of the present application may take each piece of data in the data set and the corresponding out-of-order data as positive sample data to enable unsupervised learning based on each piece of positive sample data and the corresponding negative sample data.
S13: Perform unsupervised learning based on each piece of data in the data set, the out-of-order data of each piece of data, and the negative sample data of each piece of data to obtain a corresponding ID network, and perform person Re-ID based on the ID network.
A structure diagram of an ID network in an embodiment of the present application might be shown in
Taking
In the embodiments of the present application, after acquiring a data set containing unlabeled person images, each piece of data in the data set is subjected to block processing and random ordering to obtain out-of-order data corresponding to each piece of data; corresponding negative sample data is generated based on each piece of data in the data set and the corresponding out-of-order data; unsupervised learning is performed based on the corresponding negative sample data and the positive sample data composed of each piece of data in the data set and the corresponding out-of-order data, to obtain the ID network, realizing person Re-ID based on the ID network. It might be seen that the embodiments of the present application might automatically generate corresponding out-of-order data and negative sample data based on the unlabeled person images, and then perform unsupervised learning based on the unlabeled person images, the out-of-order data, and the negative sample data to obtain the ID network, realizing person Re-ID using the ID network without labeling massive data, thereby ensuring the accuracy of person Re-ID while effectively reducing the workload of person Re-ID and improving the efficiency of person Re-ID.
For the person Re-ID method provided by the embodiments of the present application, the performing block processing on each piece of data in the data set may include: performing block processing on each piece of data in the data set according to a height dimension of a person according to a preset proportion, whereby the head, upper limb, and lower limb of the person in the corresponding data are located at different blocks.
Before the performing block processing on each piece of data in the data set, the method further includes: performing data augmentation processing on each piece of data in the data set.
In the embodiments of the present application, when the training of the unsupervised learning is performed, one batch of data might be extracted from the data set N every iteration; corresponding out-of-order data and negative sample data are generated based on the extracted data; and then current iterative training is implemented based on the extracted data, the corresponding out-of-order data, and the corresponding negative sample data. The specific value of the batch might be set according to actual needs, for example, extracting 4 pieces of data to constitute a batch. The current extracted data is subjected to data augmentation processing after each extraction of one batch of data. Methods for data augmentation processing include but are not limited to adding noise, rotation, blurring, and deduction. After the current extracted data is subjected to data augmentation processing, the augmented data might be subjected to block processing in proportion according to the height dimension of the person. In the embodiments of the present application, the blocking proportion might be 2:3:5 with 3 blocks in total, whereby each blocked part of a single piece of data includes the head, the upper limb, and the lower limb of the person in the data, for example, as shown in
For the person Re-ID method provided by the embodiments of the present application, the generating negative sample data corresponding to each piece of data based on each piece of data and the corresponding out-of-order data may include:
In the embodiments of the present application, the features of the negative sample data are multi-fused, which are derived from the features of the original data, the features of the out-of-order data, and the central sample features, and these features are weighted to obtain the negative sample data. It should be noted that, in the embodiments of the present application, α, β, and η belong to the model weights, whose values are not fixed, but may change as the model training progresses. In the early stage of training, the weight values of the neural network (NN) model are initialized randomly, which results in the positive sample data and the negative sample data being in a disordered state in the eigenspace, that is, simply speaking, the feature distance between positive sample pairs is not necessarily close, and the feature distance between negative sample pairs is not necessarily far. This disorder state makes the model difficult to converge at the beginning of training. The embodiments of the present application add a central sample feature and a weight η corresponding to the feature in the negative sample data, where the feature is obtained by averaging K pieces of negative sample data participating in the calculation, and the weight is the maximum at the first iteration and decreases as the number of iterations increases. This is because at the beginning of training, setting a larger weight for the central sample feature might ensure that the central sample feature plays a dominant role in the negative sample data, further effectively reducing the disorder of negative sample data in the eigenspace at the beginning of training and accelerating the model convergence. With the training iteration, the network model gets more and more accurate features. To avoid the influence of the central sample features on the accuracy of the network model, the proportion of the central sample features in negative sample data should be reduced. In other words, the weights of the central sample features should decrease as the number of iterations increases. In other words, the embodiments of the present application provide a central sample exit mechanism, where the formula for decreasing the weights of the central sample features is η=cos(iter/sum_iter). The mechanism might ensure that the value of negative sample data is related to the number of training iterations and the central sample through weight control, and the calculation of the negative sample feature mainly originates from the central sample features in the early stage of training, while the feature pressed into the negative sample queue mainly originates from the negative sample features of each sample in the late stage of training as the number of training increases, thereby effectively improving the iteration rate in the early stage of model training and suppressing the influence of the central sample features on the accuracy of the model in the late stage of training. Of course, based on the same idea, a similar exit mechanism might also be set for the positive samples, that is, the weights of the positive sample features decrease as the number of iterations increases. The weight decrease might be achieved by e-exponential reduction or cosine reduction, which will not be described in detail herein. Briefly, the negative samples in the embodiments of the present application are composed of multi-structure samples (different from the existing solutions), and all the multi-structure samples might be provided with an exit mechanism, and the corresponding weight thereof might gradually decrease as the number of iterations increases.
After the generating negative sample data corresponding to each piece of data based on each piece of data and the corresponding out-of-order data, the method further includes: adding the newly generated negative sample data to a contrastive sample queue, the contrastive sample queue being a First-In-First-Out (FIFO) stack with a length of K.
One batch of data might be extracted from the data set N every iteration.
The embodiments of the present application successively input each piece of data in the data set and the corresponding out-of-order data into the ID network for training (as shown in
The embodiments of the present application are unsupervised learning; the positive sample data and all the negative sample data in the contrastive sample queue are obtained for contrastive loss. Since the ID network is initially a randomly initialized weight, the features of the positive sample data are not necessarily close, and the features of the negative sample data are not necessarily far, resulting in the positive sample data and the negative sample data being in a disordered state, whereby fc is added to the calculation of the negative sample data, which might be referred to as the central sample feature. In the early stage of training, the central sample features occupy a large weight, and with the training iteration, the network gets more and more accurate features, and fc feature will gradually decrease. The specific calculation formula of fc is as follows:
For the person Re-ID method provided by the embodiments of the present application, the performing unsupervised learning based on each piece of data in the data set, the out-of-order data of each piece of data, and the negative sample data of each piece of data may include:
In the embodiments of the present application, after the construction of the contrastive sample queue is completed, real network training is started; the loss function may be calculated in the following formula:
The embodiments of the present application learn all the unlabeled data through the above loss function until all the data have been iterated; in addition, each time the loss is calculated, the weights in the ID network will be updated through loss back-propagation, whereby the model accuracy of the ID network is increasingly improved.
The person Re-ID method provided by the embodiments of the present application, after the obtaining a corresponding ID network, may further include:
The determining an extracted probability of each piece of data in the data set based on results of the classification may include:
It might be understood that, although unsupervised learning usually uses massive data for training, the training difficulty of each piece of data in the training set is different, and the distribution of the data with different training difficulties in the training set is also different, which easily results in that the model is difficult to effectively train the data with different training difficulties. In a general training set, it usually contains the most common data that is easy to train and a little difficult data that is difficult to train. Since the number of difficult data is small, the training effect of the ID network on this part of the data is poor, and it is difficult to achieve a good effect when recognizing such difficult data. Therefore, the difficult data in the training set should be selected, and then the ID network should be trained with the difficult data to improve the identification effect of the model on the difficult data. Based on this, the embodiments of the present application provide a sample selector for screening difficult data. The sample selector might increase the training opportunity of the difficult data, to make the ID network contact more difficult data, further promoting the convergence of the ID network and improving the network performance. Furthermore, the total number of training data might be reduced through data screening, the training time might be greatly reduced, and better results might be achieved under the same training time, which has great advantages for unsupervised training on massive data.
Of course, before describing the sample selector, the embodiments of the present application should note that the selector should be used in the later stage of training of the ID network model. In other words, the embodiments of the present application may perform multi-stage training on the ID network. In some embodiments, in the first stage, the embodiments of the present application will train the ID network with the full data to ensure that the model covers most of the easily recognizable data in the training set. However, when the identification effect of the network on the training set is relatively accurate, the second stage might be entered, namely, firstly using the sample selector provided in the embodiments of the present application to select a difficult sample and using the difficult sample to train again. The accuracy detection of the ID network in the embodiments of the present application is performed according to the loss values generated by the network in the iterative training process, namely, the embodiments of the present application would acquire the loss values generated by the ID network in the latest preset number of iterations, calculate the average value of these loss values, and finally determine that the accuracy of the ID network meets the requirements when it is determined that the average value is less than a preset threshold value. For example, if the ID network has been trained a total of 100 times in the first stage, and the preset number is 10, then the embodiments of the present application would take all the loss values generated in the 91st to 100th iterations of the network to perform an average calculation to determine the accuracy of the ID network. It should be noted that the embodiments of the present application do not define the specific numerical values of the preset number and the preset threshold, and might be set according to practical application requirements.
In some embodiments, the data screening method of the sample selector may include:
(C) All data categories are subjected to classification according to each cluster center to obtain data of a plurality of categories; the classification method uses nearest neighbor clustering, that is, judging which cluster center the data is close to, and which kind the data belongs to.
After realizing the above clustering, the extracted probability of each piece of data at the next training might be calculated, and the specific implementation steps might include: traversing the data of all the categories to obtain the farthest distance of each category (as shown in
It is worth noting that the probability is proportional to the distance between the data and the cluster center, that is, the larger the distance, the higher the probability. This is because the distance reflects the ability of the ID network to extract features of each piece of data, and when the distance is greater, it indicates that the network has a weaker ability to extract features of corresponding data, that is, it indicates that the data is difficult data for the network and should be trained by the network; on the contrary, it indicates that the data belongs to common data, and the chance of being trained by the network should be reduced. Thus, for data with greater distance, a greater probability should be set to increase the probability that the data will be trained by the network.
After calculating the extracted probability of each piece of data in the data set, each piece of data in the data set is extracted according to the probability of each piece of data before continuing to train the ID network. For example, in the second round of training, ½ of all data will be extracted. Due to the advantages of the algorithm, the data far from the center will be extracted with a high probability, while the sample near the center will be extracted with a small probability.
The embodiments of the present application might make the proportion of data far from the cluster center greater in this round of training, thereby increasing the training difficulties, further improving the network accuracy, reducing the amount of data for training, and reducing the total training time.
In a specific implementation, the person Re-ID method provided by the embodiments of the present application may in some embodiments include the following steps:
In the present application, the unlabeled data is first processed, and the network is trained using the processed data, and at the same time, more effective data might be screened in the training process to improve the network training efficiency. Therefore, applying unsupervised learning to person Re-ID, might not only ensure the accuracy of person identification but also greatly reduce the workload.
The embodiments of the present application further provide a person Re-ID apparatus, as shown in
In one or more embodiments, the training module 13 is configured to take each piece of data in the data set and out-of-order data of each piece of data as positive sample data; and perform the unsupervised learning according to the positive sample data and the negative sample data of each piece of data.
In one or more embodiments, the training module 13 is further configured to save weights obtained by the unsupervised learning; and load the saved weights in response to performing person Re-ID using the ID network.
In one or more embodiments, the processing module 12 is configured to perform weighted fusion on features of each piece of data, features of corresponding out-of-order data of each piece of data, and central sample features to obtain the negative sample data corresponding to each piece of data.
In one or more embodiments, the processing module 12 may include a negative sample acquisition module; the negative sample acquisition module is configured to generate negative sample data corresponding to each piece of data according to the following formula:
In one or more embodiments, the training module 13 may include a loss function calculation module; the loss function calculation module is configured to calculate a loss function loss in an unsupervised learning process according to the following formula:
In one or more embodiments, the training module 13 is further configured to update, in response to the loss function being calculated, weights in the ID network by back-propagation of the loss function.
In one or more embodiments, the apparatus further includes a storage module; the storage module is configured to add the newly generated negative sample data to the contrastive sample queue after generating negative sample data corresponding to each piece of data based on each piece of data and the corresponding out-of-order data, the contrastive sample queue being a FIFO stack with a length of K.
In one or more embodiments, the storage module is configured to extract, in response to extracting one piece of data from one batch of data of the current iterative training, corresponding negative sample data from the contrastive sample queue, and delete the extracted negative sample data from the contrastive sample queue.
In one or more embodiments, the processing module 12 may include a blocking module; the blocking module is configured to perform block processing on each piece of data in the data set according to the height dimension of a person according to a preset proportion, whereby the head, upper limb, and lower limb of the person in the corresponding data are located at different blocks.
In one or more embodiments, the person Re-ID apparatus may further include a data augmentation module; the data augmentation module is configured to perform data augmentation processing on each piece of data in the data set before performing block processing on each piece of data in the data set.
In one or more embodiments, the person Re-ID apparatus may further include a screening module. The screening module is configured to: acquire loss values corresponding to the ID network in the latest preset number of training iterations, and calculate an average value of the loss values; extract, in response to determining that the average value is less than a preset threshold, data features of each piece of data in the data set using the ID network; perform clustering on the data features of each piece of data in the data set using mean shift clustering, and perform classification on the data in the data set based on results of the clustering; and determine an extracted probability of each piece of data in the data set based on results of the classification, and extract data from the data set based on the extracted probability to continue training the ID network.
In one or more embodiments, the screening module is configured to: determine the radius of a sliding window before starting sliding; calculate, in response to sliding to a new region, a mean within the sliding window as a central point, the number of points within the sliding window being a density within the sliding window; slide the sliding window until the density within the sliding window no longer increases; and retain, in response to a plurality of sliding windows overlapping, the sliding window containing the most data features, and perform clustering according to the sliding window to which the data features belong.
In one or more embodiments, the screening module may include a probability calculation module; the probability calculation module is configured to: calculate an extracted probability of each piece of data in the data set according to the following formula:
Referring to
Referring to
It is to be noted that the description of the relevant parts of the person Re-ID apparatus and device, and the storage medium provided by the embodiments of the present application might be referred to the detailed description of the corresponding parts of the person Re-ID method provided by the embodiments of the present application, and will not be repeated here. In addition, the parts of the above technical solutions provided by the embodiments of the present application which are consistent with the implementation principles of corresponding technical solutions in the prior art are not described in detail so as not to redundantly describe.
It will be appreciated by the ordinarily skilled in the art that implementing all or part of the flow of the methods of the embodiments described above may be accomplished by instructing the associated hardware via computer-readable instructions, which may be stored on one or more non-volatile computer-readable storage media; the computer-readable instructions, when executed, may include the flow of the embodiments of the above methods. Any references to memory, storage, databases, or other media used in the embodiments provided by the present application may include a non-volatile and/or volatile memory. The non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. The volatile memory may include random-access memory (RAM) or external cache. By way of illustration and not limitation, RAM is available in many forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), Rambus DRAM (RDRAM), direct Rambus DRAM (DRDRAM).
The previous description of the disclosed embodiments is provided to enable the skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to the skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present application. Therefore, the present application will not be limited to these embodiments shown herein but will conform to the broadest scope consistent with the principles and novel features disclosed herein.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202210424667.9 | Apr 2022 | CN | national |
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2022/111350 | 8/10/2022 | WO |