This patent application claims the benefit and priority of Chinese Patent Application No. 202110732492.3, filed on Jun. 29, 2021, the disclosure of which is incorporated by reference herein in its entirety as part of the present application.
The present disclosure relates to the technical field of image processing, and in particular to a multi-task deep Hash learning-based retrieval method for massive logistics product images.
In recent years, with the rapid development of the Internet and electronics technology, information on the Internet has shown an explosive growth. As a result, massive multimedia data such as texts, images, and audios are uploaded almost in every second. This has posed great challenge to many areas requiring Efficient Nearest Neighbor Search, especially retrieval of massive images. When there is a small data size of images in the database, the simplest and direct way to achieve exhaustive search is to calculate an Euclidean distance between a point in the database and a query point, and finally sort them by distance. The time complexity is linear complexity O(dn), where d and n denote a dimension and a sample size of data, respectively. However, when there is a large data size of images, such as millions to hundreds of millions of images, linear search is no longer applicable. In addition, it has become a tendency in the field of computer vision to use high-dimensionality data or structured data to express image information of an object more accurately, and calculate the distance between images of the object using complex similarity calculation formulas. In these cases, exhaustive search has enormous limitations, which makes it impossible to efficiently complete the nearest neighbor search.
Therefore, approximate nearest neighbor search has been adopted recently to quickly search for effective solutions. Hash is an approximate nearest neighbor search algorithm under extensive study, which can convert documents, images, videos and other multimedia information into compact binary codes, and retain the similarity between original data. Hamming distance is used for measuring the distance between binary codes (also known as Hash codes), which can be quickly solved by Exclusive OR of hardware. Therefore, Hash algorithm has great advantages in storage and efficiency, making it one of the most popular approximate nearest neighbor search algorithms. The present disclosure is oriented towards the field of massive logistics product images in the logistics industry. Therefore, how to quickly and effectively search a database for pictures required has become one of the points to be broken through. Owing to its advantages, Hash learning based on nearest neighbor algorithm has become a powerful tool for mass data search in recent years.
According to most Hash methods, firstly, a fixed length (e.g., 16, 32, 48) is predetermined for a Hash code to be retrieved. Then the model is trained to learn the Hash code as a high-level image representation, and is used to retrieve mass multimedia data quickly and effectively. When the length of the Hash code is predefined, a Hash code of another length is then required for representation and retrieval once the demand changes. As a result, the model needs to be retrained to learn the new Hash code, which causes a waste of hardware resources and an increase in time cost. Secondly, it is well known that Hash code is a compact representation of the original sample, and one sample can be represented by Hash codes of different lengths. Intuitively speaking, Hash codes of different lengths representing the same sample reflect specific information of a type different from the original sample. If they are treated as different views of the original sample, there should be some differences and connections among different views. When merely Hash codes of a single length are considered, the potential relationship between them will be ignored, resulting in the loss of interactive information, reduced representational capacity and low retrieval accuracy. Moreover, for most linear non-depth Hash algorithms, feature extraction and Hash function learning are asynchronous. The design of Hash function is a complex task, and seeking an optimization method of the model is even more difficult.
To overcome disadvantages of the above technologies, the present disclosure provides a multi-task deep Hash learning-based retrieval method for massive logistics product images, so as to improve the performance of Hashing retrieval.
The technical solution used in the present disclosure to resolve the technical problem thereof is as follows:
where sij denotes similarity between an i th image and a j th image, sij∈{1,0}, the value of sij being 1 indicates the i th image is similar to the j th image, the value of sij being 0 indicates the i th image is not similar to the j th image, bi denotes a binary Hash code regarding data of the i th image, bj denotes a binary Hash code regarding data of the j th image, and T denotes transposition;
where Bk denotes a Hash code output from a k th branch, k∈0, . . . , N−1, Bk+1 denotes a Hash code output from a k+1 th branch, Wk denotes a mapping matrix for mapping the Hash code output from the k th branch to the Hash code output from the k+1 th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter;
Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
Preferably, N in Step c) is a positive integer.
Preferably, M in Step f) is 5000.
The present disclosure has the following advantages: according to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for massive objects in the logistics industry.
The present disclosure is further described with reference to
A multi-task deep Hash learning-based retrieval method for massive logistics product images, including the following steps: a) Conduct image preprocessing on an input logistics product image xi, and construct a similarity matrix s among logistics product images according to a label of the image xi.
where S denotes similarity between an i th image and a j th image, sij∈{1,0}, the value of S being 1 indicates the i th image is similar to the j th image, the value of sij being 0 indicates the i th image is not similar to the j th image, bi denotes a binary Hash code regarding data of the i th image, bj denotes a binary Hash code regarding data of the j th image, and T denotes transposition. This formula is mainly to establish a relationship between Hash codes and similarity of the original samples. If the original samples are similar, the corresponding Hash codes should be as similar as possible; and if the original samples are not similar, the corresponding Hash codes should not be similar.
where Bk denotes a Hash code output from a k th branch, k∈0, . . . , N−1, Bk+1 denotes a Hash code output from a k+1 th branch, Wk denotes a mapping matrix for mapping the Hash code output from the k th branch to the Hash code output from the k+1 th branch, γk denotes a regularization parameter, ∥⋅∥1 denotes an L1 norm, and ak denotes an optimization parameter. Generally speaking, the length of Hash codes is positively correlated with the representational capacity of Hash codes. The purpose of minimizing mutual information loss MILoss is to draw the representational capacity of a shorter Hash code closer to a longer Hash code, and further enhance the correlation among a plurality of Hash codes, so that the Hash codes learned have good representational capacity, and the Hash code retrieval is improved.
Calculate a Hamming distance DistHamming by formula DistHamming=∥Bquery⊕Bdatabase∥, and return, based on the calculated Hamming distance DistHamming, mean average precision of a query set of all images to be retrieved in a measurement manner of Average Precision to complete similarity retrieval.
In the multi-task deep Hash learning-based retrieval method for massive logistics product images, the theory of multi-view learning is adopted to mine potential relevance of Hash codes of different lengths. Hash codes of a plurality of lengths are essentially various feature representations of original data in Hamming space. Associative learning of the Hash codes of a plurality of lengths involves the use of complementarity and correlation of features, and this process can also be regarded as multi-level feature fusion of unified samples. Related theories of multi-feature fusion and multi-view learning provide a theoretical and technical guarantee for the feasibility of this research method, which further improves the performance of Hashing retrieval.
According to the idea of multi-tasking, Hash codes of a plurality of lengths can be learned simultaneously as high-level image representations. Compared with single-tasking in the prior art, the method overcomes shortcomings such as waste of hardware resources and high time cost caused by model retraining under single-tasking. Compared with the traditional idea of learning a single Hash code as an image representation and using it for retrieval, in the present disclosure, information association among Hash codes of a plurality of lengths is mined, and the mutual information loss is designed to improve the representational capacity of the Hash codes, which addresses the poor representational capacity of a single Hash code, and thus improves the retrieval performance of Hash codes. In the meanwhile, the model is based on end-to-end learning, that is, image feature extraction and Hash code learning are carried out simultaneously. Compared with the traditional linear Hash method, the model has an intuitive structure, and is easy to migrate and deploy. The multi-task deep Hash learning-based image retrieval method can be well expanded to retrieval of massive images, and therefore has a broad prospect in image retrieval for masses of objects in the logistics industry.
Table 1 provides a first simulation experiment result according to the method of the present disclosure, which is measured by MAP. Test results on NUS-WIDE data sets show that the performance of multi-tasking is better than that of single Hash code learning, which verifies the rationality of the idea of multi-tasking.
Table 2 provides a second simulation experiment result according to the method of the present disclosure, which is measured by MAP. NUS-WIDE data sets are further studied for the influence of the number of Hash codes of multiple lengths on a Hash code of any length, and it is verified that learning more Hash codes at the same time can also improve the retrieval performance of a Hash code of any length (take 24 bits as an example).
Preferably, there are five convolution layers in Step b), each of the convolution layers is connected to a pooling layer, and adopts a convolution kernel with a size of 3*3, each of the pooling layers adopts a pooling kernel with a size of 2*2, and both the convolution layer and the pooling layer apply a Relu activation function.
Preferably, the multi-branch network in Step c) is composed of N branches of a same structure, and each branch is composed of three full connect layers connected in series with one another.
Preferably, N in Step c) is a positive integer.
Preferably, M in Step f) is 5000.
Finally, it should be noted that the above descriptions are only preferred embodiments of the present disclosure and are not intended to limit the present disclosure. Although the present disclosure is described in detail with reference to the foregoing embodiments, a person skilled in the art can still make modifications to the technical solutions described in the foregoing embodiments, or make equivalent replacement of some technical features therein. Any modifications, equivalent substitutions, improvements, and the like made within the spirit and principle of the present disclosure should be included within the protection scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202110732492.3 | Jun 2021 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
20090138468 | Kurihara | May 2009 | A1 |
20180260665 | Zhang | Sep 2018 | A1 |
20180276528 | Lin | Sep 2018 | A1 |
20190171665 | Navlakha | Jun 2019 | A1 |
20200242422 | Wang | Jul 2020 | A1 |
20200286112 | Zhou | Sep 2020 | A1 |
20220147743 | Roy | May 2022 | A1 |
20220343638 | Wang | Oct 2022 | A1 |
20240037733 | Bang | Feb 2024 | A1 |
Number | Date | Country |
---|---|---|
108108657 | Jun 2018 | CN |
110659726 | Jan 2020 | CN |
107679250 | Dec 2020 | CN |
113220916 | Aug 2021 | CN |
109063112 | Apr 2022 | CN |
111460200 | Jul 2023 | CN |
112182272 | Jul 2023 | CN |
118277604 | Jul 2024 | CN |
Entry |
---|
Multi-Task Learning for Deep Semantic Hashing, Ma et al, IEEE (Year: 2018). |
Query-Adaptive Image Search With Hash Codes. Jiang et al (Year: 2013). |
Number | Date | Country | |
---|---|---|---|
20220414144 A1 | Dec 2022 | US |