This application claims priority benefits to Chinese Patent Application No. 202111259001.4, filed 28 Oct. 2021, the contents of which are incorporated herein by reference.
The present invention relates to the technology field of image retrieval, and in particular to a method and system for rapid retrieval of target images based on artificial intelligence.
Information of the related art part is merely disclosed to increase the understanding of the overall background of the present invention, but is not necessarily regarded as acknowledging or suggesting, in any form, that the information constitutes the prior art known to a person of ordinary skill in the art.
The article retrieval aims to process, analyze and understand the images captured by cameras using computers or robots to identify targets and objects in a variety of different patterns, and it is an important research topic in the field of computer vision.
Nowadays, the robots can be used to collect images of real environments, and for simple images, it is easy to learn a suitable feature representation therefor to distinguish them from samples with different semantics. However, in complex scenarios, the images require more attention to get an appropriate feature representation. The complex scenarios, for example, in a multi-label learning (the image contains multiple labels), the similarity among images is not transferable, i.e., there exists an image A that is similar to an image B (the image A and the image B have one or more identical labels), and the image A that is similar to an image C, but the image B is not similar to the image C (the image B and the image C do not have identical labels). Therefore, the popular article retrieval methods treat all samples equally, which leads to relatively poor generalization performance of the article retrieval methods in complex scenarios.
(1) The article retrieval in complex scenarios contains a large number of confusing entities, which generally have similar feature representations and the popular article retrieval methods cannot distinguish the entities (do not consider the characteristic of easy-confusion).
(2) The article retrieval in complex scenarios requires more accurate image similarity in order to mine the true similarity relationships of images to guide the generation of image features, and the existing article retrieval methods do not consider mining the similarity relationships of images.
(3) The article retrieval in complex scenarios needs to provide more attention to complex samples and divide attention reasonably, but the existing article retrieval methods treat all samples equally.
To overcome the shortcomings in the prior art, the present invention provides a method and system for rapid retrieval of target images based on artificial intelligence.
A first aspect, the present invention provides a method for rapid retrieval of target images based on artificial intelligence.
The method for rapid retrieval of target images based on artificial intelligence, comprising:
obtaining a template image and a plurality of known labels corresponding to the template image;
extracting an image to be detected from a target image database;
inputting both the image to be detected and the template image into a trained convolutional neural network, and outputting a hash code of the image to be detected and a hash code of the template image; and
obtaining a similarity between the image to be detected and the template image based on a Hamming distance between the hash code of the image to be detected and the hash code of the template image, and the smaller of the Hamming distance indicates the higher of the similarity, then selecting one or more images to be detected with the similarity higher than a set threshold as a retrieval result to output.
A second aspect, the present invention provides a system for rapid retrieval of target images based on artificial intelligence.
The system for rapid retrieval of target images based on artificial intelligence, comprising:
Compared with the prior art, the beneficial effect of the present invention is:
According to the present invention, through the use of artificial intelligence technology to realize the extraction of the image features of the image samples in complex scenarios collected by the robot vision platform based on a convolutional neural network and using the hash method to extract image features, introducing the distinguish of the confusable entities, optimizing the similarity relationship and distinguishing the sample attention, to better cope with the retrieval of items in complex scenarios.
Advantages of additional aspects of the present invention will be set forth in part in the description that follows, or will be learned by practice of the present invention.
The accompanying drawings constituting a part of the present invention are used to provide a further understanding of the present invention. The exemplary examples of the present invention and descriptions thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention.
It should be pointed out that the following detailed descriptions are all illustrative and are intended to provide further descriptions of the present invention. Unless otherwise specified, all technologies and scientific terms used in the present invention have the same meanings as those usually understood by a person of ordinary skill in the art to which the present invention belongs.
All data obtained in the present example are legally applied to the data on the basis of the laws and regulations and the consent of the user.
The present example provides a method for rapid retrieval of target images based on artificial intelligence.
As shown in
S101: obtaining a template image and a plurality of known labels corresponding to the template image;
S102: extracting an image to be detected from a target image database;
S103: inputting both the image to be detected and the template image into a trained convolutional neural network, and outputting a hash code of the image to be detected and a hash code of the template image; and
S104: obtaining a similarity between the image to be detected and the template image based on a Hamming distance between the hash code of the image to be detected and the hash code of the template image, and the smaller of the Hamming distance indicates the higher of the similarity, then selecting one or more images to be detected with the similarity higher than a set threshold (e.g. 90%) as a retrieval result to output.
Exemplarily, the template image is a known image, and the known labels corresponding to the template image, comprising, for example, mountain, water, tree, flower, animal, pedestrian, road, vehicle, etc.
Exemplarily, the S102: extracting an image to be detected from a target image database; wherein an extraction rule is an extraction without replacement.
Further, the convolutional neural network is an improved convolutional neural network CNN-F (CNN-F, Convolutional Neural Networks and the Improved Fisher Vector Network).
Wherein, the improved convolutional neural network CNN-F, a network structure thereof comprises:
An output dimension of the third fully connected layer is set to K dimensions.
Further, the trained convolutional neural network, a training step thereof comprises:
Further, the trained convolutional neural network, is obtained by using different loss functions for training based on different situations.
Further, the obtaining of the trained convolutional neural network by using different loss functions for training based on different situations, specifically comprising:
Further, the unified loss function Lu based on the similarity of the hash codes, a formula thereof is expressed as:
It should be understood that, first sending the image to be detected I obtained by using the robot vision platform into the convolutional neural network (CNN) to obtain the features F∈RC×H×W of the image, wherein C, H and W are respectively the number of channels, a height and a width of the image feature F.
The present invention adopts a deep supervised hash learning method and the loss function uses a Circle Loss triplet loss. The Circle Loss provides a simple and intuitive idea for the article retrieval in complex scenarios in the form of three tuple. The triplet loss comprises a prediction score Spi of an anchor point x and its positive samples xi (existence of the same class), and a prediction score Snj of the anchor point x and its negative samples xj (non-existence of the same class).
In the retrieval task, whether two images are similar or not is based on whether they contain objects of the same class; therefore, when a image pair contains multi-entity complex scene images, there is a difference between the actual similarity of the image pair and the similarity at retrieval. Specifically, for the anchor point, the different positive samples may have different numbers of similar objects with anchor point (the number of intersections of category labels), which means that there is a difference in the actual similarity between the positive sample and the anchor point (obviously the more of the number of intersections of category labels, the higher of the actual similarity should be). Although the image pair has only two types of labels, similar and dissimilar, during retrieval, the similarity of the image pair is also considered as these two types of labels (similar or dissimilar) during training will cause certain problems, therefore the similarity between the image pair should be distinguished.
The present invention adds a new Tanh function layer to the last layer of the existing CNN-F network, and the Tanh function layer is used as an activation function layer to limit the range of values of each dimension of the network output to the interval of [−1,1]. The input image I is processed by a CNN-F network to obtain the image feature F, and then is processed by a Tanh activation layer to obtain an output hI, so that the hash code bI=sign(hI) can be obtained, wherein sign(g) is a symbolic function.
In order to facilitate the optimal solution of the objective function, the present method uses hI directly in the objective function instead of bI. Thus the prediction sample similarity score of the hash code of the sample xi and xj is defined as:
Further, the loss function Lsu that weights the interval of the similarity prediction scores of positive and negative samples is expressed as:
It should be understood that obviously minimizing Eq. (1) would make the prediction scores of all negative samples as low as possible and the prediction scores of all positive samples as high as possible.
However, Eq. (1) does not consider the similarity between the positive sample and the anchor, i.e., it should allow the similarity between the positive sample and the anchor to be used to weight the interval between the similarity prediction scores of the positive and negative samples.
Obviously, the loss function (Eq. (2)) will focus on the positive samples with low similarity prediction score, and such positive samples are generally complex scene images, thus improving the retrieval ability of complex scene images. In addition, the greater of the similarity between the positive sample and the anchor point, the greater of the interval of corresponding set in the equation, so that for positive samples with the same similarity prediction score, the loss function will give greater weight (i.e., attention) based on positive samples with higher similarity, thus further optimizing the distribution of hash codes.
Further, the loss function Lc is expressed as:
If the range of values of the used similarity prediction score is [0, 1], the loss function is as shown in Eq. (3).
At this time, the similarity prediction score of negative samples is centered at −1; at this time, a new loss function (4) can be obtained if introducing the interval weighting.
Further, the loss function Lsc is expressed as:
The three variables in the three tuple are referred to as the anchor point, the positive sample and the negative sample. The positive sample is similar to the anchor point and the negative sample is not similar to the anchor point. The greater of the similarity between the positive sample and the anchor point, the smaller of the radius of the circle loss function, i.e., requiring the closer of the distance between the similarity predicted scores of positive and negative samples and −1 or 1; on the contrary, the smaller the constraint on distance.
Further, the loss function Lh is expressed as:
Further, the unified loss function Lsus weighted based on the interval of the optimized similarity matrix is expressed as:
Further, the circular loss function Lscs weighted based on the interval of the optimized similarity matrix is expressed as:
Further, the circular loss function Lhs based on the similarity matrix combined with the characteristics of the hash retrieval task is expressed as:
The data of the present invention comes from the pictures of objects collected by the robot vision platform in real environments, such as some digital devices, underwater fish, land wild animals, landmark buildings and various other pictures. And the pre-processing includes the previously mentioned weakly supervised background removal, random erasure, normalization, random rotation, etc.
According to the present invention, the image retrieval problem in complex scenarios can be better coped with, the image features are generated by using hashing methods, the easily confused entities are distinguished in the loss functions, and more accurate image similarity relationships can be obtained, while providing more attention to complex samples. At the same time, the structure of the model is intuitive and easy to migrate and deploy for implementation. The evaluation metric uses mAP (mean Average Precision), thus it can be seen that the accuracy of the present invention is more accurate than any other methods, especially on two multi-label datasets, NUS-WIDE and MS-COCO, with significantly superior performance.
The present example provides a system for rapid retrieval of target images based on artificial intelligence.
The system for rapid retrieval of target images based on artificial intelligence, comprising:
It should be noted here that the acquisition module, the extraction module, the conversion module and the output module mentioned above correspond to steps from S101 to S104 in Example 1, and the above-mentioned modules are implemented with the same examples and application scenarios as the corresponding steps, but are not limited to the contents disclosed in Example 1. It is to be noted that the above-mentioned modules can be executed as part of a system in a computer system such as a set of computer executable instructions.
The foregoing descriptions are merely preferred embodiments of the present invention, but not intended to limit the present invention. A person skilled in the art may make various alterations and variations to the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
202111259001.4 | Oct 2021 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
7035464 | Masuda | Apr 2006 | B2 |
7215828 | Luo | May 2007 | B2 |
7668388 | Bryll | Feb 2010 | B2 |
7991232 | Iwamoto | Aug 2011 | B2 |
8026951 | Kondo | Sep 2011 | B2 |
8774515 | Mensink | Jul 2014 | B2 |
10169684 | Nathan | Jan 2019 | B1 |
Number | Date | Country | |
---|---|---|---|
20230134531 A1 | May 2023 | US |