The present application relates generally to correlating images and, more specifically, to correlating images using a wireless communication device.
Mobile visual search and Augmented Reality (AR) applications have been gaining popularity recently with important business values for a variety of players in mobile computing and communication fields. The key technology to enable these applications is a compact local image descriptor that is robust to image recapturing variations and efficient for indexing and query transmission over the air. However, there is need for increased robustness for image capturing variations and increased efficiency for indexing and querying transmission over the air.
This disclosure provides a method and system for executing an image query using a wireless communication device.
In a first embodiment, a wireless communication device includes a processor configured to execute an image query. The image query utilizes cluster selection criteria for a cluster-aggregation based vectorization of a set of local features based on a quantity of top local features having the highest posteriori probability values. The cluster selection criterion is measured as the summation of the posteriori probability values of the top local features. The quantity of top local features is determined by a predetermined integer value greater than one.
In a second embodiment, a method of executing an image query using a wireless communication device includes utilizing a cluster selection criterion for a cluster-aggregation based vectorization of a set of local features. The cluster selection criterion is based on a quantity of top local features having the highest posteriori probability values. The method also includes measuring the summation of the posteriori probability values of the top local features. The quantity of top local features is determined by a predetermined integer value greater than one.
In a third embodiment, a wireless communication device includes a processor configured to execute an image query. The image query utilizes cluster selection criteria for a cluster-aggregation based vectorization of a set of local features based on a quantity of top local features. The quantity of top local features has the highest posteriori probability values. The cluster selection criterion is measured as the summation of the posteriori probability values of the top local features. The quantity of top local features is determined by a quantity of local features that have a posteriori probability value greater than a posterior probability value threshold.
In a fourth embodiment, a method of executing an image query using a wireless communication device includes utilizing a cluster selection criterion for a cluster-aggregation based vectorization of a set of local features. The cluster selection criterion is based on a quantity of top local features. The quantity of top local features has the highest posteriori probability values. The method also includes measuring the summation of the posteriori probability values of the top local feature. The quantity of top local features is determined by a quantity of local features that have a posteriori probability value greater than a posterior probability value threshold.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
Aspects, features, and advantages of the disclosure are readily apparent from the following detailed description, simply by illustrating a number of particular embodiments and implementations, including the best mode contemplated for carrying out the disclosure. The disclosure is also capable of other and different embodiments, and its several details can be modified in various obvious respects, all without departing from the spirit and scope of the disclosure.
Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive. The disclosure is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings. In this disclosure, we use limited number and types of base stations or limited number of mobile stations or limited number of service flows or limited number of connections or limited number of routes or limited use cases as an example for illustration. However, the embodiments disclosed in this disclosure are also applicable to arbitrary number and types of base stations, arbitrary number of mobile stations, arbitrary number of service flows, arbitrary number of connections, and other related use cases. Embodiments described here are not limited to base station (BS) and a User Equipment (UE) (BS-UE) communications, but are also applicable to BS-BS, UE-UE communications.
With respect to
With respect to
Mobile visual search and Augmented Reality (AR) applications can utilize compact descriptors that are robust to image recapturing variations and efficient for indexing and query transmission over the air. This is part of the on-going MPEG standardization effort known as Compact Descriptors for Visual Search (CDVS). The typical query processing with CDVS is illustrated in the exemplary embodiment of
As illustrated in
Several different types of global descriptors can be used in the computer vision literature, such as GIST, Vector of Locally Aggregated Descriptors (VLAD), Compressed Fisher Vector (CFV), Residual Enhanced Visual Vectors (REVV), or the like. In an embodiment, one such global descriptor can be the Scalable Compressed Fisher Vector (SCFV).
The SCFV is a compact discriminative global descriptor that is constructed by aggregating the local feature descriptors of an image producing a fast and efficient search. The SCFV is based on the CFV global descriptor. The SCFV can be constructed in essentially two stages: the Offline Stage where a Gaussian Mixture Model (GMM) is trained using SIFT descriptors of an MIRFLICKER dataset and the Online Stage where a scalable fisher vector aggregation method occurs.
In the Offline Stage, a GMM model is trained using a training set of SIFT features. The GMM training results in a set of GMM parameters λ={wi, ui, σi, i=1 . . . 128}, where wi, ui and σi denote the mixture weight, mean vector and variance of the i-th Gaussian cluster. In a subsequent online stage, the GMM model can be employed to generate the Fisher Vector for each selected local feature from the stage of keypoint selection in query/reference images.
In the Online Stage, a SCFV aggregation method occurs. However, before discussing SCFV, a Compressed Fisher Vector (CFV) aggregation method will be discussed so that a CFV aggregation method can be compared with a SCFV aggregation. For the CFV method, let X={xt, t=1 . . . T} denote the set of local feature descriptors in an image, and let the offline trained GMM model consist of K Gaussian functions. Then the image likelihood can be represented as L(X|λ)=log p(X|λ)=Σt=1T log p(xt|λ), the likelihood of each feature descriptor xt being p(xt|λ)=Σi=1K wipi(xt|λ), where pi refers to the i-th Gaussian function.
Given the local descriptor xt, the Gaussian GMM mode assignment probability γt(i) (such as the probability of xt being generated by the i-th Gaussian function) is given by
In the CFV aggregation method, first the gradient vector of p(xt|λ) is calculated for each local descriptor, with regards to each Gaussian function pi. Then the gradient vectors (partial derivatives) of p(xt|λ) are accumulated for all the selected keypoints' local descriptors in the image, with regards to each Gaussian function pi, in the analytical form as below:
Finally, by concatenating the accumulated gradient vectors giX of all Gaussian functions, the aggregated CFV can be generated. For the convenience of subsequent explanation, giX is referred henceforth as Fisher Vector (FV) sub-vector.
In the CFV aggregation method, the final global descriptor includes concatenated FV sub-vectors from all the K Gaussian functions or clusters. Conversely, the SCFV aggregation method does not include all the K FV sub-vectors in the final aggregation. Instead, the SCFV aggregation method filters out contributions from some Gaussian functions based on the property of rich sparseness inherent to the Fisher Vector aggregation method.
Lower sparseness values indicate that the corresponding FV sub-vectors are less useful. In certain embodiments, in order to construct a discriminative and compact global descriptor, the sparseness values can be thresholded to select a few informative Gaussian functions. Using the selected few thresholded sparseness values, the corresponding FV sub-vectors of the Gaussian functions can be determined and concatenated to form the Scalable Compressed Fisher Vector (SCFV). This is known as the Gaussian cluster selection criterion. The SCFV aggregation based global descriptor may use distinct sets of Gaussian functions to represent different images. However, this is taken into account at the time of pair-wise matching between the SCFV descriptors where only the Gaussian functions that are common to both the SCFVs are used in computing the global match score.
In the previous implementation of the SCFV, the sparseness value for i-th Gaussian function is computed as the maximum probability max0≦t≦T γt(i) of the selected local features in an image. For those Gaussian functions that pass the sparseness criterion, their FV sub-vectors are concatenated to form the SCFV. In a formal way, the sparseness thresholding works as follows:
However, there are some drawbacks in the Gaussian function selection criteria in this previous SCFV aggregation method. It is understood that the cluster selection criterion is an important factor in determining which Gaussian functions contribute to the final SCFV. The number of Gaussian functions that are selected by the selection criterion and specifically which Gaussian functions are selected by the selection criteria has a direct impact on the size of the SCFV global descriptor as well as an impact on its discriminative power. Therefore, it is essential that the selection criterion picks “good-quality” Gaussian functions that increase the discriminative ability of the descriptor rather than selecting noisy Gaussians functions, which reduce the discriminative power as well as add to the size of the SCFV descriptor.
In the previous SCFV implementation, the i-th Gaussian function is selected if the maximum probability of a local descriptor being generated from the i-th Gaussian function exceeds the threshold τ. Formally described as max0≦t≦T γt(i)>τ such as for the set of local feature descriptors of the image. This criterion has the disadvantage that it only depends on one local feature, the one that is nearest to the mean of the i-th Gaussian function in the feature space. If the local feature is close enough to the mean of the Gaussian function, then that function is included in SCFV aggregation. The drawback here is that just one local feature determines the importance of a Gaussian function. There may be some spurious Gaussian functions that have only one local feature close to their means and the other local features may be far away. Such Gaussian functions would erroneously be preferred over other Gaussian functions that have a higher probability of generating the local features but whose means are farther away from the nearest local features.
In the previous SCFV implementation only one local feature determines the importance of a Gaussian function. For example,
With respect to
To overcome the limitations of the previous cluster selection criterion, a cluster selection criteria can be generalized not only to consider the local feature with the maximum posteriori probability γt(i) but the top n local features which have the highest γt(i) values. The previous criterion can be expressed as
γ[1](i)>τ (4)
where γ[1](i) represents the first order statistic and is equal to max0≦t≦T γt(i). An improved criterion can be expressed as
Σj=1nγ[j](i)>τ′ (5)
where γ[1](i)≧γ[2](i)≧γ[3](i)≧ . . . . Here, n can take different integer values. For example, top 5, 10 or 20 local features may be considered. The modified criterion ensures that a Gaussian function gets selected based on multiple local features that are closest to the Gaussian function mean. Therefore a Gaussian function with more local features as its members will be preferred during the selection stage.
In certain embodiments, a Gaussian function can be selected based on a count of the number of local features that have a high probability of being generated from that Gaussian function. For the i-th Gaussian, the number of local features whose posterior probability is greater than a threshold τ″ is given by:
n
i=Σt=1T∥(γt(i)>τ″) (6)
where ∥(•) is an indicator function. The Gaussian functions can be sorted in descending order of ni's and certain top Gaussian functions can be selected for inclusion in the SCFV descriptor.
At step 805, an image can be obtained by a wireless communication device. The image can be obtained by the wireless communication device by downloading the image via a wireless connection or wired connection (such as from another electronic device). The image can also be obtained by capturing an image of an environment via camera 352 (as shown in
At step 810, the wireless communication device can extract salient information from the image. Salient information can include local features or global features of the image. At step 815, the wireless communication device can search through one or more storage mediums (such as a server in communication with a wireless network) and identify one or more images to be queried. At step 820, the wireless communication device can execute an image query utilizing cluster selection criterion based on a number of top local features comprising the posteriori probability values greater than a predetermined threshold for each identified image. In another embodiment, the cluster selection criteria is based on sum of the posteriori probability values of a predetermined number of local features having the highest posteriori probability values.
At step 825, the global descriptor generated using the cluster selection criteria is sent to a remote server along with the local descriptors and other information such as keypoint location coordinates etc. At step 830, the remote server matches one or more identified images with the images from a database using a predetermined criteria involving local and global descriptors and transmits the matched images and/or any related information to the wireless device. In certain embodiments, the wireless communication device can present the matched one or more identified images on a display screen. It should be obvious that the proposed cluster selection criteria may also be used while extracting the global descriptors from the images from the image database associated with the remote server.
Mobile visual search and augmented reality (AR) applications are gaining momentum and the underlying technology research is attracting major players across the industry spectrum. The on-going MPEG standardization effort on Compact Descriptors for Visual Search (CDVS) is the main venue for visual search and AR technology enabler research. The technical benefits of this disclosure provide more compact and more discriminative global descriptors for image matching and image retrieval simulations. The embodiments of this disclosure are configured to improve the performance of the Test Model in Compact Descriptors for Visual Search (CDVS).
In certain embodiments, various functions described above are implemented or supported by one or more computer programs, each of which is formed from computer readable program code and embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive, a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable memory device.
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
This application claims priority under 35 U.S.C. §119(e) to U.S. Provisional Patent Application Ser. No. 61/752,334, filed Jan. 14, 2013, entitled “NOVEL CRITERIA FOR GAUSSIAN MIXTURE MODEL CLUSTER SELECTION IN SCALABLE COMPRESSED FISHER VECTOR (SCFV) GLOBAL DESCRIPTOR”. The content of the above-identified application is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61752334 | Jan 2013 | US |