The present disclosure relates generally to image matching during processing of visual search requests and, more specifically, to reducing computational complexity and communication overhead associated with a visual search request submitted over a wireless communications system.
Mobile visual search and Augmented Reality (AR) applications are gaining popularity recently with important business values for a variety of players in mobile computing and communication fields. However, some approaches to defining search indices, such as use of Fisher vectors, are susceptible to noise, and the distance between two Fisher vector indices is easily dominated by noisy clusters associated with the indices. In addition, heuristic thresholding for search index definition without a proper problem formulation offers at best sub-optimal solutions.
There is, therefore, a need in the art for effective selection of indices used for visual search request processing.
Global descriptors for images within an image repository accessible to a visual search server are compared based on order statistics processing including sorting (which is a non-linear transform) and heat kernel-based transformation. Affinity scores are computed for Hamming distances between Fisher vector components corresponding to different clusters of global descriptors from a pair of images and normalized to [0, 1], with zero affinity scores assigned to non-active cluster pairs. Linear Discriminant Analysis is employed to determine a sorted vector of affinity scores to obtain a new global descriptor. The resulting global descriptors produce significantly more accurate matching.
Before undertaking the DETAILED DESCRIPTION below, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation; the term “or,” is inclusive, meaning and/or; the phrases “associated with” and “associated therewith,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term “controller” means any device, system or part thereof that controls at least one operation, where such a device, system or part may be implemented in hardware that is programmable by firmware or software. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.
For a more complete understanding of the present disclosure and its advantages, reference is now made to the following description taken in conjunction with the accompanying drawings, in which like reference numerals represent like parts:
The following documents and standards descriptions are hereby incorporated into the present disclosure as if fully set forth herein:
Mobile visual search using Content Based Image Recognition (CBIR) and Augmented Reality (AR) applications are gaining popularity, with important business values for a variety of players in the mobile computing and communication fields. One key technology enabling such applications is a compact image descriptor that is robust to image recapturing variations and efficient for indexing and query transmission over the air. As part of on-going Motion Picture Expert Group (MPEG) standardization efforts, definitions for Compact Descriptors for Visual Search (CDVS) are being promulgated (see [REF1] and [REF2]).
Visual search server 102 includes one or more processor(s) 110 coupled to a network connection 111 over which signals corresponding to visual search requests may be received and signals corresponding to visual search results may be selectively transmitted. The visual search server 102 also includes memory 112 containing an instruction sequence for processing visual search requests in the manner described below, and data used in the processing of visual search requests. The memory 112 in the example shown includes a communications interface for connection to image database 101.
User device 105 is a mobile phone and includes an optical sensor (not visible in the view of
In the exemplary embodiment, the image content within mobile device 105 is processed by processor 121 to generate visual search query image descriptor(s). Thus, for example, a user may capture an image of a landmark (such as a building) and cause the mobile device 105 to generate a visual search relating to the image. The visual search is then transmitted over the network 100 to the visual search server 102.
In a CDVS system, visual queries (VQ) typically consist of two parts: a global descriptor (GD) and a local descriptor (LD) and its associated coordinates. Local descriptors consists of a selection of SIFT [REF7] based local key point descriptors, compressed thru a multi-stage visual query scheme, and the global descriptor is derived from quantizing the Fisher Vector computed from up to 300 SIFT points, which basically captures the distribution of SIFT points in SIFT space. The local descriptor contributes to the accuracy of the image matching, while the global descriptor offers the crucial function of indexing efficiency and is used to compute a short list or potential matches from an image repository (a coarse granularity operation) for the local descriptor-based image verification of the short-listed images.
In the CDVS Test Model (TM), the global descriptor is computed from a quantized Fisher Vector of a pre-trained 128 cluster Gaussian mixture model (GMM) in the SIFT space, reduced by Principle Component Analysis (PCA) to 32 dimensions. As a result, 128×32 bits represent the Fisher Vectors from SIFT points in images. The distance between two global descriptors is computed based on the Hamming distance of common clusters, and a set of thresholds are applied for accepting or rejecting a match, according to the sum of active clusters in both images. As discussed above, however, such an approach is susceptible to noisy clusters in the global descriptor domain, and the distance is easily dominated by those noisy clusters. In addition, the heuristic thresholding without a proper problem formulation offers a sub-optimal solution.
To address those shortcomings, the visual query processing system described herein employs a novel order statistics based learning approach to find the optimal matching function and threshold, producing an improvement to the current state of art in the CDVS Test Model that is significant, as demonstrated by simulation results.
The global descriptors in the CDVS Test Model may represent each image in an image repository by a 32×128 binary matrix representing the Fisher Vectors for the SIFTs associated with an image. A 128 bit flag may also be included to indicate which GMM clusters are active in the global descriptor. The Hamming distance between two images may thus be computed with the following logic: Let two global descriptors X1 and X2 each be 128 32-bit vectors, X1=[x11, x21, . . . , x1281] and X2=[x12, x22, . . . , x1282], with the respective associated flags F1=[f11, f21, . . . , f1281] and F1=[f11, f21, . . . , f1281]. The Hamming distance vector D between X1 and X2 is:
where ⊕ indicates the exclusive OR (XOR) operation. The Hamming distances for an example of 100 matching and non-matching image pairs are illustrated in the
Order statistics is a known process in statistical data analysis. Accordingly, a sorting (which is a non-linear transformation) and heat kernel-based transformation may be introduced to operate on the Hamming distance features. First, the Hamming distance di computed for each cluster is sorted to obtain d(1), d(2), . . . , d(k). Then an affinity score ri is computed as:
r
i
=e
−ad
(2)
This normalizes the affinity per cluster in the global descriptors to [0, 1], assigns zero affinity to non-active cluster pairs, and resolves the irregular dimension size problem. Examples of 32 dimensional affinity feature from sorted Hamming distance, with kernel size a=0.1, are plotted in
where wT is the transpose of w, SB is the between-class covariance matrix, and Sw is the within-class covariance matrix. To solve equation (3), an eigen problem is computed. The optimal weights obtained from the Linear Discriminant Analysis are plotted in
In exploiting the improved precision-recall performance discussed above, the algorithm 700 operates as follows: First, local descriptors are determined for a query image utilizing known techniques. The global descriptor is then obtained using the affinity scores and Linear Discriminant Analysis as described above, and is transmitted along with the local descriptors (and possibly certain additional information) to the visual search server 102 as part of the visual search query (step 701). The global descriptor from the query is then compared to global descriptors for images within the image repository 101 (step 702). The resulting short list of images from the image repository, selected based on matching of the global descriptor from the query to the image global descriptors for images within the image repository, are then compared using the local descriptor from the query and local descriptors for the short list images (step 703). Correct matching is expected to improve and false positives are expected to reduce using this process.
The technical benefits of the more sophisticated learning algorithm described above include significantly improved matching accuracy.
Although the present disclosure has been described with an exemplary embodiment, various changes and modifications may be suggested to one skilled in the art. It is intended that the present disclosure encompass such changes and modifications as fall within the scope of the appended claims.
This application claims priority to and hereby incorporates by reference U.S. Provisional Patent Application No. 61/753,292, filed Jan. 16, 2013, entitled “VISUAL SEARCH ACCURACY WITH HAMMING DISTANCE ORDER STATISTICS LEARNING.”
Number | Date | Country | |
---|---|---|---|
61753292 | Jan 2013 | US |