1. Field of the Invention
The present invention relates to a feature matching method for recognizing an object in two-dimensional or three-dimensional image data.
2. Description of the Related Art
U.S. Pat. No. 7,016,532 B2 discloses a technique of recognizing an object by carrying out a plurality of processing operations (such as generation of a bounding box, geometry normalization, wavelet decomposition, color cube decomposition, shape decomposition, and generation of a grayscale image with a low resolution) with respect to one target region.
According to one aspect of the present invention, there is provided a feature matching method for recognizing an object in two-dimensional or three-dimensional image data, the method comprising:
detecting features in each of which a predetermined attribute in the two-dimensional or three-dimensional image data takes a local maximum and/or minimum;
excluding features existing along edges and line contours from the detected features;
allocating the remaining features to a plane;
selecting some features from the allocated features by using local information; and
performing feature matching for the selected features.
Advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. Advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention, and together with the general description given above and the detailed description of the embodiments given below, serve to explain the principles of the invention.
Hereinafter, a feature matching method according to the present invention will be described with reference to the accompanying drawings.
A feature matching method according to a first embodiment of the present invention is also referred to as a PBR (Point Based Recognition). As shown in
The feature detection 10 detects spatially stable features, which do not depend on a scale or a layout, from inputted object data, for example, an image. The feature adoption 12 adopts a robust and stable portion for making robust recognition from the features detected by the feature detection 10. The feature recognition 14 uses the features extracted by the feature adoption 12 and additional constrains to locate, index, and recognize objects pre-analyzed and stored in a database 16.
Now, a detailed description will be given with respect to each one of these feature detection 10, feature adoption 12, and feature recognition 14.
First, a description will be given with respect to the feature detection 10.
Robust recognition depends on both of the proprieties of selected features and methods used to match them. Good features should make the matcher work well and robust. Therefore, the integrated design of appropriate feature types and matching methods should exhibit reliability and stability. In general, large-scale features such as lines, bobs, or regions, are easier to match, because they provide more global information for the temporal matching computation. However, the large-scale features are also prone to significant imaging distortions that arise from variations of view, geometry, and illumination. Therefore, matching them requires storing conditions and assumptions to compensate for these distortions. Unfortunately, the geometry needed to model these conditions is usually unknown, so large-scale features often only recover approximate image geometry.
For image recognition, there is a need to recover accurate 2D correspondences in an image space, and matching small-scale features such as points has an advantage that the corresponding measurements are possible to at least the accuracy of the pixel resolution. Furthermore, a point feature has advantages over the large-scale features (such as lines and faces) in distinctiveness, robustness to occlusions (when part of the features is hidden), and good invariance to affine transformation. The related disadvantages of point features are that often only a sparse set of points and measurements are available, and matching them is also difficult, because only local information is available. However, if many point features are detected reliably, then a potentially large number of image corresponding measurements should be re-coverable, without the degradation of measurement quality introduced by the various assumptions and constraints required by other type features. Actually, observations with many methods using large-scale features or recovering full affine field that the most reliable measurement often occur near feature points. Considering these factors, points (feature points) are opted to employ as recognizing features.
General feature detection is a non-trivial problem. For image matching or recognition, the detected features should demonstrate good reliability and stability with the recognizing method, even when they do not have any physical correspondence to structure in the real world. In another words, feature detection methods should be able to detect as many as possible the features that are reliable, distinctive, and repeatable under various affine imaging conditions. This guarantees that it is possible to allocate enough features for further image matching and parameter recovering if the most of the features are occluded.
The feature detection 10 in the present embodiment uses a method for finding the point features with rich-texture regions. In this method, three filters are used. First, a high-frequency passed filter is used to detect the points having local maximum responds. Let R is a 3×3 windows centered are point P, and F(P) is the output of applying a high-frequency filter F to this point. If
F(P)=max {P>Pi:R}>Threshold (1)
then point P is a feature candidate, and saved for further examining. This filter may be used to extract local minimum responds.
The second filter is a distinctive feature filter. As is known, the points lie along the edges or linear contours are not stable for matching. This is so-called matching arbitrary effect (effect that can be seen as if matching were successful), and these points must be removed for reliable matching. In addition, it is known that the covariance matrix of image derivatives is a good indicator to measure the distributions of image structure over a small patch. Summarizing the relationship between the matrix and image structure, small eigenvalues of correspond to a relatively constant intensity within a region. A pair of large and small eigenvalues corresponds to a high texture pattern, and two large eigenvalues can represent linear features, salt-and-pepper textures, or other patterns. Therefore, it is possible to design the filter to remove those linear feature points.
Let M is a 2×2 matrix computed from image derivatives,
and λ1 and λ2 are eigenvalues of M. The measure of a linear edge response is
R=det(M)−k(trace(M))2 (3)
where the det(M)=λ1λ2, and trace(M)=λ1+λ2.
So, if the edge response
R(P)>Threshold (4)
then point P is treated as a linear edge point and removed from feature candidate list.
The third filter is an interpolation filter which iteratively refines the detected points to sub-pixel accuracy. An affine plane is first used to fit the local points to reconstruct a continuous super-plane. Then the filter iteratively refines the points upon the reconstructed plane till an optimal fitting solution is converge and the final fitting is used to update the points to the sub-pixel accuracy.
A novel aspect of the present embodiment is that scale invariance is improved by employing a multi-resolution technique, thereby extracting features from each of a plurality of images having various resolutions.
To achieve affine scale invariance, a multi-resolution strategy is employed in the above feature detection processing. Unlike the traditional pyramid usage in which the main goal is to accelerate the processing, i.e. coarse-to-fine search, it is a goal to detect all the possible features across different scales to achieve a effective affine scale invariance. So, the features in each level of the pyramid are processed dependently.
Now, a description will be given with respect to the feature adoption 12.
Once features have been detected in the above feature detection 10, the thus detected features have to be adopted as in a robust and stable representation for robust recognition. As described above, the related disadvantages of using point features as matching primitives are that often only a sparse set of points and only local information is available, which make the matching is difficult. An appropriate strategy of feature adoption is very important to deal with the variations of viewpoint, geometry, and illumination.
In the approach, the feature adoption 12 in the present embodiment adopts each feature point using its local region information, called affine region. Three constraints are used to quality the local region, i.e., intensity, scale, and orientation. The intensity constraint is the image gradient value G(x, y) calculated inside the region pixels, which indicate the texture-ness of the feature.
G(x,y)=√{square root over (∇x2+∇y2)} (5)
In the situation of small base line of two matched images, the intensity adoption is sufficient to match the images under small linear displacements. A simply correlation matching strategy could be used. Furthermore, if the matched images have larger imaging distortion, an affine warping matching is effect to compensate for the distortion.
However, under the situation of large image base line, in which the matched images have serious geometric deformation including scaling, 2D and 3D rotations, the simple intensity adoption is not sufficient. It is well known that the simple intensity correlation is not scale and rotation invariant. In this situation, the all the possible constraints should be considered in order to adopt the matching points as in a robust and stable multi-quality representation. The scale and local orientation constraints are embedded into the adoption and matching processing. First, the continuous orientation space is quantized into discrete apace.
{Odiscrete(xn,yn):n=1, 2, . . . N)=Quant{Ocontinue(x,y):x,yε[0,2π]) (6)
O
continue(x, y)=arctan(∇y/∇x) (7)
These quantized orientations form the bases spanning the orientation space. By applying the image decomposition model, all local orientation of feature can be assigned to the discrete base space. In this way, the features in term of their local orientations can be built by a compact representation. To form a consistent representation for all the considered qualities (intensity, scale, and orientation), the intensity and scale values are used to vote the contributions of every local orientation to the matching feature. Furthermore, to reduce the quantization effect (error), a Gaussian smooth function (Gaussian smooth processing) is also used to weighting the voting contributions.
A novel aspect of the present embodiment is that features of the orientations normalized from the peripheral regions of the features are provided in the form as shown in formula (8) below.
Let R is a voting range that its size is defined by a Gaussian filter used for generating a scale pyramid. For any point P(xi, yi) within the voting range, its contribution to a quantized orientation is represented by formula (8) below:
where, G(xi, yi) is a gradient computed with formula (5) above, and Weight(xi, yi) is a Gaussian weighting function centered at the processed point (x, y), as shown in formula (9) below:
Weight(xi,yi)=exp(−((xi−x)2+(yi−y)2)/σ2) (9)
The above adoption strategy is effect to handle image scaling and out-of-plane rotation, but, it is still sensitive to in plane orientation. To compensate this variance, an affine region is normalized to a coincided direction during the voting computation. Again, to cancel the quantization effect of the coincided rotation, a bi-linear interpolation and Gaussian smoothing processing are applied within a window coincide. Also, to increase the robustness with respect to variance of lighting condition, the input image is normalized.
The final output of the feature adoption 12 is a compact vector representation for each matching point and associated region that embeds all the constraints, achieving affine geometry and illumination invariance.
Now, a description will be given with respect to the feature recognition 14.
The features detected by the feature detection 10 and adopted by the feature adoption 12 establish good characteristics for geometry invariance. The matching is performed based on the adopted feature representations. The SSD (Sum of Square Difference) is used for the similarity matching, i.e. for each features P, a similarity value Similarity(P) is computed against the matched image, and the SSD search is performed to find the best matched point with maximal similarity. If the following relationship is established,
Similarity (P)={P, Pi}>Threshold (10)
it indicates that Pi is the matched point of P.
It is effective to utilize a pair of evaluation techniques utilizing RANSAC (Random Sample Consensus) as a reliability evaluation technique for image recognition, and in particular, to calculate a posture at the time of image recognition from an affine transformation matrix calculated in accordance with this technique when a small number of matched points exist, making it possible to evaluate reliability of image recognition based on the calculated posture.
The experimental results show that the above multi-constraint feature representation establish good characteristics for image matching. For the very cluster scenes, however, mismatching (i.e. outliers) may happen, especially for the features that located in the background. To remove those matching outliers, a RANSAC based approach is used to make a search for a pair that fulfills the fundamental geometrical constraint. It is well known that the matched image features corresponding to a same object will fulfill a 2D parametric transformation (a homography). To accelerate the computation, the feature recognition 14 uses the 2D affine constraint to approximate the homography for outlier removing, which requires only 3 points to estimate the parametric transformation. First, the RANSAC iteration is applied using randomly selected 3 features to estimate an initial transformation Minit.
The estimated parametric transform is then refined iteratively using all the matched features. The matching outliers (mismatching) are indicated for those matching points that have large fitting residuals.
where xit is the warped point of xi toward to xis by applying the estimated affine transformation, i.e.
The final output of the feature matching is a list of matching points with outlier indicators and the estimated 2D parametric transformation (affine parameters).
The present embodiment describes a fast matching search for achieving further speed in the foregoing feature recognition 14.
This fast matching search is referred to as a Data Base Tree (dBTree). The dBTree is an effective image matching search technology that can rapidly recover possible matches to a high-dimensional database 16 from which PBR feature points as described in the foregoing first embodiment have been extracted. Technically, the problem is a typical NP data query problem, i.e. given an N-dimension database points and a query point q, it is wanted that the closest matches (Nearest Neighbors) of q among the database is fined. The fast matching search according to the present embodiment is a tree-structure matching approach that forms a hierarchical representation of the PBR features to achieve an effective data representation, matching, and indexing of high-dimensional feature spaces.
Technically, the dBTree matcher, as shown in
Before describing in detail a dBTree approach in the present embodiment, a description will be given with respect to a problem to be solved in the match search.
The goal of the match search is to rapidly recover possible matches to a high-dimensional database. Although the present embodiment focuses on a specific case of PBR feature matching, this dBTree search structure is generic suitable for any data search applications.
Given two sets of points: P={pi, i=1, 2, . . . , N} and Q={qj, j=1, 2, . . . , M}, where pi and qj are k-dimensional vectors, for example, 128-D vector for PBR feature, the goal is to find all possible matches between the two point sets P and Q, i.e. Matches={pi<=>qj} under certain matching similarity.
Since the PBR features establish good invariant characteristics for feature matching, a Euclidean distance for the invariant features is used for the similarity matching, i.e. for each feature pi, a similarity value Similarity(pi) is computed against the matched features qj, and the matching search is performed to find the best matched point with minimal Euclidean distance.
Obviously the matching performance and speed are heavily depended on the dimensions N and M of the two point sets.
To match the points of two datasets, the first intuition would probably be a Brute-Force exhaustive search method. As shown in
Now, a detailed description will be given with respect to a dBTree approach in the present embodiment.
First, a description will be given with respect to the dBTree construction 18.
A central data structure in the dBTree matcher is a tree structure that forms an effective hierarchical representation of the feature distribution. Unlike the scan-line feature representation (i.e. every feature is represented in a grid structure) used in a Brute-Force search, the dBTree matcher represents the k-dimension data in a balanced binary tree by hierarchically decomposing the whole space into several subspaces according to the splitting value of each tree-node. The root-node of this tree represents the entire matching space, and the branch-nodes represent rectangular sub-spaces that contain the features having different characters of their enclosed spaces. Since the subspace is relatively small comparing to the original space such that it contains small number of input features, the tree representation should provide a fast way to access any input feature by feature's position. By traversing down the hierarchy until find the sub-spaces containing the input feature, an identifying operation of the matching points can be carried out merely by scanning trough few nodes in the sub-spaces.
Now, a description will be given with respect to the dBTree search 20.
There are two steps for search a query point over the tree: search for closest subspace 26 and search for closest node within the subspace 26. First, the tree is traversed to find the subspace 26 containing the query point. Since the number of subspace 26 is relatively small, it is possible to rapidly locate the closest subspace 26 with only log(N) comparisons, and the space would have a high probability that contains that the matched points. Once locate the subspace 26, a node-level traversing is performed through all the nodes in the subspace 26 to identify the possible matching points. The process is repeated until the closest node is found to the query point.
The above search strategy has been tested and it does show certain speed-improvement on matching small dimensional dataset. To be surprised, however, it demonstrated extremely ineffective for large-scale dataset, even slower than the Brute-Force search approach. Analysis the reasons come from the two aspects. First, the efficiency of the traditional tree searching is based on the fact that many tree branches could be pruned if the distance to the query point is too far, which greatly reduces the unnecessary searching time. This is typical true for the low dimensional dataset, but for higher dimensions there are too many branches adjacent to the central one, which have to be examined. A lot of calculations are still carried out trying to prune the branches and looking for the best searching paths, which becomes a tree-type exhaustive search. Second, node-level traversing within the subspace 26 is also exhaustive through every contained node, depended entirely on the number of contained nodes. For a high-dimensional dataset, each subspace 26 still contains too many nodes that need to be exhaustively traversed.
In the present embodiment, two strategies (methods) are employed to overcome those problems and to achieve effective matching for high-dimensional dataset. First, a tree-pruning-filter (branch cutting filter) is used to cut (reduce) the number of branches needs to be examined. After exploring a specific number of nearest branches (i.e. search-steps), the branch search is enforced stopped. The distance filtering could also be used for this purpose, but extensive experiments have shown that using the search-steps filtering has demonstrated better performance in terms of corrected matches and computation cost. Although search results obtained from the strategy give approximate solutions are observed, experiments shows that the mismatching rate only increased less 2%.
The second strategy (method) is to improve the node search by introducing a node-distance-filter. Based on the matching consistent constraint that for the most of real-world scenes the correct matching will be mostly clustered, so, instead to search exhaustively for every feature node, a distance threshold is used for limiting the node research range. The node search is performed as a circular pattern so that nodes that are closer to the target will be searched first. Once the search boundary is reached, the search is enforced stopped and nearest neighbors (NNs) are outputted.
Now, a description will be given with respect to the index matching 22.
Once the nearest neighbors are detected, the next step is to decide if the NNs are accepted as correct matches. Same as that using in the original PBR point matcher, a related matching cost threshold is used for selecting correct matching, i.e. if the similarly difference between the highest NN and second-highest NN (a distance up to the highest NN/a distance up to the second-highest NN) is less than a pre-defined threshold, the point is accepted as correct match.
The difference in similarity between the highest NN and the second-highest NN is obtained as a parameter that expresses preciseness in identity judgment of the similarity of that point. In addition, the number per se of matching points in the image is also obtained as a parameter that expresses preciseness in identity judgment of the image. Further, a differential total sum (residual difference) in affine transformation of matching points in the image expressed by formula (13) above is also obtained as a parameter that expresses preciseness in identity judgment of the image. Part of these parameters may be utilized. Alternatively, a transform formula defining each of these parameters as a variable is defined, whereby this formula may be defined as preciseness of identity judgment in matching.
In addition, by utilizing a value of the preciseness, it becomes possible to output a plurality of images as a matched result in a predetermined sequence. For example, the number of matching points is utilized as preciseness, and then, the matching results are displayed in descending order of the number of matching points, whereby images are outputted in sequential order from the most reliable image.
Applications utilizing the feature matching method described above will be described herebelow.
[First Application]
The information retrieval system is configured to include an information presentation apparatus 100, a storage unit 102, a dataset server 104, and an information server 106. The information presentation apparatus 100 is configured by platform hardware. The storage unit 102 is provided in the platform hardware. The dataset server 104 and the information server 106 are configured in sites accessible by the platform hardware.
The information presentation apparatus 100 is configured to include an image acquisition unit 108, a recognition and identification unit 110, an information specification unit 112, a presentation image generation unit 114, and an image display unit 116. The recognition and identification unit 110, the information specification unit 112, and the presentation image generation unit 114 are realized by application software of the information presentation unit installed in the platform hardware.
Depending on the case, the image acquisition unit 108 and the image display unit 116 are provided as physical configurations in the platform hardware, or are connected to outside. Thus, the recognition and identification unit 110, the information specification unit 112, and the presentation image generation unit 114 could be referred to as an information presentation apparatus. However, in the present application, the information presentation apparatus is defined to perform processes from the process of imaging or image capture to the process of final image presentation, such that the combination of the image acquisition unit 108, the recognition and identification unit 110, the information specification unit 112, the presentation image generation unit 114, and the image display unit 116 is herein referred to as the information presentation apparatus.
The image acquisition unit 108 is a camera or the like having a predetermined image acquisition range. The recognition and identification unit 110 recognizes and identifies respective objects within the image acquisition range from an image acquired by the image acquisition unit 108. The information specification unit 112 obtains predetermined information (display contents) from the information server 106 in accordance with information of the respective objects identified by the recognition and identification unit 110. The information specification unit 112 then specifies the predetermined information as relevant information. The presentation image generation unit 114 generates a presentation image formed by correlation between the relevant information, which has been specified by the information specification unit 112, and the image acquired by the image acquisition unit 108. The image display unit 116 is, for example, a liquid crystal display that displays the presentation image generated by the presentation image generation unit 114.
The storage unit 102 located in the platform contains a dataset 118 stored by the dataset server 104 via a communication unit or storage medium (not shown). Admission (downloading or media replacement) and storing of the dataset 118 is possible regardless of pre-activation or post-activation of the information presentation apparatus 100.
The information presentation apparatus 100 configured as described above performs operation as follows. First, as shown in
As shown in
Although not shown in
The first application using a camera mobile phone as a platform will be described herebelow. Basically, mobile phones are devices that are used by individuals. In recent years, most models of mobile phones allow admission (that is, installation by downloading) of application software from an Internet site accessible from the mobile phones (which hereinbelow will be simply referred to as a “mobile-phone accessible site”). The information presentation apparatus 100 is, basically, also assumed as a prerequisite to be a mobile phone of the aforementioned type. Application software of the information presentation apparatus 100 is installed into the storage unit 102 of the mobile phone. The dataset 118 is appropriately stored into the storage unit 102 of the mobile phone through communication from the dataset server 104 connected to a specific mobile-phone accessible site (not shown).
By way of example, a utilization range of the information presentation apparatus 100 in the mobile phones includes a utilization method described hereinbelow. For example, a case is assumed in which photographs existing in publications, such as magazines or newspapers, are preliminarily specified, and data sets relevant thereto are preliminarily prepared. In this case, a mobile phone of a user acquires an image of an object from paper space of any of the publications and then to read information relevant to the object from a mobile-phone accessible site. In such a case, it is impossible to retain all photographs, icons, illustrations, and like items contained in all publications as feature. Thus, it is practical to restrict the range to, for example, a specific use range, thereby to provide features. For instance, the data can be provided to a user in a summarized form, such as “a data set for referencing, as objects, photographs contained in an n-th month issue” of a specific magazine. With such an arrangement, usability for users is improved, and reference images, if 100 to several hundred pieces in one dataset, can be sufficiently stored into the storage unit 102 of the mobile phone, and in addition, the recognition and identification processing time can be within several seconds. Further, neither special contrivance nor process is necessary for, for example, photographs and illustrations on the side of prints that are used in the information presentation apparatus 100.
According to the first application described above, for the user, multiple items of data in a use range can be admitted by batch into the information presentation apparatus 100, the dataset supply side can easily be prepared therefore, and services easy to be commercially provided can be realized.
In the configuration further including the function of calculating the position and orientation, information obtained from the information server 106 becomes displayable with an appropriate position and orientation over an original image. Consequently, the configuration leads to enhancement of user information obtainment effects.
[Second Application]
A second application will be described herebelow.
However, in the case that the information presentation apparatus 100 becomes pervasive and data sets also are supplied in wide variety from many businesses, the following arrangements are preferably made. Of data, data enjoying high utilization frequency (which data hereinbelow will be referred to as “basic data” 122) is not supplied as a separate dataset 118, but preferably is provided usable even if any type of a dataset 118 is selected. For instance, it is useful that objects associated with index information of the dataset 118 itself or object and the like most frequently used is excluded from the dataset 118, but only the some number of features are stored to be resident in application software in the information presentation apparatus 100. More specifically, in the second application, the dataset 118 is composed in a set corresponding to the utilization purpose of a user or a publication or object correlated thereto, and is supplied as a separate resource from the application software. However, features or the like relevant to an object with an especially high utilization frequency or necessity is stored to reside or is retained as the basic data 122 in the application software itself.
Description will again be made with reference to the case in which a camera mobile phone is the platform. For example, it is most practical to download an ordinary dataset 118 through communication from a mobile-phone accessible site. In this case, however, it is convenient for a user of the mobile phone if guiding and retrieval can be performed in an index site (a page in the mobile-phone accessible site) of the dataset 118. Even in the event of access to the site itself, control is performed such that the information presentation apparatus 100 acquires an image of an object dedicated therefore, and a URL for the site is passed to accessing software to be accessible, so that special preparation of the dataset 118 is not necessary. As such, features corresponding to the object are stored to reside as the basic data 122 in the application software. In this case, a specific illustration or logo can be set as the object, or a plain rectangle freely available can be set as the object.
Alternatively, in lieu of the arrangement in which the basic data 122 is stored to reside or is retained in the application software itself, the configuration can be such that, as shown in
More specifically, as described above, when actually operating the information presentation apparatus 100, the user admits an arbitrary dataset 118. At least one item of the basic data 122 is included in any of the datasets 118, so that it is always addressable for an object either with high utilization frequency or high necessity. For example, a case is contemplated in which, as shown in
In addition, as shown in
This configuration provides a method for admitting the basic data 122 useful in a configuration mode in which the dataset 118 is supplied as a separate resource, and especially, is downloaded through a network from the dataset server 104. More specifically, in the configuration shown in
Thereby, the user is able to always use the basic data 122 with the information presentation apparatus 100 without the need of giving special considerations.
For example, in recent years, camera mobile phones capable of using application software are generally pervasive. A case is now contemplated in which a camera mobile phone of this type is used as a platform, and application software having functions, except those of the image acquisition unit 108 and image display unit 116 of the information presentation apparatus 100, is installed on the platform. With reference to
If the basic data 122 does not exist in the mobile phone, it is determined that the update is necessary. In the event that, even while the basic data 122 already exists in the storage unit 102 of the mobile phone, if a version of the basic data 122 is older than a version of a basic data 122 intended to be supplied from the dataset server 104, it is determined that the update is necessary.
Subsequently, similarly to the case of the dataset 118, the basic data 122 is downloaded (step S116). The basic data 122 thus downloaded is stored into the storage unit 102 of the mobile phone (step S118). In addition, the dataset 118 downloaded is stored into the storage unit 102 of the mobile phone (step S120).
Thus, in the event that the basic data 122 already exists in the storage unit 102 of the mobile phone, the necessity of the update is determined through the version comparison, and then the basic data 122 is downloaded and stored.
As described above, regarding the necessity for the dataset 118, only a dataset 118 corresponding to the necessity of the user is stored into the mobile phone, whereby the securement of the object-identification process speed and user's necessity are made compatible.
The utilization range of the information presentation apparatus 100 includes, for example, access from the mobile phone to information relevant or attributed to a design of photograph or illustration of a publication, such as newspaper or magazine, as a object, and improvement of information presentation by superimposing the aforementioned information over an image acquired by the camera. Further, not only such the printout, but also any of, for example, physical objects and signboards existing in a town can be registered as an object into the features. In this case, such a physical object or signboard is recognized as an object by the mobile phone, thereby to make it possible to obtain additional information or latest information.
As another utilization mode using the mobile phone, in the case of a product, such as CD, DVD, or the like, having a package, the design of a jacket thereof is variant, and thus the respective jacket designs can be used as a object. For example, it is now assumed that data sets regarding such jackets are distributed to users from a store or separately from a record company. In this case, the respective jackets can be recognized as an object by the mobile phone in, for example, a CD and/or DVD store or rental store. As such, for example, a URL is correlated to the object, and audio distribution of, for example, a selected part of music can be implemented to the mobile phone as information correlated to the object through the URL. Further, as this correlated information, an annotation (respective annotation of a photograph of the jacket) corresponding to the surface of the jacket can be appropriately added.
Thus, as a utilization mode using the mobile phone, in the case of using a jacket design of a product such as CD, DVD, or the like having a package as the object, the arrangement can be made as follows. First, (1) at least a part of an exterior image of a recording medium containing music fixed thereto or a package thereof is preliminarily distributed to the mobile phone as object data. Then, (2) predetermined music information (such as audio data and annotation information) relevant to the fixed music is distributed to the mobile phone accessed to an address guided by the object.
The arrangement thus made is effective for promotion on the side of the record company, and produces an advantage in that, for example, time and labor can be reduced for preparation for viewing and listening on the side of the store.
As described above in each application, the recognition and identification unit, the information specification unit, the presentation image generation unit, and the position and orientation calculation unit are each implemented by a CPU, which is incorporated in the information presentation apparatus, and a program that operates on the CPU. However, this can be in another mode in which, for example, leased lines are provided.
As a mode for realizing the storage unit in the platform, an external data pack and a detachable storage medium (flash memory, for example) are usable, without being limited thereto.
Also in the second application, similarly as in the first application, the configuration can be formed to include the position and orientation calculation unit 120 so that relevant information is presented in accordance with calculated position and orientation.
In addition, as shown by the broken line in
[Third Application]
The configuration of the information retrieval system of the first application shown in
[Fourth Application]
A fourth application will be described herebelow.
The product recognition system includes a barcode scanner 126 serving as a reader for recognizing products each having a barcode, a weight scale 128 for measuring the weights of respective products, and in addition, a camera 130 for acquiring images of products. A control unit/cash storage box 132 for storing cash performs recognition of a product in accordance with a database 134 having registered product features for recognition, and displays the type, unit price, and total price of the recognized products on a monitor 136. A view field 138 of the camera 130 matches with the range of the weight scale 128.
Thus, according to the product recognition system, a system provider preliminarily acquires an image of an object that would need to be recognized, and registers a feature point extracted therefrom into the database 134. For example, for use in a supermarket, vegetables and the like such as tomato, apple, and green pepper are photographed, and feature points 140 thereof are extracted and stored, with identification indexes such as respectively corresponding recognition IDs and names, into the database 134 as shown in
A purchaser of a product carries the product (object) and places it within the view field 138 of the camera 130 installed to a cash register, whereby an image of the product is acquired (step S122). Image data of the product is transferred from the camera 130 to the control unit/cash storage box 132 (step S124). In the control unit/cash storage box 132, features are extracted, and the product is recognized with reference to the database 134 (step S126).
After the product has been recognized, the control unit/cash storage box 132 calls or retrieves a specified price of the recognized product from the database 134 (step S128), causes the price to be displayed on the monitor 136, and carries out the settlement (step S130).
In the event that a purchaser purchases two items, a green pepper and tomato, at first, an image of the tomato is acquired by the camera 130. Then, in the control unit/cash storage box 132, features in the image data are extracted, and matching with the database 134 is carried out. After matching, in the event that one object product is designated, a coefficient corresponding to the price thereof, or the weight thereof if a weight-based system is used, is read from the database 134 and is output to the monitor 136. Then, similarly, also for the green pepper, product identification and price display are carried out. Finally, a total price of the products are calculated and output to the monitor 136, thereby carrying out the settlement.
In the event that a plurality of object candidates exceeding a threshold value of similarity are output after matching, the following method is applied: (1) the candidates are displayed on the monitor 136 to be selected; or (2) re-acquiring of an image of an objects is carried out. Thereby, object establishment is carried out.
In the above, although the example is shown in which an image of each product is acquired one by one by the camera 130, an image including a plurality of object products can be acquired at one time for matching.
When purchasers carry out the processes, an automatic cash register can be realized.
A plurality of features is extracted from an image (product image data) input from the camera 130 (step S132). Then, preliminarily registered features of object are read as comparison data from the database 134 (step S134). Then, as shown in
Alternatively, if the object is determined to be identical (step S140), the object currently in comparison and the product in the input image are determined to be identical to one another (step S144).
As described above, according to the product recognition system of the fourth application, product recognition can be accomplished without affixing a recognition index such as barcode or RF tag to the product. Especially, this is useful as automatic recognition is possible in recognizing agricultural products, such as vegetables, and other products, such as meat and fish, for which significant time and labor are necessary to affix recognition indexes, unlike those such as industrial products to which recognition indexes can easily be affixed by printing and the like.
Further, objects to which such recognition indexes are less affixable include minerals, such that the system can be adapted for industrial use, such as automatic separation thereof.
[Fifth Application]
A fifth application will be described herebelow.
For example, the storage 148 is a memory detachable from or built in the digital camera 146. The printer 150 prints out image data stored in the memory, i.e., the storage 148, in accordance with a printout instruction received from the digital camera 146. Alternately, the storage 148 is connected to the digital camera 146 through connection terminals, cable, or wireless/wired network, or alternately, can be a device mounting a memory detached from the digital camera 146 and capable of transferring image data. In this case, the printer 150 can be of the type that connected to or is integrally configured with the storage 148 and that executes printout operation in accordance with a printout instruction received from the digital camera 146.
The storage 148 further includes functionality of a database from which image data is retrievable in accordance with the feature value. Specifically, the storage 148 configures a feature database (DB) containing feature sets created from digital data of original images.
The retrieval system thus configured performs operation as follows.
(1) First, the digital camera 146 acquires an image of a photographic subject including a retrieval source printout 152 once printed out by the printer 150. Then, a region corresponding to the image of the retrieval source printout 152 is extracted from the acquired image data, and features of the extracted region are extracted.
(2) Then, the digital camera 146 executes matching (process) of the extracted features with the feature sets stored in the storage 148.
(3) As a consequence, the digital camera 146 reads image data corresponding to matched features from the storage 148 as original image data of the retrieval source printout 152.
(4) Thereby, the digital camera 146 is able to again print out the read original image data with the printer 150.
The retrieval source printout 152 can use not only a printout having been output in units of one page, but also an index print having been output to collectively include a plurality of demagnified images. This is because it is more advantageous in cost and usability to select necessary images from the index print and to copy them.
The retrieval source printout 152 can be a printout output from a printer (not shown) external of the system as long as it is an image of which original image data exists in the feature DB.
The retrieval system of the fifth application will be described in more detail with reference to a block diagram of configuration shown in
After having set the mode to the retrieval mode, a user operates an image acquisition unit 154 of the digital camera 146 to acquire image of a retrieval source printout 152 desired to be printed out again in the state where it is pasted onto, for example, a table or a wall face (step S146).
Then, features are extracted by a feature extraction unit 156 (step S148). The features can be any one of the following types: one type uses feature points in the image data; another type uses relative densities of split areas in the image data in accordance with a predetermined rule, that is, small regions allocated with a predetermined grating; another type in accordance with Fourier transform values corresponding to respective split areas. Preferably, information contained in such feature points includes point distribution information.
Subsequently, a matching unit 158 performs a DB-matching process in the manner that the features extracted by the feature extraction unit 156 are compared to the feature DB (feature sets) of already-acquired image data composed in the storage 148, and data with a relatively high similarity is sequentially extracted (step S150).
More specifically, as shown in
Thereafter, image data of the selected original image candidates are read from the storage 148 and are displayed on a display unit 160 as image candidates to be extracted (step S158), thereby to receive a selection from the user (step S160).
In the event that the arrow key, which corresponds to the “PREVIOUS” or “NEXT” icon 164 (step S162), is depressed, the process returns to step S158, at which the image candidate 162 is displayed. In the event that the enter key, which corresponds to the “DETERMINE” icon 166, is depressed (step S162), the matching unit 158 sends to the connected printer 150 original image data that corresponds to the image candidate 162 stored in the storage 148, and the image data is again printed out (step S164). When the storage 148 is not connected to the printer 150 through a wired/wireless network, the process of performing predetermined marking, such as additionally writing a flag, is carried out on the original image data corresponding to the image candidate 162 stored in the storage 148. Thereby, the data can be printed out by the printer 150 capable of accessing the storage 148.
In step S158 of displaying the image candidate, a plurality of candidates can be displayed at one time. In this case, the display unit 160 ordinarily mounted to the digital camera 146 is, of course, of a small size of several inches, such that displaying of four or nine items is appropriate for use.
The feature DB of the already-acquired image data composed in the storage 148 as comparative objects used in step S150 has to be preliminarily created from original image data stored in the storage 148. The storage 148 can be either a memory attached to the digital camera 146 or a database accessible through a communication unit 170 as shown by a broken line in
Various methods are considered for creation of the feature DB.
One example is a method that carries out calculation of features and database registration when storing acquired image data in the original-image acquiring event into a memory area of the digital camera 146. More specifically, as shown in
Another method is such that, when original image data stored in the storage 148 is printed out by the printer 150, printing-out is specified, and concurrently, feature extraction process is carried out, and the extracted features are stored in the database, therefore producing high processing efficiency. More specifically, as shown in
Further, of course batch processing can be performed. More specifically, as shown in
Further, the data can be discretely processed in accordance with the input of a user specification. More specifically, as shown in
Conventionally, in many cases, when again printing out image data, which was previously printed out, a user retrieves the data with reference to supplementary information (such as file name and image acquired date/time) of the image data. However, according to the retrieval system of the present application, only by acquiring the image of the desired retrieval source printout 152 by using the digital camera 146, a file (image data) of the original image can be accessed, therefore making it possible to provide a retrieval method intuitive and with high usability for users.
Further, not only the original image data itself, but also image data similar in image configuration can be retrieved, thereby making it possible to provide novel secondary adaptabilities. More specifically, an image of a signboard or poster on the street, for example, is acquired in a so-called retrieval mode such as described above. In this case, image data similar or identical to the acquired image data can easily be retrieved from image data and features thereof existing in the storage 148, such as database, accessible through, for example, the memory attached to the digital camera 146 and communication.
Further, suppose that, as shown in
Further, an example case is assumed in which an image of the Tokyo Tower is acquired. In this case, images existing in the storage 148, such as database, accessible through, for example, the memory attached to the digital camera 146 and communication are retrieved, whereby photographs of not only the Tokyo Tower, but also photographs of tower-like buildings in various corners of the world can be retrieved and extracted. Further, in accordance with the position information provided as additional information of respective photographs thus retrieved and extracted, the locations of the respective towers can be informed, or as shown in
In the event of superimposed display of a photograph over a map, a case can occur in which many images are overlapped and less visible depending on factors, such as the map scale, the photograph size, the number of photographs relevant to the location. In such a case, as shown in
In the above, although it has been described that the process of steps S148 to 5162 is carried out within the digital camera 146, the process can be carried out in a different way as follows. In the case where the storage 148 is provided as a separate resource independent of the digital camera 146, the process described above can be actually operated by being activated in the form of software in the storage 148 or by being separated into the digital camera 146 and the storage 148.
[Sixth Application]
An outline of a retrieval system of a sixth application will be described herebelow with reference to
The retrieval system includes a digital camera 146, a storage 148, a printer 150, and a personal computer (PC) 172. The storage 148 is a storage device built in the PC 172 or accessible by the PC 172 through communication. The PC 172 is wired/wireless connected to the digital camera 146, or alternatively is configured to permit a memory detached from the digital camera 146 to be attached, thereby being able to read image data stored in the memory of the digital camera 146.
The retrieval system thus configured performs operation as follows.
(1) First, the digital camera 146 acquires an image of a photographic subject including a retrieval source printout 152 once printed out by the printer 150.
(5) The PC 172 extracts a region corresponding to the image of the retrieval source printout 152 from the image data acquired, and then extracts features of the extracted region.
(6) Then, the PC 172 executes matching process of the extracted features with the features stored in the storage 148.
(7) As a consequence, the PC 172 reads image data corresponding to matched features as original image data of the retrieval source printout 152 from the storage 148.
(8) Thereby, the PC 172 is able to again print out the read original image data by the printer 150.
The retrieval system of the sixth application will be described in more detail with reference to a block diagram of configuration shown in
The present application contemplates a case where image data acquired by the digital camera 146 is stored into the storage 148 built in or connected to the PC 172 designated by a user, and a process shown on the PC side in
With the application software having thus started the operation, an image acquisition process for acquiring an image of a printout is executed on the side of the digital camera 146 (step S146). More specifically, as shown in
Then, in the PC 172, a feature extraction unit 176 realized by application software performs the process of extracting features from the transferred acquired image data (step S148). The feature extraction process can be performed on the digital camera 146 side. Thereby, the amount of communication from the digital camera 146 to the PC 172 can be reduced.
Subsequently, a matching unit 178 realized by application software performs a DB-matching process such that the extracted features are compared to the feature DB of already-acquired image data composed in the storage 148, and those with relatively high similarities are sequentially extracted (step S150). More specifically, in accordance with the calculated features, the matching unit 178 on the PC 172 side performs comparison with the features stored in correlation to respective items of image data in the storage 148 (or, comprehensively stored in the form of a database), and most similar one is selected. It is also effective in usability to set such that a plurality of most similar feature candidates is selected. The features include specification information of original image data from which the features have been calculated, and candidate images are called in accordance with the specification information.
Thereafter, image data of the selected original image candidates (or candidate images) are read from the storage 148 and are displayed on a display unit 180 serving as a display of the PC 172 as image candidates to be extracted (step S158), whereby to receive a selection from the user. In this case, the processing may be such that the selected original image candidates (or the candidate images) are transferred as they are or in appropriately compressed states from the PC 172 to the digital camera 146, and are displayed on the display unit 160 of the digital camera 146 (step S206).
Then, in response to a selection performed through the operation of a mouse or the like, original image data corresponding to the image candidate stored in the storage 148 is sent to the connected printer 150 and is printed thereby (step S164). More specifically, the displayed original image candidate is determined through determination of the user and is passed to the printing process, thereby to enable the user to easily perform the preliminarily desired reprinting of already-printed image data. In this event, not only printing is simply done, but also the plurality of selected candidate images result in a state that “although different from the desired original image, similar images have been collected”, depending on the user's determination, thereby realizing the function of batch retrieval of similar image data.
In the present application, the feature DB can be created in the event of transfer of the acquired image data from the digital camera 146 to the storage 148 through the PC 172. More specifically, with reference to
Thus, according to the sixth application, similarly to the fifth application, only by acquiring the image of the desired retrieval source printout 152 by using the digital camera 146, a file (image data) of the original image can be accessed, thereby making it possible to provide a retrieval method intuitive and with high usability for users.
Further, not only the original image data itself, but also image data similar in image configuration can be retrieved, thereby making it possible to provide novel secondary adaptabilities. More specifically, an image of a signboard or poster on the street, for example, is acquired in a so-called retrieval mode such as described above. In this case, image data similar or identical to the acquired image data can easily be retrieved from image data and features thereof existing in the storage 148, such as an external database, accessible through, for example, the memory attached to the digital camera 146 and a communication unit 182 shown by the broken line in
Description has been given with reference to the case where the digital camera 146 is used, the present application is not limited thereto, and a scanner can be used.
Further, while an image of the retrieval source printout 152, which has actually been printed out, is acquired by the digital camera 146, an image of a display displaying the acquired image of the retrieval source printout 152, for example, can be acquired by the digital camera 146.
[Seventh Application]
A retrieval system of a seventh application will be described herebelow. The present application is an example of adaptation to application software 188 of a mobile phone 184 with a camera 186, as shown in
Mobile phone application software is at present usable with most mobile phones, and a large number of items of image data are storable in a memory such as an internal memory or an external memory card. Further, in specific mobile phone sites (mobile phone dedicated Internet sites), storage services for, for example, user-specified image files are provided. In these environments, a very large number of image data can be stored, thereby to make it possible to use them for various user's own activity recording and jobs. On the other hand, however, retrieval of desired image data is complicate and burdensome for hardware of the mobile phone having the interface relatively inferior in freedom degree. In most cases, actual retrieval is carried out from a list of texts representing, for example, the titles or date and time of image data. As such, it must be said that, in the case of large number of image data, the retrieval is complicate and burdensome; and even when keying-in a text, it is inconvenient to input a plurality of words or a long title, for example.
According to the present retrieval system installed, the system is operated as the application of the camera mobile phone, thereby to carry out the activation of “image input function”, “segmentation of a region of interest”, and “feature calculation.” The features are transmitted to a corresponding server via a mobile phone line. The corresponding server can be provide in a one to one or one to multiplicity relation with respect to the camera or cameras. The features sent to the server are actually subjected to the process of matching by a “matching function” provided in the server with the features read from a database required by the server. Thereby, image data with high similarity is extracted. The image data thus extracted is returned to the call-side mobile phone from the server, whereby the image data can be output by a printer unspecified from the mobile phone. In the case that various types of information relevant to the image data are further added to the image data extracted by the server, an extended function “the information is returned to the mobile phone” can be implemented. Further, the extracted image data is highly compressed and returned to the mobile phone, and after a user verifies that the data is a desired image data, the data is stored in the memory area of the mobile phone or is displayed on a display 190 of the mobile phone. Even only from this fact, it can of course be said that the system is useful.
[Eighth Application]
A retrieval system of an eighth application will be described herebelow.
The present application has a configuration including a digital camera 146 with a communication function and a server connected through communication, in which a function for image retrieval is sharedly provided to the digital camera 146 and the server. The digital camera 146 with the communication function provides the function as an image-acquiring-function mounted communication device, and of course includes a camera mobile phone.
In this case, similarly as in the fifth application, the digital camera 146 includes the image acquiring function and a calculation function for calculating the features from the image data. In any one of the fifth to seventh applications, the features (or the feature DB) to be compared and referred are originally created based on images acquired and printed out by users or the digital camera 146. This is attributed to the fact that the initial purpose is to image printouts of already-acquired image data and to carry out retrieval. In comparison, the present application is configured by extending the purpose and is significantly different in that features calculated based on images of, for example, on-the-street sign boards, posters, printouts, and publications are also stored into the database formed in the storage 148 of the server.
Of course, not only printing out, but also extraction from images contained in the database can be accomplished.
Further, features extracted from an acquired image can be added to the database.
In the event of registration, position information relevant to the image is recognized manually, by a sensor such as a GPS, or by the above-described character recognition, and then is registered. In this manner, in the event of acquiring a next time image in a similar location, a similar image is extracted by retrieval from the database, whereby the position information desired to be added to the acquired image can be extracted.
In the present application, an image of a poster such as a product advertisement present on the street is acquired by the digital camera 146, for example (step S146). Then, a feature extraction process is executed by the digital camera 146 from the acquired image data (step S148). The extracted features are sent to a predetermined server by the communication unit 170 built in or attached to the digital camera 146.
In the server, the feature DB formed in the storage 148 accessible by the server is looked up (accessed), and features sent from the digital camera 146 are compared thereto (step S150), thereby to extract similar image candidates having similar features (step S216). Image data of the extracted similar image candidates are, by necessity, subjected to a predetermined compression process to reduce the amount of communication, and then are sent to the digital camera 146, whereby the candidates can be simply displayed on the display unit 160 of the digital camera 146 (step S218). Thereby, user selection can be performed similarly as in the fifth application.
Then, image data of an image candidate extracted (and selected) is sent and output to the digital camera 146; or alternatively, a next operation is carried out in accordance with specified information correlated to the features of the extracted (and selected) image candidate (step S220). In the case of the product advertisement, the next operation can be, for example, description of the product or connection to a mail-order site or returning of a screen of the site, as image data, to the digital camera 146. Further, in the event that an image of an on-the-street signboard has been acquired, also peripheral information of the signboard is retrieved as features. Further, for example, data of the location of a wireless communication base station during communication is compared, thereby to make it possible to present identifications of, for example, the location and address, as information to the user.
[Ninth Application]
A retrieval system of a ninth application will be described herebelow.
The present application retrieves multiple items of image data from a storage 148 by matching using first features in accordance with an acquired image of an acquired retrieval source printout 152. In addition, the application retrieves a single or multiple items of image data from the multiple items of image data, obtained as a result of the retrieval by feature matching using second features of a region narrower than or identical to the first features and high in resolution.
The retrieval system of the present application has a configuration similar to that of the fifth application. Particularly, in the present application, the storage 148 is configured to include a total feature DB containing general features registered as first features, and a detail feature DB containing detail features registered as second features.
As shown in
Similarly as in the fifth application, in the present application, first, an image acquisition unit 154 of a digital camera 146 set in a retrieval mode acquires an image of a retrieval source printout 152 desired to be printed out again in the state where it is pasted onto, for example, a table or a wall face so that at least no omission of the retrieval source printout 152 occurs (step S146).
Then, a total feature extraction process for extracting features from the totality of the image data acquired by the image acquisition unit 154 is performed by a feature extraction unit 156 (step S222). Then, a matching process with the total feature DB, which compares the extracted total features to the total feature DB composed in the storage 148 and containing registered general features and sequentially extracts data with a relatively high similarity, is executed by a matching unit 158 (step S224).
Thereafter, in the feature extraction unit 156, a detail retrieval object region, namely image data of the central region portion of the region of interest in the present example, is further extracted as detail retrieval object image data from the acquired image data of the total region of interest (step S226).
Then, a detail feature extraction process for extracting features from the extracted detail retrieval object image data is performed by the feature extraction unit 156 (step S228). Subsequently, in the matching unit 158, a matching process with the detail feature DB, which compares the extracted detail features to the detail feature DB formed in the storage 148 and having registered detail features and sequentially extracts data with higher similarity, is executed (step S230). In this case, however, feature matching with all detail features registered into the detail feature DB is not performed, but feature matching is executed only for detail features corresponding to multiple items of image data extracted by the matching process with the total feature DB in the step 5224. Therefore, although the feature value matching process with the detail features takes a process time by nature as the resolution is high, the process can accomplished within a minimum necessary time. As a criterion for the extraction in the matching process with the total feature DB in step S224, such a method is employed that provides a threshold value for the similarity or that fixedly selects high order 500 items.
After the image data with high similarity are extracted as original image candidates by the matching process with the detail feature DB, the candidates are displayed on the display unit 160 as image candidates for extraction (step S158), thereby to receive a selection from the user. If an image desired by the user is determined (step S162), then the matching unit 158 sends original image data corresponding to the image candidate stored in the storage 148 to the connected printer 150; and the data is again printed out (step S164).
According to the present application, quality (satisfaction level) of the retrieval result of the original image data and an appropriate retrieval time period are compatible with one another.
Further, the retrieval result incorporating the consideration of the attention region for the photographer can be obtained. More specifically, ordinarily, the photographer acquires an image of a main photographic subject by capturing it in the center of the imaging area. Therefore, as shown in
Further, in retrieval from an original image population for which keyword classification and the like are difficult, the effectiveness as means for performing high speed determination of small differences is high. That is, the retrieval result can be narrowed down in a stepwise manner with respect to a large population.
Also in the present application, the general features and the detail features have to be preliminarily created and registered into the database for one item of original image data. The registration can be performed as described in the fifth application. However, both the features do not necessarily have to be created at the same time. For example, the method can be such that the detail features are created when necessary in execution of secondary retrieval.
Further, the features are not limited to that as shown in, for example,
For example, as shown in
Further, as shown in
Further, as shown in
Thereby, in the event of feature matching with the detail features, a partial region thereof, that is, the region as shown in each of
Although the present application has thus been described in correspondence to the fifth application, the application is, of course, similarly adaptable to the sixth to eighth applications.
[Tenth Application]
A retrieval system of a tenth application will be described herebelow.
The retrieval system of the present application is an example using a digital camera 146 including a communication function. The application is adapted in the case where a preliminarily registered image is acquired to thereby recognize the image, and a predetermined operation (for example, activation of an audio output or predetermined program, or displaying of a predetermined URL) is executed in accordance with the recognition result. Of course, the digital camera 146 with the communication function functions as an imaging-function mounted communication device, and includes a camera mobile phone.
When an image is recognized, while image data is registered as a reference database (so-called dictionary data), it is more efficient and practical to compare the features of images than to compare the images as they are, such that a feature value database (DB) of features extracted from images is used. The database can be of a built-in type or a type existing in the server through communication.
In the present application, an arrangement relationship of feature points of an image is calculated as a combination of vector quantities, and a multigroup thereof is defined to be the feature. In this event, the feature is different in accuracy depending on the number of feature points, such that as the fineness of original image data is higher, a proportionally larger number of feature points are detectable. As such, for the original image data, the feature is calculated under a condition of a highest-possible fineness. In this event, when the feature is calculated for the same image element in accordance with image data with a reduced fineness, the number of feature points is relatively small, such that the feature itself has a small capacity. In the case of a small capacity, while the matching accuracy is low, advantages are produced in that, for example, the matching speed is high, and the communication speed is high.
In the present application, attention is drawn on the above-described. More specifically, in the event of registration of image data as reference data (feature), when one image element is registered, the features are calculated from a plurality of different finenesses, thereby to configure databases specialized corresponding to the respective finenesses.
Corresponding matching servers are connected to the respective databases and arranged to be capable of providing parallel operation. More specifically, as shown in
With the matching process system thus prepared, as shown in
In this event, suppose that the camera resolution is about two million pixels. In this case, also when performing retrieval in the matching server through communication, if matching is performed by using data from a feature DB having a resolution of about two million pixels, an erroneous-recognition ratio is low.
However, matching in a concurrently operating feature DB with a low resolution (VGA class resolution, for example) is responsive at high speed, and thus the result is transmitted earlier to the digital camera 146. It is advantageous in speed and recognition accuracy to thus parallel arrange the matching servers corresponding to the resolutions. However, a case can occur in which a response (result) from the followingly operating high-resolution matching server is different from an already-output result of the low-resolution matching server. In such a case, displaying in accordance with the earlier result is first carried out, and then it is updated to a display in accordance with the following result. In the event of recognition of, for example, a banknote, although the result in the low resolution matching is a level of “$100 note”, a more detailed or proper result, such as “$100 note with the number HD85866756A”, due to the higher fineness can be obtained in the high resolution matching. In addition, a displaying manner is also effective in which a plurality of candidates are obtained from the low resolution result, and the resultant candidates are narrowed down to be accurate as a high resolution result arrives.
In addition, as described above, the capacity of the feature itself is large in the high resolution matching server. A feature in an XGA class increases to about 40 kB; however, the capacity is reduced to about 10 kB by preliminary low resolution matching.
Further, in the second or higher matching server and database, when only a difference from a lower low resolution database is retained, a smaller database configuration is realized. This leads to an increase in the speed of the recognition process. It has been verified that, when extraction with feature (method in which area allocation is carried out, and respective density values are compared) is advanced for features, the feature is generally 10 kB or lower, and also multidimensional features obtained by combining the two methods appropriately are useful to improve the recognition accuracy.
As described above, the method in which the resolution of some or entirety of the acquired image surface is divided into multiple resolutions to thereby realize substantial matching hierarchization is effective in both recognition speed and recognition accuracy in comparison with the case in which a plurality of matching servers are simply distributed in a clustered manner.
Especially, the above-described method is a method effective in the case that the number of images preliminarily registered into a database is very large (1000 or larger), and is effective in the case that images with high similarity are included therein.
[Eleventh Application]
A retrieval system of an eleventh application will be described herebelow.
As shown in
The server 198 further includes a feature management database (DB) 202 that contains a multiple items of features registered and that performs the hierarchical management thereof. Features to be registered into the feature management DB 202 is created by a feature creation unit 204 from an object image 206 arranged on a paper space 208 by using a desktop publishing (DTP) 210.
That is, in the retrieval system of the present application, the object image 206 is preliminarily printed by the DTP 210 on the paper space 208, and the features of the object image 206 are created by the feature creation unit 204. Then, the created features are preliminarily registered into the feature management DB 202 of the server 198. When a large number of object images 206 to be registered exist, the above-described creation and registration of features are repeatedly performed.
When a user desiring retrieval acquires the object image 206 from the paper face 208 by using the camera 186 of the mobile phone 184, the application software 188 performs feature extraction of an image from the input image. The application software 188 sends the extracted features to the matching process unit 200 of the server 198. Then, the matching process unit 200 performs matching with the features registered in the feature management DB 202. If a matching result is obtained, then the matching process unit 200 sends information of the matching result to the application software 188 of the mobile phone 184 with the camera 186. The application software 188 displays the result information on the display 190.
As described above, in the eleventh application, a plurality of features are extracted from the input image, and a feature set consisting of the features is comparatively matched (subjected to the matching process) with the feature set in units of the preliminarily registered object. Thereby, identification of the identical object is carried out.
The feature point in the image in this case refers to that having a difference greater than a predetermined level from an other pixel, for example, contrast in brightness, color, distribution of peripheral pixels, differentiation component value, and inter-feature point arrangement. In the eleventh application, the features are extracted and are then registered in units of the object. Then, in the event of actual identification, features are extracted by searching the interior of an input image and are compared to the preliminarily registered data.
Referring to
Then, it is determined whether the comparison with all the recognition elements is finished (step S250). If step S250 is branched to “NO”, the features in the feature set of the next recognition element is input to the matching process unit 200 as comparison data (step S252), and the process returns to step S242
If step S250 is branched to “YES”, it is determined whether the number of the matching features is greater than or equal to a predetermined value (Y (pieces), in the present example) (step S254). If step S254 is branched to “YES”, then a determination is made that the input object is identical to the object Z, and is displayed on the display 190 to be notified to the user (step S256). Alternately, if step S254 is branched “NO”, then a determination is made that the input object and the object Z are not identical to one another (step S258).
In the event of actual identification, when a numeric value representing the similarity (degree) (difference between respective components of features) exceeds a preset threshold value, the feature is determined to be a similar feature. Further, an object having a plurality of matched features is determined to be identical to the object of the input image. More specifically, features in an input image and a preliminarily registered feature set are compared with one another as described herebelow.
First, the interior of an object is split into a plurality of elements, and the elements are registered. Thereby, in the event of comparative matching between objects, a determination logic is applied for recognition to determine such that the object is not recognized unless a plurality of elements (three elements, for example) are recognized.
Second, suppose that similar objects are shown in an image for object recognition, as in a case, where, for example, an S company uses an object OBJ1 (features: A, B, and C) as its logo, and an M company uses an object OBJ2 (features: E, F, and G) as its logo. In addition, the S company and the M company are assumed to be companies competitive with one another. In this case, every effort should be made to prevent confusion between the logos of the two companies. Taking these circumstances into account, according to the eleventh application, in the event that the features A and E are detected at the same time from the same screen, neither of the objects is recognized. That is, the recognition determination is made strict.
Third, conventionally, whatever the number of features is recognized, textual expression for informing the user of the recognition result is the same. As such, in the event that, for example, only some of features have been recognized, and more specifically, in the event that the identity level between the input image and the comparative image includes uncertainty, the actual state cannot be reported to the user. However, according to the eleventh application, when the number of recognition elements is small, the result displaying method (expression method) is altered to provide an expression inclusive of uncertainty such as described above.
With the respective technical measures described above, the following respective effects can be obtained.
First, the probability of causing erroneous recognition due to the identity of only part of the object can be reduced.
Second, a determination reference to be applied particularly when erroneous recognition is desired to be prevented can be specified to be strict.
Third, even when accuracy in the identity determination of the object is lower than a predetermined value, attention is directed to the user, and then the identity determination result can be reported to the user.
In the cases of the object OBJ1 (features: A, B, and C) and the object OBJ2 (features: E, F, and G), in which the features in objects are separately registered, recognition is carried out in accordance with the determination logic described herebelow.
First, unless “A and B and C” is satisfied, recognition of the object OBJ1 is not determined to be successful.
More specifically, in the event of the recognition of the object OBJ1, which consists of the recognition elements or features A, B, and C, when only one or two of A, B, and C are recognized, it is not determined that the recognition of the object OBJ1 is successful.
By way of a modified example of the above, features A, B, and C, respectively, are weighted by allocating weights as evaluation scores. For example, the features are weighted as 1.0, 0.5, and 0.3, respectively. In this case, in the event that recognition is carried out when the total evaluation score exceeds 1.5, when the features A and B are detected as recognition elements, since the total evaluation score is 1.5, the object OBJ1 is recognized.
When the features B and C are detected, the object OBJ1 is not recognized.
The evaluation scores of the recognition elements are manageable together with the features of the recognition elements.
Further, as logical expressions, the priority of the respective element can be altered, whereby not only “A and B and C,” but also a combination, such as “A and (B or C)” or “A or (B and C)”, is possible. In any of these examples, the feature A is always essential to achieve successful recognition.
The above-described examples of the evaluation scores and logical expressions can be used by being combined. More specifically, the priorities of the respective logical expressions and weights of the respective elements can be used by being combined.
Second, when “E and A” are extracted, neither the object OBJ1 nor the object OBJ2 is recognized.
For example, reference is again made to the case where the S company using the object OBJ1 as its logo and the M company using the object OBJ2 as its logo are in the competitive relation, and every effort should be made to prevent confusion between the two logos. In this case, when the object OBJ1 used as the logo of the S company and the object OBJ2 used as the logo of the M company are both displayed on the same screen, neither of the logos is recognized. In this case, the system provides the user with a display saying to the effect that the recognition is impossible not because the object images are not detected, but because the recognition elements are detected from both (A, B, and C) and (E, F, and G).
Thus, according to the eleventh application, logos of, for example, companies in the competitive relation are identified in the following manner. For example, only when only one of the object OBJ1 used as the logo of the S company and the object OBJ2 used as the logo of the M company is displayed on the acquired image, the logo is recognized. More specifically, either only one of (A, B, and C) or only one of (E, F, and G) is detected within one image, either the object OBJ1 or the object OBJ2 is recognized. In other words, any one of (A, B, and C) and any one of (E, F, and G) are detected within one image, neither the object OBJ1 nor the object OBJ2 is recognized.
Third, when only partial ones, such as “A and B,” are extracted, the result presentation method is altered (expression is made to include uncertainty).
For example, in recognition of the object OBJ1, when all the recognition elements of the features A, B, and C have been recognizable, the recognition result is presented to the user in a high-tone expression, such as “The object OBJ1 has been recognized”.
Alternatively, when two recognition elements, such as the features A and B, B and C, or A and C, have been recognizable, the recognition result is presented to the user in a low-tone expression reducing the conviction, such as “The object is considered to be the object OBJ1.” Still alternatively, when the number of recognizable elements has been one, the recognition result is presented to the user in an expression including uncertainty, such as “The object OBJ1 may have been recognized.”
As a modified example of the eleventh application, in the case where the weighting evaluation scores described above are employed, technical measures for the expression method, such as described above, for the presentation of the recognition result in accordance with the total evaluation score to the user can be contemplated. Of course, the technical measures for the expression method, such as described above, for the presentation of the recognition result to the user are adaptable in various cases. For example, the technical measures are also adaptable to recognition of a desired single recognition element. Further, the expression method as described above is adaptable to a case where the recognition result is presented to the user in accordance with, for example, the number of matched features in a recognition element and the level of identity between extracted features and already-registered features.
In the eleventh application, the feature creation unit 204 can be operated in the server 198. The paper space 208 refers to a display surface, but not necessarily be paper. For example, it can be any one of metal, plastic, and like materials, or can even be an image display apparatus, such as a liquid crystal monitor or plasma television. Of course, information displayed on those such as described above corresponds to information that is displayed in visible light regions for human beings. However, the information can be invisible for human beings as long as the information is inputtable into the camera 186. Further, since all those acquirable as images can be objects, the objects may be images such as X-ray images and thermographic images.
In
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details, and illustrated examples shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
This is a Continuation Application of PCT Application No. PCT/US2007/003653, filed Feb. 13, 2007, which was published under PCT Article 21(2) in English.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2007/003653 | Feb 2007 | US |
Child | 12539786 | US |