This is the first application filed for the present invention.
The present invention relates to machine vision, and in particular to a global method for object recognition using local moments.
Techniques of optical object recognition have applications in a variety of fields, including automated manufacturing, biomedical engineering, security, and document analysis.
Most object recognition methods measure the similarity between two objects based on a set of features extracted from each of the objects. These methods can be classified into three types—Global feature methods, structural feature methods, and relational graph methods—explained in “Model-Based Recognition in Robot Vision” (R. T. Chin and C. R. Dyer, Computing Surveys, Vol.18, No.1, March 1996).
In global feature methods, the object is represented by a feature vector containing n global features of the object. Thus, each object can be viewed as a point in an n-dimensional feature space, and two objects can be compared using a function of their corresponding points in the feature space, such as a measure of distance.
An application usually associated with global methods is object identification, which consists of recognizing an isolated object (target object) contained in an image as one in a bank of predefined model objects. Typically, the object identification process comprises two main phases: model definition and object recognition. During the model definition phase, each of the model objects is represented by a feature vector according to a chosen feature extraction scheme. During the object recognition phase, an image is acquired (e.g. using a gray-scale digital camera, etc.) and pre-processed to isolate individual objects. For one of these isolated objects, the target object, a feature vector is extracted according to the chosen feature extraction scheme. This feature vector is then compared to the feature vectors of the model objects; each target-model pair is attributed a similarity score.
In object identification applications, the model object as it occurs in the target image may have undergone an affine transformation (e.g. translation, rotation, scale variation, aspect-ratio variation, skew, etc.) and degradation of various types (e.g. erosion and dilatation, addition or subtraction of pieces, etc.). For gray scale images, the model object may have been affected by a change in image contrast or mean gray level.
Therefore, object identification must support the set of affine transformations and be robust to the types of degradation expected for the specific application. Support for the transformations can be achieved by including in the pre-processing stage one or more standardization steps during which the object is roughly centered in the sub-image and normalized to a standard size and orientation; and/or by selecting features that are invariant under the transformations; and/or by choosing a similarity measure that detects the correlation between the vectors of two objects related through an affine transformation.
In global feature methods, the process of feature extraction consists of extracting from the object a set of features adequate for identification purposes, namely a set of features that vary across the model objects and thus enable their differentiation. Various types of features are used in object recognition, as described in “Feature Extraction Methods for Character Recognition—A Survey” (Ø. D. Trier, A. K. Jain and T. Taxt, Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996).
A particular class of features used for object recognition are those derived from geometric moments of the object, as described in Moment Functions in Image Analysis (R. Mukundan and K. R. Ramakrishnan, World Scientific Publishing, 1998). Geometric moments of different orders of the object provide different spatial characteristics of the intensity distribution within the object (and of the mass distribution of the object for binary images); for example, moments of order zero and one together provide elementary object descriptors such as the total intensity and the intensity centroid of the object (total mass and center of mass for binary images).
Geometric moments have many desirable characteristics for use as features in object recognition. Central moments that are calculated by shifting the origin of the reference system to the intensity centroid are invariant to translation. Furthermore, geometric moments can be combined to obtain moment features that are invariant to other transformations such as uniform scale variation, aspect-ratio variation and rotation, called geometric moment invariants.
In typical moment-based methods of object recognition, an object is represented by a set of features derived from global moments of the object of various orders. Increasing the order of the moments used provides a larger set of features to represent the object, but also results in a greater sensitivity to noise.
An alternative feature set that does not require the use of higher order moments uses local moments of the object, as opposed to global moments. In a method known as “Zoning” applied to binary images, an n×m (uniform rectangular) grid is superimposed on the object image, and the masses (zero-order moments) of the object in each of the n×m regions are used as features. However, as the outer boundary of the grid is fixed by the outmost points of the object, the addition or subtraction of a piece, even of negligible mass relative to the object, can significantly alter the grid and therefore the extracted features.
Another method based on local moments is described in “Scene Classification by Fuzzy Local Moments” (H. D. Cheng and R. Desai, International Journal of Pattern Recognition and Artificial Intelligence, Vol. 12, No. 7, 921-938,1998). The method locates the intensity centroid of the object, on which it centers an n×m radial grid; the outer boundary of the grid, the nth concentric circle, is fixed by the object point at the greatest radial distance from the center of mass. The features are the first-order radial moments calculated in each of the n×m regions. This method suffers from the same disadvantage as zoning, namely that the size and position of the grid is fixed by the outer boundary of the object, which is vulnerable to degradation. Also, using a radial system of axes is not computationally efficient.
There is therefore a need in the industry for an object recognition method that will overcome the above-identified drawbacks.
Accordingly, it is an object of the invention to provide a method used in object recognition that offers the benefits of local moment-based techniques but with an improved robustness to noise.
According to an embodiment of the invention, there is provided a method for determining a similarity score of a target object with respect to a model object. The target object is in a plane and the model object is represented by a model feature vector. The method comprises generating regions of the plane according to a first mass distribution of the target object and a second mass distribution of a part of the target object. Each of the generated regions has a corresponding mass distribution indicator. The method further comprises calculating a target feature vector for the target object according to at least one of the corresponding mass distribution indicators. Finally, the method computes the similarity score using the target feature vector and the model feature vector.
The detailed description of the invention will refer to the appended drawings, in which:
a-d is a schematic sequentially illustrating successive steps in the process of generating regions in accordance with
a-b is a schematic illustrating the processes of generating regions and extracting a feature vector in accordance with
The method of the present invention comprises two main phases, namely model definition and object recognition. The model definition phase involves representing the model object by a feature vector, according to a chosen feature extraction scheme. The object recognition phase involves extracting from a target object contained in an image a feature vector according to the chosen feature extraction scheme, and computing a similarity score between this feature vector and the feature vector of the model object.
Model Definition
In the present invention, the model definition phase (
The model object can be a suitable image of the object (i.e. an image in which the physical object is neither degraded nor occluded by other objects and taken under suitable conditions of lighting, contrast, noise level, etc.) acquired by an image acquisition device (e.g. gray scale digital camera, electromagnetic or ultra-sonic imaging system, etc.). The acquired image is often converted to another representation, such as a binary representation, or an edge, skeleton or crest representation. Alternatively, the model object can take the form of a so-called “synthetic” description of the object, such as a drawing produced using a software program or a set of mathematical equations describing the mass distribution (or intensity distribution) of the object. In any case, the model object is defined as a set of weighted points in a plane with an associated plane coordinate system.
Referring to
The partitioning of the plane for the preferred embodiment is described with reference to
In another preferred embodiment of the invention, the second partition (at 302) is performed by selecting each of the quadrants bordered by the center of mass axes, and for each of these quadrants, locating the axes parallel to the center of mass axes passing through the center of mass of the quadrant. Again, the global and local center of mass axes define a partition of the plane into disjoint parts. In this embodiment, the set of regions selected consists of all the disjoint parts resulting from the partition.
Other indicators of the distribution of mass of the object may be used to partition the object, such as other indicators derived from geometric moments, or alternatively the median.
Referring to
In a preferred embodiment, each feature extracted from a region is a coordinate (x or y) of the center of mass of the object in this region. Extraction of the feature vector for the preferred embodiment is described with reference to
VM=(
Object Recognition
During the object recognition phase, a feature vector is extracted from a target object contained in an image according to the same feature extraction scheme as used during the model definition phase, and a similarity score is then computed between this feature vector and the feature vector of the model object.
Preliminary steps of the object recognition phase include acquiring and pre-processing an image to obtain the target object. First, an image of a scene including a target object for comparison with a model object is acquired using an image acquisition device, such as any of those listed above for obtaining the model object. The model object as it occurs in the target image may have undergone certain affine transformations (e.g. translation, rotation, scale variation, aspect-ratio variation) and degradation of various types (e.g. erosion and dilatation, addition or subtraction of pieces, etc.). Second, the image is pre-processed. Objects in the image are isolated using any method known in the art (e.g. blob analysis, etc.). One or more standardization steps may be applied to the image for the method to support some of the expected transformations; for example, the image may be scaled or rotated for the image objects to appear in a standard size and orientation. Also, the image may be converted to another representation, such as a binary representation, or an edge, skeleton or crest representation. In any case, each isolated object is defined, like the model object, as a set of weighted points; these isolated objects appear in a plane with an associated coordinate system. In a final pre-processing step, a target object is selected among the isolated objects.
a illustrates an exemplary target object 33 and another isolated object 36 in a plane 34 with an associated coordinate system 35. For the purpose of this example, the isolated objects consist of sets of points of equal weight. The target object 33 is an occurrence of the model object 23 that has undergone translation, a variation in scale, and degradation, in particular the addition of a long piece 37.
Regions are generated (at 800) in exactly the same way as during the model definition phase (at 200). Referring to
The feature vector is extracted from the generated regions (at 802) in precisely the same way as during the model definition phase (at 202). Therefore, in a preferred embodiment, the feature vector is created as described in
VT=(
In the proposed feature extraction scheme, both the method of partitioning and the features themselves present advantages.
First, the method of partitioning a plane containing the object using successive partitions based on local mass distributions of the object is more robust to degradation of the boundary of the object than other methods known in the art. In the present method, a piece added to or subtracted from the object affects the partition proportionally to its mass relative to the object. Thus, a piece of relatively small mass (e.g. 37) will not significantly affect the overall partition or the extracted features as a whole; its effect will be largely restricted to the regions containing the addition or subtraction of mass. Second, the extracted features are derived from first and zero order moments, which are less sensitive to noise than moments of higher order. The local moments are represented in a coordinate system attached to the object and therefore are invariant to translation of the object in the image. Also, using orthogonal coordinate axes parallel to the natural axes of the image is more computationally effective than using a radial system of axes, for example.
The final step of the object recognition phase consists of computing a similarity score between the target and model object feature vectors (at 804). This similarity score is computed using a set of weights assigned to the features. In a preferred embodiment of the invention, all the weights are equal to one, which is equivalent to not using weights.
A commonly used measure of the correspondence between two moment-based feature vectors is Euclidean distance. The Euclidean distance D between the target and model object feature vectors VT and VM is given by:
This measure is simple to compute, but for the method to support a scale difference between the target and model objects, one of the objects or feature vectors must be scaled beforehand.
An alternative measure that is scale invariant is related to the angle θ between the target and model object feature vectors. The scale invariant similarity score S is given by:
However, this score does not support independent scale variations in the x and y directions or, equivalently, transformations affecting the aspect ratio of the object.
VxT=(
VxM=(
Second (at 1002), two independent similarity scores Sx and Sy are computed for the x and y feature vectors respectively:
Finally (at 1004), the global similarity score between the target and model objects is computed as the average of the two independent similarity scores:
In another embodiment of the invention, non-trivial weights may be used. During a preliminary stage, each pair of corresponding features
The weight matrix can be used with any of the similarity scores described previously. The weighted Euclidean distance DW and the weighted scale invariant score SW are given by:
where
|A|W=√{square root over (A′WA)}
(A·B)W=A′WB
where A′denotes the transpose of A.
Similarly, a weighted aspect ration invariant score can be obtained by averaging the weighted similarity scores SxWx and SyWy for x and y feature vectors given by:
An application of the present invention is object identification, which consists of matching a target object contained in an image to one or more model objects contained in a bank of model objects. In this case, similarity scores are computed between the target object and each of the model objects, and these scores are then used to select one or more model objects based on certain criteria.
Given a specific bank of model objects, some of the extracted features vary more over the set of model objects than others and therefore contribute more to differentiating the similarity scores computed between a target object and each of the model objects. In order to accentuate the differences between the similarity scores, larger weights can be assigned to features associated with greater model object variability. In a preferred embodiment, a common weight matrix is assigned to the bank of model objects during a training stage.
The embodiments of the invention described above are intended to be exemplary only. The scope of the invention is therefore intended to be limited solely by the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
4903312 | Sato | Feb 1990 | A |
4903313 | Tachikawa | Feb 1990 | A |
5841902 | Tu | Nov 1998 | A |
5915250 | Jain et al. | Jun 1999 | A |
6128410 | Park et al. | Oct 2000 | A |
6272245 | Lin | Aug 2001 | B1 |
6901163 | Pearce et al. | May 2005 | B1 |
6999623 | Yamaoka et al. | Feb 2006 | B1 |
20050036709 | Imai | Feb 2005 | A1 |
Number | Date | Country | |
---|---|---|---|
20050063590 A1 | Mar 2005 | US |