The field of the invention is that of the detection of points of interest, also called salient points, in a digital image. More specifically, the invention relates to a technique for the detection of points of interest implementing a wavelet-type approach.
A point of interest may be considered to be the representative of a spatial region of the image conveying a substantial portion of information.
Historically, the notion of the salient point has been proposed in the field of computer vision, where one of the major problems consists of the detection of the corners of the objects (whence the term “salient” used here below as a synonym for the term “of interest”). This term was subsequently broadened to include other characteristics of images such as contours, junctions etc.
In image processing, the detection of the salient points corresponding to the corners of the objects is of little interest. Indeed, the corners are generally isolated points, representing only a small part of the information contained in the image. Furthermore, their detection generates heaps of salient points in the textured or noisy regions.
Various other techniques have been proposed, relating especially to the salient points corresponding to the high frequency zones, namely to the contours of the objects. The invention can be applied more specifically to this type of technique.
A more detailed description is given here below of the different techniques for the detection of salient points.
The detection of salient points (also called points of interest) in images is a problem that has given rise to much research for many years. This section presents the main approaches classically used in the literature. Reference may be made to the document [5] (the documents referred to are listed together in the appendix B) for a more detailed review of the prior art.
One of the first methods was proposed by Harris and Stephens [7] for the detection of corners. Points of this type were deemed then to convey a major quantity of information and were applied in the field of computer vision.
To define this detector, the following quantity is defined at each point p(x,y) of the image I:
Rx,y=Det(Mx,y)−kTr(Mx,y)2
where Mx,y is a matrix defined by:
where:
The salient points are then defined by the positive local extreme values of the quantity Rx,y.
In [5], the authors also propose a more precise version of the Harris and Stephens detector. This version replaces the computation of the derivatives of the image I by a precise computation of the Gaussian kernel.
The Harris and Stephens detector presented here above has been extended to the case of color pictures in [6]. To do this, the authors extend the definition of the matrix Mx,y which then becomes:
where:
In [10], the authors consider the salient points to be the points of the image showing high contrast. To build a detector of this kind, the authors use a multiple-resolution approach based on the construction of a Gaussian pyramid.
Let it be assumed that the image I has a size 2N×2N. We can define a pyramid with N levels where the level 0 corresponds to the original image and the level N-1 corresponds to a one-pixel image.
At the level k of the pyramid, the contrast of the point P is defined by:
where Gk(P) defines the local luminance at the point P and at the level k, and Bk(P) defines the luminance of the local background at the point P and at the level k.
These two variables are computed at each point and for each level of the pyramid. They can therefore be represented by two pyramids called a luminance pyramid and a background pyramid and defined by:
where:
In this approach, a salient point is a point characterized by a high value of the local contrast. In order to take account of the non-symmetry of the variable Ck, the authors introduce a new variable in order to obtain a zero value for a situation of non-contrast and a value >0 everywhere else.
This new variable is defined by:
With this new variable, the salient points are defined by the local maximum values of C*k greater than a fixed threshold.
The salient points detector initially presented in [11] is doubtless the closest to the present invention since it is also based on the use of the theory of wavelets. Indeed, it is the view of the authors that the points conveying a major part of the information are localized in the regions of the image having high frequencies.
By using wavelets with compact carriers, the authors are capable of determining the set of points of the signal f (assumed for the time being to be one-dimensional) that were used to compute any wavelet coefficient whatsoever D2
On the basis of this observation, the hierarchy of wavelet coefficients is built. For each resolution level and for each wavelet coefficient D2
C(D2
where p denotes the regularity of the wavelet base used (i.e. the size of the wavelet filter) and N denotes the length of the original signal f.
Thus, each wavelet coefficient D2
This coefficient therefore needs to be considered at this level of resolution. By applying this process recursively, a coefficient D2
To extend this approach to the 2D signals constituted by the images, the authors apply the same approach to each of the three subbands D2
This technique has been used especially in image indexation in [9].
As shown in the previous section, many methods have been proposed in the literature for the detection of salient points.
The major difference between these approaches relies on the very definition of a salient point. Historically, researchers in the field of computer vision have devoted attention to the corners of objects. It is thus that the Harris and Stephens detector [7] was proposed. This detector has recently been extended to color in [6]. The corners of objects do not, however, represent any relevant information in the field of image processing. Indeed, in the case of weakly textured images, these dots will be scattered in space and will not give any satisfactory representation of the image. In the case of textured or noisy images, the dots will all be concentrated in the textures and within a local and non-comprehensive representation of the image.
The definition of contrast-based salience [10] is appreciably more interesting for image processing. Unfortunately, this approach suffers from the same defect as the previous one in the case of textured or noisy regions.
It is therefore a particular aim of the invention to overcome the different drawbacks of the prior art.
More specifically, it is an aim of the invention to provide a technique for the detection of salient points corresponding to a high frequency, and giving preference to no particular direction in the image.
It is another aim of the invention to provide such a technique that calls for a reduced number of operations as compared with prior art techniques.
In particular, it is a goal of the invention to provide a technique of this kind enabling the use of wavelet bases with a large-sized carrier.
These goals, as well as others that shall appear more clearly here below, are achieved by means of a method for the detection of points of interest in a source digital image, said method implementing a wavelet transformation associating a sub-sampled image, called a scale image, with a source image, and wavelet coefficients corresponding to at least one detail image, for at least one level of decomposition, a point of interest being a point associated with a region of the image showing high frequencies.
According to the invention, this method comprises the following steps:
In the present document, for the sake of simplification, the term “source image” is applied to an original image or an image having undergone pre-processing (gradient computations, change of colorimetrical space etc.).
Advantageously, for each level of decomposition, at least two detail images, respectively corresponding to at least two directions predetermined by said wavelet transformation, are determined.
This wavelet transformation may use especially first-generation or second-generation (mesh-based) wavelets.
In particular, the detail images may comprise:
Advantageously, the method of the invention comprises a step for merging the coefficients of said detail images so as not to give preference to any direction of said source image.
Advantageously, said step for the construction of a tree structure relies on a zerotree type of approach.
Thus, preferably, each point of the scale image having minimum resolution is the root of a tree with which is associated at least one offspring node respectively formed by each of the wavelet coefficients of each of said detail image or images localized at the same position, and then recursively, four offspring nodes are associated with each offspring node of a given level of resolution, these four associated offspring nodes being formed by the wavelet coefficients of the detail image that is of a same type and at the previous resolution level, associated with the corresponding region of the source image.
According to an advantageous aspect of the invention, said selection step implements a step for the construction of at least one salience map, assigning said wavelet coefficients a salience value representing its interest. Preferably, a salience map is built for each of said resolution levels.
Advantageously, for each of said salience maps, for each salience value, a merging is performed of the pieces of information associated with the three wavelet coefficients corresponding to the three detail images so as not to give preference to any direction in the image.
According to a preferred aspect of the invention, a salience value of a given wavelet coefficient having a given level of resolution takes account of the salience value or values of the descending-order wavelet coefficients in said tree structure of said given wavelet coefficient.
Preferably, a salience value is a linear relationship of the. associated wavelet coefficients.
In a particular embodiment of the invention, the salience value of a given wavelet coefficient is computed from the following equations:
In these equations, the parameter Otk may for example have a value −1/r for all the values of k.
According to another preferred aspect of the invention, said selection step comprises a step for building a tree structure of said salience values, the step advantageously relying on a zerotree type approach.
In this case, said selection step advantageously comprises the steps of:
According to a preferred aspect of the invention, said step for the selection of the branch having the highest salience value implements a corresponding scan of the tree starting from its root and a selection, at each level of the tree, of the offspring node having the highest salience value.
As already mentioned, the invention enables the use of numerous wavelet transformations. One particular embodiment implements the Haar base.
One particular embodiment chooses a minimum level of resolution 2−4.
The method of the invention may furthermore include a step for the computation of an image signature, from a predetermined number of points of interest of said image.
Said signature may thus be used especially to index images by their content.
More generally, the invention can be applied in many fields, and for example for:
The invention also relates to devices for the detection of points of interest in a source digital image implementing the method as described here above.
The invention also relates to computer programs comprising program code instructions for the execution of the steps of the method for the detection of points of interest described here above, and the carriers of digital data that can be used by a computer carrying such a program.
Other characteristics and advantages of the invention shall appear from the following description of a preferred embodiment, given by way of a simple illustrative and non-exhaustive example and from the appended drawings, of which:
a and 6b illustrate experimental results of the method of the invention,
5.0 General Principles
One aim of the invention therefore is the detection of the salient points of an image I. These points correspond to the pixels of I belonging to high-frequency regions. This detection is based on wavelet theory [1] [2] [3]. Appendix A briefly presents this theory.
Wavelet transform is a multi-resolution representation of the image enabling the image to be expressed at the different resolutions ½, ¼, etc. Thus, at each level of resolution 2j(j≦−1), the wavelet transform represents the image I, sized n×m=2k×2l(k,l ∈ Z), in the form:
Each of these images is sized 2k+j×2l+j.
Each of these three images is obtained from A2
The invention therefore consists of choosing first of all a wavelet base and a minimum level of resolution 2r(r≦−1). Once the wavelet transformation has been effected, it is proposed to scan each of the three detail images D2
Thus a coefficient having significant salience corresponds to a region of I having high frequencies. Indeed, a wavelet coefficient having a high-value modulus at the resolution 2r(r≦−1) corresponds to a contour of the image A2
From the built-up salience map, the invention proposes a method for the choosing, from among of the 2−r×2−r pixels of I, of the pixel that most represents this zone.
In terms of potential applications, the detection of salient points in the images may be used non-exhaustively for the following operations:
The technique of the invention differs from that proposed by E. Loupias and N. Sebe [11]. The main differences are:
The Loupias and Sebe method considers the subbands independently of each other thus leading them to the detection, by priority, of the maximum gradient points in every direction (i.e. the corners). For our part, we merge the information contained in the different subbands so that no preference is given to any particular direction.
5.1 Wavelet Tran Formation
Wavelet transformation is a powerful mathematical tool for the multi-resolution analysis of a function [1] [2] [3]. Appendix A provides a quick overview of this tool.
In the invention, the functions considered are digital images, i.e. discrete 2D functions. Without overlooking general aspects, we assume here that the processed images are sampled on a discrete grid of n lines and m columns with value range in a sampled luminance space containing 256 values. Furthermore it is assumed that n=2k(k ∈ Z) and that m=2l (l ∈ Z).
If the original image is referenced I, we then have:
As mentioned in section 4, the wavelet transformation of I enables a multi-resolution representation of I. At each level of resolution 2j(j≦−1), the representation of I is given by a coarse image A2
Wavelet transformation necessitates the choice of a scale function Φ(x) as well as the choice of a wavelet function Ψ(x). From these two functions, a scale filter H and a wavelet filter G are derived, their respective pulse responses h and g being defined by:
h(n)=<φ2
g(n)=<Ψ2
Let {tilde over (H)} and {tilde over (G)} respectively denote the mirror filters of H and G (i.e. {tilde over (h)}(n)=h(−n) and {tilde over (g)}(n)=g(−n)).
It can then be shown [1] (cf.
5.2 Construction of the Tree Structure with Wavelet Coefficients
Once the wavelet transformation has been made up to the resolution 2r(r≦−1), we have available:
A tree structure of wavelet coefficients is then built by the zerotree technique [4]. The trees are built as follows (cf.figure 3):
Recursively, the tree structure is constructed wherein each wavelet coefficient α2
Once the tree structure is constructed, each wavelet coefficient α2
5.3 Construction of the Salience Maps
Starting from the tree structure obtained by the preceding step, we propose to build a set of −r salience maps (i.e. one salience map per level of resolution). Each salience map S2
It must be noted that each wavelet coefficient gives preference to one direction (horizontal, vertical or oblique) depending on the detail image to which it belongs. However, we have chosen to favor no particular direction and have therefore merged the information contained in the three wavelet coefficients α2
Furthermore, the salience of each coefficient with the resolution 2j must take account of the salience of its offspring in the tree structure of the coefficients.
In order to take account of all these properties, the salience of a coefficient localized at (x,y) with the resolution 2j is given by the following recursive relationship:
Where:
As can be seen in the Equation 1, the salience of a coefficient is a linear relationship of the wavelet coefficients. Indeed, as mentioned in section 4, we consider the salient points to be pixels of the image belonging to high-frequency regions. Now, a high wavelet coefficient α2
Thus, the formulation of the salience of a given image in the Equation 1 is warranted.
5.4 Choice of the Salient Points
Once the construction of the salience maps is completed, we propose a method in order to choose the most salient points in the original image.
To do this, we build a tree structure of the salience values from the −r built-up salience maps. In a manner similar to the building of the tree structure of the wavelet coefficients, we can build 2k+l+2r trees of salience coefficients, each having a coefficient of S2
In order to localize the most salient points in I, we carry out:
In order to select this branch, it is proposed to scan the tree from the root. During this scan a selection is made, at each level of the tree, of the offspring node having the greatest salience value (cf.
SalientBranch={s2r(x1,y1),s2r+1(x2,y2), . . . , s2−1(x−r,y−r)}
with
(xk,yk)=Arg Max{s2
From the most salient branches of each tree, the pixel of I chosen as being the most representative pixel of the branch is localized at (2x−1, 2y−r). In practice, only a subset of the 2k+1+2r trees is scanned. Indeed, for many applications, a search is made for a fixed number n of salient points. In this case, it is appropriate to scan only the n trees having the most salient roots.
In this section, we use the technical elements presented in the previous section for which we set the necessary parameters in order to describe a particular embodiment.
6.1 Choice of Wavelet Transformation
As mentioned in section 5.1, we must first of all choose a wavelet base and a minimum resolution level 2r (r≦−1).
For this particular embodiment, we propose to use the Haar base and r=−4.
The Haar base is defined by:
for the scale function, and by:
for the wavelet function.
6.2 Construction of the Tree Structure of the Wavelet Coefficients
In this step, no parameter whatsoever is required. The process is therefore compliant with what is described in section 5.1.
6.3 Construction of the Salience Maps
In this step, we must choose the parameters αk (−1≧k≧r) used to adjust the importance given to the salience coefficients according to the level of resolution to which they belong.
In this particular embodiment, we propose to use
6.4 Choice of the Salient Points
This step requires no parameter. The process is therefore compliant with what is described in section 5.4.
6.5 Experimental Results
The results obtained on natural images by using the parameters proposed in this particular embodiment are illustrated in
6.6 Example of Application
Among the potential applications listed in the section 4, this section presents the use of salient points for the indexing of images fixed by the content.
6.6.1 Purpose of Image Indexing
Image indexing by content enables the retrieval, from an image database, of a set of images visually similar to a given image called a request image. To do this, visual characteristics (also called descriptors) are extracted from the images and form the signature of the image.
The signatures of the images belonging to the database are computed off-line and are stored in the database. When the user frequently submits a request image to the indexing engine, the engine computes the signature of the request image and cross-checks this signature with the pre-computed signatures of the database.
This cross-checking is made by computing the distance between the signature of the request image and the signatures of the database. The images most similar to the request image are then those whose signature minimizes the computed distance.
The difficulty of image indexing then lies entirely in determining descriptors and robust distances.
6.6.2 Descriptors Based on the Salient Points of an Image
In this section, we propose to compute the signature of an image from a fixed number of salient points. This approach draws inspiration from [9].
A colorimetrical descriptor and texture descriptor are extracted at the vicinity of each of the salient points. The colorimetrical descriptor is constituted by the 0 order (mean), 1st order (variance) and 2nd order moments in a neighborhood sized 3×3 around each salient point. The texture descriptor is constituted by the Gabor moments in a neighborhood sized 9×9.
Once the signature of the request image R has been computed, the distance D(R,Ij) between this signature and the signature of the jth image Ij in the database is defined by:
where N denotes the number of images in the database and Sj(fi) is defined by:
Sj(fi)=(xi−qi)T(xi−qi)
where xi and qi respectively designate the ith descriptor (for example i=1 for the colorimetrical descriptor and i=2 for the texture descriptor) of the jth image of the base and of the request R. The weights Wi make it possible to modulate the importance of the descriptors relative to each other.
A.1 Introduction
Wavelet theory [1] [2] [3] enables the approximation of a function (a curve, surface, etc.) at different resolution levels. Thus, this theory enables a function to be described in the form of a coarse approximation and of a series of details enabling the perfect reconstruction of the original function.
Such a multi-resolution representation [1] of a function therefore enables the hierarchical interpretation of the information contained in the function. To do this, the information is reorganized into a set of details appearing at different resolution levels. Starting from a sequence of resolution levels in ascending order (rj )j∈Z, the details of a function at the resolution level rj are defined as the difference of information between its approximation at the resolution rj and its approximation at the resolution rj+1.
A.2 Notation
Before presenting the bases of multi-resolution analysis in greater detail, in this section we shall present the notation that will be used in the document.
)du.
A.2 Properties of Multi-Resolution Analysis
This section intuitively presents the desired properties of the operator enabling the multi-resolution analysis of a function. These properties come from [1].
Let A2
The following are the properties expected from A2
1. A2
2. Among all the possible approximations of ƒ(x) with the resolution 2j, A2
3. The approximation of a function at the resolution 2j+l contains all the information necessary to compute the same function at the lower resolution 2j. This property of causality induces the following relationship:
∀j ∈ Z, V2
4. The operation of approximation is the same at all values of resolution. The spaces of the approximation function may be derived from one another by a change of scale corresponding to the difference of resolution.
∀j ∈ Z,ƒ(x) ∈ V2ƒ(2x) ∈ V2
5. When an approximation of ƒ(x) at the resolution 2j, is computed, a part of the information contained in ƒ(x) is lost. However, when the resolution tends toward infinity, the approximate function must converge on the original function ƒ(x). In the same way, when the resolution tends toward zero, the approximate function contains less information and must converge on zero.
Any vector space (V2
A.3 Multi-Resolution Analysis of a ID Function
A.3.1 Search for a Base of V2,
We have seen in section A.2 that the approximation operator A2
V2
One of the main theorems of the theory of wavelets stipulates that there is a single function Φ(x) ∈ L2(R), called a scale function, from which it is possible to define 2j base functions Φij(x) de V2
Φij(x)=Φ(2jx−i), i=0, . . . , 2j−1.
Approximating a function ƒ(x) ∈ L2(R) at the resolution 2j therefore amounts to making an orthogonal projection ƒ(x) on the 2j basic functions Φij(x). This operation consists in computing the scalar product of ƒ(x) with each of the 2j basic functions Φij(x):
It can be shown [1] that A2
A2
Since Φ(x) is a low-pass filter, A2
A.3.2 Construction of the Multi-Resolution Analysis
In practice, the functions ƒ to be approximated (signal, image, etc.) are discrete. Let it be assumed that the original function ƒ(x) is defined on n=2k(k ∈ Z) samples. The maximum resolution of ƒ(x) is then n.
Let Anƒ be the discrete approximation of ƒ(x) at the resolution level n. According to the property of causality, it is claimed (cf. section A.2) that A2
Indeed, in computing the projection of the 2j basic functions Φij (x) of V2
A.3.3 The Detail Function
As mentioned in the property (5) of section A.3, the operation which consists in approximating a function ƒ(x) at the resolution 2j on the basis of an approximation at the resolution 2j+l causes a loss of information.
This loss of information is contained in a function called a detail function at resolution level 2j and referenced D2
The detail function at the resolution level 2j is obtained by projecting the original function ƒ(x) orthogonally on the orthogonal complement of V2
To calculate this projection numerically, we need to find an orthonormal base of W2
Ψij(x)=Ψ(2jx−i), i=0, . . . , 2j−1.
In the same way as for the construction of the approximation A2
D2
A.4.5 Extension to the Multi-Resolution Analysis of 2D Functions
This section presents the manner of extending multi-resolution analysis by wavelets to the functions of L2 (R2) such as images.
This is done by using the same theorems as the ones used earlier. Thus if V2
Φij(x,y)=Φ(2jx−i,2jy−j), (i,j) ∈ Z2.
In the particular case of the separable approximations of L2(R2), we have Φ(x,y)=Φ(x)Φ(y) where Φ(x) is a scale function of L2(R). In this case, the multi-resolution analysis of a function of L2(R2) is done by the sequential and separable processing of each of the dimensions x and y.
As in the 1D case, the detail function at the resolution 2j is obtained by an orthogonal projection of ƒ(x,y) on the complement of V2
Ψ1(x,y)=Φ(x)Ψ(y)
Ψ2(x,y)=Ψ(x)Φ(y)
Ψ3(x,y)=Ψ(x)Ψ(y)
are wavelet functions of L2 (R2 ). Expanding and translating these three wavelet functions gives an orthonormal base of W2
Ψj1(x,y)=ΦΨ(2jx−k,2jy−l)
Ψ2(x,y)=ΨΦ(2jx−k,2jy−l)
Ψ3(x,y)=ΨΨ(2jx−k,2jy−l).
The projection of f(x,y) on these three base functions of the base of W2
D2
D2
D2
| Number | Date | Country | Kind |
|---|---|---|---|
| 02/16929 | Dec 2002 | FR | national |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/FR03/00834 | 3/14/2003 | WO | 6/23/2006 |