The present invention relates generally to image processing and, in particular, to estimating an affine relation between images.
Watermarks are often used to embed information into an image imperceptibly. However, many of the watermarks in general use are destroyed by distortions, such as scaling, rotation, shearing, and anamorphic scaling. Some watermark detection methods are even sensitive to translations.
Various methods have been proposed for detecting watermarks, with some of those methods said to be invariant to affine distortions. However, most methods that are invariant to affine distortions require, at some stage, an extensive search through a space of parameters defining the affine distortions. Such searches may cover many dimensions and typically consume considerable computing capacity.
Recently some methods have been proposed that reduce the search space by performing transformations which first remove the translation, then convert scaling and rotation into further translation effects in two orthogonal directions. These methods are known as RST (Rotation, Scale, and Translation) invariant methods. Typically such techniques require complementarity of the embedding and detection procedures.
Other techniques rely on embedding patterns with special symmetry properties, such as rotational symmetry, and then detecting those patterns by extensive search over the non-symmetric distortion parameter(s).
As none of the aforementioned methods imbue full affine distortion invariance, an extensive search through one or more of the distortion parameter spaces is still required to ensure full invariance. Accordingly, there is a need for a method of estimating the parameters of the affine distortion without requiring an extensive search trough a parameter space.
It is an object of the present invention to substantially overcome, or at least ameliorate, one or more disadvantages of existing arrangements.
Disclosed are arrangements which seek to address the above problems. The arrangements estimate parameters of affine distortions applied to an image by identifying correspondences between intersection points of lines embedded into the image and that of a distorted version thereof.
According to an aspect of the present invention, there is provided a method for estimating an affine relation between first and second images, said images each having at least 4 non-parallel lines therein, said method comprising the steps of:
identifying a first set of intersection points of said lines appearing in said first image;
identifying a second set of intersection points of said lines appearing in said second image;
determining whether a relation between intersection points from said first and second sets of intersection points points exists, wherein if said relation exists then said first and second images are affine related.
According to yet another aspect of the present invention, there is provided a method of detecting an auxiliary pattern in a second image, said second image being an affine distorted version of a first image having said auxiliary pattern and at least 4 non-parallel lines therein, said lines forming a first set of intersection points at predetermined positions, said method comprising the steps of:
identifying a second set of intersection points of said lines appearing in said second image;
identifying relation between intersection points from said first and second sets of intersection points;
estimating said affine distortion parameters using said first and second sets of intersection points;
applying said affine distortions to said auxiliary pattern to form a distorted auxiliary pattern, said affine distortions being defined by said affine distortion parameters; and
detecting said distorted auxiliary pattern in said second image.
According to yet another aspect of the present invention, there is provided an apparatus for implementing any one of the aforementioned methods.
According to another aspect of the present invention there is provided a computer program product including a computer readable medium having recorded thereon a computer program for implementing any one of the methods described above.
Other aspects of the invention are also disclosed.
One or more embodiments of the present invention will now be described with reference to the drawings, in which:
Where reference is made in any one or more of the accompanying drawings to steps and/or features, which have the same reference numerals, those steps and/or features have for the purposes of this description the same function(s) or operation(s), unless the contrary intention appears.
Consider a straight-line n in an image, an example of which is illustrated in
The straight-line n is at an angle βn with the vertical Cartesian axis (Y-axis) and a distance rn from the Cartesian origin in a direction perpendicular to the straight-line n. The straight-line n is uniquely described by the parameter pair {rn,βn}.
Referring again to
Now consider a general affine distortion applied to the image including the straight-line n. The affine distortion may include one or more of rotation, scaling, shearing, reflection, translation, and anamorphic scaling. Let ({tilde over (x)},{tilde over (y)}) be the transformed coordinates in the image of the original point (x,y) after the affine distortion, then point ({tilde over (x)},{tilde over (y)}) may be written as:
wherein aij are parameters defining rotation, scaling, shearing, reflection, and anamorphic scaling, and (x0,y0) defines a translation.
It is assumed that
Furthermore, it is noted that
indicates that a reflection has occurred.
The total distortion applied by the parameters aij may be decomposed into convenient combinations of the prime distortions, namely rotation, anamorphic scaling and shear as follows:
wherein rotation by an angle ω is given by matrix:
anamorphic scaling along the X— and Y-axes is sometimes called aspect ratio change and has the form:
shear in the x direction has the form:
and the transposition, which is applied to place the rotation distortion in the correct quadrant so that |ω|≦45°, is selected from one of the four options:
Affine distortions have the property that any straight line is preserved during affine distortion, along with the parallelism of lines, while lengths and angles change. Accordingly, the straight-line n in the image is transformed to a straight-line in the deformed image, but the straight-line in the deformed image is defined by parameter pair {{tilde over (r)}n,{tilde over (β)}n}, which is typically different from the parameter pair {rn,βn} of the straight-line n before the affine distortions.
From Equations (1) and (2), and using the abbreviations cn=cos βn,sn=sin βn, the following relations exist:
With Equations (8) and (9) including 6 unknown parameters {α11, α12, α21, α22, x0, y0} defining the affine distortion, by having 3 straight-lines in the image, with each straight line n being defined by the parameter pair {rn/βn} by detecting the 3 straight-lines in the affine distorted image, by determining the parameter pair {{tilde over (r)}n,{tilde over (β)}n} for each affine distorted line, and by solving Equations (8) and (9), the parameters {α11, α12, α21, α22, x0, y0} may be calculated. In order to distinguish reflection, a left-right asymmetry is also required when placing the straight-lines into the image.
However, from Equations (8) and (9) it can be seen that, when an affine distortion is applied to a straight-line, then the parameter pair {{tilde over (r)}n,{tilde over (β)}n} is generally transformed in a rather complicated and ambiguous manner. The ambiguity occurs because a translation or scaling in the direction parallel to the straight-line n has no effect on the parameter pair {rn,βn}.
Also, identifying the distorted straight-line corresponding to the straight-line n in the image before affine distortion is non-obvious. To illustrate this,
A number of techniques may be used to uniquely identify the correspondence between each straight-line before and after affine distortion. One technique is to embed lines where the lines themselves are unique, and where the uniqueness of the lines are not destroyed by affine distortions. For example, lines having different styles, colors, etc. may be used to uniquely identify each straight-line.
Another useful property of affine distortions is that the ratio of two segments of a straight-line is invariant with respect to affine distortions. In the preferred implementation it is this property that is used to uniquely identify the correspondence between each straight-line before and after affine distortion.
Consider N straight-lines in an image, each of the straight-lines again being uniquely described by the parameter pair {rn,βn}, n=1→N, with angles βn≠βm when n≠m. Hence the straight-lines are non-parallel.
Because the straight-lines are non-parallel, each straight-line intersects each other straight-line along its length. The maximum number of intersection points for N (non-parallel) straight-lines is the triangular number N(N−1)/2. The point (xkm,ykm) where straight-line m intersects straight-line k is:
wherein λkm is the distance along straight-line k where straight-line m intersects. Similarly, distance λkm is the distance along straight-line m where straight-line k intersects.
Solving the simultaneous equations in Equation (10), the distance λkm is given by:
Consider the case where the number of straight-lines N=4. Each line k would have 3 intersection points (xkm,ykm) with the other 3 lines in. At this stage it is useful to order the distances λkm along straight-line k to the respective intersection points (xkm,ykm) by size as follows:
{λkm}max>{λkm}mid>{λkm}min, m=1→4, m≠k (12)
With segment lengths ξk1 and ξk2 being defined as follows:
ξk1={λkm}max−{λkm}mid; and
ξk2={λkm}mid−{λkm}min (14)
a length ratio Rk for line k is then defined as:
With the parameter pairs {rn,βn} of the straight-lines suitably chosen, the ratio Rk of each line k is distinct from that of every other line m. Because the ratio Rk is invariant with respect to affine distortions, the correspondence between each straight-line before and after affine distortion can be uniquely determined.
Once the corresponding straight-lines have been identified, then the distances λkm along different straight-lines may be compared before and after affine distortion to estimate the scaling factors along each direction represented by the straight-line k. Also, the change in orientation of each straight-line may be used for estimation of a rotation component of the affine distortion.
Consider a particular case of a rotation followed by an anamorphic scaling applied to the original image, and as is defined in Equations (4) and (5) respectively. The anamorphic scaling has a scaling of factor A in a direction parallel to the horizontal Cartesian axis (X-axis), and a scaling of factor B in a direction perpendicular thereto, with the angle ω being the angle of rotation. In general, a line segment of length ln at an angle βn will be mapped to a line segment of length {tilde over (l)}n at an angle {tilde over (β)}n, where:
Equations (16) and (17) contain 3 unknowns, namely scaling factors A and B, and angle ω. Having 4 lines embedded in the original image, and using the distance between the outer intersection points of each line as the length ln as follows:
lk={λkm}max−{λkm}min, k=1→4 (18)
4 pairs of equations are obtained. Such an over-constrained system can be advantageously solved using a least squares method.
It is noted that Equations (16) and (17) are translation invariant, and do not make provision for translation (x0,y0). Also, for all the parameters of the general affine distortion defined in Equation (2) to be calculated, the matrices of Equations (4) and (5) combined, as in Equation (3), but with the shear factor R being 0.
Alternatively, from the correspondence between each straight-line before and after affine distortion, the correspondence between the parameter pairs {rn,βn} and {{tilde over (r)}n,{tilde over (β)}n} is uniquely determined. Substituting Equation (11) into Equation (10), the intersection point (xkm,ykm) is given by:
Because the straight-lines are non-parallel, sin(βk−βm)≠0. In practical situations respective angles βk and βm are chosen such that sin2({tilde over (β)}k−{tilde over (β)}m)≧0.25 holds. Equivalently, the angle at which lines k and m intersect satisfies 30°<|{tilde over (β)}k−{tilde over (β)}m|<150°.
Similarly, the transformed point ({tilde over (x)}km,{tilde over (y)}km) corresponding to the intersection point (xlm,ykm) is given by:
By substituting the parameter pairs {rn,βn} and {{tilde over (r)}n,{tilde over (β)}n} into Equations (19) and (20) respectively, and then solving Equation (2) using the corresponding intersection points (xkm,ykm) and ({tilde over (x)}km,{tilde over (y)}km), the affine distortion parameters {α11, α12, α21, α22, x0,y0} may be obtained.
The preferred method of affine parameter estimation uses least squares fitting (LSF) between the intersection points (xkm,ykm) of the lines and the intersection points ({tilde over (x)}km,{tilde over (y)}km) after affine distortion. This LSF is greatly eased because the correspondence between the intersection points (xkm,ykm) and ({tilde over (x)}km,{tilde over (y)}km) is known. Without the unique identification of the correspondence between intersection points (xkm,ykm) and ({tilde over (x)}km,{tilde over (y)}km) it is necessary to evaluate the LSF for all possible permutations, in this case 6!=720.
Let the detected and (uniquely) ordered intersection points be denoted by ({circumflex over (x)}km,ŷkm). As before the original (undistorted) intersection points are (xkm,ykm), while the distorted intersection points are ({tilde over (x)}km,{tilde over (y)}km). An error energy is defined as the Euclidean norm measure E:
Error energy minimization with respect to a transformation parameter p gives six equations:
with the transformation parameter p being one of the parameters α11, α12, α21, α22, x0 and y0 (Equation (2)) defining the transformation.
Fortunately the 6 equations represented in Equation (22) simplifies greatly because of the partial derivatives with respect to p=αij:
The six equations represented in Equation (22) are then essentially equivalent to matching the following summations:
Filling in the full transform parameters from Equations (22) and (23) gives a pair of 3×3 matrix equations with the same symmetric matrix multiplier M:
The inverse of matrix M is another symmetric matrix M−1, where
Inverting Equations (25) and (26), the final solution for the least squares estimate of all six distortion parameters is explicitly:
The minimum error energy E may be calculated using Equation (21), and may be compared with an allowable limit. If the error energy E is greater than the allowable limit, then it is assumed that an affine match has not been found. It is also possible to calculate maximum point deviations by comparing intersection points (xkm,ykm) and ({tilde over (x)}km,{tilde over (y)}km), which is useful for establishing a reliability measure of the estimated distortion. Another useful parameter, especially with regard to reliability estimation, is the normalized energy; namely the error energy E divided by the centred mean square point distribution G, defined as follows for any set of points n=1→M:
where the centroid of the measured point distribution is defined as (
Accordingly,
The methods 200 and 300 are preferably practiced using a general-purpose computer system 100, such as that shown in
The computer system 100 is formed by a computer module 101, input devices such as a keyboard 102, mouse 103 and an imaging device 122, output devices including a printer 115 and a display device 114. The imaging device 122 may be a scanner or digital camera used for obtaining a digital image. A Modulator-Demodulator (Modem) transceiver device 116 is used by the computer module 101 for communicating to and from a communications network 120, for example connectable via a telephone line 121 or other functional medium.
The computer module 101 typically includes at least one processor unit 105, and a memory unit 106, for example formed from semiconductor random access memory (RAM) and read only memory (ROM). The module 101 also includes an number of input/output (I/O) interfaces including a video interface 107 that couples to the video display 114, an I/O interface 113 for the keyboard 102, mouse 103 and imaging device 122, and an interface 108 for the modem 116 and printer 115. A storage device 109 is provided and typically includes a hard disk drive 110 and a floppy disk drive 111. A CD-ROM drive 112 is typically provided as a non-volatile source of data. The components 105 to 113 of the computer module 101, typically communicate via an interconnected bus 104 and in a manner which results in a conventional mode of operation of the computer system 100 known to those in the relevant art.
Typically, the software is resident on the hard disk drive 110 and read and controlled in its execution by the processor 105. In some instances, the software may be supplied to the user encoded on a CD-ROM or floppy disk and read via the corresponding drive 112 or 111, or alternatively may be read by the user from the network 120 via the modem device 116. Still further, the software can also be loaded into the computer system 100 from other computer readable media.
The method of 200 of detecting parameters of an affine distortion from an image comprising at least 4 non-parallel straight-lines therein and the 300 of embedding the straight-lines using watermarks may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub functions of thereof Such dedicated hardware may include graphic processors, digital signal processors, or one or more microprocessors and associated memories.
Referring first to
When choosing the parameter pair {rnl ,βn} of the lines to be embedded, one option is to choose 4 lines having angles βn that are uniformly distributed over the full angular range, i.e. at 0°, 45°, 90°, 135°.
A number of arrangements of the four straight-lines give highly symmetric results. The arrangement shown is less symmetric. It contains six intersection points (xkm,ykm) that can be distinguished from a reflected and distorted version of itself. Fewer intersections are possible in degenerate arrangements where more than two lines meet at a point, but are not considered here.
Also shown on
In the preferred implementation (e=0.31) 4 straight-lines are embedded having parameters as follows:
where p is set to the image width. Note that the negative values of rn are equivalent to their positive counterparts, provided the corresponding angle βn is rotated by 180°.
The preferred choice of orientation angles βn avoids any coincidence between any one straight-line n and image boundaries by the largest amount.
When the straight-lines are embedded as watermarks, each watermark has to have the property that it can be detected even after the image into which it is embedded has been affine distorted. According to the preferred implementation each pattern has variation in one direction only, that direction being perpendicular to the straight-line n. The pattern also has an axis, which may be an axis of (odd or even) symmetry or a nominated axis, which coincides with the line. The patterns are typically generated from a one-dimensional basis function applied at the direction of variation and repeated parallel to the axis.
In a preferred implementation, the basis function is a complex homogeneous function of the form:
g(v)=|v|p+ia=|v|p exp(iα log[|v|](34)
where v is a one-dimensional coordinate, which is not necessarily positive, while α and p are constants. The basis function g(v) is preferably attenuated in areas where the basis function has a frequency above the Nyquist frequency of the pattern image. Equation (34) may be considered as an amplitude function, amplitude modulating a phase function, with the phase function having a logarithmic phase. When such a complex homogeneous function g(v) is scaled, say by a factor α, the scaling only introduces a complex constant factor as follows:
g(αv)=αp+iag(v) (35)
The advantage of the complex homogeneous function is that the auto-correlation of the complex homogeneous function is directly proportional to the cross-correlation of the complex homogeneous function with a scaled version of the complex homogeneous function. This ‘scale-invariant’ property allows a watermark to be detected in an image even after a scale transformation by correlating the image with the basis function.
g(x,y)=Re{g(x cos βn+y sin βn−rn)}(36)
Note that masking has been applied at the area adjoining the axis of symmetry to remove values with a frequency above the Nyquist frequency of the pattern image. Also illustrated is the relationship of the pattern with the straight-line it embeds. It would be understood that the straight-line itself does not form part of the pattern, and would not be added to the image in the steps that follows. In illustrating the example pattern, pattern values have been mapped to values in the range of [0, 255], with a value of 0 being represented as the colour black and a value of 255 being represented as the colour white.
Referring again to
In order to embed patterns that are imperceptible to a human observer, the pattern image is retrieved from the memory 106 and is perceptually masked in step 320 by the processor 105 in order to greatly reduce the levels of the patterns corresponding to regions of the image having low intensity variation, and reduce by a lesser amount the levels of the patterns corresponding to regions of the image having high intensity variation. An example measure of intensity variation is the local gradient magnitude of the luminance in the image. Other measures include second partial derivatives of the luminance; local estimates of the “energy” or frequency content, local variance, and more sophisticated estimates of human visual system masking.
The perceptually masked pattern image, which may be called a watermark, is added to the image, in step 330. If the image is a colour image, then the watermark is preferably added to the luminance part of a colour image. This allows the watermark to survive when the watermarked image is converted from colour to a greyscale representation. Alternatively, the watermark may be added to one or more of the R, G, B, H, V, S, u, v etc channels of the colour image, or any combination thereof. Apart from simple algebraic addition, addition of the watermark to the image also includes dithering and half-toning. The real and imaginary parts of a complex basis function may be added independently to two or more channels of the colour image.
The image with the embedded watermark may be stored on storage device 109 (
With the straight-lines embedded into the image, and the image having been affine distorted, the straight-lines and their parameter pair {{tilde over (r)}n,{tilde over (β)}n} have to be detected. With the straight-lines embedded using watermarks in the manner described with reference to
Step 205 follows where the processor 105 undoes the perceptual masking by first forming a perceptual mask from the image, and then emphasising the image with the perceptual mask by dividing the values of the image by the corresponding values of the perceptual mask. It is noted that an approximate perceptual mask is adequate.
A projective transform is then applied to the resulting image in step 210. The projective transform accumulates energy by summing values along straight lines in the image. The Radon (or equivalently Hough) transform is one such projective transform that may be used in step 210 and is defined as:
In order to derive a convenient implementation of the Radon transform for a discrete dataset, a useful correspondence between the projection of the image function h(x,y) and the slices of the function's Fourier transform is used, that correspondence being known as the “projection-slice theorem”.
The projection-slice theorem states that the one-dimensional Fourier transform of a projection of a two dimensional function is equal to a radial slice of the two-dimensional Fourier transform of that function. Note that
wherein H(u,v) is the 2-D Fourier transform of image h(x,y). In the quasi polar space the angles are in the range (−π/2,π/2], while distance is in the range (−∞,∞). By defining quasi-polar coordinates (q,φ) in the Fourier domain, the lo coordinate transform is u=q cos φ, v=q sin φ, and one form of the projection-slice theorem is obtained for the Fourier polar angle corresponding to the Radon projection angle φ=θ as follows:
Equation (39) is useful because it allows estimation of (the Fourier transform of) a Radon projection as a radial slice of the 2-D-FFT of a discrete image. This suggests that a discrete Radon transform may be evaluated by first performing a 2-D FFT followed by a Cartesian to polar remapping, using a suitable interpolation—such as bicubic, or chirp-z—to perform the resampling.
The image h(x,y) having embedded therein N patterns based on one-dimensional basis function g(v) having an orientation angle {tilde over (β)}n and perpendicular displacement {tilde over (r)}n, and ignoring the pixel values of the image itself, may be written as:
When the projection transform is applied to a pattern having a variation in only one direction with that direction being at an orientation angle {tilde over (β)}n, then the values of the projection are significantly higher when the angle θ is equal to the orientation angle {tilde over (β)}n compared to all other values of angle θ.
Hence, by applying the Radon transforms on the functions in Equation (40), it can be shown that the Radon transform of such a function is constrained to a line where θ={tilde over (β)}n as shown in the following:
Having concentrated the image function h(x,y) having the embedded patterns g(v) onto N single lines in quasi-polar space by the use of Radon transform, it is further possible to concentrate the energy contained in each of the lines into a single point (or into a small region near a point) by using 1-D quasi-radial correlations (in coordinate r) detection for all values of the polar angle θ.
Accordingly, step 220 follows where the processor 105 performs 1-D correlations between the projection and the basis function g(v) in the quasi-radial coordinate r for all possible values of the polar angle θ. The term “correlation” also includes phase correlation and phase correlation scaled by energy. The resulting correlations have peaks at quasi-polar coordinates ({circumflex over (r)}n,{circumflex over (β)}n).
In step 230 the processor 105 finds the absolute peaks of the correlation. The orientation angle {circumflex over (β)}n and perpendicular displacement {circumflex over (r)}n of each distorted embedded pattern is directly available from the quasi-polar coordinates ({circumflex over (r)}n,{circumflex over (β)}n) of the peaks. Also, the parameters pair {rn,βn} of the embedded straight-lines are available.
The pre-processing stage of the detector finds a number of candidate lines, typically about 64, based on the highest correlation magnitude peaks. From this number, all combinations of four lines are scrutinized. The number of possible combinations is 64!/(60!4!)=635376. If the angles of the four lines do not all differ by at least 15°, then that combination is dismissed and does not pass on to the next stage. Assuming a combination of lines passes this stage, then the six intersection points are calculated using Equation (19) and the four sets of line segment ratios {circumflex over (R)}n, evaluated. The ratios {circumflex over (R)}n are then placed in order of increasing size. A merit function for the closeness of fit to the expected line ratios (also placed in order of increasing size) is then evaluated as:
All combinations with a merit function below a certain threshold (preferably 0.1) are labelled as “candidate combinations” for further processing.
Note that other merit functions could be used, for example the above merit function could be weighted to reflect the line strength (in terms of the line correlation magnitude) and thus reduce contributions from weakly detected lines.
The following processing is then carried out fro each candidate combination.
In step 240 the processor uniquely identifies the correspondence between each straight-line n before and after affine distortion. In the preferred implementation the length ratios {circumflex over (R)}n of the lines after distortion are matched with the length ratios Rn before distortion and in a manner described with reference to Equation (15) are used to uniquely identify the correspondence between each straight-line before and after affine distortion.
Another technique that may be used to uniquely identify the correspondence between each straight-line before and after affine distortion is to use basis functions having different parameters to embed each line. In such an implementation steps 220 and 230 are repeated with each basis function.
Using Equation (20) and in step 250 the processor 105 next calculates the intersection points ({circumflex over (x)}km,ŷkm). Finally, in step 260, the affine distortion parameters {α11,α12,α21,α22,x0,y0} are estimated using LSF described above with reference to Equations (21) to (32).
The final affine distortion parameters are chosen to be those producing the lowest minimum error energy E (Equation (21)) over all candidate combinations.
Once the affine distortion parameters {α11, α12, α21, α22, x0, y0} are estimated, the affine distortions may be inverted, sometimes called rectification or registration.
Once the affine distortions are inverted, additional patterns, named auxiliary patterns, that existed in the image prior to adding the patterns in step 330 (
However, the process of un-distorting and aligning the image involves interpolation and resampling, which are typically computationally expensive. Therefore, it is proposed to avoid rectifying the image by using a complementarily distorted detection template when detecting the auxiliary patterns.
The major advantage of this approach is that the template may be defined in the Fourier domain. Affine distortions in the spatial domain lead to corresponding affine distortions in the Fourier domain, along with linear phase factors related to the translation distortion. The Fourier affine correspondence is well documented in the literature. Essentially the template has the Fourier version of Equation (2) applied. Defining the continuous Fourier transformation as follows, and noting that exactly corresponding relations apply for discretely sampled images and the discrete Fourier transform (DFT) and its more efficient implementation, the fast Fourier transform (FFT):
J(u,v)=∫∫j(x,y)exp(−2π[ux+vy])dxdy (43)
The Fourier transform of a distorted template function is
kJ(ũ,{tilde over (v)})=∫∫j({tilde over (x)},{tilde over (y)})exp(−2π[ux+vy])dxdy (44)
The affine distortion of Equation (2) in the Fourier domain is given by
wherein the factor κ contains a real normalization constant and a linear phase factor, but otherwise does not affect the distorted Fourier template J(ũ,{tilde over (v)}), so that the distorted Fourier template J(ũ,{tilde over (v)}) is used directly in the Fourier domain implementation of correlation detection. Hence, certain templates j(x,y) may be distorted using Equation (45) without the need for interpolation or resampling, if the Fourier template J(ũ,{tilde over (v)}) is defined by a deterministic, analytic functional descriptor. Only the Fourier coordinates u and v need to be distorted prior to the template matching.
The auxiliary patterns may be used to encode some data or a pointer to that data, for example using a URL or pointer to that URL. Also, using the method 200 of detecting the parameters of the affine distortion from the image, the so-called image metadata, which is data about, or referring to some property of that image, is thus bound to the image in that the metadata can be retrieved from the image even if the image is distorted. The distortions that the metadata can resist include the projective transforms above, but also include: Printing, Photocopying/copying, Scanning, Colour removal, gamma correction, gamma change, JPEG compression/general compression, format conversion (ie BMP to GIF), noise addition and removal, filtering, such as low-pass filtering, cropping, and almost any editing operations which maintain the basic recognizability of the image.
Certain applications do not require for the straight-lines to be embedded imperceptibly. Also, the straight-lines may be embedded onto any planar surface by engraving or etching. Such planar surfaces include silicon wafers. The distorted straight-lines may then be detected by some system which may be optical (such as a camera), electromagnetic, or proximity based. The detected straight-lines may be then utilized to determine the surface's position and orientation.
The foregoing describes embodiments based upon the embedding of N straight lines. A revised version of the method 200 may also be applied to images having straight lines therein associated with features in the image. For such inherent lines, it is not possible to constrain, a priori, the four intersection ratios of each possible quartet of lines. It is however possible to compare the extant line ratios in a first image with those in a second image. For each quartet of matching ratios (within some predefined tolerance range) the corresponding affine transform is calculated using steps 240 to 260 of the method 200. Steps 240 to 260 are then repeated for matching ratio quartets. If the corresponding affine transforms are consistent (i.e. the affine parameters show clustering), then it is probable that the two images are related to each other via an affine transformation. Hence it is possible to compare two images and estimate whether the second image is related to the first image via an affine transformation.
Let a first image I1(x,y) be related to a second image I2(x,y) by an affine transformation or spatial distortion (x,y)→({tilde over (x)},{tilde over (y)}), with the affine transformation being that defined in Equation (2). The relation between the first image I1(x,y) and the second image I2(x,y) may be written as follows:
I2(x,y)=μI1({tilde over (x)},{tilde over (y)})+n(x,y) (46)
wherein μ is an image intensity multiplying factor (or gain) and n(x,y) is function which takes account of the difference between the first and second images.
It is known that conventional methods of image matching (using correlation for example) are not easy to implement for more general affine distortions, because a search over a 6 dimensional space (of the 6 affine parameters) is required in general. Therefore, and in accordance with the teachings herein, a more effective way to compare the first image I1(x,y) and the second image I2(x,y) is by comparing the intersection ratios of naturally occurring line structures within the images themselves. Many images contain some line structures or partial line structures, such as straight edges. Preferably such line structures are enhanced, for example by using the modulus of the gradient of pixel intensities. Other operators, such as the Laplacian, may also be used.
The method 400 starts in step 402 where a projective transform, such as the Radon transform, is applied to each of the images I1(x,y) and I2(x,y), and in the manner described in relation to step 210 of the method 200.
In practice it is found that enhancement after the Radon transformation is preferable. Accordingly, step 404 follows where gradient enhancement is performed giving the 2-D distribution function Ωj(r,θ) for each image:
Ωj(r,θ)=|grad[Rθ{Ij(x,y)}] (47)
In step 406 peaks are detected in each of the distribution functions Ωj(r,θ). The objective is to find corresponding quartets of line structures. So by finding the 32 highest peaks within each of the distribution functions Ωj(r,θ), representing the 32 most significant line structures within each of the images I1(x,y) and I2(x,y), the number of possible combinations of 4 lines is 35960.
Step 408 then follows where the processor 105 calculates for each combination of 4 lines the intersection ratios, as is described in detail with reference to the method 200, from the (r,θ) values of the 4 peaks. The 35960 combinations of ratio quartets are placed in ordered sequences in step 410. As before, if any ratio is greater than 1 it is inverted. The sequence is ordered in terms of increasing value. In particular, all combinations of ratio quartets are placed in a four dimensional table for the first image I1(x,y) and a separate table for the second image I2(x,y).
The ratio sequences for the first image I1(x,y) and the second image I2(x,y) are next compared in step 412. In particular, the correspondence of each ordered sequence in the first image I1(x,y) table with those in the second image I2(x,y) table 2 is evaluated, based on Euclidean distance or the merit function shown in Equation (42). Only correspondence within a predefined distance is allowed. The calculation of merit function is rather computationally intensive and is also related to the square of the number of points (sequences) tested. A more efficient approach is to have a coarse binning of the 4-D ratio table (for example bin widths of 0.2 will give 54=625 bins) and just to evaluate merit function of sequences in corresponding or adjacent bins.
If the merit function is less than the predefined amount, then the line quartet is assumed to match and the consequent affine transformation parameters are computed in step 414 from the intersection points using Equations (21) to (32). Step 414 is repeated for all matched sequences satisfying the merit conditions and the 6 affine parameters are tabulated.
Step 416 follows where the likelihood of an affine relation between the first image I1(x,y) and the second image I2(x,y) is estimated. For example, if there is a significant clustering of affine parameter entries in the table, then an affine relation or match between the first image I1(x,y) and the second image I2(x,y) is likely. A more quantitative estimate of the likelihood of affine matching can be derived from the clustering statistics.
If the match between the first image I1(x,y) and the second image I2(x,y) is considered not likely in step 416, then the method 400 ends in step 417 for example by indicating on a user interface that no match between the images I1(x,y) and image I2(x,y) exists. However, if the match between the first image I1(x,y) and the second image I2(x,y) is considered likely, then the first image I1(x,y) is inverse affine distorted in step 418 so that it matches the second image I2(x,y).
Finally, in step 420, the quality of the match is determined by normalised correlation. A value of 0.9 would indicate a good match. Other measures (e.g. visual) may give a better indication of how well the images match and how well the affine distortion parameters have been estimated.
The foregoing describes only some embodiments of the present invention, and modifications and/or changes can be made thereto without departing from the scope and spirit of the invention, the embodiments being illustrative and not restrictive.
In the context of this specification, the word “comprising” means “including principally but not necessarily solely” or “having” or “including”, and not “consisting only of”. Variations of the word “comprising”, such as “comprise” and “comprises” have correspondingly varied meanings.
Number | Date | Country | Kind |
---|---|---|---|
2003906082 | Nov 2003 | AU | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/AU04/01532 | 11/4/2004 | WO | 10/13/2006 |