1. Field of the Invention
The present invention relates to a method and apparatus for representing an image, and, in addition, a method and apparatus for comparing or matching images, for example, for the purposes of searching or validation.
2. Description of the Background Art
This invention relates to improvements upon the image identification technique described in co-pending European patent application EP 06255239.3. The contents of EP 06255239.3 are incorporated herein by reference. Details of the invention and embodiments in EP 06255239.3 apply analogously to the present invention and embodiments.
The image identification method and apparatus described in EP 06255239.3, which extracts a short binary descriptor from an image (see
However, in practical applications higher detection rates are desirable. in particular, it would be desirable to increase the average detection rate to above 98%, and also to significantly improve robustness to noise and histogram equalisation modifications.
According to a first aspect, the present invention provides a method of deriving a representation of an image as defined in accompanying claim 1.
Further aspects of the present invention include use of a representation of an image derived using a method according to a the first aspect of the present invention, an apparatus for performing the method according to the first aspect of the present invention, and computer-readable storage medium comprising instructions which, when executed, perform the method according to the first aspect of the present invention.
Preferred and optional features of embodiments of the present invention are set out in the dependent claims.
The present invention concerns a new method of extracting visual identification features from the Trace transform of an image (or an equivalent two-dimensional function of the image). The method may be used to create a multi-resolution representation of an image by performing region-based processing on the Trace transform of the image, prior to extraction of the identifier e.g. by means of the magnitude of the Fourier Transform.
In the present application, the term “functional” has its normal mathematical meaning. In particular, a functional is a real-valued function on a vector space V, usually of functions. In the case of the Trace transform, functionals are applied over lines in the image.
In the method described in co-pending patent application EP 06255239.3 the Trace transform is computed by tracing an image with straight lines along which certain functional T of the image intensity or colour function are calculated. Different functionals T are used to produce different Trace transforms from a single input image. Since in the 2D plane a line is characterised by two parameters, distance d and angle θ, a Trace transform of the image is a 2D function of the parameters of each tracing line. Next, the “circus function” is computed by applying a diametrical functional P along the columns of the Trace transform. A frequency representation of the circus function is obtained (e.g. a Fourier transform) and a function is defined on the frequency amplitude components and its sign is taken as a binary descriptor.
A method according to embodiments of the present invention may use similar techniques to derive a representation of an image. However, a reduced resolution function of the image is derived, such as a reduced resolution Trace transform, prior to performing further steps to derive the representation of the image (e.g. binary descriptor). The reduction in resolution should preserve the essential elements that are unique to the image (i.e. its visual identification features), whilst reducing the quantity of data for processing. Typically, the derived reduced resolution function of the image, incorporates, by processing, representative values for selected or sampled parts of the image, as will be apparent from the description below.
According to one embodiment of the present invention, the reduced resolution function of the image is derived by tracing the image with sets of lines, where the parameters of these lines are of a predetermined interval Δd and/or Δθ, and deriving a Trace transform (or equivalent) using all of the sets of lines (instead of all lines across the image). The lines may correspond to strips (as illustrated in
According to another embodiment of the present invention, the Trace transform (or equivalent) is first derived in the conventional manner, by tracing all lines across the image. The Trace transform of the image is then traced with strips at different values of the angle parameter θ, and resolution reduction is performed over intervals of the distance parameter d (as illustrated in
Advantageously, the method of this embodiment of the present invention can be implemented very efficiently by implicitly computing the Trace transform values along strips and/or cones in the Trace transform domain, as explained in further detail below.
As in the method disclosed in co-pending patent application EP 06255239.3, a method according to an embodiment of the present invention combines selected fragments from a ‘family’ of identifiers obtained by using different functionals. In addition, in some embodiments, identifiers obtained with strips and/or double cones are combined into a single descriptor. In addition, strips of different width and/or cones of different opening angle are used, in some embodiments, to obtain a multi-resolution representation.
Embodiments of the invention will be described with reference to the accompanying drawings, of which:
a shows an image;
b shows a reduced version of the image of
c shows a rotated version of the image of
d shows a blurred version of the image of
a-c illustrate functions derived from different versions of an image;
Various embodiments for deriving a representation of an image, specifically an image identifier, and using such a representation/identifier for the purposes of, for example, identification, matching or validation of an image or images, will be described below. The present invention is especially useful for, but is not restricted to, identifying an image. In the described embodiments, an “image identifier” (sometimes simply “identifier”) is an example of a representation of an image and the term is used merely to denote a representation of an image, or descriptor.
The skilled person will appreciate that the specific design of an image identification apparatus and method, according to an embodiment of the present invention, and the derivation of an image identifier for use in image identification, is determined by design requirements. Such design requirements relate to the type of image modifications that the image identifier should be robust to, the size of the identifier, extraction and matching complexity, target false-alarm rate, etc.
The following embodiment illustrates a generic design that results in an identifier that is robust to the following modifications to an image (this is not an exhaustive list):
Colour reduction,
Blurring,
Brightness Change,
Flip (left-right & top-bottom),
Greyscale Conversion,
Histogram Equalisation,
JPEG Compression,
Noise,
Rotation and
It has been found that this generic design typically can achieve a very low false-alarm rate of 1 part per million (ppm) on a broad class of images.
An embodiment of the invention derives a representation of an image, and more specifically, an image identifier, by processing signals corresponding to the image.
In the initial stage of extraction, the image is pre-processed by resizing (step 110) and optionally filtering (step 120). The resizing step 110 is used to normalise the images before processing. The step 120 can comprise of filtering to remove effects such as aliasing caused by any processing performed on the image and/or region selection rather than using the full original image. In a preferred embodiment of the method, a circular region is extracted from the centre of the image for further processing.
In step 130, a Trace transform T(d, θ) is performed. The trace transform projects all possible lines over an image and applies one or more functionals over these lines. As previously stated, a functional is a real-valued function on a vector space V, usually of functions. In the case of the Trace transform a functional is applied over lines in the image. As shown in
In one particular example of the method, the Trace transform T(d, θ) of an image is extracted with the trace functional T
∫ξ(t)dt, (1)
and the circus function is obtained by applying the diametrical functional P
max(ξ(t)). (2)
Examples of how the circus function is affected by different image processing operations can be seen in
It can be shown that for the majority of image modification operations listed above, and with a suitable choice of functionals T, P, the circus function f(a) of image a is only ever a shifted or scaled (in amplitude) version of the circus function f(a′) of the modified image a′ (see Section 3 of reference [1] infra).
f(a′)=κf(a−θ). (3)
According to the method described in co-pending European patent application EP 06255239.3, frequency components of a frequency representation of the circus function may be used to derive an image identifier. It will be appreciated that other techniques for deriving an image descriptor are possible, and may be used in conjunction with the present invention. In one example, the image identifier may be derived from a Fourier Transform (or equally a Haar Transform) of the circus function.
Thus, by taking the Fourier transform of equation (3) gives:
Then taking the magnitude of equation (6) gives
|F(Φ)|=|κF[f(a)]|. (7)
From equation (7) it can be seen that the modified image and original image are now equivalent except for the scaling factor κ.
According to the example, a function c(ω) is now defined on the magnitude coefficients of a plurality of Fourier transform coefficients. One illustration of this function is taking the difference between each coefficient and its neighbouring coefficient
c(ω)=|F(ω)|−|F(ω+1)| (8)
A binary string can be extracted by applying a threshold to the resulting vector (equation 8) such that
The image identifier is then made up of these values B={b0, . . . , bn}.
To perform identifier matching between two different identifiers B1 and B2, both of length N, the normalised Hamming distance is taken
where is the exclusive OR (XOR) operator. Other methods of comparing identifiers or representations can be used.
The performance may be further improved by selection of certain bits in the identifier. The bits corresponding to the lower frequencies are generally more robust and the bits corresponding to the higher frequencies are more discriminating. In one particular embodiment of the invention the first bit is ignored and then the identifier is made up of the next 64 bits.
In accordance with one embodiment of the present invention, step 140 of decomposing the two dimensional function of the image, resulting from the Trace transform (or equivalent) involves reducing the resolution thereof. The reduced resolution may be achieved by processing in either of its two dimensions, d or θ, or in both dimensions.
Thus, the resolution may be reduced in the distance dimension in the “Trace-domain” by sub-sampling the d-parameter e.g. by summing or integrating over intervals for d along the columns (corresponding to values for θ), as in
Alternatively, or additionally, the resolution may be reduced in the angle dimension in the “Trace domain” by sub-sampling the θ parameter e.g. by summing or integrating over intervals for θ along the rows (corresponding to values for d), as in
In accordance with another embodiment of the present invention, the step 140 of decomposing could be performed in the “image domain” i.e. after step 120 and typically in combination with step 130 of
As the skilled person will appreciate, other techniques for decomposing in the image domain are possible.
An example of an apparatus according to an embodiment of the invention for carrying the above methods is shown in
The basic identifier described previously can be improved by using multiple reduced resolution Trace transforms to derive respective identifiers and combining bits from the separate identifiers as shown in
Good results may be obtained in this way by using the Trace functional T in equation (1) supra with the diametrical functional P given by equation (2) supra for one binary string and then Trace functional (1) with the diametrical functional (11)
∫|ξ(t)′|dt, (11)
to obtain the second string. The first bit of each binary string is skipped and then the subsequent 64 bits from both strings are concatenated to obtain a 128 bit identifier.
Significant performance improvements may be obtained by using a multi-resolution representation of the Trace transform, in accordance with the present invention. In particular, decomposition may be performed in one or two dimensions. The diametrical functional can then be applied and the binary string extracted as previously. Typical results show that using the decomposition improves the detection rates at a false error rate of 1 part per million from around 80% to 98%.
This multi-resolution Trace transform may be created by sub-sampling an original Trace transform, to reduce its resolution, in either of its two dimensions, d or θ, or in both dimensions, as described above. In the “Trace-domain” sub-sampling the d-parameter is performed by e.g. integrating over intervals along the columns, as in
Multiple basic identifiers can be extracted from one Trace transform by using a multi-resolution decomposition, where sub-sampling takes place over a range of different interval widths to generate the multi-resolution representation composed of the multiple basic identifiers. Ideally, the multi-resolution representation uses multiple identifiers derived using a range of interval widths. For instance, each interval width may be at least a factor of two different from other interval widths. Good results were typically obtained by using a system, where the output of the trace transform is of size 600×384, and then the d-parameter is sub-sampled by integrating using bands of widths 8, 16, 32, 64 & 128, similarly the O-parameter is sub-sampled by e.g. integrating using bands of widths 3, 6, 12, & 24.
One application of the identifier is as an image search engine. A database is constructed by extracting and storing the binary identifier along with associated information such as the filename, the image, photographer, date and time of capture, and any other useful information. Then given a query image aq the binary identifier is extracted and is compared with all identifiers in the database B0 . . . Bm. All images with a Hamming distance to the query image below a threshold are returned.
A range of different Trace and diametrical functionals can be used, for example (a non-exhaustive list):
Two or more identifiers can be combined to better characterise an image. The combination is preferably carried out by concatenation of the multiple identifiers.
For geometric transformations of higher order than rotation, translation and scaling the version of the identifier described above is not appropriate; the relationship in equation (3) does not hold. The robustness of the identifier can be extended to affine transformations using a normalisation process full details of which can be found in reference [2] infra. Two steps are introduced to normalise the circus function, the first involves finding the so called associated circus, then the second step involves finding the normalised associated circus function. Following this normalisation it is shown that the relationship in equation (3) is true. The identifier extraction process can now continue as before.
Some suitable Trace functionals for use with the normalisation process are given below in (G1) & (G2), a suitable choice for the diametrical functional is given in (G3).
where r≡t−c, c≡median({tk}k,{|g(tk)|}k). The weighted median of a sequence y1, y2, . . . , yn with nonnegative weights w1, w2, . . . , wn is defined by identifying the maximal index m for which
assuming that the sequence is sorted in ascending order according to the weights. If the inequality (12) is strict the median is ym. However, if the inequality is an equality then the median is (ym+ym-1)/2.
Rather than constructing the identifier from a continuous block of bits the selection can be carried out by experimentation. One example of how to do this is to have two sets of data i) independent images ii) original and modified images. The performance of the identifier can be measured by comparing the false acceptance rate for the independent data and false rejection rate for the original and modified images. Points of interest are the equal error rate or the false rejection rate at a false acceptance rate of 1×10−6. The optimisation starts off with no bits selected. It is possible to examine each bit one at a time to see which bit gives the best performance (say in terms of the equal error rate or some similar measure). The bit that gives the best result should be selected. Then, all the remaining bits should be tested to find which gives the best performance in combination with the first bit. Again, the bit with the lowest error rate is selected. This procedure is repeated until all bits are selected. In this way, the bit combination that results in the overall best performance can be determined.
A multi-resolution decomposition of the trace transform can be formed as described above by summing or integrating over intervals of the parameter (either d or θ). As indicated above, any statistical technique can be used to achieve decomposition or resolution reduction and other possibilities include calculating statistics such as the mean, max, min etc. Other functionals may also be applied over these intervals.
Moreover, a structure could be applied to the identifier to improve search performance. For example a two pass search could be implemented, half of the bits are used for an initial search and then only those with a given level of accuracy are accepted for the second pass of the search.
The identifier can be compressed to further reduce its size using a method such as Reed-Muller decoder or Wyner-Ziv decoder.
The identifier can also be used to index the frames in a video sequence. Given a new sequence identifiers can be extracted from the frames and then searching can be performed to find the same sequence. This could be useful for copyright detection and sequence identification.
Multiple broadcasters often transmit the same content, for example advertisements or stock news footage. The identifier can be used to form links between the content for navigation between broadcasters.
Image identifiers provide the opportunity to link content through images. If a user is interested in a particular image on a web page then there is no effective way of finding other pages with the same image. The identifier could be used to provide a navigation route between images.
The identifier can be used to detect adverts in broadcast feeds. This can be used to provide automated monitoring for advertisers to track their campaigns.
There are many image databases in existence, from large commercial sets to small collections on a personal computer. Unless the databases are tightly controlled there will usually be duplicates of images in the sets, which requires unnecessary extra storage. The identifier can be used as a tool for removing or linking duplicate images in these datasets.
In this specification, the term “image” is used to describe an image unit, including after processing, such as filtering, changing resolution, upsampling, downsampling, but the term also applies to other similar terminology such as frame, field, picture, or sub-units or regions of an image, frame etc. In the specification, the term image means a whole image or a region of an image, except where apparent from the context. Similarly, a region of an image can mean the whole image. An image includes a frame or a field, and relates to a still image or an image in a sequence of images such as a film or video, or in a related group of images. The image may be a greyscale or colour image, or another type of multi-spectral image, for example, IR, UV or other electromagnetic image, or an acoustic image etc.
In the embodiments, a frequency representation is derived using a Fourier transform, but a frequency representation can also be derived using other techniques such as a Haar transform. In the claims, the term Fourier transform is intended to cover variants such as DFT and FFT.
The invention is preferably implemented by processing electrical signals using a suitable apparatus.
The invention can be implemented for example in a computer system, with suitable software and/or hardware modifications. For example, the invention can be implemented using a computer or similar having control or processing means such as a processor or control device, data storage means, including image storage means, such as memory, magnetic storage, CD, DVD etc, data output means such as a display or monitor or printer, data input means such as a keyboard, and image input means such as a scanner, or any combination of such components together with additional components. Aspects of the invention can be provided in software and/or hardware form, or in an application-specific apparatus or application-specific modules can be provided, such as chips. Components of a system in an apparatus according to an embodiment of the invention may be provided remotely from other components, for example, over the internet.
As the skilled person will appreciate, many variations and modifications can be made to the described embodiments. For example, the present invention can be implemented in embodiments combining implementations of the existing and relating techniques, known to the skilled person. It is intended to include all such variations, modifications and equivalents to the described embodiments, that fall within the scope of the present invention, as defined in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
0700468.2 | Jan 2007 | GB | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/GB2007/004676 | 12/6/2007 | WO | 00 | 8/6/2009 |