The invention relates to a method and arrangement for detecting a watermark in a signal, the method comprising the steps of computing a correlation between a sequence of signal samples and a predetermined watermark, and detecting whether said correlation exceeds a given threshold.
Watermarks are imperceptible messages embedded in the content of information signals such as audio or video. Watermarks support a variety of applications such as monitoring and copy control. A watermark is generally embedded in a signal by modifying samples of the signal according to respective samples of the watermark. The term “samples” refers to signal values in the domain in which the watermark is embedded.
A prior art watermark embedding and detection system for audio is disclosed in Jaap Haitsma, Michiel van der Veen, Ton Kalker and Fons Bruekers: “Audio Watermarking for Monitoring and Copy Protection”, ACM Multimedia Conference, Oct. 30-Nov. 4, 2002, pp. 119-122. The audio signal is segmented into frames and transformed to the frequency domain. A watermark sequence is embedded in the magnitudes of the Fourier coefficients of each frame. The detector receives the time-domain version of the watermarked audio signal. The received signal is segmented into frames and transformed to the frequency domain. The magnitudes of the Fourier coefficients are cross-correlated with the watermark sequence. If the correlation exceeds a given threshold, the watermark is said to be present. The expression “sequence of signal samples” defined in the opening paragraph refers to the magnitudes of the Fourier coefficients of an audio frame in this case.
A prior-art watermark embedding and detection system for video is disclosed in Ton Kalker, Geert Depovere, Jaap Haitsma and Maurice Maes: “A Video watermarking System for Broadcast Monitoring”, Proceedings of SPIE, Vol. 3657, January 1999, pp. 103-112. In this system, the watermark is embedded in the pixel domain. The watermark sequence is a 128×128 watermark pattern, which is tiled over an image. The watermark detector correlates 128×128 image blocks with the watermark pattern. If the correlation is sufficiently large, the watermark is said to be present. The expression “sequence of signal samples” defined in the opening paragraph refers to image blocks of 128×128 pixels in this case.
Watermark detection algorithms can be sensitive to attacks or specific signal conditions, such as a strong single tone present in or added to an audio signal, or a strong logo present on a fixed position in every video frame or white subtitle letters at the bottom of every frame.
It is an object of the invention to improve the performance of the prior-art watermark detection method.
To this end, the method according to the invention is characterized in that the method includes pre-processing of said sequence of signal samples, said pre-processing comprising the steps of:
The method according to the invention effectively suppresses large signal peaks while maintaining the small signal variations representing the watermark. This is achieved without knowing or detecting the location of the disturbing component in the signal.
The invention is particularly effective if the watermark detection method includes accumulation of plural signal sequences. Such an accumulation normally improves the detection reliability (the watermark sequences add up whereas the signal is averaged), but this is no longer the case if the signal includes the same disturbing component in substantially all accumulated sequences. In a preferred embodiment of the method according to the invention, the pre-processing is applied to said accumulated sequences. It is thereby achieved that the disturbing component is effectively removed from the accumulated sequences.
In an advantageous embodiment of the method according to the invention, the sequence of signal samples is divided into overlapping, preferably windowed, sub-sequences. A suitable window is the well-known Hanning window, or the square root of the Hanning window. An overlap of 50% has been found to give good results. The concatenated sequence to be correlated with the watermark is obtained by adding the weighted sub-sequences.
Advantageously, the step of weighting comprises Fourier transforming the sub-sequence of signal samples, normalizing the magnitudes of the Fourier coefficients, and back-transforming the normalized coefficients. Alternatively, the step of weighting comprises dividing all signal samples of a sub-sequence by the largest signal sample of said sub-sequence. The second option, i.e. scaling, has a lower arithmetic complexity than the first option where weighting is obtained by normalizing the magnitudes in the frequency domain. In both embodiments, the sequence is adaptively weighted, based on properties of the signal.
These and other aspects of the invention are apparent from and will be elucidated with reference to the accompanying drawings, in which:
The invention will now be described with reference to the detection of a watermark embedded in an audio signal. An embedding arrangement will first be described to provide background information.
Wi(k)=Ws(k)Xi(k)
where i indicates the frame or sequence number, Xi(k) the spectral representation of a frame xi(n), Ws(k) the cyclically shifted version of W(k), and Wi(k) the resulting frequency domain watermark. An inverse Fourier transform 106 is used to obtain the time domain watermark representation w(n).
In a segmentation unit 11 of the accumulation stage, the arrangement segments the suspect audio signal y(n) into frames or sequences yi(n) of 2048 audio samples. Each sequence is Fourier transformed (12) and the magnitudes of the Fourier coefficients Yi(k) are computed (13). The magnitudes of Fourier coefficients of frame i constitute a sequence |Y|i(k) of 1024 real numbers in which the watermark information has been embedded. In the preferred embodiment of the arrangement, a plurality of such sequences |Y|i(k) is accumulated, by an accumulator 14, to obtain an accumulated sequence Y(k). The number of sequences being accumulated is chosen to represent a period of say, 2 seconds of the audio signal.
The correlation stage 3 will now briefly be described. For a detailed description of watermark detection using correlation, reference is made to International Patent Application WO 99/45707. The correlation stage calculates a correlation C between an accumulated sequence of signal samples (note that “signal samples” in this example refers to magnitudes of Fourier coefficients) and every possible shifted version of the watermark sequence W(k). The correlation stage receives a sequence Z(k). It will initially be assumed that the correlation stage receives the accumulated sequence directly from the accumulation stage 1, i.e. Z(k)=Y(k).
The cross-correlation for every possible shifted version of W(k) is calculated most efficiently using the Fourier transform. The traditional cross-correlation may be written as:
C=F−1(F(Z(k))×F*(W(k)))
where F(.) denotes the Fourier transform, F*(.) the Fourier transform including conjugation of the complex Fourier coefficients, and F−1(.) the inverse Fourier transform. The respective transforms are carried out by Fourier transform circuits 31, 32 and 33 in
The detection performance is enhanced by Symmetrical Phase Only Filtering (SPOMF). In this cross-correlation procedure, only phase information of the signals F(Z(k)) and F*(W(k)) is used. The phase-only operation is defined as:
and is carried out by respective phase extraction circuits 35 and 36 in
A peak detector 4 determines whether the cross-correlation function C exhibits a peak value ρ which is larger than a given detection threshold (for example, 5σ, where σ is the standard deviation of the correlation function). In that case, the watermark W(k) is said to be present. The peak detector also retrieves the position of said peak value, which corresponds to the amount of shift being applied to the watermark W(k), and thus represents the 10-bit payload d. However, this aspect is not relevant to the invention.
A possible solution to overcome the problem is to ignore parts of the signals, for example: parts of video frames or parts of the audio spectrum, where the disturbing components are present. For example, the location of a logo in a video signal may be known in advance, so that the corresponding pixels can be ignored. Or, if an audio watermark detector is observing an FM radio station, the frequencies close to the carrier wave can be ignored. Ignoring parts of a signal can be seen as applying a more or less abrupt weighting function to the signal. However, the location of disturbing components is generally unknown. Some kind of mechanism is desired to adapt the weighting function to the signal.
To this end, the arrangement for detecting the watermark in accordance with the invention includes a pre-processing stage 2 between accumulation stage 1 and correlation stage 3 (cf.
The sub-segmentation unit 21 divides the accumulated sequence Y(k) into a plurality of possibly overlapping and windowed sub-sequences A(k). For audio signals, where the sequence Y(k) comprises 1024 signal samples, a sub-sequence length of 16 samples has been found to be a good choice.
The weighting circuit 22 subjects each individual sub-sequence to a weighting function. The weighting function is chosen to be such that the distribution of the signal samples over the whole sequence is substantially flat while the original variations of signal samples within each sub-sequence are retained. The expression “substantially flat” may mean, for example, that the mean value of the signal samples of a sub-sequences is the same for all the sub-sequences.
In one embodiment, this is achieved by normalizing the magnitudes of each sub-sequence in the frequency domain. To this end, the weighting circuit performs the following operation:
B(k)=F−1(P(F(A(k))) (1)
where F(.) denotes the Fourier transform, P(.) denotes the phase only operation as defined above, and F−1(.) denotes the inverse Fourier transform.
In another embodiment, the weighting is carried out by the following scaling operation:
where Ak and Bk denote samples of the original sub-sequence A(k) and the weighted sub-sequence B(k), respectively, and |Ak| is the largest absolute value of the signal samples of sub-sequence A(k).
The weighted sub-sequences B(k) are subsequently concatenated by the concatenation circuit 23, to obtain the pre-processed sequence Z(k). If the sub-sequences overlap each other, suitable windows (e.g. Hanning windows) are preferably applied on B(k). It is the pre-processed sequence Z(k) that is input to the correlation stage 2.
The improvement achieved with the watermark detection method according to the invention is shown in
In the embodiments described above, the watermark is represented by slight modifications of the magnitudes of Fourier coefficients, i.e. in the frequency domain. However, it will be appreciated that the invention is equally applicable to detection of a watermark being embedded in the temporal or spatial (video) domain.
A watermark detection method is disclosed which is based on computing the cross-correlation between a suspect signal and a watermark. In order to be more robust against prolonged dominant signal components that adversely affect the correlation, the sequence of signal samples (61) to be correlated with the watermark is divided into sub-sequences (A(k)). The sub-sequences are processed, by a weighting function, to obtain modified sub-sequences (B(k)) that individually exhibit the original signal variations, but collectively (62) exhibit a flatter distribution of sample values. Dominant peaks in the signal are thereby substantially reduced.
Number | Date | Country | Kind |
---|---|---|---|
02077982.3 | Jul 2002 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB03/03095 | 7/7/2003 | WO | 1/18/2005 |