This application claims the benefit, under 35 U.S.C. §365 of International Application PCT/EP2006/065973, filed Sep. 4, 2006 which was published in accordance with PCT Article 21(2) on Mar. 22, 2007 in English and which claims the benefit of European patent application No. 05090261.8, filed Sep. 16, 2005.
The invention relates to a method and to an apparatus for transmitting or regaining watermark data embedded in an audio signal by using modifications of the phase of said audio signal.
Watermarking of audio signals intends to manipulate the audio signal in a way that the changes in the audio content cannot be recognised by the human auditory system. Most audio watermarking technologies add to the original audio signal a spread spectrum signal covering the whole frequency spectrum of the audio signal, or insert into the original audio signal one or more carriers which are modulated with a spread spectrum signal. There are many possibilities of watermarking to a more or less audible degree, and in a more or less robust way. The currently most prominent technology uses a psycho-acoustically shaped spread spectrum, see for instance WO-A-97/33391 and U.S. Pat. No. 6,061,793. This technology offers a good compromise between audibility and robustness, although its robustness is not optimum.
In an other technology the encoded data, i.e. the watermark, is hidden in the phase of the original audio signal by phase coding: W. Bender, D. Gruhl, N. Morimoto, A. Lu, “Techniques for Data Hiding”, IBM Systems Journal 35, Nos. 3&4, 1996, pp. 313-336.
A further technology is phase modulation:
S. S. Kuo, J. D. Johnston, W. Turin, S. R. Quackenbusch, “Covert Audio Watermarking using Perceptually Tuned Signal Independent Multiband Phase Modulation”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2002, vol. 2, IEEE Press, pp. 1753-1756.
However, for some types of audio signals it is not possible to retrieve and decode the spread spectrum at decoder side. If carriers modulated with spread spectrum sequences are used, it is possible to easily remove the carriers by applying notch filters.
A disadvantage of the above phase coding technique is that it is neither robust against cropping nor achieves an acceptable data rate, and both phase related techniques need the original audio signal for decoding and therefore the detector works in a non-blind manner.
The problem to be solved by the invention is to increase the watermark detection reliability at decoder side and to improve the robustness of the watermark signal, thereby still allowing blind detector operation in the decoder. This problem is solved by the methods disclosed in claims 1 and 3. Apparatuses that utilise these methods are disclosed in claims 2 and 4.
The invention uses phase modification of the audio signal for embedding the watermark signal data. A blind detection at decoder side is feasible, i.e. the original audio signal is not required for decoding the watermark signal. In the spectral domain, the phase of the audio signal can be manipulated by the phase of a reference phase sequence (e.g. a spread spectrum sequence or an m-sequence or a pseudo-random distribution of phase values between and including ‘−π’ and ‘+π’). This may include splitting the audio signal in overlapping blocks, transforming these blocks with the Fourier or any other time-to-frequency domain transform and changing the original phase based on pseudo-random numbers of a reference phase sequence and a model of the human auditory system, inversely (Fourier) transforming the phase-changed spectrum back into the time domain and carrying out an overlap/add on the blocks. The resulting changed audio signal sounds like the original one.
Because a change of the audio signal phase over the whole frequency range can be audible, a strong (e.g. −π/+π) phase manipulation is carried out only within one or more small frequency ranges which are located in the higher frequencies and/or in noisy audio signal sections, the corresponding frequency ranges being determined according to psycho-acoustic principles.
In a further embodiment, in the remaining frequency ranges the phase values can be changed, too, the allowable extent of the phase changes being controlled according to psycho-acoustic principles. In addition, the amplitude of (less audible) spectral bins can be changed according to psycho-acoustic principles in order to allow even greater (non-audible) phase changes.
The watermarked audio signal is decoded at decoder side by correlating the received audio signal with corresponding inversely (Fourier) transformed candidate reference phase sequence which had been used in the encoding, or by using a matched filter instead of correlation.
The invention achieves a good compromise between robustness and audibility, achieves a high data rate, facilitates a real-time processing and is suitable for embedded systems.
In principle, the inventive method is suited for watermarking data embedded in an audio signal by using modifications of the phase of said audio signal, said method including the steps:
In principle the inventive apparatus is suited for watermarking data embedded in an audio signal by using modifications of the phase of said audio signal, said apparatus including:
In principle the inventive watermark decoding is suited for regaining watermark data that were embedded in an audio signal by using modifications of the phase of said audio signal, wherein the value of a current bit of said watermark data was controlled by the selection or the generation of a corresponding reference data sequence and, according to said corresponding reference data sequence, phase values in a current time-to-frequency domain converted block of said audio signal were modified, whereby within said current block the allowable frequency range or ranges for said phase value modification by a pre-determined maximum amount was determined by psycho-acoustic related calculations, and the modified version of said current block of said audio signal was frequency-to-time domain converted so as to form a corresponding section of the watermarked audio signal, said method including the steps:
In principle the inventive watermark decoding apparatus is suited for regaining watermark data that were embedded in an audio signal by using modifications of the phase of said audio signal, wherein the value of a current bit of said watermark data was controlled by the selection or the generation of a corresponding reference data sequence and, according to said corresponding reference data sequence, phase values in a current time-to-frequency domain converted block of said audio signal were modified, whereby within said current block the allowable frequency range or ranges for said phase value modification by a pre-determined maximum amount was determined by psycho-acoustic related calculations, and the modified version of said current block of said audio signal was frequency-to-time domain converted so as to form a corresponding section of the watermarked audio signal, said apparatus including:
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
In
The current frequency range or ranges which are used for the phase changes depend on the current audio signal AUI and are dynamically determined by the psycho-acoustic model. The phase manipulation can be carried out at different frequency ranges in order to prevent a cut-off of these areas. It is also possible to additionally add a ‘normal’ spread spectrum watermark signal to the amplitude of the audio signal in the time or frequency domain.
The phase change module PHCHM outputs a corresponding watermarked audio signal WMAU.
At decoder side, the watermarked audio signal WMAU passes (framewise or blockwise) through a correlator CORR in which its phase is correlated with one or more frequency-to-time domain converted versions of the candidate decoder spreading sequences or pseudo-noise sequences (one of which was used in the encoder) stored or generated in a decoder spreading sequence stage DSPRSEQ. The correlator provides a bit value of the corresponding watermark output signal WMO.
Advantageously, the correlation output at decoder side contains always a meaningful peak (corresponding to a watermark information bit), which is often not the case if a (shaped) spreading sequence was added to the audio signal amplitude. It is not possible to remove this kind of watermarking from the audio signal without destroying the quality of the audio signal drastically. The robustness of the watermarking is therefore increased.
Instead of modifying the phase in specific frequency range or ranges and/or at specific time instants only, under certain conditions the whole frequency range can be subject to the phase modifications.
An example implementation of this embodiment is as follows. Two different phase vectors p—0 and p—1 are created, each one comprising 513 pseudo random numbers between −π and π (in practise, the first and the last value is never used, but for the sake of simplicity this fact is omitted here).
In
If a ‘zero’ payload (i.e. watermark) data PD bit shall be transmitted, a vector p (phase only) is generated in a reference phase section stage RPHS with p=p—0, if a watermark data bit ‘one’ shall be transmitted, a vector p is generated with p=p—1.
A new vector d is calculated in a phase modification stage PHCH by d=p−phase(s), and for each bin j of vector d a normalisation step is carried out:
Next the psycho-acoustical limits that were checked in stage PHLC are taken into account in stage PHCH by calculating for each bin i:
In the next step a modified audio signal y is calculated in an inverse Fourier transform stage IFTR as
y=IFFT(|s|ei(phase(s)+d)),
where i denotes the imaginary number. This modified audio signal sounds like the original signal, but contains a watermarking data bit.
Blocking artefacts can be reduced in an overlap-and-add stage OADD by overlapping blocks for example with a well-known sine window.
w—0=IFFT(eip
These two vectors or pseudo-noise sequences w_0 and w_1 are correlated in the time domain in correlator CORR with the shaped watermarked audio signal.
A correlation of a watermarked audio signal with a sequence w_0 or w_1 that has the same phase vector like the embedded watermark data bit will show a peak PK in the correlation result, whereas a correlation of that watermarked audio signal with the other sequence w_1 or w_0, respectively, shows only noise in the correlation result. The correlator assigns the corresponding bit values and provides the thereby resulting watermark output signal WMO.
In
Theoretically it is sufficient to use only a single phase vector for the transmission of one watermark data bit, and to use e.g. the original vector for transmitting a ‘one’ and the same vector tuned by ‘−π’ for transmitting a ‘zero’. But experiments have shown that the processing is much more robust if two different phase vectors are used.
It is possible to transmit several watermark data bits per audio signal block in case several different random phase vectors per block are used and each value is mapped to one phase vector.
The basic technology of the inventive processing can be combined with features known from spread spectrum watermarking:
A further improvement can be achieved by not only considering the phase, but also the amplitude of the audio signal. For example, in the described implementation, the psycho-acoustic module PSYA or PHLC determines that at a certain frequency bin a phase shift of 10 degree is not audible. An improved psycho-acoustic module will determine that the 10 degree phase shift is not audible only with the given current amplitude, but if a current amplitude were half a 15 degree phase shift would be permissible still without being audible. In this case the amplitude value or values of the original spectrum would be halved and their corresponding phase values would be changed by 15°.
In
Number | Date | Country | Kind |
---|---|---|---|
05090261 | Sep 2005 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP2006/065973 | 9/4/2006 | WO | 00 | 3/14/2008 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2007/031423 | 3/22/2007 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
6061793 | Tewfik | May 2000 | A |
6996521 | Iliev et al. | Feb 2006 | B2 |
7131007 | Johnston et al. | Oct 2006 | B1 |
20040170381 | Srinivasan | Sep 2004 | A1 |
20050033579 | Bocko et al. | Feb 2005 | A1 |
20050043830 | Lee et al. | Feb 2005 | A1 |
20060147048 | Breebaart et al. | Jul 2006 | A1 |
20070014428 | Kountchev et al. | Jan 2007 | A1 |
20080027729 | Herre et al. | Jan 2008 | A1 |
Number | Date | Country |
---|---|---|
9733391 | Sep 1997 | WO |
Number | Date | Country | |
---|---|---|---|
20090076826 A1 | Mar 2009 | US |