The invention relates to a method and to an apparatus for detecting a watermark symbol in a section of a received version of a watermarked audio signal, wherein the received version of the watermarked audio signal can include noise and/or echoes.
Audio watermarking modifies an audio signal or track by embedding hidden information. If watermark embedding happens in the frequency domain, the frequency range for embedding is typically limited e.g. from 300 Hz to 10 kHz in view of perceptual transparency and for robustness against audio compression employing low-pass filtering. For audio signals sampled at 48 kHz or 44.1 kHz, downsampling by a factor of two decreases complexity without reducing robustness against common signal processing steps.
In EP 2175444 A1 and in WO 2011/141292 A1 statistical detectors are disclosed which improve the robustness of audio watermarking over an acoustic path, e.g. loudspeaker→microphone, enabling successful deployment of audio watermarking systems for e.g. second-screen applications. These statistical detectors use correlation peak amount values between a watermarked signal and a reference signal, and calculate corresponding false positive probabilities for watermark symbol detection.
For efficient implementation, the EP 2175444 A1 statistical detector uses circular correlation instead of normal correlation. The efficiency of the circular correlation is based on the Fast Fourier Transform (FFT) and the Inverse Fast Fourier Transform (IFFT). The FFTs are carried out for received watermarked signals and for the reference signals. After multiplication of one spectrum with the conjugate complex of the other spectrum, IFFT is performed to get the circular correlation of these two signals. Carrying out such correlation is computationally demanding.
In the watermark decoder processing in
A known statistical detector in conjunction with downsampling is illustrated in a simplified manner in
In
However, for watermarked audio signals or tracks transmitted over an acoustic path it was found that, without downsampling, the detection rate is considerably higher than the detection rate when including downsampling of the input signals. I.e., there is a trade-off between calculation complexity and detection robustness.
A problem to be solved by the invention is to achieve similar detection robustness like a statistical detector without using downsampling prior to correlation while achieving reduced calculation complexity of a statistical detector using downsampling. This problem is solved by the method disclosed in claim 1. An apparatus that utilises this method is disclosed in claim 2.
According to the invention, in order to approximate the detection robustness of circular correlation without downsampling before input, a temporal interpolation step is inserted between the circular correlation and the statistical detector. Unfortunately, due to the downsampling, the number of correlation result peaks is reduced, but that temporal interpolation increases the number of correlation result peaks and thereby an improved watermark detection reliability is achieved. If the interpolation is implemented e.g. as a short length FIR filter, the calculation complexity of the modified detector is still much lower than that of the detector without using input values downsampling. The invention provides a better detection robustness/computational effort trade-off than a state-of-the-art detector without or with downsampling.
In principle, the inventive method is suited for detecting a watermark symbol in a section of a received version of a watermarked audio signal, wherein said received version of said watermarked audio signal can include noise and/or echoes and wherein watermark symbols were embedded in said audio signal by modifying sections of said audio signal in relation to at least two different reference data sequences, said method including the steps:
In principle the inventive apparatus is suited for detecting a watermark symbol in a section of a received version of a watermarked audio signal, wherein said received version of said watermarked audio signal can include noise and/or echoes and wherein watermark symbols were embedded in said audio signal by modifying sections of said audio signal in relation to at least two different reference data sequences, said apparatus including:
Advantageous additional embodiments of the invention are disclosed in the respective dependent claims.
Exemplary embodiments of the invention are described with reference to the accompanying drawings, which show in:
As mentioned above, the frequency range for embedding can be limited. In turn, only this frequency range is relevant for watermark detection. Consequently, during the multiplication step in the circular correlation calculation, multiplication is only necessary for the relevant frequency range, and thereby the output signal after circular correlation is also limited to the relevant frequency range.
Circular correlation values which are not available due to the temporal downsampling can at least partly be reconstructed by means of temporal interpolation, if the downsampling does not introduce alias in the relevant frequency range. For example, if the received signals RWAS and the reference signals REFP are sampled at 48 kHz and the relevant frequency range is limited to 10 kHz, a downsampling factor of ‘2’ will not cause any spectral alias in the output signal following circular correlation.
The passband of the frequency response of a corresponding temporal interpolator covers the frequency range used for embedding the watermark symbols, and a type of interpolation is used which recovers additional peak values temporally between the correlation result values.
Such type of temporal interpolation is described in F. M. Gardner, “Interpolation in Digital Modems—Part I: Fundamentals”, IEEE Trans. of Commun., vol. 41, no. 3, March 1993, pp. 501-507, and in L. Erup, F. M. Gardner, R. A. Harris, “Interpolation in Digital Modems—Part II: Implementation and Performance”, IEEE Trans. of Commun., vol. 41, no. 6, June 1993, pp. 998-1008.
Therefore, according to the invention and as shown in
Such 6-tap Lagrange interpolator is described in J. J. Wang, “Timing Recovery Techniques for Digital Recording Systems”, PhD thesis, National University of Singapore, 2002, pp. 139-140.
On one hand, because only correlation result value peaks are used in the statistical detector 45, interpolation in step/stage 44 may only be necessary for signal portions near peak amount values in the output signal of the circular correlation step/stage 43. This will further reduce the computational complexity.
On the other hand, the detection robustness can be further improved by applying a temporal interpolation successively because this increases the number of correlation result peak values but circular correlation of downsampled input signals plus e.g. two successive interpolations can still require in total less computational complexity than circular correlation of non-downsampled input signals. Although this increases the computational complexity, it offers the possibility to further adjust the detection robustness/computational complexity trade-off based on the available computational power.
Instead for watermarked audio input signals, the invention can be used in a corresponding manner for watermarked video input signals.
After a current section of the input signal is checked, the processing described is continued with the following section of the input signal.
The invention may be applied to any correlation-based watermark detection if input signal downsampling is applied.
The inventive processing can be carried out by a single processor or electronic circuit, or by several processors or electronic circuits operating in parallel and/or operating on different parts of the inventive processing.
Number | Date | Country | Kind |
---|---|---|---|
13306138.2 | Aug 2013 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2014/066063 | 7/25/2014 | WO | 00 |