This invention relates to a method for embedding and detecting a watermark in a digital audio signal.
It is state of the art to use watermarks in digital rights management for digital media such as video or audio. A watermark is a digital information, which is hidden in the media or host data, such that it is ideally imperceptible but not removable. Hence, it can be used to attach information about the origin, owner, and status of the media. This information can then be used e.g. to trace back the origin of an illegal copy.
The most commonly used technique to embed a watermark into a signal is based on an idea from spread-spectrum radio communications. Here, the embedded watermark is created when a pseudorandom noise sequence with low amplitude is added to the original signal. This added sequence, can then be detected at a later stage with e.g. a correlation receiver or a matched filter. If the parameters of the added sequence, like the amplitude or the sequence length are chosen appropriately, the probability of the detection is very high. If several of such watermarks are embedded consecutively, several bits of information can be conveyed. In general, the higher the number of samples used to embed one bit and the higher the amplitude of the added sequence, the more robust is the watermark against attacks. On the other hand, the watermark becomes audible, when the amplitude is too high and the amount of embedded information is reduced, when the number of samples increases. Hence, there exists a trade-off between robustness, watermark data-rate, and quality.
Watermarking techniques, which are based on the spread-spectrum approach, require a rather strict synchronization. If such a synchronization is not maintained, then the detection of embedded information will not be possible anymore. Therefore, synchronization is often considered to be a pre-requirement in prior art solutions.
But exactly this weakness is exploited by so called synchronization attacks, which attempt to break the correlation and make the recovery of the watermark impossible or infeasible. Such attacks can be geometric manipulations, like e.g. zoom, rotation, shearing, cropping, and re-sampling. For audio, known manipulations are the insertion or deletion of single audio samples, like e.g. a jitter attack, sample rate conversion like e.g. linear time-scaling, the extension or shortening of speech pauses, or the pitch-shifting. Since a typical watermark detector has to know the exact position of the embedded data, these attacks are very effective and thus a major problem in the practical application of watermarks in audio signals.
It is therefore an object of the present invention to overcome the above mentioned problems and to provide a method for embedding a watermark in a digital audio signal, where the digital audio signal, which includes several pitch periods and is divided into groups of N samples, comprising the steps of selecting from one of the groups of N samples an input-segment with an input-length, dividing the input-segment into at least two sub-segments, each sub-segment having a length of at least one pitch period, creating a modified-segment with an output-length, wherein at least one of the sub-segments is time-shifted such that in an overlapping zone a correlation value of the two sub-segments is a maximum, and wherein the signal in the overlapping zone is a weighted average of the two sub-segments in said overlapping zone.
Further there is provided a method for detecting a watermark in a received digital audio signal, where the received digital audio signal may include at least one modified-segment, which is modified according to the above embedding method, and comprising the steps of receiving for said at least one modified-segment an a-priori information about: the input-segment, the modified-segment, extension-segments and a start point of that modified-segment; generating a first template-signal, which is the input-segment with the extension-segments before and after the input-segment; generating a second template-signal, which is the modified-segment with the extension-segments before and after the modified-segment; creating a first and a second correlation value by comparing the first and second template-signal with the received digital audio signal, and assuming that a watermark is included, if the second correlation value is higher than the first correlation value.
With it, an embedded watermark is more resistant against synchronization attacks, because the watermark is generated in the same manner as such an attack. Any kind of synchronization attack, which is applied before or after the extension-segments, does not degrade the performance of the proposed detection method. Although any known method for detecting a watermark will benefit from the a-priori knowledge of the original signal, the proposed method takes as a direct advantage from this pre-requirement, a higher robustness against synchronization attack.
If the time-shift from said at least one of the sub-segments is equal to a pitch period, the transition between the modified-segment and the neighboring signal-segments is smooth and thus the embedded watermark is less audible.
A further time-shift, from said at least one of the sub-segments, which is equal to a multiple number of the pitch periods, causes a higher difference between the input-length form the input segment and the output-length from the modified segment. Thus the following detection of the embedded watermark in a digital audio signal will become easier, because the difference between the input-segment and the modified-segment is more distinguishable.
If the input-segment is selected from one of the groups of N samples, where consecutive pitch periods are similar, the embedding is less audible. Then, the resulting signal in the overlapping zone, which is a weighted average of the overlapping sub-segments, varies only slightly from these pitch periods before and after the overlapping zone. This causes that the modification is less audible.
Selecting the input-segment from the mid of one of the groups of N samples or depending on a pre-defined secret key, causes that the start point of the modified segment is known, which simplifies the following detection method.
If the principle of the present embedding method is repeated for several input-segments, where the output-length from each of the respective modified-segments is different, a higher modulation level can be achieved and thus more information can be included in the modified digital audio signal. Then, according to the number of different modified-segments, a corresponding number of different template signals for the detection method have to be generated.
If the length of the extension-segments is in the range from 10 ms to 40 ms, it is supposed that within that range the audio signal is approximately stationary. Hence, the template-signals are distinguishable and detection is always robust enough.
Further features and advantages of the present invention will be apparent to those skilled in the art from further dependent claims and the following detailed description, taken together with the accompanying figures, where:
In the time domain, digital audio signals are divided into groups of N samples. This is already known to those skilled in the art and thus not described in more detail. The embedding and detecting method according to the present invention applies to parts of such groups of N samples.
The input-segment sin(t), with a length Lin, is divided into two sub-segments ssub,1(t) and ssub,2(t), with a respective length Lsub,1 and Lsub,2 respectively. Each of the sub-segments, ssub,1(t) and ssub,2(t), includes at least one complete pitch period Pi. In the shown embodiment, the sub-segment ssub,2(t) directly follows after the sub-segment ssub,1(t). As shown in
Now, with reference to
The main scope of the present invention, which has been described beforehand based on different embodiments, is to achieve a watermarking method, which has a higher resistance against synchronization attacks. Moreover the proposed method is also usable for added noise and other signal processing techniques, like filtering, which do not effect the synchronization. At least the same robustness as for spread-spectrum watermarks is expected. Furthermore, also compression techniques should not be problematic. This increased robustness is possible, because all these attacks usually do not change the number of pitches in the digital audio signal, where the proposed watermark is embedded. Furthermore, a simple jitter attack that inserts or deletes single sample, is not expected to be problematic. Even a slight shift still yields a high cross-correlation between the two waveforms, as long as the number of inserted or deleted samples is not too high. Even in that case, the proposed detection method can be repeated using different length of the modified segments. Considering pitch-shifting attacks, which are usually the most problematic attacks for watermarks, it is obvious that any scaling and shifting that is applied outside the template region should not affect the detection performance. If the input segment is positioned at t0 and no modifications are made to any samples within the range (t0−ΔL−)<t<(t0+ΔL++LOUT), then the detection performance will not be affected. Only if an additional pitch-shift is performed within the template region by an attack, the correlation detector may be misled and may not detect the watermark correctly. However, if the length ΔL− and ΔL+ from the extension segments ΔS+(t), ΔS−(t) can be kept reasonably short, e.g., corresponding to 40 ms, then a pitch-shifting attack has to be applied every 80 ms to remove the watermark with a high probability. Hence, the scheme can be designed to embed one watermark bit every N samples and provide robustness as long as additional pitch-shifts are inserted less frequently than every ((ΔL−)+(ΔL+)) sample. Assuming that (ΔL−)+(ΔL+)<<N, we can design the scheme such that the embedding is imperceptible but the attempt to remove the watermark results in audible distortions.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP03/01778 | 2/21/2003 | WO | 12/18/2006 |