1. Field of the Invention
The present invention relates to audio watermarking apparatus and method.
2. Description of the Prior Art
The Digital Cinema Initiative (DCI) is a known project which aims to provide an open standard for digital cinema. The standard covers many aspects of digital cinema including implementing security measures to hinder unauthorised copying, editing and playback of cinematic content.
One of the security requirements used in the DCI is the insertion of a watermark in the audio data of the content during projection. The audio watermark includes a time stamp and other data, for example information indicating the identity of the system on which the cinematic content is being reproduced. In the same way that a visually obvious watermark inserted into the video data is undesirable, an audio watermark which is audible is also undesirable. Therefore the DCI standard sets out strict requirements for the audio watermark amongst which are that the audio watermark must be inaudible in critical listening A/B tests.
Some adaptive watermarking systems can struggle to successfully mask the presence of a watermark in an audio signal if the audio signal contains prominent frequency components over a narrow range of frequencies. This is caused by inevitable signal spreading within the system due to non-ideal filtering. Such watermarking systems may not meet the requirements set out in the DCI standard for the audibility of audio watermarks. Increasing the number and resolution of the audio filters present within the watermarking system could potentially address this problem. However, this would increase the cost and complexity and may in itself introduce unwanted filter artefacts into the embedded watermark. This problem is addressed by embodiments of the invention.
According to the present invention there is provided an apparatus for embedding a watermark in an audio signal, the apparatus comprising an input operable to receive the audio signal; a watermark adapting unit operable to receive the watermark from a watermark generating unit and adapt the profile of the frequency spectrum of the watermark to correspond to the profile of the frequency spectrum of the input audio signal, and watermark embedding means operable to embed the adapted watermark in the audio signal, the watermark embedding means including a watermark gain amplifier operable to apply a gain to the watermark before the watermark is embedded in the audio signal in accordance with a gain signal generated by a watermark gain value generator, wherein the watermark gain value generator is operable to adjust the gain applied to the watermark, the gain being determined in accordance with the presence of component of at least one peak having an amplitude above a threshold.
The present invention identifies problematic parts of the audio signal which are likely to cause signal spreading outside of the masking limits of the human auditory system and thus increase the audibility of the watermark and, in response, adjust the watermark gain for the duration of the problematic parts. Thus, in parts of the audio signal where a conventional watermarking system would struggle to mask an embedded watermark, the apparatus and method according to the present invention reduces the watermark's audibility. As a further advantage, as the nature of cinematic audio content is such that the occurrence of prominent frequency components over a narrow range of frequencies is usually quite rare. Therefore any reduction in watermarking robustness due to the low level of the watermark is minimised as the reduction in the watermark level is only temporary.
The frequency range of the or each peak may be such that the peak would cause spreading in the input audio signal such that the watermark in the watermark embedded audio signal is audible to the human ear and if such a peak or peaks are detected, the watermark gain value generator may be operable to modify the gain signal such that the gain applied to the watermark by the watermark gain amplifier is reduced.
The apparatus may further comprise a plurality of envelope filters, each filter being operable to receive the input audio signal and to output an envelope signal corresponding to the distribution of energy across a subset of the frequency spectrum of the input audio signal, each subset being different for each filter.
The gain signal may be determined by a predetermined gain curve, the gain curve defining the gain signal in dependence of the frequency at which the amplitude of the component peak is largest.
The transition from a first value of gain signal to a second value of gain signal may be made incrementally, each increment being of a predetermined value and a predetermined length of time in duration.
The increments may be one of either a stepped increment or a gradational increment.
The watermark gain value generator may further be operable to determine the gain in accordance with a comparison between the energy contained in the peak or peaks above the threshold and the energy in the input audio signal.
According to a further aspect, there is provided a digital cinema projector comprising a decoder for decoding audio data from a data source; a watermarking apparatus according to any embodiment of the invention for inserting a watermark into the audio data; and a unit for outputting the watermarked audio data.
According to another aspect, there is provided a method of embedding a watermark in an audio signal, the method comprising: receiving the audio signal; receiving the watermark from a watermark generating unit and adapting the profile of the frequency spectrum of the watermark to correspond to the profile of the frequency spectrum of the input audio signal, and embedding the adapted watermark in the audio signal, wherein, before embedding in the audio signal, a gain is applied to the watermark before the watermark is embedded in the audio signal in accordance with a gain signal, wherein the gain is determined in accordance with the presence of component of at least one peak having an amplitude above a threshold.
The frequency range of the or each peak may be such that the peak would cause spreading in the input audio signal such that the watermark in the watermark embedded audio signal is audible to the human ear and if such a peak or peaks are detected, the gain signal is modified such that the gain applied to the watermark is reduced.
A plurality of envelope filters may be provided, each filter being operable to receive the input audio signal and to output an envelope signal corresponding to the distribution of energy across a subset of the frequency spectrum of the input audio signal, each subset being different for each filter.
The gain signal may be determined by a predetermined gain curve, the gain curve defining the gain signal in dependence of the frequency at which the amplitude of the component peak is largest.
The transition from a first value of gain signal to a second value of gain signal may be made incrementally, each increment being of a predetermined value and a predetermined length of time in duration.
The increments may be one of either a stepped increment or a gradational increment.
The gain may be determined in accordance with a comparison between the energy contained in the peak or peaks above the threshold and the energy in the input audio signal.
Various further aspects and features of the invention are defined in the appended claims.
The above and other features and advantages of the invention will be apparent from the following detailed description of illustrative embodiments which is to be read in connection with the accompanying drawings and in which:
In the watermarking unit shown in
A watermark generator 26 generates a watermark signal in the frequency domain which is then transformed into the time domain by an inverse FFT unit 216 and input to a second band filter 27. In an illustrative example the watermark is a pseudo-random Gaussian stream created in the fast Fourier Transform (FFT) domain with a block size of 2048 at quarter sampling rate (i.e. a quarter of the rate at which the audio is sampled), which is noise like in sound. Once the watermark has been generated in the frequency domain, it is then transformed into the time domain by the inverse FFT unit 216. In one embodiment, the watermark generator receives an FFT of the audio input block and uses an FFT of the audio input block to provide phase values and the watermark to provide magnitude values and the combination is input into the inverse FFT unit 216. The result can then be added to the input audio block in the time domain, thus reducing any potential loss in quality of the audio caused by putting the audio input through a forward FFT and then inverse FFT. The second band filter 27 operates in a similar way to the first band filter 21 and divides the watermark signal into a number of band blocks and outputs a corresponding number of band divided watermark blocks. The frequency bands into which the watermark signal is divided correspond to the frequency bands into which the input audio block is divided. Next, a number of multipliers 28, 29, 210, 211 multiply the output from each envelope follower filter 22, 23, 24, 25 with the corresponding band divided part of the watermark signal output from the second band filter 27. The outputs of the multipliers 28, 29, 210, 211 are then added together by a first combiner 212 which thus forms the complete adapted watermark. The output of the first combiner 212 is then multiplied by a gain amplifier 215 and combined with the input audio block of the original audio data by a second combiner 213. Typically, all the operations occur in the time domain. Thus the watermarked version of the original audio data unit is formed.
The multiplication of each band divided block of the watermark signal with the output of the corresponding envelope filtered band of the input audio block has the effect of reducing the perceptibility of the watermark when it is combined with the original audio data. This is illustrated in
The adaptation of the watermark works well for most audio signals, particularly audio signals comprising part of a cinematic audio track. However, the system shown in
This problem could be addressed by using a greater number of narrower envelope follower filters to mitigate the spreading. However, this would require more processor intensive filtering and could also introduce unwanted filter artefacts into the output of the envelope follower filters. Instead, in accordance with embodiments of the present invention, a problematic stimulus is detected, such as high level, narrow band signal and subsequently the overall gain applied to the watermark is reduced for the duration of that stimulus to a level whereby the watermark is imperceptible.
The following describes the analysis which is performed by the gain value generator 51 on the input audio block currently being watermarked.
The first step in the process is to acquire the information from the FFT version of the input audio block to determine if the source data is likely to produce unwanted spreading in the envelope follower filter. The gain value generator 51 includes a gate which is used to remove all but the main peaks in the FFT block. This concept is illustrated in
In one embodiment the audio signal comprises a 2048 sample block of FFT data at a sampling rate of a quarter that at which the audio signal is sampled and the gate reduces to zero any frequency with an amplitude of less than five times the mean of the whole FFT block. In addition, a lower limit (for example approximately −40 dB) is applied to the mean, whereby if the mean drops below this value then the entire block is reduced to zero to avoid gain reduction caused by for example, alias components introduced during the down sampling. After the gating, all the significant narrow band frequency components of the audio signal are revealed as discernable peaks. The peaks of the gated spectrum 63 are then analysed. The analysis includes the collection of the following values:
From this data the energy of the two largest peaks present in the audio data can be calculated along with their centre locations. In some embodiments if the peak energy of the largest peak is more than 9 dB greater than peak energy of the second largest peak, then the second largest peak is reduced to zero. After this the remaining spectral energy can be calculated as the sum of peak energy values in the analysis data minus the two largest peaks (after the second largest peak has been adjusted as described above).
To determine whether the gain value generator 51 is to apply a gain reduction to the watermark, the peak data is analysed to determine if it satisfies further criteria. For example if one or more of the following conditions are met, a gain reduction is applied to the watermark:
In other words, it is possible to analyse the energy distribution of the peaks above the threshold and compare this value with the energy of the input audio signal. As a result of this comparison, the gain of the watermark is adjusted.
If none of the aforementioned criteria have been met, in other words it is determined that there is no need to reduce the level of the watermark, then the gain value generator 61 sets the gain value to unity. However, the gain value may not instantly be set to unity, rather it is increased as per a maximum transition rate discussed below.
Assuming the previously mentioned test criteria have determined a gain reduction is necessary, the next step is to determine the amount by which the watermark will be reduced by the gain amplifier 215. The gain reduction is calculated based on a predetermined gain reduction curve. As will be understood, the HAS is able to detect certain frequencies better than others. Therefore the gain reduction curve may be derived empirically, for example by conducting listening tests to determine the threshold of watermark audibility at a number of fixed frequencies. The gain reduction for frequencies between the fixed frequencies can be identified using linear interpolation.
The gain value is calculated once every time each FFT block is processed. In some embodiments a maximum transition rate can be set which limits the change of the gain on a block by block basis. For example, a maximum gain transition rate of 0.11 (the gain value produced by the gain value generator ranging from 0 to 1) per block may be set. As will be appreciated, it may take multiple blocks to reach the new gain value. In addition, the gain value calculated for a latest block will override any gain value established for a previous block.
As the gain value output by the gain value generator 51 is calculated on a block by block basis, this means that the change in gain may comprise a series of discrete stepped values. This is shown in
The smoothing shown in
As explained above, in order to realise the smoothing interpolation patterns in
Various modifications may be made to the embodiments herein before described. Although embodiments of the invention have been described in terms of a watermarking unit and a pipeline architecture, other implementations are also envisaged. For example the watermarking process could be executed on a computer. The computer could be arranged to implement the present invention by being programmed by a computer program stored on a storage medium, the storage medium containing instructions for carrying out the invention on the computer.
Furthermore, the present invention is not necessarily restricted to use within the context of digital cinema. The invention could be used in any suitable application in which there is a requirement to insert a watermark in audio content.
Number | Date | Country | Kind |
---|---|---|---|
0815889.1 | Sep 2008 | GB | national |