Preferred embodiments of the present invention will be detailed subsequently referring to the appended drawings, in which:
Subsequently, a scheme for embedding a watermark into an audio signal will be described referring to
Embedding the watermark according to the scheme of
Internally, the embedder 10 includes windowing means 18 and a first filter bank 20 which are connected in series after the input 12 and are responsible for transferring the audio signal at the input 12 from the time domain 22 to the time/frequency domain 24 by a block-by-block processing. What follows after the output of the filter bank 20 is magnitude/phase detection means 26 to divide the time/frequency domain representation of the audio signal into magnitude and phase. A second filter bank 28 is connected to the detection means 26 to obtain the magnitude portion of the time/frequency domain representation, and transfers the magnitude portion into the frequency/modulation frequency domain 30 to generate a frequency/modulation frequency representation of the audio signal 12 in this manner. Blocks 18, 20, 26, 28 thus represent an analysis part of the embedder 10 achieving a transfer of the audio signal to the frequency/modulation frequency representation.
Watermark embedding means 32 is connected to the second filter bank 28 to receive the frequency/modulation frequency representation of the audio signal 12 from it. Another input of the watermark embedding means 32 is connected to the input 14 of the embedder 10. The watermark embedding means 32 generates a modified frequency/modulation frequency representation.
An output of the watermark embedding means 32 is connected to an input of a filter bank 34 inverse to the second filter bank 28, which is responsible for re-transfer to the time/frequency domain 24. Phase processing means 36 is connected to the detection means 26 to obtain the phase portion of the time/frequency domain representation 24 of the audio signal and to pass it on in a manipulated form, as will be described below, to recombining means 38 which is additionally connected to an output of the inverse filter bank 34 to obtain the modified magnitude portion of the time/frequency representation of the audio signal. The recombining means 38 unites the phase portion modified by the phase processing 36 and the magnitude portion of the time/frequency domain representation of the audio signal modified by the watermark and outputs the result, i.e. the time/frequency representation of the audio signal provided with a watermark, to a filter bank 40 inverse to the first filter bank 20. Windowing means 42 is connected between the output of the inverse filter bank 40 and the output 16. The part of the components 34, 38, 40, 42 may be considered to be the synthesis part of the embedder 10 since it is responsible for generating the audio signal provided with a watermark in the time representation from the modified frequency/modulation frequency representation.
The setup of the embedder 10 having been described above, its mode of functioning will be described below.
Embedding starts with the transfer of the audio signal at the input 12 from the time representation to the time/frequency representation by the means 18 and 20, wherein it is assumed that the audio input signal at the input 12 is present in a type sampled by a predetermined sample frequency, i.e. as a sequence of samples or audio values. If the audio signal is not yet in such a sampled form, a corresponding A/D converter may be used here as sampling means.
The windowing means 18 receives the audio signal and extracts from it a sequence of blocks of audio values. For this, the windowing means 18 unites a predetermined number of successive audio values of the audio signal at the input 12 each to form time blocks and multiplies or windows these time blocks representing a time window from the audio signal 12, by a window or weighting function, such as, for example, a sine window, a KBD window or the like. This process is referred to as windowing and is exemplarily performed such that the individual time blocks refer to time sections of the audio signal overlapping one another, such as, for example, by one half, so that each audio value is allocated to two time blocks.
The process of windowing by the means 18 is exemplarily illustrated in greater detail in
The filter bank 20 receives the time blocks or blocks of windowed audio values, as is indicated in
The block-by-block transfer is indicated in
Since the filter bank 20 generates one block 60 of spectral values 62 per time block, several sequences of spectral values 62 result over time, namely one per spectral component k or subband k. In
As can be recognized, a matrix 68 of spectral values 62 representing a time/frequency domain representation 24 of the audio signal over the duration of these time blocks forms over a certain number, here exemplarily a number of 8, of successive time blocks.
The time/frequency transform 56 performed block by block on the time blocks by the filter bank 20 is, for example, a DFT, DCT, MDCT or the like. Depending on the transform, the individual spectral values within a block 60 are divided into certain subbands. For each subband, each block 60 may comprise more than one spectral value 62. All in all, the result, over the sequence of time blocks, is a sequence of spectral values representing the time form of the respective subband and in
The filter bank 20 passes on the blocks 60 of spectral values 62 to the magnitude/phase detection means 26 block by block. The latter processes the complex spectral values and will only pass on the magnitudes thereof to the filter bank 28. However, it passes on the phases of the spectral values 62 to the phase processing means 36.
The filter bank 28 processes the sequences 70 of magnitudes of spectral values 62 per subband similarly to the filter bank 20, namely by block-by-block transforming these sequences block by block to the spectral representation or the modulation frequency representation, again preferably using windowed and overlapping blocks, wherein the basic blocks of all subbands are preferably time-oriented to one another equally. Put differently, the filter bank 28 will process N spectral blocks 60 of spectral value magnitudes each at the same time or together. The N spectral blocks 60 of spectral value magnitudes form a matrix 68 of spectral value magnitudes. If there are, for example, M subbands, the filter bank 28 will process the spectral value magnitudes in matrices of N*M spectral value magnitudes each.
After receiving the magnitude portion N of successive spectral blocks or the matrix 68, the filter bank 28 will transform—separate for each subband—the blocks of spectral value magnitudes of the respective subbands, i.e. the lines in the matrix 58, from the time domain 66 to a frequency representation, wherein, as has already been mentioned, the spectral value magnitudes may be windowed to avoid aliasing effects. Put differently, the filter bank 28 will transfer each of these spectral value magnitude blocks from the sequences 70 representing the time form of a respective subband to a spectral representation and thus generate one block of modulation values per subband, which in
As has already been mentioned, for avoiding artifacts the filter bank 28 or the means 26 may comprise internal window means (not shown) subjecting, per subband, the transform blocks, i.e. the lines of the matrix 68, of spectral values to windowing by a window function 82 before the respective time/modulation frequency transform 80 by the filter bank 28 to the modulation frequency domain 30 to obtain the blocks 74.
Again, it is pointed out explicitly that a sequence of matrices 80, which in the 50% overlap windowing exemplarily mentioned before overlap in time by 50% is processed in the manner described above. Put differently, the filter bank 28 forms the matrix 80 for successive N time blocks such that the matrices 80 each refer to N time blocks which overlap by one half, as is exemplarily to be indicated in
The modulation values of the frequency/modulation frequency domain representation 30, as are output by the filter bank 28, reach the watermark embedding means 32. The watermark embedding means 32 then modifies the modulation matrix 80 or individual or several ones of the modulation values of the modulation matrices 80 of the audio signal 12. The modification performed by the means 32 may, for example, take place by a multiplicative weighting of individual modulation frequency/frequency segments of the modulation subband spectrum or of the frequency/modulation frequency domain representation, i.e. by a weighting of the modulation values within a certain region of the frequency/modulation frequency space spanned by the axes 76 and 78. Also, the modification might include setting individual segments or modulation values to certain values.
The multiplicative weighting or the certain values would depend on the watermark obtained at the input 14 in a predetermined manner. Thus, setting individual modulation values or segments of modulation values to certain values would take place in a signal-adaptive manner, i.e. additionally depending on the audio signal 12 itself.
The individual segments of the 2-dimensional modulation subband spectrum can, on the one hand, be obtained by subdividing the acoustic frequency axis 78 into frequency groups, on the other hand further segmentation may be performed by subdividing the modulation frequency axis 76 into modulation frequency groups. In
After the means 32 has modified the modulation matrix 80, it will send the modified modulation values of the modulation matrix 80 to the inverse filter bank 34 which re-transfers, by means of a transform which is inverse to that of the filter bank 28, i.e., for example, an IDFT, IFFT, IDCT, IMDCT or the like, the modulation matrix 80 to the time/frequency domain representation 24 on a block 74-wise manner, i.e. divided per subband, along the modulation frequency axis 76, to obtain modified magnitude portion spectral values in this way. Put differently, the inverse filter bank 34 transforms each block of modified modulation values 74 belonging to a certain subband by a transform inverse to the transform 86 to a sequence of magnitude portion spectral values per subband, the result, according to the above embodiment, being a matrix of N×M magnitude portion spectral values.
The magnitude portion spectral values from the inverse filter bank 34 will consequently always relate to two-dimensional blocks or matrices from the stream of sequences of spectral values, of course in a form modified by the watermark. According to the exemplary embodiment, these blocks overlap by 50%. Means (not shown) exemplarily provided in the means 34 then compensates the windowing in this exemplary 50% overlapping case by adding the overlapping recombined spectral values of successive matrices of spectral values obtained by retransforming successive modulation matrices. Here, streams or sequences of modified spectral values form again from the individual matrices of modified spectral values, namely one per subband. These sequences correspond only to the magnitude portion of the unmodified sequences 70 of spectral values, as have been output by means 20.
The recombining means 38 combines the magnitude portion spectral values of the inverse filter bank 34 united to form subband streams with the phase portions of the spectral values 62, as have been isolated by the detection means 26 directly after the transform 56 by the first filter bank 20, but in a form modified by the phase processing 36. The phase processing means 36 modifies the phase portions in a manner separated from watermark embedding by the means 32 but maybe depending on this embedding such that the detectability of the watermark in the detector or decoder system, which will be explained later referring to
In this manner, the means 38 thus generates sequences of spectral values per subband like that having been obtained directly after the filter bank 20 from the unchanged audio signal, namely the sequences 70, but in a form altered by the watermark, so that the spectral values recombined and output by the means 38 and modified with regard to the magnitude portion represent a time/frequency representation of the audio signal provided with a watermark.
The inverse filter bank 40 thus again obtains sequences of modified spectral values, namely one per subband. Put differently, the inverse filter bank 40 obtains one block of modified spectral values per cycle, i.e. one frequency representation of the audio signal provided with a watermark relating to one time section. Correspondingly, the filter bank 40 performs a transform inverse to the transform 56 of the filter bank 20 at each such block of spectral values, i.e. spectral values arranged along the frequency axis 70, to obtain as a result modified windowed time blocks or time blocks of windowed modified audio values. The subsequent windowing means 42 compensates windowing, as has been introduced by the windowing means 18, by adding audio values corresponding to one another within the overlapping regions, the result of which is the output signal provided with a watermark in the time domain representation 22 at the output 16.
The embedding of a watermark according to the embodiment of
The watermark decoder of
Watermark decoding means 132 connected to the filter bank 128 for obtaining the frequency/modulation domain representation of the input signal provided with a watermark or the modulation matrices is provided to extract the watermark originally introduced by the embedder 10 from this representation and output same at the output 114. The extraction is performed at predetermined locations of the modulation matrices corresponding to those having been used by the embedder 10 for embedding. Matching selection of the locations is, for example, ensured by a corresponding standardization.
Alterations of the modulation matrices caused compared to the modulation matrices as have been generated in the embedder 10 in the means 32, as are fed to the watermark decoding means 132, may also be caused by the input signal provided with a watermark being deteriorated somehow between its generation or output at the output 16 and the detection by detector 100 or the reception at the input 112, such as, for example, by a coarser quantization of the audio values or the like.
Before another embodiment of a scheme of embedding a watermark into an audio signal will be described referring to
On the one hand, the embodiment for embedding a watermark in an audio signal described above may be used to prove authorship of an audio signal. The original audio signal arriving at the input 12 exemplarily is a piece of music. While producing pieces of music, author information in the form of a watermark can be introduced into the audio signal by the embedder 10, the result being an audio signal provided with a watermark at the output 16. Should a third person claim to be the author of the corresponding piece of music or music title, the proof of the actual authorship can be done using the watermark which can be extracted again by means of the detector 100 from the audio signal provided with a watermark and otherwise is inaudible in normal playing.
Another possible usage of the watermark embedding illustrated above is to use watermarks for logging the broadcast program of TV and radio stations. Broadcast programs are often divided into different portions, such as, for example, individual music titles, radio plays, commercials or the like. The author of an audio signal or at least that person allowed to and wanting to make money with a certain music title or a commercial can provide his or her audio signal with a watermark by the embedder 10 and make the audio signal provided with a watermark available to the broadcasting operator. In this manner, music titles or commercials can be provided with a respective unambiguous watermark. For logging the broadcast program, a computer checking the broadcast signal for a watermark and logging watermarks found may exemplarily be used. Using the list of the watermark discovered, a broadcast list for the corresponding broadcasting station may be generated easily, which makes accounting and charging easier.
Another field of application is using watermarks for determining illegal copies. In this manner, using watermarks is particularly worthwhile for distributing music over the Internet. If a customer purchases a music title, an unambiguous customer number is embedded into the data using a watermark while transmitting the music data to the customer. The result is music titles into which the watermark is embedded inaudibly. If at a later point in time a music title is found on the Internet at a site not approved, such as, for example, an exchange site, this piece can be checked for the watermark by means of a decoder according to
Further applications for watermarks are, for example, described in the publication Chr. Neubauer, J. Herre, “Advanced Watermarking and its Applications”, 109th Audio Engineering Society Convention, Los Angeles, September 2000, Preprint 5176.
Subsequently, an embedder and a watermark decoder will be described referring to an embodiment of an embedding scheme where, compared to the embodiment of
The embedder of
The above explanation has only referred to individual blocks 60 of spectral values. However, it becomes obvious from the above explanation that a linear phase increase may also be detected for spectral values resulting with successive time blocks for one and the same subband, i.e. a phase increase along the lines in
The carrier frequency determining means 214 thus fits a plane into the unwrapped phases or phases subjected to phase unwrapping or phase development or phase portion lineup of the spectral values 62 of the matrix 68 by suitable methods, such as, for example, a least error square algorithm, and deduces from it the phase increase going back to the phase offset of the time blocks which occurs in the sequences 70 of spectral values for the individual subbands within the matrix 68. All in all, the result, per subband, is a deduced phase increase corresponding to the modulation carrier component sought. The means 214 passes this on to the mixer 212 in order for the respective sequence 70 of spectral values to be multiplied by the mixer 212 by the complex conjugate thereof, or multiplied by e−j(w*m+φ), w representing the certain carrier, m being the index for the spectral values and φ a phase offset of the certain carrier at the time section of the N time blocks considered. Of course, the carrier frequency determining means 214 may also perform one-dimensional fits of a straight into the phase forms of the individual sequences 70 of spectral values 62 within the matrices 68 to obtain the individual phase increases going back to the phase offset of the time blocks. After the demodulation by the mixer 212, the phase portion of the spectral values of the matrix 68 is thus “leveled out” and only varies on average around the phase zero due to the shape of the audio signal itself.
The mixer 212 passes on the spectral values 62 modified in this way to the filter bank 28 which transfers same matrix by matrix (matrix 68 in
The successive modulation matrices generated in this way are passed on to watermark embedding means 216 which receives the watermark 14 at another input. The watermark embedding means 216 exemplarily operates in a similar manner as does the embedding means 32 of the embedder 10 of
The altered modulation values or the altered or modified modulation matrices are passed on to the inverse filter bank 34, which is how matrices of modified spectral values form from the modified modulation matrices. With these modified spectral values, the phase correction which has been caused by the demodulation by means of the mixer 212 can still be reversed. This is why the blocks of modified spectral values output by the inverse filter bank 34 per subband are mixed or multiplied by means of a mixer 218 by a demodulation carrier component which is a complex conjugate of that having been used by the mixer 212 for this subband before the transfer to the frequency/modulation frequency domain for demodulation, i.e. by performing a multiplication of these blocks by ej(w*m+φ), wherein w in turn indicates the certain carrier for the respective subband, m is the index for the modified spectral values and φ is a phase offset of the certain carrier at the time section of the N time blocks for the respective subband considered. The respective modulator for the respective subband which refers to the contents of a certain subband block or which has been applied after block division by the modulation 212, 214 is inverted again by this before subsequent block merging.
The spectral values obtained in this way still exist in the form of blocks, namely one block of modified spectral value blocks each per subband, and are, if necessary, subjected to OLA or merging for reversing windowing, such as, for example, in the manner described referring to 34 of
An advantage of the procedure according to
A watermark decoder suitable for processing the audio signal provided with a watermark as is output by the embedder 210 to extract the watermark therefrom is shown in
The above embodiments have consequently related to a connection of the subject areas “subband modulation spectral analysis” and “digital watermark” not known in the past to form an overall system for introducing watermarks with an embedder system on the one side and a detector system on the other side. The embedder system serves for introducing the watermark. It consists of a subband modulation spectral analysis, an embedder stage performing modification of the signal representation achieved by the analysis, and synthesis of the signal of the modified representation. The detector system in contrast serves for recognizing a watermark present in an audio signal provided with a watermark. It consists of a subband modulation spectral analysis and a detection stage which recognizes and evaluates the watermark using the signal representation obtained by the analysis.
With regard to the selection of those locations in the frequency/modulation frequency domain or those modulation values in the frequency/modulation frequency domain used for embedding the watermark or extracting the watermark, it is to be pointed out that this selection should be made as to psycho-acoustic factors to ensure that the watermark is inaudible when playing the audio signal provided with a watermark. Masking effects in the modulation spectral range might be made use of for a suitable selection. Here, reference is, for example, made to T. Houtgast: “Frequency Selectivity in Amplitude Modulation Detection”, J. Acoust. Soc. Am., vol. 85, No. 4, April 1989, which is incorporated herein with regard to selecting inaudibly modifiable modulation values in the frequency/modulation frequency domain.
For a better understanding of the modulation spectral analysis in general, reference is made to the following publications which refer to audio coding using a modulation transform, and wherein the signal is divided into frequency bands by a transform, subsequently a division as to magnitude and phase is performed and then, while the phase is not processed further, the magnitudes of each subband are transformed again in a second transform via a number of transform blocks. The result is a frequency division of the time envelope of the respective subband into “modulation coefficients”. These continuative documents include the article M. Vinton and L. Atlas, “A Scalable and Progressive Audio Codec”, in Proceedings of the 2001 IEEE ICASSP, May 7-11, 2001, Salt Lake City, US 2002/0176353A1 by Atlas and others having the title “Scalable And Perceptually Ranked Signal Coding and Decoding”, the article J. Thompson and L. Atlas, “A Non-uniform Modulation Transform for Audio Coding with Increased Time Resolution”, in Proceedings of the 2003 IEEE ICASSP, April 6-10, Hong Kong, 2003, and the article L. Atlas, “Joint Acoustic And Modulation Frequency”, Journal on Applied Signal Processing 7 EURASIP, pp. 668-675, 2003.
The above embodiments only represent exemplary ways of being able to provide audio recordings with inaudible additional information robust against manipulation and thus introducing the watermark in the so-called subband modulation spectral range and performing detection in the subband modulation spectral range. However, different variations may be made to these embodiments. The windowing means mentioned above might only serve for block formation, i.e. multiplication or weighting by the window functions might be omitted. In addition, window functions other than the magnitudes of trigonometric functions mentioned before might be used. Also, the 50% block overlapping might be omitted or be performed differently. Correspondingly, the block overlapping on the side of the synthesis might include operations other than a pure addition of matching audio values in successive time blocks. In addition, windowing operations in the second transform stage might also be varied correspondingly.
Additionally, it is pointed out that the audio signal introduction need not necessarily be made from the time domain to the frequency/modulation frequency domain representation and from there be reversed again—after modification—to the time domain representation. Additionally, it would also be possible to modify the two embodiments mentioned before in that the values as are output by the recombining means 38 or the mixer 218 are united to form an audio signal provided with a watermark in a bitstream to be present in a time/frequency domain.
In addition, the demodulation used in the second embodiment might also be designed to be different, such as, for example, by alteration of the phase forms of the spectral value blocks within the matrices 68 by measures other than by pure multiplication by a fixed complex carrier.
With regard to the above embodiments for possible decoders, as have been discussed referring to
It is also to be pointed out that the above embodiments have exclusively related to watermark embedding with regard to audio signal but that the present watermark embedding scheme may also be applied to different information signals, such as, for example, to control signals, measuring signals, video signals or the like, to check same, for example, as to their authenticity. In all these cases, it is possible by the presently suggested scheme to perform embedding of information such that this does not impede the normal usage of the information signal in the form provided with a watermark, such as, for example, analysis of the measurement result or the optical impression of the video or the like, which is why in these cases, too, the additional data to be embedded are referred to as watermark.
In particular, it is pointed out that, depending on the circumstances, the inventive scheme may also be implemented in software. The implementation may be on a digital storage medium, in particular on a disc or a CD having control signals which may be read out electronically which can cooperate with a programmable computer system such that the corresponding method will be executed. Generally, the invention thus also is in a computer program product having a program code stored on a machine-readable carrier for performing the inventive method when the computer program product runs on a computer. Put differently, the invention may thus also be realized as a computer program having a program code for performing the method when the computer program runs on a computer.
While this invention has been described in terms of several preferred embodiments, there are alterations, permutations, and equivalents which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and compositions of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
102004021404.2 | Apr 2004 | DE | national |
This application is a continuation of copending International Application No. PCT/EP2005/002636, filed Mar. 11, 2005, which designated the United States and was not published in English, and is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP05/02636 | Mar 2005 | US |
Child | 11554492 | US |