The present disclosure generally relates to audio processing and, more particularly, to watermark insertion during audio processing.
A watermark, which is a type of digital marker, often is embedded into audio data to identify the owner or source of the audio data for copyright protection purposes or to transfer other non-audio information. Typically, a watermark is added to audio data prior to encoding or after encoding. However, this approach renders the watermark relatively easy to detect and modify, and thus susceptible to tampering or removal by unauthorized entities.
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art, by referencing the accompanying drawings.
In some embodiments, watermark data is embedded in the sets of frequency coefficients by modifying at least a subset of frequency coefficients of a set based on a corresponding bit of the watermark data. This modification can include, for example, a linear add of one value if the corresponding bit value is a 0, and a linear add of a different value if the corresponding bit value is a 1. Each frequency coefficient of the set may be so modified, or only a subset of the frequency coefficients of the set may be modified. By modifying the frequency coefficients on a set-by-set basis in this manner, the watermark may be embedded in the audio data in a manner that permits detection of the presence of the watermark using, for example, an average detector or most-likelihood detector as known in the art, while also being more resilient to unauthorized tampering than conventional time-domain watermarking techniques.
In the depicted example, the audio processing device 100 includes an input buffer 102, an initial processing module 104, a watermarking module 106, a final processing module 108, and an output buffer 110. The initial processing module 104, watermarking module 106, and final processing module 108 each may be implemented entirely in hard-coded logic (that is, hardware), as a combination of software 112 stored in a non-transitory computer readable storage medium (e.g., a memory 114) and one or more processors 116 to access and execute the software, or as combination of hard-coded logic and software-executed functionality. To illustrate, in one embodiment, the audio processing device 100 is implemented as a system on a chip (SOC) whereby portions of the modules 104, 106, and 108 are implemented as hardware logic, and other portions are implemented via firmware (one embodiment of the software 112) stored at the SOC and executed by a processor 116 of the SOC.
The hardware of the audio processing device 100 can be implemented using a single processor 116 or a plurality of processors 116. Such processors 116 can include a central processing unit (CPU), a graphics processing unit (GPU), microcontroller, a digital signal processor, a field programmable gate array, programmable logic device, state machine, logic circuitry, analog circuitry, digital circuitry, or any device that manipulates signals (analog and/or digital) based on operational instructions that are stored in the memory 114 or other non-transitory computer readable storage medium. The memory 114 may be a single memory device or a plurality of memory devices. Such memory devices can include a hard disk drive or other disk drive, read-only memory, random access memory, volatile memory, non-volatile memory, static memory, dynamic memory, flash memory, cache memory, and/or any device that stores digital information. Note that when the processing module implements one or more of its functions via a state machine, analog circuitry, digital circuitry, and/or logic circuitry, the memory storing the corresponding operational instructions may be embedded within, or external to, the circuitry comprising the state machine, analog circuitry, digital circuitry, and/or logic circuitry.
As a general operational overview, the audio processing device 100 receives input audio data 120 from an audio source (e.g., a live recording, pulse code modulated audio data from a CD or DVD, etc.) and buffers the input audio data 120 as it is received at the input buffer 102. The initial processing module 104 then processes the buffered input audio data 120 to generate sets of frequency coefficients that represent a time-to-frequency transform of at least a portion of the audio data 120. This output of sets of frequency coefficients is referred to herein as a stream 122 of frequency coefficients. The watermarking module 106 then embeds watermark data 124 by modifying some or all of the frequency coefficients of some or all sets of the stream 122 to generate modified sets of frequency coefficients (referred to herein as “modified stream 126 of frequency coefficients”). The modified stream 126 of frequency coefficients then is used by the final processing module 108 to generate output audio data 128, which may be buffered in the output buffer 110 before being transmitted to an intermediary or final destination.
In some embodiments, this processing is performed in the context of the audio processing device 100 as an encoding system such that the input audio data 120 is unencoded audio data (e.g., pulse code modulation (PCM) data representative of the original analog audio waveform) and the output audio data 128 is encoded audio data, such as audio data encoded in accordance with one or more of a version of the Advanced Audio Coding (AAC) standard, aversion of the Motion Pictures Experts Group (MPEG) 2 Audio Level 3 (MP3) standard, and the like. In this implementation, the initial processing module 104 comprises a frequency domain transform module 134 that performs a time-to-frequency domain transform of the input audio data 120 to generate the stream 122 of frequency coefficients. The frequency domain transform module 134 thus can apply, for example, a Discrete Cosign Transform (DCT)-based transform, such as a Modified DCT (MDCT) transform, a Fourier-based transform, such as a Fast Fourier Transform (FFT), and the like. Further, for an encoding-based implementation, the final processing module 108 comprises a final encoding module 138 to generate an encoded audio stream as the output audio data 128 from the modified stream 126 of frequency coefficients using any of a variety of audio encoding techniques that employ time-to-frequency domain transforms, such as the aforementioned AAC and MP3 standards.
In other embodiments, the processing of the audio processing device 100 is directed to a decoding context such that the input audio data 120 is encoded audio data, such as AAC-encoded or MP3-encoded audio data and the output audio data 128 is decoded audio data (e.g., PCM audio data). In a decoding implementation, the input audio data 120 already includes the frequency coefficients, albeit in some coded form, and thus the initial processing module 104 comprises an initial decoding module 144 to perform initial decoding sufficient to extract the stream 122 of frequency coefficients from the encoded input audio data 120. The decoding necessary to obtain these frequency coefficients depends on the manner in which the input audio data 120 was encoded. Further, the final processing module 108 includes a final decoding module 148 to perform the final decoding process using the modified stream 126 of frequency coefficients to generate the decoded output audio data 128 in accordance with the encoding standard employed to encode the input audio data.
In still other embodiments, the processing of the audio processing device 100 is directed to a transcoding context such that the input audio data 120 is encoded audio data and the output audio data 128 is encoded audio data, whereby the audio processing device 100 modifies the resolution, bitrate, or format of the input audio data 120 to generate the output audio data. In such instances, as such transcoding involves at least partial decoding and subsequently at least partial re-encoding, the digital watermarking process may be employed at either or both of the encoding process or decoding process as described in greater detail below.
Next, at block 208, a frequency coefficient of the set is selected and the watermarking module 106 performs a linear add using the selected frequency coefficient and one of a first value or a second value that is selected depending on whether the bit value of the watermark data 124 selected at block 202 is a “0” or a “1”.To illustrate, if the bit value of the watermark data 124 is a “0” the linear add operation can add a “−1” to the frequency coefficient, and if the bit value of the watermark data 124 is a “1” the linear add operation can add a “+1” to the frequency coefficient. Any arrangement of value pairs used in the linear add operation based on watermark bit value may be used rather than “−1, +1”, such as, for example, “−10, +10” or “−3, +6”, etc. The resulting modified frequency coefficient is output as part of the modified stream 126 of frequency coefficients. In sonic embodiments, each filter coefficient of the set is modified in this manner. In other embodiments, only a subset of the filter coefficients is modified. For example, the watermarking module 106 may be configured to modify only one-quarter or one-half of the filter coefficients of the set. Those filter coefficients not selected for modification are output without modification as part of the modified stream 126 of filter coefficients. Accordingly, at block 210 the watermarking module 106 determines whether it has modified all of the filter coefficients of the audio block that are to be modified. If not, the method flow returns to block 208 for the selection of the next frequency coefficient of the set that is to be modified. If watermarking of the set of filter coefficients of the audio block has completed, the method 200 returns to block 202 to repeat the watermarking process for the next audio block with the next bit value of the watermark data 124.
Concurrently, at block 212 the final encoding module 138 completes the encoding of the audio block using the modified set of frequency coefficients in the modified stream 126, rather than the original set of frequency coefficients generated from the audio block. This encoding can include any of a variety of well-known encoding processes in accordance with the audio encoding standard being applied, such as quantization of the modified set of frequency coefficients using a psychoacoustic model, redundancy-elimination coding of the resulting quantized frequency coefficients, error correction coding, and the like. The resulting encoded audio data for the audio block is buffered at the output buffer 110 and then included as part of the output audio data 128 transmitted to a destination device for storage or subsequent decoding.
Next, at block 308, a frequency coefficient of the set is selected and the watermarking module 106 performs a linear add using the selected frequency coefficient and one of a first value or a second value (e.g., “−1” or “+1”) that is selected depending on whether the bit value of the watermark data 124 selected at block 302 is a “0” or a “1”. The resulting modified frequency coefficient is output as part of the modified stream 126 of frequency coefficients. As similarly noted above, this modification process may be applied to each frequency coefficient in the set or to only a selected subset. Those filter coefficients not selected for modification are output without modification as part of the modified stream 126 of filter coefficients. Accordingly, at block 310 the watermarking module 106 determines whether it has modified all of the filter coefficients of the set that are to be modified. If not, the method flow returns to block 308 for the selection of the next frequency coefficient of the set that is to be modified. If watermarking of the set of filter coefficients has completed, the method 300 returns to block 302 to repeat the watermarking process for the next audio data set with the next bit value of the watermark data 124.
Concurrently, at block 312 the final decoding module 148 completes the decoding of the audio data set using the modified set of frequency coefficients in the modified stream 126, rather than the original set of frequency coefficients generated from the audio block. This encoding can include any of a variety of well-known decoding processes in accordance with the audio decoding standard being applied, such as a frequency-to-time domain transform process, error correction, and the like. The resulting unencoded audio data for the audio data set is buffered at the output buffer 110 and then output as an unencoded audio block of the output audio data 128 transmitted to a destination device for storage or playback.
Thus, in the illustrated example, for a first audio block 401 (also denoted “Block A”), the watermarking module 124 implements a time-to-frequency domain transform to generate a set of four frequency coefficients 411, 412, 413, and 414 (note that generally substantially more than four frequency coefficients are generated, but this example is limited to four for ease of illustration). In this example, the frequency coefficients of the lowest three frequency bands (that is, frequency coefficients 411, 412, 413) are selected for modification, and thus linear add operations 421, 422, and 423 are performed using the frequency coefficients 411, 412, and 413, respectively, to generate modified frequency coefficients 431, 432, and 433. In this example, the first bit value of the watermark data 124 is to be embedded in the frequency coefficients generated from the audio block 401. Because this first bit value is a “1”, the linear add operations 421, 422, and 423 add a value of “+1” to the values of the frequency coefficients 411, 412, and 413, respectively. These modified frequency coefficients 431, 432, and 433, and the unmodified frequency coefficient 414 are then passed on as a modified frequency coefficient set 434 of the modified stream 126 (
For a second audio block 402 (also denoted “Block B”), the watermarking module 124 implements a time-to-frequency domain transform to generate a set of four frequency coefficients 441, 442, 443, and 444. As with the processing of the first audio block 401, the frequency coefficients of the lowest three frequency bands (that is, frequency coefficients 431, 432, 433) are selected for modification, and thus linear add operations 451, 452, and 453 are performed using the frequency coefficients 431, 432, and 433, respectively, to generate modified frequency coefficients 461, 462, and 463. In this example, the second bit value of the watermark data 124 is to be embedded in the frequency coefficients generated from the audio block 402. Because this second bit value is a “0”, the linear add operations 451, 452, and 453 add a value of “−1” to the values of the frequency coefficients 441, 442, and 443, respectively. These modified frequency coefficients 461, 462., and 463, and the unmodified frequency coefficient 444 are then passed on as a modified frequency coefficient set 464 of the modified stream 126 (
In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software comprises one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.
In this document, relational terms such as “first” and “second”, and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual relationship or order between such entities or actions or any actual relationship or order between such entities and claimed elements. The term “another”, as used herein, is defined as at least a second or more. The terms “including”, “having”, or any variation thereof, as used herein, are defined as comprising.
Other embodiments, uses, and advantages of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. The specification and drawings should be considered as examples only, and the scope of the disclosure is accordingly intended to be limited only by the following claims and equivalents thereof.
Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed.
Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.
Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims.
Number | Name | Date | Kind |
---|---|---|---|
6157330 | Bruekers | Dec 2000 | A |
6157938 | Wu | Dec 2000 | A |
6185312 | Nakamura | Feb 2001 | B1 |
6425082 | Matsui | Jul 2002 | B1 |
6614914 | Rhoads | Sep 2003 | B1 |
6665420 | Xie | Dec 2003 | B1 |
6725372 | Lewis | Apr 2004 | B1 |
7058979 | Baudry | Jun 2006 | B1 |
7152161 | Bruekers | Dec 2006 | B2 |
7299189 | Sato | Nov 2007 | B1 |
8099285 | Smith | Jan 2012 | B2 |
9001888 | Henry | Apr 2015 | B2 |
9037454 | Yoon | May 2015 | B2 |
20020131617 | Pelly | Sep 2002 | A1 |
20030079222 | Boykin | Apr 2003 | A1 |
20070014428 | Kountchev | Jan 2007 | A1 |
20070238415 | Sinha | Oct 2007 | A1 |
20080292133 | Planitz | Nov 2008 | A1 |
20090044072 | Oh | Feb 2009 | A1 |
Number | Date | Country |
---|---|---|
9929114 | Jun 1999 | WO |
0105075 | Jan 2001 | WO |
Entry |
---|
European Search Report corresponding to European Application No. 14192035.5, mailed on Apr. 15, 2015, 5 pages. |
Number | Date | Country | |
---|---|---|---|
20150154972 A1 | Jun 2015 | US |