Digital audio watermarking is a technique that is used to assist with enforcement of copyrights, and uses data hiding technology to embed messages within digital audio content that can later be recovered, but which hopefully cannot be heard by humans when listening to the audio. However, hackers and pirates are aware of the use of watermarking and so may attempt to tamper with a watermark in a digital audio file, such as by attempting to over-write it with a different watermark or copy the recording in a manner that erases or degrades the watermark. One method is playing the audio through a speaker, and recording the played audio into a different digital file. If a watermark is rendered unrecoverable, the intended authentication value for copyright enforcement may be reduced or lost.
Traditional methods of watermarking have multiple shortcomings: For example, multiple watermarks placed within the same segment of audio will interfere with each other, possibly rendering one of the watermarks unrecoverable (damaging the authentication value), and common techniques such as inserting bit sequences, often using lesser-significance bits, result in easily-damaged watermarks. The common trade-off with traditional methods of watermarking is that increasing robustness of authentication decreases transparency to the user, rendering the watermark potentially audible to humans and thereby degrading the user's listening experience.
The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below. The following summary is provided to illustrate some examples disclosed herein. It is not meant, however, to limit all examples to any particular configuration or sequence of operations.
Solutions for authenticating digital audio include: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.
Solutions for authenticating digital audio include: receiving a digital audio file; determining a first watermark score of a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; determining a second watermark score of the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; based on at least the first watermark score and the second watermark score, determining a probability that the digital audio file is watermarked; and based on at least determining the probability that the digital audio file is watermarked, generating a report indicating whether the digital audio file is watermarked. In some examples, solutions for authenticating digital audio may also embed and decode messages.
The disclosed examples are described in detail below with reference to the accompanying drawing figures listed below:
Corresponding reference characters indicate corresponding parts throughout the drawings.
The various examples will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made throughout this disclosure relating to specific examples and implementations are provided solely for illustrative purposes but, unless indicated to the contrary, are not meant to limit all examples.
Solutions for authenticating digital audio include: generating a first band-limited watermark using a first key, generating a second band-limited watermark using a second key, wherein the bandwidth of the second watermark does not overlap with the bandwidth of the first watermark; and embedding the first watermark and the second watermark into a segment of the digital audio file. Solutions also include determining a first watermark score of a segment of the digital audio file for the first watermark using the first key; determining a second watermark score of the segment of the digital audio file for the second watermark using the second key; based on at least the first watermark score and the second watermark score, determining a probability that the digital audio file is watermarked; and generating a report indicating whether the digital audio file is watermarked. In some examples, solutions for authenticating digital audio may also embed and decode messages.
Aspects of the disclosure operate in an unconventional manner by embedding multiple (different) watermarks within the same segment of a digital audio file, placing the watermarks into their own limited bandwidths within the segment. This technique permits the watermarks to co-exist without interference, thereby improving robustness, such as resistance to tampering. Aspects of the disclosure operate in an unconventional manner by detecting the multiple watermarks within the different bands of the same segment of the digital audio file. This technique improves the reliability of detecting the watermarks, thereby also improving robustness of the detection process, in the event that tampering had occurred.
A disclosed solution for watermark embedding and detection employs a watermark embedding module and a watermark detection module. Watermark keys are employed to synchronize parameters and to provide extra security. In some examples, a machine learning (ML) component, using neural networks (NNs) is leveraged to enhance the robustness. By limiting the bandwidth of watermarks, multiple watermarks may be embedded into the same segment of digital audio without interference. The use of multiple different watermarking schemes within the same segment of digital audio improves the likelihood of detecting at least one of the watermarks, despite natural noise and distortion and even deliberate attacks (e.g., improves robustness). An example is disclosed that uses one bandwidth of 6 kilohertz (KHz) to 8 KHZ as the bandwidth for one watermark, and 3-4 KHZ as the second bandwidth for a second watermark.
Solutions may be used for audio books, music, and other classes of digital audio recordings in which imperceptibility (perceptual transparency) is important to users, such as for high quality audio. Versions have been tested and produced a mean opinion score (MOS) gap of less than 0.02 and a comparative (CMOS) gap of less than 0.05. Other advantages include low computational cost and low latency for real-time applications, and the flexibility to adjust to various sampling rates and quantization resolution. Watermark may be embedded into multiple digital audio formats, such as with sampling rates from 8 KHz to 48 KHz, quantization from 8-bits to 48 bits, and storage in WAV, PCM, OGG, MP3, OPUS. SILK. Siren, and other formats— including formats using lossy compression by codec.
Security is provided to be resistant to brute force cracking. For example, the use of two 96-bit keys is described, providing 2{circumflex over ( )}96 bits of security. Robustness preserves performance against distortion or damage through transmission, replay and re-recording, noise, and even deliberate attacks. Versions have been tested successfully using nose levels ranging from −10 decibels (dB) up through 30 dB. Deliberate attacks that may be defeated by various examples of the disclosure include synchronization attacks, which adjust time sequential properties of the audio, such as making the time sequence faster or slower, swapping the order of some audio segments or inserting other audio segments; signal processing attacks, such as low-pass filtering or high-pass filtering; and the digital watermark attacks, which add new watermarks to attempt masking the original watermark(s). Robustness has been demonstrated to exceed 95% correct detections (combined precision and recall measurements) in real-world scenarios.
In some examples, a watermark message 110 is inserted into one of the watermarks for embedding into digital audio file 102 by watermark embedding module 300 and later extracted by watermark detection module 700. Watermark embedding module 300 is described in further detail in relation to
In general, there are three requirements for the performance of digital audio watermarking. The first is imperceptibility, also known as perceptual transparency, which is a requirement to ensure that the watermark is not heard by human ears. The second is robustness, which is leveraged to measure the stability of the watermark against distortion or damage during transmission. The third is security, which refers to the complexity for brute cracking the digital watermark. In general, the longer the key length, the higher the complexity, and the more secure the watermark.
Multiple watermarking schemes exist, such as a spread spectrum method, which spreads a pseudo-random sequence spectrum and then embeds it into the audio; a patchwork method that embeds a watermark into two dual channels of a data block; a quantization index modulation (QIM); a perceptual method; and a self-correlated method. The perceptual method improves the imperceptibility of the watermark by calculating a psychoacoustic model, while enhancing the robustness. The self-correlated method divides the audio into several data blocks with equal length. For example, two blocks are used for embedding different watermark vectors that are mutually orthogonal in a discrete cosine transform (DCT) domain. For detection procedure, the existence of the watermark is estimated by calculating the self-correlation of the (watermarked) audio signal. The higher the correlation, the higher the probability of the self-correlated watermark being present.
The self-correlated (SC) method is adopted in the lower frequency band (3-4 KHz) and is robust for reverberation scenes. The spread spectrum (SS) method is adopted in the higher frequency band (6-8 KHz) and is robust for additive noise scenes. The combination provides superior robustness over either used alone. In low frequencies, higher robustness may be achieved at the expense of imperceptibility, whereas in high frequencies, higher imperceptibility may be achieved at the expense of robustness. The self-correlated method is able to enhance imperceptibility at low frequencies. The spread spectrum method is able to enhance robustness at high frequencies. Spread spectrum watermark 410 is described in further detail in relation to
The excitation signal from LPC analysis component 302 is transformed by a DCT component 304. A self-correlated embedding 340 generates self-correlated watermark 510, as shown in
The strength of the watermarks is controlled by a psychoacoustic strength control 308, which determines the strength of the audio power in any segment of digital audio file 102 for which a watermark is to be embedded. The strength is controlled based on a psychoacoustic model that models the human auditory system. The strength is a multiplication factor for the watermark to ensure that the watermark energy remains beneath the threshold of human hearing. A masking curve is calculated from the input audio according to the psychoacoustic model, and a strength factor is determined to control the strength of watermark to ensure the energy of watermark is below the masking curve.
An LPC synthesis component 312 completes the process to permit embedding self-correlated watermark 510 and spread spectrum watermark 410 into digital audio file 102 to produce watermarked digital audio file 104.
This process may be represented as:
where
This process may be represented as:
where
Operation 606 includes generating self-correlated watermark 510 (a second watermark) using watermark key 502 (a second key), wherein self-correlated watermark 510 is band-limited to bandwidth 201 (a second bandwidth). In some examples, self-correlated watermark 510 holds watermark message 110 (or another watermark message). In some examples, bandwidth 201 extends from 3 KHz to 4 KHz. Operation 608 includes embedding spread spectrum watermark 410 into digital audio file segment 200. Operation 610 includes embedding self-correlated watermark 510 into digital audio file segment 200. In some examples, the first bandwidth has a lower frequency limit above 5 KHz and the second bandwidth has an upper frequency limit below 5 KHz, so that the second bandwidth does not overlap with the first bandwidth.
In some examples, the first and second watermarks comprise different watermarking schemes, each selected from the list consisting of: a spread spectrum watermark, a self-correlated watermark, and a patchwork watermark. In some examples, the first watermark comprises a spread spectrum watermark and is band-limited to 6 KHz to 8 KHz. In some examples, the second watermark comprises a self-correlated watermark and is band-limited to 3 KHz to 4 KHz. In some examples, watermark key 402 comprises a first set of at least 96 bits. In some examples, watermark key 502 comprises a second set of at least 96 bits. In some examples, watermark key 502 has a different value than watermark key 402. In some examples, a key for a spread spectrum watermark comprises three 32-bit portions, a first portion of the three portions functions as a PN generator seed, a second portion of the three portions provides permutation information, and a third portion of the three portions provides sign information. In some examples, a key for a self-correlated watermark comprises three 32-bit portions, a first portion of the three portions functions as a position array, a second portion of the three portions provides eigenvector information, and a third portion of the three portions provides sign information;
In some examples, a third watermark (or more) may also be added into watermarked digital audio file segment 220. For example a patchwork watermark may be used as the third watermark. Thus, in examples using a third watermark, operation 612 includes generating the third watermark using the third key. In some examples, the third watermark is band-limited to a third bandwidth. In some examples, the third bandwidth does overlap with the first bandwidth or the second bandwidth. Operation 614 includes embedding the third watermark into digital audio file segment 200. Operation 616 includes distributing watermarked digital audio file 104.
The excitation signal from LPC analysis component 702 is transformed by a DCT component 704. A self-correlated watermark search 740 generates self-correlated watermark score 714, as shown in
In some examples, if watermark decision component 718 detects a watermark in watermarked digital audio file 104, ML component 1000 and a message decoder 720 outputs a recovered watermark message 110.
This scoring process may be represented as:
where pn is the correlation and BER denotes the bit error rate. BER varies from 0 (zero), if a watermark is detected without errors to 50% if there is no trace of a watermark (assuming an equal likelihood of a random bit giving a correct or incorrect result). It is possible to calculate the BER because the encoded watermark sequence is known. The closer the BER is to 0, the higher the probability of the watermark's presence. If closer the BER is to 50%, the lower the probability of the watermark's presence.
This scoring process may be represented as:
where C is a scalar constant.
According to the equations (5) and (6), if no watermark is present, the self-correlation will remain at a low level. However, if a watermark is present, the self-correlation will be a constant value added to the self-correlation about the watermark. This enables determination of whether a watermark is present.
In examples using a third watermark, operation 1108 includes determining the watermark score for digital audio file segment 220 for a third watermark using a third watermark key. Operation 1110 includes determining, using ML component 1000, ML watermark score 1010 (a third watermark score) of digital audio file segment 220. In some examples, ML component 1000 comprises feature extraction network 1002 and classification network 1006. In some examples, ML component 1000 further comprises decoder network 1020.
Operation 1112 includes, based on at least spread spectrum watermark score 716 and self-correlated watermark score 714, determining a probability that watermarked digital audio file 104 is watermarked. In some examples, determining the probability that watermarked digital audio file 104 is watermarked comprises, based on at least spread spectrum watermark score 716, self-correlated watermark score 714, and the watermark score for the third watermark, determining the probability that watermarked digital audio file 104 is watermarked. In some examples, determining the probability that watermarked digital audio file 104 is watermarked comprises, based on at least spread spectrum watermark score 716, self-correlated watermark score 714, and ML watermark score 1010, determining the probability that watermarked digital audio file 104 is watermarked.
Decision operation 1114 determines whether to report the received digital audio file as watermark found or watermark not found. If not found, watermark report 108 indicates that no watermark was found, in operation 1116. Otherwise, operation 1118 includes, based on at least determining the probability that watermarked digital audio file 104 is watermarked, generating watermark report 108 indicating that digital audio file 102 is watermarked. In some examples, a hard decision (decision operation 1114) may not be used, and operation 1118 merely reports the probability. Together, operations 1116 and 1118 include generating watermark report 108 indicating whether digital audio file 102 is watermarked. If a watermark is detected, operation 1120 includes determining, using ML component 1000, the decoded watermark message 110.
An example method of authenticating digital audio comprises: receiving a digital audio file; generating a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.
An example system for authenticating digital audio comprises: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive a digital audio file; generate a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; generate a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embed the first watermark into a segment of the digital audio file; and embed the second watermark into the segment of the digital audio file.
One or more example computer storage devices has computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising: receiving a digital audio file: generating a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; generating a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; embedding the first watermark into a segment of the digital audio file; and embedding the second watermark into the segment of the digital audio file.
An example method of authenticating digital audio comprises: receiving a digital audio file; determining a first watermark score of a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; determining a second watermark score of the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; based on at least the first watermark score and the second watermark score, determining a probability that the digital audio file is watermarked; and based on at least determining the probability that the digital audio file is watermarked, generating a report indicating whether the digital audio file is watermarked.
An example system for authenticating digital audio comprises: a processor; and a computer-readable medium storing instructions that are operative upon execution by the processor to: receive a digital audio file; determine a first watermark score of a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; determine a second watermark score of the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; based on at least the first watermark score and the second watermark score, determine a probability that the digital audio file is watermarked; and based on at least determining the probability that the digital audio file is watermarked, generate a report indicating whether the digital audio file is watermarked.
One or more example computer storage devices has computer-executable instructions stored thereon, which, on execution by a computer, cause the computer to perform operations comprising: receiving a digital audio file; determining a first watermark score of a segment of the digital audio file for a first watermark using a first key, wherein the first watermark is band-limited to a first bandwidth; determining a second watermark score of the segment of the digital audio file for a second watermark using a second key, wherein the second watermark is band-limited to a second bandwidth, and wherein the second bandwidth does not overlap with the first bandwidth; based on at least the first watermark score and the second watermark score, determining a probability that the digital audio file is watermarked; and based on at least determining the probability that the digital audio file is watermarked, generating a report indicating whether the digital audio file is watermarked.
Alternatively. or in addition to the other examples described herein, examples include any combination of the following:
While the aspects of the disclosure have been described in terms of various examples with their associated operations, a person skilled in the art would appreciate that a combination of operations from any number of different examples is also within scope of the aspects of the disclosure.
Computing device 1400 includes a bus 1410 that directly or indirectly couples the following devices: computer-storage memory 1412, one or more processors 1414, one or more presentation components 1416, I/O ports 1418, I/O components 1420, a power supply 1422, and a network component 1424. While computing device 1400 is depicted as a seemingly single device, multiple computing devices 1400 may work together and share the depicted device resources. For example, memory 1412 may be distributed across multiple devices, and processor(s) 1414 may be housed with different devices.
Bus 1410 represents what may be one or more busses (such as an address bus, data bus, or a combination thereof). Although the various blocks of
In some examples, memory 1412 includes computer-storage media in the form of volatile and/or nonvolatile memory, removable or non-removable memory, data disks in virtual environments, or a combination thereof. Memory 1412 may include any quantity of memory associated with or accessible by the computing device 1400. Memory 1412 may be internal to the computing device 1400 (as shown in
Processor(s) 1414 may include any quantity of processing units that read data from various entities, such as memory 1412 or I/O components 1420. Specifically, processor(s) 1414 are programmed to execute computer-executable instructions for implementing aspects of the disclosure. The instructions may be performed by the processor, by multiple processors within the computing device 1400, or by a processor external to the client computing device 1400. In some examples, the processor(s) 1414 are programmed to execute instructions such as those illustrated in the flow charts discussed below and depicted in the accompanying drawings. Moreover, in some examples, the processor(s) 1414 represent an implementation of analog techniques to perform the operations described herein. For example, the operations may be performed by an analog client computing device 1400 and/or a digital client computing device 1400. Presentation component(s) 1416 present data indications to a user or other device. Exemplary presentation components include a display device, speaker, printing component, vibrating component, etc. One skilled in the art will understand and appreciate that computer data may be presented in a number of ways, such as visually in a graphical user interface (GUI), audibly through speakers, wirelessly between computing devices 1400, across a wired connection, or in other ways. I/O ports 1418 allow computing device 1400 to be logically coupled to other devices including I/O components 1420, some of which may be built in. Example I/O components 1420 include, for example but without limitation, a microphone, joystick, game pad, satellite dish, scanner, printer, wireless device, etc.
The computing device 1400 may operate in a networked environment via the network component 1424 using logical connections to one or more remote computers. In some examples, the network component 1424 includes a network interface card and/or computer-executable instructions (e.g., a driver) for operating the network interface card. Communication between the computing device 1400 and other devices may occur using any protocol or mechanism over any wired or wireless connection. In some examples, network component 1424 is operable to communicate data over public, private, or hybrid (public and private) using a transfer protocol, between devices wirelessly using short range communication technologies (e.g., near-field communication (NFC), Bluetooth™ branded communications, or the like), or a combination thereof. Network component 1424 communicates over wireless communication link 1426 and/or a wired communication link 1426a to a cloud resource 1428 across network 1430. Various different examples of communication links 1426 and 1426a include a wireless connection, a wired connection, and/or a dedicated link, and in some examples, at least a portion is routed through the internet.
Although described in connection with an example computing device 1400, examples of the disclosure are capable of implementation with numerous other general-purpose or special-purpose computing system environments, configurations, or devices. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with aspects of the disclosure include, but are not limited to, smart phones, mobile tablets, mobile computing devices, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, gaming consoles, microprocessor-based systems, set top boxes, programmable consumer electronics, mobile telephones, mobile computing and/or communication devices in wearable or accessory form factors (e.g., watches, glasses, headsets, or carphones), network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, virtual reality (VR) devices, augmented reality (AR) devices, mixed reality (MR) devices, holographic device, and the like. Such systems or devices may accept input from the user in any way, including from input devices such as a keyboard or pointing device, via gesture input, proximity input (such as by hovering), and/or via voice input.
Examples of the disclosure may be described in the general context of computer-executable instructions, such as program modules, executed by one or more computers or other devices in software, firmware, hardware, or a combination thereof. The computer-executable instructions may be organized into one or more computer-executable components or modules. Generally, program modules include, but are not limited to, routines, programs, objects, components, and data structures that perform particular tasks or implement particular abstract data types. Aspects of the disclosure may be implemented with any number and organization of such components or modules. For example, aspects of the disclosure are not limited to the specific computer-executable instructions or the specific components or modules illustrated in the figures and described herein. Other examples of the disclosure may include different computer-executable instructions or components having more or less functionality than illustrated and described herein. In examples involving a general-purpose computer, aspects of the disclosure transform the general-purpose computer into a special-purpose computing device when configured to execute the instructions described herein.
By way of example and not limitation, computer readable media comprise computer storage media and communication media. Computer storage media include volatile and nonvolatile, removable and non-removable memory implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or the like. Computer storage media are tangible and mutually exclusive to communication media. Computer storage media are implemented in hardware and exclude carrier waves and propagated signals. Computer storage media for purposes of this disclosure are not signals per se. Exemplary computer storage media include hard disks, flash drives, solid-state memory, phase change random-access memory (PRAM), static random-access memory (SRAM), dynamic random-access memory (DRAM), other types of random-access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that may be used to store information for access by a computing device. In contrast, communication media typically embody computer readable instructions, data structures, program modules, or the like in a modulated data signal such as a carrier wave or other transport mechanism and include any information delivery media.
The order of execution or performance of the operations in examples of the disclosure illustrated and described herein is not essential, and may be performed in different sequential manners in various examples. For example, it is contemplated that executing or performing a particular operation before, contemporaneously with, or after another operation is within the scope of aspects of the disclosure. When introducing elements of aspects of the disclosure or the examples thereof, the articles “a,” “an,” “the,” and “said” are intended to mean that there are one or more of the elements. The terms “comprising.” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. The term “exemplary” is intended to mean “an example of.” The phrase “one or more of the following: A, B, and C” means “at least one of A and/or at least one of B and/or at least one of C.”
Having described aspects of the disclosure in detail, it will be apparent that modifications and variations are possible without departing from the scope of aspects of the disclosure as defined in the appended claims. As various changes could be made in the above constructions, products, and methods without departing from the scope of aspects of the disclosure, it is intended that all matter contained in the above description and shown in the accompanying drawings shall be interpreted as illustrative and not in a limiting sense.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/092281 | 5/8/2021 | WO |