This disclosure relates to watermarks for digital files, and in particular, digital audio files.
According to the International Federation of the Phonographic Industry (IFPI), in 2015 digital music sales became the leading revenue stream generating globally around US $6.7 b, with a projection of US $20 b by 2020. This growth is a result of Internet advances in the distribution of digital contents, including multimedia. Unfortunately, this progress also creates an unprecedented challenge for authenticating the resulting several billion instances of licensed audio content, mainly distributed via the Internet. One of the associated business scenario that is considered in this disclosure is the tracking of audio copies broadcasted on web radio, with a requirement to identify both the audio master title and the owner of a given particular audio copy being played.
Digital watermarking is a well-known solution for audio tracking and authentication. It includes embedding hidden inaudible data into host audio. Several algorithms have been proposed in the literature and some of these algorithms are in current use in commercial services such as NexGuard, MusicTrace, and the like. However, such existing techniques rely on embedding a unique watermark payload in every distributed audio copy. With several billion copies of audio content to be tracked, the resulting number of bits required to encode all potential unique watermarks is very large. Such large payloads increase the risk that audible distortion will result from the watermark having been embedded in the copy. This problem has stimulated strong research interest around “high payload audio watermarking.”
As a result, there is a long-felt need for improved watermarking technology which lowers the risk of problems such as audible distortion.
This disclosure presents a new watermarking concept that exploits audio fingerprinting in order to reuse the same watermark payloads between audio copies originating from different audio masters. This is achieved by using fingerprints of audio master to derive unique watermarking zones for its associated copies, therefore obviating the need of adding overhead synchronization bits to locate watermark positions. Thanks to a shorter watermark payload enabling a higher repetition rate of the watermark within the host media, the present methods have been validated via simulations to be robust against typical audio attacks such as MP3 compression, cropping, jittering, and zeros inserting.
For a fuller understanding of the nature and objects of the disclosure, reference should be made to the following detailed description taken in conjunction with the accompanying drawings, in which:
In computer science, fingerprinting is a procedure that maps an arbitrarily large data item (such as a computer file) to a much shorter bit string, its fingerprint, that uniquely identifies the original data. For an audio signal, such as an audio file, an acoustic fingerprint is a condensed digital summary, deterministically generated from the audio signal, that can be used to identify an audio sample or quickly locate similar items in an audio database.
A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as an audio. “Watermarking” is the process of hiding digital information in a carrier signal. Digital watermarks may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its owners.
For the purposes of the present disclosure, an audio master is an audio file (e.g., song or any other audio sample) in its original format, without any watermark. An audio copy is a copy of an audio master, where the copy includes an embedded watermark. Two different copies will have the same carrier signal (e.g., song) but different watermarks. A clone is an exact copy of an audio file, including any embedded signal. Two clones are identical and do not differ in any aspect from the signals point of view.
Although claimed subject matter will be described in terms of certain embodiments, other embodiments, including embodiments that do not provide all of the benefits and features set forth herein, are also within the scope of this disclosure. Various structural, logical, process step, and electronic changes may be made without departing from the scope of the disclosure.
Fingerprinting may include extracting features and/or patterns from a known audio signal and storing the features and/or patterns, associated with the known audio signal, in a database. The database may then be queried to identify an unknown audio signal by matching the fingerprints of this unknown signal with those already stored in the database. Fingerprinting cannot distinguish audio copies of the same audio master because the audio copies will have similar fingerprints. However, fingerprinting is advantageous in that information about the audio signal can be retrieved without the need to embed data into the signal—i.e., an empty watermark payload.
In the presently-disclosed approach, non-unique watermarks are used in conjunction with fingerprinting to reduce the number of bits necessary to encode the watermark payload. A shorter watermark yields two main advantages. First, the risk of audibility is lower (the risk that the embedded watermark will be noticeable to a listener). Second, the watermark may be more frequently repeated within the audio signal to improve the watermark extraction robustness by aggregating the watermark signal across several frames. Practically speaking, the present solution collects fingerprints of audio masters and uses these fingerprints to derive unique zones for the corresponding audio master, where the zones are used for placing watermarks in related copies. Additionally, by positioning watermarks based on fingerprints, there is no need to include overhead synchronization bits to locate watermark positions.
The presently-disclosed methods are advantageous in various respects, including:
It should be noted that the above-described information may be housed in a single database file or more than one database files (in which case, the database comprises multiple databases). For example, the database of information may be embodied in three separate databases—the Audio Master Fingerprints database, the Audio Master Metadata database, and the audio Copy Metadata database. For convenience, the remainder of this disclosure will refer to this exemplary embodiment having three separate databases, but the scope should not be limited to only this embodiment.
Taking as input the ith audio master signal, ami(t), the role of the fingerprint encoder is to provide both the master ID (m_IDi) and the vector of its fingerprints (
The role of the watermark encoder is to create the audio copy signal aci,k(t), denoting the kth audio copy of the ith master. This copy includes an embedded watermark payload,
A watermark payload is created 112 based on the master ID and using copy metadata retrieved from a database. For example, by using the master ID m_IDi of the ith audio master, the number of existing copies, denoted by nci, can be retrieved 140 from the Audio Copy Metadata Database. A watermark payload
Watermarking positions (i.e., zones, represented as vector
i=[(ti,1, fi,1),(ti,2,fi,s), . . . ,(ti,N
where the value Nz is the number of watermarking zones and represents the targeted repetition rate of the watermark payload within the audio copy to be generated.
It is also noted that by seeding the pseudorandom number sequence with m_IDi, a deterministic (i.e., reproducible) random sequence can be generated for that particular audio master. Thus, during the watermark extraction operation, once the audio master associated to the unknown copy under analysis has been recognized, it is then possible to reconstruct exactly this sequence of original watermarking positions.
The created 112 watermark payload is then embedded 115 in the audio signal according to the generated 109 watermark zones. In this way, a watermarked audio copy of the audio signal is created.
From the set of time-frequency watermarking positions,
where:
The generated hybrid FH/TH carrier pi(t) is modulated 153 by a pseudo-noise sequence to yield a spread spectrum hybrid FH/TH carrier qi(t). The latter is then modulated 156 by a watermark baseband signal wk(t) to yield a radio frequency (RF) watermark signal. The kth audio copy of the ith master is obtained by (adding the RF watermark to the audio signal 159):
ac
i,k(t)=ami(t)+wk(t)*qi(t) (3)
By spreading the spectrum of the watermark payload signal, the latter is hidden in the host audio signal (i.e., is made imperceptible). Furthermore, this spreading process will enable the recovery of the watermark payload signal from the audio copy signal during the watermark detection process explained below.
Let us consider an unknown audio that has to be verified and denoted by ua(t). This audio may result from a previously generated audio copy embedding a fingerprint-based watermark. Eventually, it may have been modified during distribution by one or more audio attacks such as MP3 compression, cropping, jittering, zeros inserting, additive white Gaussian noise (AWGN) and so on. An exemplary process flow for detecting an eventual embedded fingerprint-based watermark is shown in
Its main purpose is to identify which audio master is associated to the unknown audio. Therefore, the fingerprints of the latter, denoted by
An exemplary watermark decoder process 200 involves three main steps described below.
Reconstruct original watermarking zones. This operation is similar to the one of generating 209 watermarking zones (see above) during the process of embedding a watermark. Using m_IDj to initiate the seed state, a pseudorandom number sequence is generated 220 and then used 223 to select a subset of
j=[(tj,1, fj,1),(tj,2,fj,2), . . . ,(tj,N
Watermark Extraction. The watermark extraction 212 operation is presented in
Using the first index of watermark zone, ns, and the last index of the watermark zones, nf, the vector of useful watermarking positions is represented by:
j=[(tj,n
Thus, the resulting hybrid FH/TH carrier is given by the following expression
Note that by taking into account the time delay r in the carrier expression, this can be interpreted as a coarse synchronization between the carrier and the unknown audio.
Next, the reconstructed carrier is modulated 233 by the same pseudo-noise sequence (that has been used for generating copies) in order to get the spread spectrum FH/TH carrier q′j(t) . The latter is used to fine-tune the synchronization 236 between the carrier and the unknown audio by cross-correlating both signals. The synchronized unknown audio is then demodulated 239 using this spread spectrum FH/TH carrier q′j(t) to a get a baseband watermark signal w(t). The signal is then fed into a set of time-domain filters 242, which number is equal to the number of watermark positions found in the unknown audio. Each filter is defined by the time position of each watermarking positions in
Finally, the different watermark payloads extracted from different frames may be aggregated 245 to get the maximum likelihood watermark payload
Parse Copy Information. With on the one hand the recognized master ID, m_IDj, and the other hand the copy number, the information about the identified audio copy such as the master title, copy owner and so on are obtained from both audio master and copy metadata database.
Although the present disclosure has been described with respect to one or more particular embodiments, it will be understood that other embodiments of the present disclosure may be made without departing from the spirit and scope of the present disclosure. Hence, the present disclosure is deemed limited only by the appended claims and the reasonable interpretation thereof.
This application claims priority to U.S. Provisional Application No. 62/508,727, filed on May 19, 2017, now pending, the disclosure of which is incorporated herein by reference.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2018/000644 | 5/21/2018 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62508727 | May 2017 | US |