The present invention relates generally to systems and methods for processing data, and specifically to encoding information into data content.
Media content transmitted over communication networks is generally subject to copyright. The copyright imposes strict legal limits on the ways in which subscribers are allowed to use the content that they receive from the network. For example, in addition to viewing video programs while they are broadcast, subscribers may be allowed to record the programs for their own use, but they are typically not permitted to distribute recorded copies of the programs. Notwithstanding these legal limitations, however, unauthorized copying and distribution of media content remains a major problem.
In order to identify unauthorized copies and possibly to detect their source, content distributors sometimes embed a digital watermark in each copy of encoded media data that they distribute. Such a watermark typically comprises encoded data that are added to digital content (such as audio, images, or video) in a manner that is difficult for unauthorized parties to detect or remove, but can readily be read out by an authorized party with the appropriate tools.
Various digital watermarking techniques are known in the art. For example, PCT International Publication WO 2010/143026 describes a method and system for embedding a watermark in block-encrypted content. The method includes encoding a bit string of n bits, denoted b0-bn-1, by translating each bit into a block of data, according to the following rule: If bi=1, then translate bi into a block of data of a first type, and if bi=0, then translate bi into a block of data of a second type, thereby translating the n bits into n blocks of data corresponding to each bit b0-bn-1. A composite block of data is arranged to include the n blocks of data and at least one additional block indicating the presence of the n blocks of data. The composite block of data is inserted into a content item as a watermark, wherein the watermarked content item is encrypted using an electronic code-book (ECB) mode of encryption.
Embodiments of the present invention that are described hereinbelow provide improved methods, apparatus and software for digital watermarking.
There is therefore provided, in accordance with an embodiment of the present invention, a method for processing data, which includes encoding a string of symbols, each having a respective symbol value, as a sequence of vectors. Each vector includes a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string. A watermark is applied to an item of content including digital data by inserting the sequence of the vectors into the data.
The length of the sub-vector may be chosen to be an integer divisor of a block size of a block cipher that is to be applied to the item of content after application of the watermark thereto.
In some embodiments, inserting the sequence of the vectors includes interleaving the vectors with gaps of known lengths containing arbitrary data. At least two of the gaps may have different, respective lengths. Additionally or alternatively, at least two of the vectors include different, respective sub-vectors of the predefined length.
In a disclosed embodiment, the symbols include bits, and each of the vectors corresponding to a zero bit includes a first number of the repetitions, while each of the vectors corresponding to a one bit includes a second number of the repetitions, which is different from the first number.
In one embodiment, inserting the sequence includes inserting into the data, prior to the sequence of the vectors, a marker including a concatenation of a predetermined number of copies of a marker vector.
There is also provided, in accordance with an embodiment of the present invention, a method for processing data, which includes receiving ciphertext generated by applying a block cipher to an item of content including digital data to which a watermark has been applied by encoding a string of symbols, each having a respective symbol value, as a sequence of vectors, each vector including a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string, and inserting the sequence of the vectors into the digital data. The ciphertext is analyzed to extract the watermark.
In a disclosed embodiment, analyzing the ciphertext includes identifying and counting recurrences of patterns occurring in the ciphertext, and decoding each of the symbols in the string based on a respective count of the recurrences in the ciphertext.
There is additionally provided, in accordance with an embodiment of the present invention, apparatus for processing data, including a memory, configured to hold a string of symbols, each having a respective symbol value. A processor is configured to encode the string of the symbols as a sequence of vectors. Each vector includes a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string. The processor is configured to apply a watermark to an item of content including digital data by inserting the sequence of the vectors into the data.
There is further provided, in accordance with an embodiment of the present invention, apparatus for processing data, including an interface, which is coupled to receive ciphertext generated by applying a block cipher to an item of content including digital data to which a watermark has been applied by encoding a string of symbols, each having a respective symbol value, as a sequence of vectors, each vector including a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string, and inserting the sequence of the vectors into the digital data. A processor is configured to analyze the ciphertext in order to extract the watermark.
There is moreover provided, in accordance with an embodiment of the present invention, a computer software product, including a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to encode a string of symbols, each having a respective symbol value, as a sequence of vectors, each vector including a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string, and cause the computer to apply a watermark to an item of content including digital data by inserting the sequence of the vectors into the data.
There is furthermore provided, in accordance with an embodiment of the present invention, a computer software product, including a computer-readable medium in which program instructions are stored, which instructions, when read by a computer, cause the computer to receive ciphertext generated by applying a block cipher to an item of content including digital data to which a watermark has been applied by encoding a string of symbols, each having a respective symbol value, as a sequence of vectors, each vector including a respective number of repetitions of a sub-vector of a predefined length, such that the respective number of the repetitions in each vector in the sequence is indicative of the respective symbol value of a corresponding symbol in the string, and inserting the sequence of the vectors into the digital data, and to analyze the ciphertext in order to extract the watermark.
The present invention will be more fully understood from the following detailed description of the embodiments thereof, taken together with the drawings in which:
After applying watermarks to items of digital content, a content distributor may subsequently wish to inspect communication traffic in order to detect these watermarks, so as to identify unauthorized copies of content items and the source of these copies. Frequently, however, communication traffic is encrypted, and it becomes difficult or impossible to detect the watermarks without first decrypting the traffic. The content distributor, however, may not have access to the necessary decryption key.
Embodiments of the present invention that are described hereinbelow can be used to address this sort of difficulty by providing a digital watermark that can be detected even in encrypted data, particularly when certain types of block ciphers are used for encryption. For example, when an electronic codebook (ECB) mode of block cipher encryption is used, each plaintext block is encrypted as a corresponding ciphertext block of the same size. In the embodiments described below, the watermark is encoded using repeating patterns of sub-vectors, whose length is chosen to be an integer divisor of the block size (and may be equal to the block size). These patterns are inserted into the plaintext, but their presence can then be detected in the blocks of the ciphertext, as well. The patterns are defined so that the watermark can be detected and decoded even when the vectors have been shifted relative to the block boundaries in the course of one or more rounds of encryption.
In the disclosed embodiments, a watermark comprises a string of symbols, which are encoded as a sequence of vectors. Each vector comprises a certain number of repetitions of a sub-vector of the appropriate, predefined length. The number of these repetitions in each vector in the sequence is indicative of the respective symbol value of the corresponding symbol in the string. In other words, a higher symbol value (such as 1, assuming each symbol is a single bit) may be mapped to a larger number of repetitions, while a lower symbol value (such as 0) is mapped to a smaller number, or vice versa. The sub-vectors may be chosen arbitrarily, with different sub-vectors repeated in different vectors, in order to make the watermark harder for unauthorized parties to detect (and consequently harder to remove or tamper with).
The encoded watermark is applied to an item of content by inserting the sequence of the vectors into the digital data of the content item. Typically, to make detection of the watermark still more difficult, the sequence of the vectors representing the watermark is interleaved with gaps of known lengths containing arbitrary data. These gaps may have different lengths, as long as the lengths are known to authorized watermark detectors. To facilitate authorized detection of the watermark, the sequence of the vectors may be preceded in the data by a marker generated by concatenating a predetermined number of copies of a marker vector.
To extract the watermark from ciphertext, the watermark detector identifies and counts recurring patterns that occur in the ciphertext. The number of recurrences of each pattern corresponds to the number of repetitions of a sub-vector in the plaintext watermark, even when the sub-vectors have been shifted relative to the block boundaries in the encryption process. Thus, each of the symbols in the string can be deciphered based on the respective count of the recurrences of the corresponding pattern in the ciphertext.
The term “encode” is used throughout the present specification and claims, in all of its grammatical forms, to refer to any type of conversion of data from one form to another that preserves the information content of the data. Such encoding make take the form, for example, of converting a string of symbols into a sequence of vectors, as described above. Other forms of encoding include data stream encoding, such as (but not limited to) MPEG-2 encoding, H.264 encoding, VC-1 encoding, and synthetic encodings such as Scalable Vector Graphics (SVG) and LASER (ISO/IEC 14496-20), and so forth. Any recipient of encoded data who is cognizant of the encoding scheme, whether or not the recipient of the encoded data is the intended recipient, is, at least in potential, able to read the encoded data. It is appreciated that encoding may be performed in several stages and may include a number of different processes, including, but not necessarily limited to: compressing the data; transforming the data into other forms; and making the data more robust (for instance replicating the data or using error correction mechanisms).
Similarly, the term “decode” is used throughout the present specification and claims, in all its grammatical forms, to refer to the reverse of “encoding.”
The terms “cipher” and “encrypt,” in all of their grammatical forms, are used interchangeably throughout the present specification and claims to refer to any appropriate method for encoding data in such a way as to make it unintelligible except to intended recipients. Well-known types of ciphering or encrypting include, for example, block and stream ciphers, as well as methods such as DES, 3DES, and AES. Similarly, the terms “decipher” and “decrypt” are used throughout the present specification and claims, in all their grammatical forms, to refer to the reverse of “ciphering” and “encrypting.”
Each subscriber receives the encoded content in a decoding device 38, such as a television set-top box (STB), which decodes the video content in order to output a series of video image frames to a television monitor 28. Alternatively, decoding devices 38 may comprise any suitable sort of video decoder and may be implemented either as freestanding units, as shown in the figure, or in the form of embedded processing circuitry within a display device, such as a computer, entertainment console, or mobile media player.
Video decoding devices 38 in system 20 generally output standard digital video signals, which may be input to any sort of standard display device (such as monitors 28) or to a video recorder. Once the content has been recorded in this manner, it may be difficult or impossible, despite legal constraints, to prevent subscribers from distributing digital copies of the content. For example, the user of one of devices 38 may forward unauthorized copies via a public network 40, such as the Internet, to other computers 42. (Although networks 26 and 40 are shown in
To enable this sort of unauthorized copying to be tracked, head-end 24 adds a watermark 34 to content 22. The watermark typically comprises a string of symbols, such as a word of eight bits or more, which identifies a subscriber or group of subscribers that are to receive the content. The watermarking operation, which is described in detail hereinbelow, is typically carried out by a general-purpose computer, comprising a programmable processor 30, with a memory 32 for holding watermark data and an interface 36 to network 26. The processor is programmed in software to carry out the functions that are described herein (and may also carry out other functions in the general context of operation of head-end 24). This software may be downloaded to processor 30 in electronic form, over a network, for example. Alternatively or additionally, the software may be stored in tangible media, such as optical, magnetic, or electronic memory media, possibly in memory 32. Further alternatively or additionally, at least some of the watermarking functions of processor 30 may be implemented in dedicated or programmable hardware logic.
A copy detector 44 is coupled to network 40 and analyzes data content transmitted over the network in order to capture and identify unauthorized copies of content 22. Detector 44 may, for example, comprise a general-purpose computer, with an interface 46 to network 40 and a processor 48 (with a memory 50) for extracting encoded watermark data from intercepted content items, as described in greater detail hereinbelow. Processor 48 typically runs under the control of suitable software, which may be downloaded and/or stored as described above, possibly with hardware processing support, as well. After analyzing an item of content, processor 48 outputs a watermark identifier 52, which may then be associated with watermark 34 in order to identify the source of an unauthorized copy.
Embedder 62 typically inserts the encoded watermark in a location in the signal that will be ignored by standard parsers of the carrying signal, as in decoding device 38. For example, when the signal complies with some standard that has hooks for proprietary extensions (such as the user data section in MPEG streams), the encoded watermark can be placed in the proprietary extension to avoid parsing failure by standard decoders.
Typically, for data security, an encryptor 64 encrypts the carrying signal before transmission, using any suitable method of encryption that is known in the art. Head-end 24 thus outputs watermarked, encoded, encrypted content 66 to distribution network 26. Encryption is optional, however, in embodiments of the present invention, and the head-end may thus output watermarked, encoded content without encryption. On the other hand, encoded content may subsequently be encrypted and re-transmitted in encrypted form by users (possibly in violation of the content owner's copyright). The watermark inserted in the content by embedder 62 can be detected even in such encrypted content, as described hereinbelow, without requiring that the content be decrypted.
Decoding device 38 receives content 66, and applies a suitable decryptor 68 to recover the watermarked, encoded carrying signal. To render the content as a stream of video images, a video decoder 70 in device, such as a suitable MPEG decoder, converts the signal into image frames for output to monitor 28. As noted earlier, decoder 70 will typically ignore the watermark (in the user data section or elsewhere in the carrying signal). If the user of device 38 makes a copy of the decrypted carrying signal, however, the watermark will be preserved in this copy. If the user then distributes such a copy, the watermark will be detectable by detector 44.
Encoded watermark 82 may optionally begin with a marker 88 to aid authorized detectors 44 in locating the embedded watermark. The marker may be generated as follows:
Following marker 88, encoded watermark 82 comprises gaps 90, containing arbitrary data Fi, interleaved with vectors Yi 92 that encode respective bits b1 in the string of watermark 34. For every bit bi, from b0 to bn, the watermark is constructed as follows:
The above process is repeated for all the bits of watermark 34. The resulting encoded watermark 82 thus comprises marker 88 followed by an interleaved sequence of gaps 90 and vectors 92: <Ym, F0, Y0, F1, Y1, F2, Y2, . . . >.
Processor 48 in detector 44 analyzes ciphertext of an intercepted digital signal in order to identify and locate possible repetitions of eight- and sixteen-byte vectors, in a ciphertext analysis step 100. Because each vector 92 in encoded watermark 82 contains multiple consecutive repetitions of identical sub-vectors, these repetitions will appear in the ciphertext even when vectors 92 have been shifted relative to the cipher block boundaries in the encryption process. Such repetitions occur by accident only rarely in the intercepted signals, and can thus reliably serve as indications of an embedded watermark.
Assuming the watermark encoding scheme includes a marker 88 (as shown in
In order to extract the next bit bi of the watermark 82, processor 48 finds the next sequence of repetitions of sixteen-byte vectors in the intercepted signal following a gap whose length is approximately equal to the known gap length Gi, at a repetition finding step 104. The processor counts the repetitions in the sequence, which correspond to the vector Yi, at a repetition counting step 106. The processor then compares this count to a threshold parameter Ti, at a threshold checking step 108. This threshold parameter is chosen such that Li0<Ti<Li1 (assuming Li0<Li1, as noted above). If the repetition count is less than Ti, processor 48 concludes that bi=0, at a zero output step 110. Otherwise, the processor concludes that bi=1, at a one output step 112.
Processor 48 checks whether it has decoded all the bits of the watermark, at a completion checking step 114. If not, the processor returns to step 104 in order to find the next gap and vector in the encoded watermark. When all the bits have been decoded, processor 48 outputs watermark identifier 52, comprising the extracted bit string <b0, . . . , bn>, at a watermark output step 116. This identifier should correspond precisely to the digital watermark 34 that was originally embedded in the content in question.
Although the embodiments described above are directed specifically to encoding and decoding of watermarks, the principles of the present invention may similarly be applied in encoding and embedding other sorts of digital symbol strings in a data signal. It will thus be appreciated that the embodiments described above are cited by way of example, and that the present invention is not limited to what has been particularly shown and described hereinabove. Rather, the scope of the present invention includes both combinations and subcombinations of the various features described hereinabove, as well as variations and modifications thereof which would occur to persons skilled in the art upon reading the foregoing description and which are not disclosed in the prior art.
Number | Date | Country | Kind |
---|---|---|---|
218701 | Mar 2012 | IL | national |
1218406.5 | Oct 2012 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB2013/051817 | 3/7/2013 | WO | 00 |