The invention relates to a method and arrangement for embedding auxiliary data in an information signal. The invention also relates to a method and arrangement for detecting said auxiliary data.
A known method of embedding auxiliary data in a PCM-coded information signal, also referred to as fragile watermarking, is disclosed in European Patent EP-B 0 205 200. In this prior-art method, the least significant bit of every nth signal sample is replaced by an auxiliary data bit. An improved variant of this embedding scheme is disclosed in European Patent Application EP-A 0 359 325. The least significant bit of a PCM sample is herein modified, depending not only on the actual auxiliary data bit, but also on one or more other bits of the same PCM sample.
It is an object of the invention to further improve the method of embedding auxiliary data in a PCM signal.
According to the invention, this is achieved by the method as defined in claim 1. The method leads to an improvement of the rate-distortion ratio.
In the prior-art embedding method mentioned in the opening paragraph, one half of the host signal symbols is not modified because the least significant bit of the PCM sample is already equal to the data bit to be embedded. The other half of the symbols is modified by flipping the least significant bit to make said bit equal to the data bit to be embedded. The modified symbol thus differs by +1 or −1 from the host PCM sample. This is summarized in the following Table I.
The modification of host symbols introduces distortion. For PCM encoded information signals (such as audio samples, image pixels), the “squared error” is often used, defined as:
D(x,y)=(y−x)2
The average distortion of the prior-art embedding scheme is:
The embedding rate of the prior-art embedding scheme is R=1 bit/symbol, and the rate-distortion ratio is therefore R/D=2.
The prior-art embedding method is a species of least-significant bits replacement, in which the message is embedded in the R least significant bits (R is a positive integer) of the signal symbols. The following Table IIa shows the least-significant bit replacement scheme for R=2.
In this case, the average distortion is:
and the rate-distortion ratio is R/D=0.8.
Note that in the R=2 embedding scheme of Table IIa, the two least significant bits of host symbols are replaced by a 2-bit message w. As the Table shows, this may lead to a distortion of D=32 per symbol. Obviously, it is better to subtract 1 from a sample instead of adding 3. For R=2, this strategy is summarized in the following Table IIb.
In this case, the distortion is:
and the rate-distortion ratio is R/D=1.33, which is considerably better than the ratio R/D=0.8 of the scheme which is shown in Table IIa.
The inventors have found a theoretically realizable rate-distortion curve, which shows that there must be embedding schemes that have a better rate-distortion ratio. This “boundary” curve is denoted 20 in
“Coded” LSB Modulation
According to a first aspect of the invention, the message symbols are represented by the syndrome of a vector formed by the least significant bit of each one of a group of L (L>1) host symbols, hereinafter also referred to as LSB vector. The expression “syndrome” is a well-known notion in the field of error correction. In error correction schemes, a received data word (the input vector) is processed to obtain its syndrome. Usually (bot not necessarily), said processing implies multiplication of the data word with a given matrix. If the syndrome is zero, all bits of the data word are correct. If the syndrome is unequal to zero, the non-zero value represents the position (or positions) of erroneous bits. Hamming codes have Hamming distance 3 and thus allow 1 erroneous bit to be corrected. Other codes, such as Golay codes allow plural bits of a data word to be corrected.
In a mathematical sense, the data embedding method in accordance with the invention resembles error correction. In order to embed a number of data bits in a group of L host symbols, the encoder (12 in
In the following example, the (7,4,3) Hamming code is used for embedding data in the least significant bit of host symbols. In the field of error correction, the (7,4,3) Hamming code allows 1 bit in 7-bit data words to be corrected (Hamming distance is 3) using 7−4=3 parity bits. In analogy herewith, the embedding embodiment allows 3 message bits to be embedded in 7 host symbols. To compute the syndrome, the LSB vector of the 7 host symbols is multiplied (all mathematical operations are modulo-2 operations) with the following 3×7 matrix:
The columns of this matrix include all possible bit combinations except 000.
Assume that the seven host symbols have least significant bits 0, 0, 1, 1, 0, 1, 0, i.e. the input LSB vector is (0011010). The modulo-2 multiplication of this vector with the above matrix yields:
The output of this multiplication (001) is the syndrome of the input vector (0011010). It is this syndrome which represents the embedded message symbol.
Obviously, the syndrome represented by the original host symbols is generally not the message symbol to be embedded. The least significant bit of only one of the host symbols must therefore be modified. This is achieved by the following steps:
The least significant bit of the 3rd host symbol is thus modified in the present example, which results in a modified vector (0001010). In the decoder, this vector is subjected to syndrome determination. The result is:
which indeed represents the message symbol (010).
The distortion per 7 symbols is
(probability ⅛ that none of the host symbols is changed and probability ⅞ that one symbol is changed by ±1), so that the average distortion per symbol is D=⅛. The embedding rate is 3 bits per 7 symbols, i.e. R= 3/7. The corresponding (R,D)—pair is shown as a + sign denoted 23 in
More generally, the embedding based on Hamming codes allows embedding of m-bit message symbols into 2m−1 host symbols with embedding rate
and distortion
The (R,D)-pairs corresponding to m=2, 3, 4 and 5 are shown as + signs in
which is approximately equal to m for large m. For m=10, the ratio is roughly a factor 5 larger than low-bit modulation. This significant improvement is achieved with moderate complexity of the embedding and decoding hardware or software.
Instead of the binary Hamming code, a Golay code can be used. There is one binary Golay code: the (23,12,7) Golay code. In the field of error correction code, the Golay code can detect and correct 3 erroneous bits (Hamming distance 7) in 23-bit data words having 12 information bits and 23−12=11 protection bits. In analogy herewith, the (23,12,7) Golay code can embed 11 data bits in 23 host symbols by modifying max 3 symbols. Application of the Golay code leads to:
The corresponding (R,D) pair is indicated by a ⋄ sign denoted 24 in
Ternary Embedding Methods
According to a second aspect of the invention, the message symbols are represented by the so-termed “class” of host symbols. A symbol x is said to be in “class w” if x mod 3=w for w=0,1,2. In other words:
The data embedder 12 (see
The embedding rate is R=log2 3≈1.585, and the average distortion rate is D= 1/9·6=⅔. The corresponding (R,D)-pair is shown as a ⋄ sign denoted 30 in
The above-mentioned form of ternary embedding, in which each host symbol has data embedded is referred to as “uncoded” ternary modulation. Again, however, it is possible to perform “coded” ternary modulation, in a similar manner as described hereinbefore with respect to “coded” LSB modulation, i.e. by embedding ternary symbols in groups of host symbols. It is again also possible to do this by using (ternary) Hamming codes or a (ternary) Golay code.
The embedding operation using a ternary Hamming code with m=2 parity checks will now be shown in more details. According to the principles set out above for binary Hamming codes, one would expect the matrix to have 3m−1=8 columns (all combinations except 00). However, columns being multiples of other columns need not be included in the matrix. Accordingly, the number of columns is (3m−1)/2=4. This is also the number of host symbols forming a vector. The matrix is:
This ternary Hamming code allows two ternary message symbols to be accommodated in groups of 4 host symbols having respective classes 1, 2, 0, and 1. The syndrome of the host vector (1,2,0,1) is (note that all mathematical operations are now modulo-3):
We want to embed two ternary symbols (1,2) in this group of host symbols. The difference is (1,2)−(0,0)=(1,2). The difference can be found in the 4th column of the matrix. Accordingly, the 4th host symbol is modified. If the difference is found in the matrix, the relevant host symbol is modified by adding 1 to its original PCM value. The group of host symbols is thus modified into (1,2,0,2). In the decoder, this vector is subjected to syndrome determination. The result is:
which indeed represents the message symbols (1,2).
If we want to embed the ternary symbols (2,2) in the group of host symbols, the difference is (2,2)−(0,0)=(2,2). There is no column having this value in the matrix, because (2,2) is a multiple of (1,1). This value (1,1) can be found in the 3rd column of the matrix. Accordingly, the 3rd symbol is modified, but now the modification involves adding 2 to (or subtracting 1 from) the respective PCM value rather than adding 1. The group of host symbols is thus modified into (1,2,2,1). In the decoder, this vector is subjected to syndrome determination. The result is:
which indeed represents the message symbols (2,2).
Generally, when Hamming codes are used with a given number m of parity check symbols, the code word length is (3m−1)/2. Therefore:
Two (R,D) pairs are indicated by □ signs and denoted 31 (m=2) and 32 (m=3) in
Again, a Golay code can be used instead of the Hamming code. There is only one ternary Golay code: the (11,6,5) Golay code. This code can embed 11−6=5 ternary symbols in 11 host symbols by modifying max 2 symbols (Hamming distance is 5). Application of the Golay code leads to:
The corresponding (R,D) pair is indicated by the + sign denoted 33 in
It should be noted that ternary embedding (class=x mod 3) is a special case of more general n-ary embedding (class=x mod n). The previously described LSB embedding is also a special case thereof, viz. n=2. The invention applies to any integer n.
Two-Dimensional Codes
According to a third aspect of the invention, the message symbols are embedded in pairs of host symbols. This is a two-dimensional version of the embedding methods described above. In this coding mode, the two-dimensional symbol space of symbol pairs (xa,xb) is “colored” with 5 colors. Each point on the grid denotes a symbol pair, and has a color different from its neighbors. The colors are numbered 0 . . . 4, and each color represents a message symbol wε{0,1,2,3,4}. The following Table IV shows (a part of) the two-dimensional grid.
The decoder just looks at the color of the received symbol pair (ya,yb). The encoder checks whether (xa,xb) has the color w to be embedded. If that is not the case, it changes the symbol pair (xa,xb) such that the modified pair has the color w. For example, if a message w=4 is to be embedded in host symbol pair (xa,xb)=(76,79) having color 3, the embedder modifies the symbols into a pair having color 4, e.g. the pair (ya,yb)=(75,78). The parameters of this embedding scheme are:
The corresponding (R,D)-pair is shown as a ⋄ sign denoted 40 in
It will be appreciated that the two-dimensional embedding scheme can be extended to more dimensions. In a three-dimensional grid, for example, each point cannot only be “moved” to the four neighbors in the same layer, but also up or down. Seven colors, i.e. seven message symbols, are available in this scheme.
Like the LSB modulation and ternary modulation methods described above, the two-dimensional method can also be “coded” by means of 5-ary Hamming or Golay codes. For a given number m of parity checks, a code length of (5m−1)/4 5-ary symbols is obtained. The coding scheme processes (5m−1)/2 symbols. Its parameters are:
Two of such (R,D) pairs are indicated by □ signs and denoted 41 (m=2) and 42 (m=3) in
The invention can be summarized as follows. Information signals such as grayscale images or audio signals are represented as a sequence of PCM signal samples. To embed auxiliary data in the signal, the samples are slightly distorted. There is a so-termed “rate-distortion function” (20) which gives the largest embedding rate R given a certain distortion level D. It appears that the efficiency of prior art embedding schemes such as LSB replacement (21,22) can be improved. The invention discloses such embedding schemes (23,24). According to the invention, the signal is divided into groups of L (L>1) signal samples (x). For each group of signal samples, a vector of least significant portions (x mod n) of the signal samples is created. For n=2, the vector comprises the least significant bit of each signal sample. The syndrome of said vector (as defined in the field of error detection and correction) represents the embedded data. Only one (or a few, in any case less than L) signal sample(s) of a group needs to be modified so as to achieve that the vector assumes a desired syndrome value.
Number | Date | Country | Kind |
---|---|---|---|
01201778.6 | May 2001 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/IB02/01702 | 5/15/2002 | WO |