METHOD FOR EFFICIENT ENCODING AND DECODING QUANTIZED SEQUENCE IN WYNER-ZIV CODING OF VIDEO

Description

BRIEF DESCRIPTION OF THE DRAWINGS

The foregoing and other objects, aspects and advantages will be better understood from the following detailed description of a preferred embodiment of the invention with reference to the drawings, in which:

FIG. 1 is a diagram illustrating a bipartite graph as used in the invention.

FIG. 2 is a diagram illustrating a typical Wyner-Ziv coding system.

FIG. 3 is a diagram illustrating a conventional method of encoding quantized sequence K involving binarization, decomposition of a statistical model, and a series of binary codes.

FIG. 4 is a diagram illustrating the improved method of encoding quantized sequence x in a Wyner-Ziv coding system.

FIG. 5 is a diagram illustrating the improved method of decoding quantized sequence K in a Wyner-Ziv coding system.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION

The encoding side and decoding side of the method of the present invention are shown in FIGS. 4 and 5, respectively.

In contrast to prior art encoding as illustrated in FIG. 3, the improved encoding of the present invention as shown in FIG. 4 first estimates 410 a field size M from the statistical model 240, then generates an M-ary code 420 from the same statistical model 240 to encode the quantized sequence x directly into a binary sequence z 425.

To illustrate the encoding process, let us look at FIG. 1. FIG. 1 shows an example of a bipartite graph where on the left hand side of an edge (e.g. items 131 and 132) is a variable node (circle-shaped), and on the right hand side of an edge is a check node (square-shaped). Each variable node carries a value, for example, v_ifor the i^thvariable node (counted from top to bottom, items 111 to 118), where i denotes an integer between 1 and 8. Similarly, each check node carries a value, for example, c_jfor the j^thcheck node (counted from top to bottom, items 121 to 124), where j denotes an integer between 1 and 4. An edge in a bipartite graph may also carry a value, for example, a_ijfor the edge connecting the i^thvariable node and the j^thcheck node. For example, edge a₁₁131 represents the value associated with the component of v₁in c₁, and edge a₈₄132 represents the value associated with the component of v₈in c₄.

In the example shown in FIG. 1, each c_iis related to a subset of {v₁, v₂, . . . , v₈} through a linear equation (e.g. 141 to 144) defined over a finite field A. For example, as shown in FIG. 1,

c
₁
=a
₁₁
v
₁
+a
₂₁
v
₂
+a
₄₁
v
₄
+a
₅₁
v
₅
+a
₇₁
v
₇
+a
₈₁
v
₈.

To see how c₁is calculated, let us suppose that v₁=6, v₂=3, v₄=5, v₅=10, v₇=1 and v₈=0, and that a_ij=1 for all (i, j) pairs appearing in the above equation.

Suppose also that the finite field A is GF(4) consisting of 4 elements 0, 1, 2, and 3. To perform the arithmetic in GF(4), we need to first map the integer set from which v₁v₂. . . v₈is drawn to A (equivalently, GF(4)). A common mapping method is to use modular arithmetic. Hence, v₁=2 (mod 4), v₂=3 (mod 4), v₄=1 (mod 4), v₅=2 (mod 4), v₇=1 (mod 4), and v₈=0 (mod 4). Then according to finite field arithmetic defined over GF(4),

c₁=(2+3)+(1+2)+(1+0)=1+3+1=3.

Note that GF(4) arithmetic field operations (addition ‘+’ and multiplication ‘*’) are defined by the following two tables.

+
0
1
2
3

0
0
1
2
3

1
1
0
3
2

2
2
3
0
1

3
3
2
1
0

*
0
1
2
3

0
0
0
0
0

1
0
1
2
3

2
0
2
3
1

3
0
3
1
2

Suppose that each symbol in the field A is represented by log(M) bits, where “log” denotes the logarithm function to base 2 throughout this document. Since m is generally much smaller than n, compression is achieved. The resulting compression rate in bits per symbol is (mlog(M))/n. In practice, the encoder 420 selects m and M according to the length of the input sequence x and the expected compression rate estimated from the statistical model 240. The estimation is performed for each frame to be encoded. Note that once M is chosen, the encoder 420 knows how to choose m to achieve the desired compression rate.

A video sequence 205 typically consists of integer numbers (ranging, say, from 0 to 255″). After DCT transform 210 and quantization 220, quantized sequence x 225 is another sequence of integer numbers (typically with a larger range). This sequence is encoded by using the above described method. Say, for example, a number 13 is to be encoded and the field size is 4. We can use modulo 4 operation to convert 13 into 1. The M-ary encoding operations 420 are then performed arithmetically in finite field GF(4) to produce syndrome sequence z 425. Compression is achieved since the number of bits used to represent the syndrome sequence z is often much less than the number bits used to represent the input quantized sequence.

In this invention, the field size M to be used to generate an M-ary code is estimated from the statistical model 240. Specifically, suppose that the alphabet of the quantized sequence x is an integer set. The estimation process seeks a subset of M integers such that the probability (determined by the statistical model 240) of x containing symbols outside the size M subset is smaller than a prescribed threshold. To facilitate computation in different computing platforms, one may also want to constrain the selection of M to satisfy certain properties, for example, being an integral power of two.

On the decoder 430 side, the improved method shown in FIG. 5 generates an M-ary code given by a bipartite graph from the statistical model 240, and then uses the same M-ary code to decode 430 quantized sequence x 435 from the sequences z 425 and y 295, the latter of which is generated for input to the decoder 430 from the previously decoded frames, for example, in the same manner as in the prior art shown in FIG. 2.

Note that, in principle, the encoder 420 and decoder 430 may be distant from one another. By matching the learning method and the data used to learn the model, we can make sure that the encoder 420 and the decoder 430 are using the same statistical model 240 and, thus, generate the same bipartite graph. Assuming a given statistical relationship, there are various ways to learn the relationship in practice. One such way is to use a portion of the video frames for the learning.

In general, an M-ary code described by a bipartite graph with n variable nodes and m check nodes can be used to encode any sequence of length n into its syndrome sequence of length m. The value of n is given by the input sequence x, whereas m is determined from the statistical model 240. For any pair (n,m), the encoder 420 and the decoder 430 generate the same bipartite graph (hence, the same code) from the statistical model 240.

The decoding rule can be either the MAP (maximum a posteriori) decoding rule or its approximations. By using the MAP decoding rule, the decoder tries to find the most probable sequence given y (that is, most probable according to the statistical model 240) whose syndrome sequence is equal to z in the generated bipartite graph.

Enforcing the MAP decoding rule is computationally expensive. An alternative is to approximate the MAP decoding rule by using an iterative decoding process. The process can be described as follows. Let G denote the bipartite graph describing the M-ary code 420 generated from the statistical model 240.

Step 1: Initialize a counter k as 1. Let v^(k)denote the sequence in the variable nodes of G at the k^thiteration. Initialize v⁽¹⁾as y.

Step 2: Calculate the syndrome sequence of v^(k)in G. If the syndrome sequence is equal to z, the process terminates; otherwise, the decoder finds a new sequence v^(k+1)according to the statistical model and G. One possible method in determining the new sequence v^(k+1)is the well-known belief propagation method generalized to a finite field of size M.

Step 3: Increase the counter k by one. If k is greater than a prescribed threshold, the process terminates; otherwise, go to Step 2.

At the end of the above iterative process, the decoder outputs x as v^(k). Note that if the above iterative process terminates with a sequence v^(k)whose syndrome sequence is NOT z in G, the decoder detects a decoding failure.

It should be noted that use of the bi-partite graph provides a computationally effective methodology for iterative decoding, and is therefore the best mode of implementing the invention. However, in principle, the invention is also operable for any M-ary linear code H, where H is an m by n matrix from a finite field with M elements and where the encoder calculates z=Hx.

While the invention has been described in terms of a single preferred embodiment, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the appended claims.

Claims

1. A method for encoding and decoding quantized sequences in Wyner-Ziv encoded video transmissions, comprising: estimating a minimum field size from a given statistical model of a relationship between a quantized sequence of a current video frame and side information obtained from previous video frames decoded from previously encoded quantized sequences; andusing the statistical model to encode a quantized sequence x into a syndrome sequence z and decode the quantized sequence x from syndrome sequence z and said side information, said encoding not having access to said side information, and said encoding and said decoding each being able to separately construct a statistical model that is the same as said given statistical model.
2. A method as in claim 1, wherein encoding a quantized sequence x further comprises: constructing from the statistical model a bi-partite graph G of variable nodes and check nodes in which an alphabet A of each check node is a Field of said minimum field size; andencoding the quantized sequence into a syndrome sequence z using said bi-partite graph by feeding the quantized sequence into the variable nodes and taking the syndrome sequence z from the check nodes as output.
3. A method as in claim 1, wherein decoding syndrome sequence z further comprises: constructing from the statistical model a bi-partite graph G of variable nodes and check nodes in which an alphabet A of each check node is a field of said minimum field size; anddecoding the syndrome sequence z using the bi-partite graph G and side information y obtained from previously decoded video frames.
4. The method of claim 3, wherein decoding the syndrome sequence z to reconstruct the quantized sequence further comprises: feeding the sequence y into said variable nodes; andusing an iterative process to obtain from y an iterated sequence at said variable nodes whose output at said check nodes is equal to said syndrome sequence, the iterated sequence being the reconstructed quantized sequence.
5. The method of claim 4, wherein the iterative process is the maximum a posteriori (MAP) rule or its simplified approximations.
6. The method of claim 41 wherein the iterative process comprises the steps of: initializing v(k) as the prior decoded quantized sequence y for k=1;calculating a syndrome sequence z′ of v(k) in G;if the syndrome sequence z′ is equal to the syndrome sequence z, then set a v(k) as the decoded quantized sequence x and terminate the iterative process;if the syndrome sequence z′ is not equal to the syndrome sequence z, and if k is less than a prescribed threshold, then generate a new v(k) for k=k+1 and return to the calculating step;if k is greater than or equal to the prescribed iteration threshold, then terminate the iterative process as failed.
7. The method of claim 6, wherein a new v(k) for k=k+1 is determined according to the statistical model and G.
8. The method of claim 6, wherein a new v(k) for k=k+1 is determined by the belief propagation method generalized to a finite field of size M.
9. The method of claim 1, wherein field size M is estimated by determining a subset of an alphabet for the quantized sequence such that a probability, according to the statistical model, of the quantized sequence containing a symbol outside the subset is smaller than a prescribed threshold.
10. The method of claim 9, wherein the prescribed probability threshold is the same whether estimation of field size M is done by the encoder or the decoder.
11. A system for encoding and decoding quantized sequences in Wyner-Ziv encoded video transmissions, comprising: means for estimating a minimum field size from a given statistical model of a relationship between a quantized sequence of a current video frame and side information obtained from previous video frames decoded from previously encoded quantized sequences; andmeans for using the statistical model to encode a quantized sequence x into a syndrome sequence z and decode the quantized sequence x from syndrome sequence z and said side information, said encoding not having access to said side information, and said encoding and said decoding each being able to separately construct a statistical model that is the same as said given statistical model.
12. A system as in claim 11, wherein encoding a quantized sequence x further comprises: means for constructing from the statistical model a bi-partite graph G of variable nodes and check nodes in which an alphabet A of each check node is a field of said minimum field size; andmeans for encoding the quantized sequence into a syndrome sequence z using said bi-partite graph by feeding the quantized sequence into the variable nodes and taking the syndrome sequence z from the check nodes as output.
13. A system as in claim 11, wherein decoding syndrome sequence z further comprises: means for constructing from the statistical model a bi-partite graph G of variable nodes and check nodes in which an alphabet A of each check node is a field of said minimum field size; andmeans for decoding the syndrome sequence z using the bi-partite graph G and side information y obtained from previously decoded video frames.
14. The system of claim 13, wherein the means for decoding the syndrome sequence z to reconstruct the quantized sequence further comprises: means for feeding the sequence y into said variable nodes; andmeans for using an iterative process to obtain from y an iterated sequence at said variable nodes whose output at said check nodes is equal to said syndrome sequence, the iterated sequence being the reconstructed quantized sequence.
15. The system of claim 14, wherein the means for using an iterative process use the maximum a posteriori (MAP) rule or its simplified approximations.
16. The system of claim 14, wherein the means for using an iterative process further comprise: means for initializing v(k) as the prior decoded quantized sequence y for k=1;means for calculating a syndrome sequence z′ of v(k) in G; wherein,if the syndrome sequence z′ is equal to the syndrome sequence z, then v(k) is set as the decoded quantized sequence x and the iterative process is terminated;if the syndrome sequence z′ is not equal to the syndrome sequence z, and if k is less than a prescribed threshold, then a new v(k) is generated for k=k+1 and said calculating means is reused;if k is greater than or equal to the prescribed iteration threshold, then the iterative process is terminated as failed.
17. The system of claim 16, wherein a new v(k) for k=k+1 is determined according to the statistical model and G.
18. The system of claim 16, wherein a new v(k) for k=k+1 is determined by the belief propagation method generalized to a finite field of size N.
19. The system of claim 11, wherein field size M is estimated by determining a subset of an alphabet for the quantized sequence such that a probability, according to the statistical model, of the quantized sequence containing a symbol outside the subset is smaller than a prescribed threshold.
20. The system of claim 19, wherein the prescribed probability threshold is the same whether estimation of field size N is done by the encoder or the decoder.

METHOD FOR EFFICIENT ENCODING AND DECODING QUANTIZED SEQUENCE IN WYNER-ZIV CODING OF VIDEO

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

US Classifications

International Classifications

Abstract

Description

Claims