The present invention relates to a technique relating to a highly-efficient image encoding method for efficiently encoding and decoding an image (static image or video (moving) image).
Priority is claimed on Japanese Patent Application No. 2007-281556, filed Oct. 30, 2007, the contents of which are incorporated herein by reference.
In the encoding of an image (static or video image), a prediction encoding method is a mainstream, in which pixel values of an encoding target are predicted by means of spatial or temporal prediction using previously-decoded pixels.
For example, in 4×4 block horizontal intra prediction in H.264/AVC, a 4×4 block from pixel A to pixel P (described as “A . . . P”, similar forms will be used in other descriptions) as an encoding target is predicted horizontally using previously-decoded adjacent pixels a . . . d on the left side, as shown below:
That is, horizontal prediction is performed as follows:
Next, the prediction residual is computed as follows:
After that, orthogonal transformation, quantization, and entropy encoding are executed so as to perform compressive encoding.
Similar operation is performed in motion-compensated prediction. That is, in 4×4 block motion compensation, a 4×4 block A′ . . . P′ as a result of prediction of A . . . P by using another frame is generated as follows:
Then, the prediction residual is computed as follows:
After that, orthogonal transformation, quantization, and entropy encoding are executed so as to perform compressive encoding.
For the upper-left position (as an example) of the block, a corresponding decoder obtains a predicted value A′ and a decoded value (A-A′) of the prediction residual, and acquires an original pixel value A as the sum of the above-obtained values. This is a reversible case. However, even in an irreversible case, a decoder obtains a prediction residual decoded value (A-A′+Δ) (Δ is an encoding noise), and acquires (A+Δ) by adding a predicted value A′ to the above-obtained value.
The above explanation is applied to 16 (i.e., 4×4) pixel values. Below, a one-dimensional form based on a simplified concept will be shown. Also below, a popular 8-bit pixel value is employed. Therefore, the pixel value is an integer within a range from 0 to 255 (i.e., including 256 integers). Similar explanations can be applied to other pixel values other than the 8-bit pixel value.
Now it is assumed that x denotes a pixel value as an encoding target, and x′ denotes a predicted value thereof. Since x′ is close to x, the prediction residual (x-x′) can be within a range from −255 . . . 255, and concentrates at values in the vicinity of 0, so that the number of large absolute values is relatively small. This relationship is shown in a graph of
Since the information amount of biased distribution is smaller than uniform distribution, it may be compressed after the encoding. Conventionally, a highly efficient compression is achieved using such biased distribution.
Non-Patent Document 1 relates to vector encoding which is described in embodiments of the present invention explained later, and discloses a pyramid vector quantization technique where representative vectors are regularly positioned within a space.
Non-Patent Document 2 discloses a vector quantization technique based on an LBG algorithm for optimizing representative vectors of vector quantization by means of learning, so as to irregularly arrange representative vectors in a space.
Problem to be Solved by the Invention
In conventional techniques, it is assumed here that the predicted value x′=255. Since the pixel value x belongs to 0 . . . 255, the prediction residual x-x′=−255 . . . 0, that is, it should be 0 or smaller.
Therefore, in the relevant prediction residual distribution, almost right half (i.e., positive direction) is not used. If qualitatively describing while disregarding end portions (having a very small occurrence probability) of the distribution, 1 bit is required for indicating information of “which of right and left” (e.g., 0 for right and 1 for left) because the distribution is symmetrical in the left-to-right direction. When the right-half distribution is not used (where there is a part exceeding a possible value range), the above 1 bit is originally needless. Also, when the predicted value x′=0, almost left half of the relevant prediction error distribution is not used, and the “1 bit” is originally unnecessary.
The above relationships are shown in
For qualitative description, pw(d) is defined, which denotes a probability distribution extending in the left-to-right direction.
Actually, values in the right half are never used. Therefore, the true distribution of “error d” is twice as much as pw.
pc(d)=2pw(d)(when d≦0)
pc(d)=0(when d>0) (2)
When regarding pw as occurrence probability, the average entropy Hw is estimated as follows.
The average entropy computed using a true occurrence probability is as follows.
However, in conventional encoding, only a difference (x-x′) is targeted, and thus it is impossible to delete the useless “1 bit”.
This is because:
In light of the above problems, an object of the present invention is to improve encoding efficiency of predictive encoding, by not computing the difference between an original pixel value and a predicted value therefor when performing temporal and spatial prediction (similar to conventional techniques), and encoding the original pixel value based on a distribution corresponding to the original pixel value in consideration of an above-described “excess” for the distribution.
Means for Solving the Problem
The present invention is applied to predictive encoding for encoding a pixel value of an encoding target (the value may be associated with a pixel block) by using a predicted value generated by means of spatial or temporal prediction (motion compensation) using a previously-decoded image. In order to solve the above problems, a main feature of the present invention is to encode the value of an encoding target pixel (or pixel block) by using a conditional distribution for the predicted value of the relevant pixel value in consideration of the upper and lower limits of possible values of the pixel value.
The upper and lower limits of possible values of the pixel value correspond to upper and lower limits of possible values of a pixel in a digital image. In an 8-bit image which is most popularly used, the upper and lower limits are 255 and 0, while they are 1023 and 0 for a 10-bit image.
It is no problem to assume that no pixel having a value (e.g., 2000) larger than the upper limit or a value (e.g., −1) is present in the original image. This is the consideration of the upper and lower limits, and the present invention improves the efficiency of the encoding by using such a matter.
The conditional distribution for the predicted value of a pixel value is a probability distribution which indicates what value an original pixel value x has for a predicted value x′ obtained for a pixel.
The “conditional” is equal to that the predicted value is x′.
In mathematics, the above is represented by Pr(x|x′), which generally has a bell form whose peak is x′.
The distribution of x under the condition that the predicted value is x′ and (of course) the distribution of x without such a condition are always included within a range from the lower to the upper limits (e.g., corresponding to integers from 0 to 255 for an 8-bit image) of the relevant pixel.
Additionally, when performing the prediction of the present invention in block units, vector quantization can be used for encoding a conditional distribution of a pixel block value obtained by block prediction.
Effect of the Invention
In accordance with the present invention, when processing a difference between a predicted value and an original pixel value, no “absence of the predicted value as important information” in the conventional method occurs, but the predicted value is fully used for the encoding, thereby encoding an image (static image or video image) with a reduced amount of code.
The general concept of the present invention will be concretely and simply explained.
If it is not known which of four values {−2, −1, 1, 2} a signal d has (where equal probability of 25% is assumed), “2 bit” is necessary for encoding this signal.
If it is known that the signal d is positive, only two values {1, 2} are possible, and encoding can be performed using “1 bit”.
Similar explanations can be applied to predictive encoding for a static or video image.
When an image signal x (0≦x) has a predicted value x′, the distribution of prediction error d (=x-x′) varies in accordance with the predicted value x′.
For example, if x′=0, then 0≦d0, that is, d does not have a negative value. If x′=255, then d0≦0, that is, d does not have a positive value (refer to
As described above, before the encoding or decoding, the range where d is present can be narrowed by referring to the predicted value x′, which should improve the encoding efficiency.
The process of narrowing the presence range of d equals to normalization of the range of “x′+d” (for an 8-bit image) into 0 . . . 255.
This process also corresponds to a clipping step 105 in an encoding method shown in a flowchart of
In addition, optimum representative vectors can be adaptively designed by designing a representative vector for each block as a prediction unit (this process corresponds to a representative vector design step 106 in the encoding method shown in the flowchart of
After narrowing the encoding target signal as described above, an ordinary encoding process is executed so that a code shorter (i.e., having a higher level of encoding efficiency) than a conventional code is output (see a quantization index encoding step 109 in the flowchart of
The pixel value prediction is employed as motion compensation or intra prediction is existing encoding techniques MPEG-1, MPEG-2, MPEG-4, and H.264/AVC (see step 101 in the encoding method shown in the flowchart of
In the existing prediction encoding techniques, encoding is performed on an assumption that the prediction error can always be positive or negative (this concept is shown in
Next, the conceptual function of embodiments of the present invention will be explained in detail.
An example of performing prediction in pixel block units and applying vector quantization to the encoding of a conditional distribution of a pixel block value by means of the block prediction will be explained. For a pixel having a predicted value x′, a basic concept of performing the encoding by using a probability distribution which indicates the actual value of the original pixel value x is also applied to the encoding performed in pixel units.
When the Distance Measure is L∞ Norm
An example of quantization and encoding in a two-dimensional space will be explained with reference to
For example, point (0,3) and point (−2,−3) have the same L∞ norm.
In
In
It is assumed here that the occurrence probability of “L∞ norm=2”, to which the difference vector (−2, 2) (corresponding to the original pixel values) belongs, is 0.3.
Since there are 16 representative vectors whose L∞ norm is 2, the amount of information required for encoding the original pixel values is as follows:
−log20.3+log216=5.737[bit] (9)
Next, the amount of code generated when the prediction error is not computed in the present invention will be evaluated. The concept thereof is shown in
In
The center is the predicted value (x1′, x2′)=(255, 100). Similar to the above explanation, it is assumed that the probability that the L∞ norm from the center is 2 is 0.3.
Since 9 representative vectors belong to the area satisfying the above, the information amount required for encoding the original pixel values is as follows:
−log20.3+log29=4.907[bit] (10)
which is lower by 0.83 bit than the case of computing the relevant difference (see formula (9).
The Number of Representative Vector Points on a Plane Having a Constant Norm
When N denotes the dimension and K denotes the norm, the number N(L,K) of representative vectors is given by the following formula:
N(L,K)=(2K+1)L−(2K−1)L (11)
In the example of
N(2,4)=92−72=81−49=32
In the encoding, the following information amount is required to specify representative vectors after the norm is specified.
log2 N(L,K)[bit] (12)
In addition,
0≦nKi,pKi≦K(i=1 . . . L)
When there is no excess part, the relationship is:
Ki(upper limit, lower limit)≡K
Here the number of white circles (∘) is indicated by:
N′(L, K, nK1, . . . , nKL, pK1, . . . , pKL)
The number is computed by:
In the above formula:
f(K,K′)=K′−1(when K′=K)
f(K,K′)=K′(when K′<K) (14)
The degree of entropy reduction in accordance with the method of the present invention is evaluated by:
log2 32−log2 21=0.608[bit]
Next,
Therefore, the area of the excess part is considerably reduced.
When the Distance Measure is L1Norm
Below, a so-called pyramid vector quantization case, in which the distance from the origin corresponds to an L1 norm, will be explained with reference to
In
In this case, the entropy obtained when computing no difference between the original pixel value and the predicted value (in the present invention) is lower than the case of computing the difference (in conventional methods) by:
log2 16−log2 10=0.678[bit] (15)
The number of Representative Vector Points on a Plane Having a Constant Norm
Similar to the above example, N(L,K) indicates the number of representative vectors where L1 norm is K in the L-dimensional pyramid vector quantization, and is computed in a recurrence faun as follows (see Non-Patent Document 1).
Next, the number of vectors in the “excess part” is considered. For example, five “excess” representative vectors (●) are generated due to pK1=1, and one “excess” representative vectors are generated due to pK2=3.
First, for simplification, only pK1=1 has the relationship of pK1<K, and the others have the following relationship:
nK1,Ki=K(i=2, . . . , L)
In such a case, the number M(L,K) of “excess” representative vectors (●) is computed using N in formula (16), as follows:
For L=2, K=4, and pK1=1 in
A similar method can be applied to another dimension. For example, if pK2=3, the following computation can be performed:
M(2,4−3−1)=M(2,0)=1
Accordingly, the number of the white circles (o) can be computed as “N−M” where N is the total number of the representative vectors, and M is the number of the excess representative vectors.
In order to accurately compute the “excess” amount as described above, the following conditions should be satisfied:
As a specific example, in
If it is assumed that the shaded part is defined as shown in
In such a case, the “excess” amount is not accurately computed.
In Ordinary Vector Quantization
In ordinary vector quantization using a well-known LBG algorithm (see Non-Patent Document 2) in which representative vectors are not regularly arranged, the present invention is performed as described below.
Since the design is of course performed within the range of 0 . . . 255 for each dimension (i.e., 0≦x1, x2≦255), no “excess” representative vectors (as generated in the conventional method in
Below, an embodiment of the present invention will be explained, in which prediction is embodiment in pixel block units, and vector quantization is applied to the encoding of a conditional distribution of a pixel block value by means of the block prediction.
In the present embodiment, representative vectors for the vector quantization are generated based on data for learning, which is prepared in advance.
Here, only learning data associated with the predicted value x′ for an encoding target block may be used. However, the number of data items is small in this case. Therefore, the difference from the original pixel value (i.e., x-x′) may be stored is advance, and a value obtained by adding the predicted value to the difference may be used in the learning.
In a pixel value prediction step 101, pixel value prediction of the encoding target block is performed by applying motion compensation or intra prediction to each block as a unit, thereby obtaining a predicted value 102 (vector quantity).
In a shift step 104, the predicted value is added (shifted) to a differential value (separately stored) in difference distribution data 103 (vector quantity). In the next clipping step 105, each vector element is clipped to be within a range of 0 . . . 255. The clipped data functions as original data for learning.
In a representative vector design step 106, representative vectors are designed using the original data for learning, by means of the LBG algorithm or the like (thereby obtaining a result as shown in
In the next vector quantization step 107, an original pixel value 108 (vector quantity) of the encoding target block is associated with a representative vector closest to the original pixel value.
In the next quantization index encoding step 109, based on the obtained index information of the relevant representative vector is encoded based on the corresponding occurrence probability, by means of entropy encoding such as arithmetic encoding. The obtained code is output, and the operation is completed.
The function of the encoding process shown in
If each value in this range has the same occurrence probability, each value appears with a probability of 1/256 (see
However, if a predicted value x′ of the original pixel value x has been obtained, the probability distribution for possible values of the original pixel value x can be a non-equal probability distribution based on a distribution of known prediction error values. Based on this feature, the present method reduces the encoding cost.
The prediction error value (x-x′) as difference between the original pixel value x and the predicted value x′ (i.e., differential value) can be within a range from −255 to 255. The distribution of the difference can be obtained by performing a predictive encoding experiment applied to many sample images. Data of the difference distribution is accumulated in advance, so as to store the data.
The difference distribution is the distribution of the frequency or the probability of each prediction error value, and an example thereof is shown in
In order to encode the original pixel value x, the predicted value x′ is computed in the pixel value prediction step 101.
In the shift step S104, the predicted value x′ is added (shifted) to each differential value in the difference distribution data 103, that is, each prediction error value x-x′ on the horizontal axis in the difference distribution shown in
The transformed result corresponds to a probability distribution of possible values of the original pixel value x when the predicted value x′ is known.
In the distribution of
When encoding the original pixel value x based on the distribution shown in
Vector quantization is an example of efficient encoding under such a probability distribution. Furthermore, in the present embodiment, based on a probability distribution as shown in
An image original signal and a previously-decoded image signal are input through a signal terminal 300.
An original pixel value of an encoding target block is stored in an original pixel value storage memory 306.
In a pixel value predictor 301, pixel value prediction of the encoding target block is performed by means of motion compensation, intra prediction, or the like, executed in block units, thereby obtaining a predicted value (vector quantity), which is stored in a predicted value storage memory 302.
In an adder and clipper 304, a difference distribution data vector, which has been separately stored in a difference distribution storage memory 303, is added to the predicted value, so as to clip each element of the relevant vector to be included in a range of 0 . . . 255. This functions as the original data for learning.
In a representative vector designer 305, representative vectors are designed using the original data for learning, by means of the LBG algorithm or the like.
Next, the original pixel value (vector quantity) of the encoding target block, which has been stored in the memory 306, is associated by a vector quantizer 307 with the representative vector closest to the original pixel value.
In a quantization index encoder 308, the index information of the obtained representative vector is encoded based on the occurrence probability thereof, by means of entropy encoding such as arithmetic encoding. The obtained code is output through an output terminal 309, and the operation is completed.
In a pixel value prediction step 401, pixel value prediction of the encoding target block is performed by applying motion compensation or intra prediction to each block as a unit, thereby obtaining a predicted value 402 (vector quantity).
In an addition step 404, a differential value vector, which has been separately stored, is added to the predicted value. In the next clipping step 405, each vector element is clipped to be within a range of 0 . . . 255. The clipped data functions as original data for learning.
In a representative vector design step 406, representative vectors are designed using the original data for learning, by means of the LBG algorithm or the like.
Based on the occurrence probability of index information of the obtained representative vector, the relevant index is decoded in the quantization index decoding step 407.
In the next vector inverse-quantization step 408, a representative vector value corresponding to the index is obtained. The obtained value is output, and the relevant operation is completed.
Since the block structure of a decoding apparatus of the present embodiment can be easily analogized based on the explanation for the block diagram of the encoding apparatus in
Basically, the decoding apparatus has a structure similar to the block diagram of the encoding apparatus shown in
The above-described image or video encoding and decoding operation can also be implemented by a computer and a software program. Such a computer program may be provided by storing it in a computer-readable storage medium, or by means of a network.
In accordance with the present invention, when processing a difference between a predicted value and an original pixel value, no “absence of the predicted value as important information” in the conventional method occurs, but the predicted value is fully used for the encoding, thereby encoding an image (static image or video image) with a reduced amount of code.
Number | Date | Country | Kind |
---|---|---|---|
2007-281556 | Oct 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/069257 | 10/23/2008 | WO | 00 | 4/12/2010 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/057506 | 5/7/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5991449 | Kimura et al. | Nov 1999 | A |
6125201 | Zador | Sep 2000 | A |
6865291 | Zador | Mar 2005 | B1 |
6909808 | Stanek | Jun 2005 | B2 |
8121190 | Li | Feb 2012 | B2 |
20060188020 | Wang | Aug 2006 | A1 |
20060291567 | Filippini et al. | Dec 2006 | A1 |
20070036222 | Srinivasan et al. | Feb 2007 | A1 |
Number | Date | Country |
---|---|---|
1 833 256 | Sep 2007 | EP |
3-145887 | Jun 1991 | JP |
09-084022 | Mar 1997 | JP |
2006-229623 | Aug 2006 | JP |
2 162 280 | Jan 2001 | RU |
2 191 469 | Oct 2002 | RU |
200627969 | Aug 2006 | TW |
03101117 | Dec 2003 | WO |
20061095501 | Sep 2006 | WO |
2007010690 | Jan 2007 | WO |
Entry |
---|
Linde et al, “An Algorithm for Vector Quantizer Design”, IEEE Transactions on Communications, Jan. 1980, pp. 84-95. |
Fischer, “A Pyramid Vector Quantizer”, IEEE TRansactions on INformation Theory, Jul. 1986, pp. 568-583. |
T. R. Fischer, “A pyramid vector quantizer”, IEEE Trans. Inform. Theory, vol. IT-32, No. 4, pp. 568-583, Jul. 1986. |
Y. Linde, A. Buzo and R. M. Gray, “An algorithm for vector quantizer design”, IEEE Trans. on Communications, vol. com-28, No. 1, pp. 84-95, Jan. 1980. |
Marpe, Detlev, et al., “Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, No. 7, Jul. 2003, pp. 620-636. |
Number | Date | Country | |
---|---|---|---|
20100215102 A1 | Aug 2010 | US |