The present application claims priority to the corresponding Japanese Application No. 2003-125667, filed on Apr. 30, 2003, the entire contents of which are hereby incorporated by reference.
1. Field of the Invention
The present invention generally relates to conversion and encoding of signals, such as image signals, and specifically relates to generation of encoded data by conversion and encoding, and recompression of the encoded data.
2. Description of the Related Art
In conversion and encoding of an image using wavelet transform, technology is disclosed by Japanese Patent Publication No. JP 6-326990 A, wherein a greater number of smaller quantization steps are provided to a lower frequency subband than a higher frequency subband that is provided with a lesser number of larger (wider) quantization steps such that human vision properties are adequately reflected when linear quantization of a wavelet coefficient is performed.
Further, in order to minimize the mean square value of errors generated in a signal after reverse frequency conversion of the subband obtained by decoding a signal that is encoded by conversion and encoding, technology that uses an inverse value (or an integral multiple value thereof) of the square root of subband gain as the step size for linear quantization of each subband in the case of encoding is disclosed by J. Katto and Y. Yasuda, “Performance Evaluation of Subband Encoding and Optimization of its Filter Coefficients,” Journal of Visual Communication and Image Representation, vol.2, pp.303-313, December 1991.
As for human vision properties, a measurement example of human vision sensitivity is disclosed by J. Katto and Y. Yasuda, “Performance Evaluation of Subband Encoding and Optimization of its Filter Coefficients,” Journal of Visual Communication and Image Representation, vol.2, pp.303-313, December 1991. Further, a standard document of JPEG 2000 (refer to, for example, Yasuyuki Nomizu, “Next-Generation Image Encoding Method JPEG 2000,” Triceps, Inc., Feb. 13, 2001) provides an example of weights of subbands based on the human vision sensitivity, details of which are disclosed by Marcus J. Nadenau and Julien Reichel, “Opponent Color, Human Vision and Wavelets for Image Compression,” Proceedings of the Seventh Color Imaging Conference pp.237-242, Scottsdale, Ariz., Nov. 16-19, 1999, IS&T.
Generally, a process of conversion and encoding includes frequency conversion of original signals to subbands, quantization of frequency domain coefficients constituting the subbands, and entropy encoding of the quantized coefficients, which are performed in this sequence, and is referred to as Procedure 100. Here, the subband is a group of the “frequency domain coefficients” that are classified for each of predetermined frequency bands. The “frequency domain coefficients,” which are also called frequency coefficients or coefficients, are DCT coefficients if the frequency conversion is carried out by DCT (discrete cosine transform), and wavelet coefficients if the conversion is carried out by wavelet transform. Further, as is widely known, the quantization is carried out to raise the compression ratio of data, and a typical method is linear quantization wherein coefficients are divided by a constant that is called the step size. An example of this type of conversion encoding is disclosed by Yasuyuki Nomizu, “Next-Generation Image Encoding Method JPEG 2000,” Triceps, Inc., Feb. 13, 2001.
Now, given that the frequency coefficients are quantized and entropy encoded by Procedure 100, when the compression ratio of encoded data is desired to be raised, decoding the entropy encoded signal, de-quantization of the frequency coefficients that are decoded, re-quantization of the de-quantized frequency coefficients, and entropy encoding have to be performed in this sequence, which is called Procedure 101. This poses a problem in that, in addition to Procedure 101 being redundant, errors at the time of de-quantization have effect at the time of re-quantization, and there is a problem of producing cumulative errors.
To cope with the problem, in recent years an encoding method, which is also known as a “post quantization” method, enabling recompression without decoding the encoded signals has been proposed. Since the recompression is performed not by decoding the encoded signal, but by discarding unnecessary codes in the state of the entropy code, cumulative errors do not occur. A representative example of the post quantization method is JPEG 2000. In such a “recompression-able” encoding method as above, first, lossless (or almost lossless) encoded data are generated and held, and then, the encoded data are recompressed at a desired compression ratio by discarding unnecessary codes as desired.
In order to enable recompression by discarding codes, a method called “bit plane encoding” is used, wherein frequency coefficients are decomposed into bit planes, and each bit plane is independently encoded. In bit plane encoding, compression is performed by outputting only selected codes of high-order bit planes, which is implemented by one of the following processes:
(i) entropy encoding is performed on only selected high-order bit planes; and
(ii) entropy encoding is performed on bit planes beyond necessity (typically, all bit planes), and the entropy codes of selected low-order bit planes are discarded.
The implementation referred to above as (ii) finally outputs only the codes of selected high-order bit planes, and is the recompression. In bit plane encoding, compression is fundamentally realized by discarding bit planes, or entropy codes thereof, not by linear quantization of the coefficients. Further, as mentioned above, the post quantization can be performed either in the encoding process, or in a separate process after completing the encoding. In this specification, “post quantization” means both cases.
Now, in either case of (i) and (ii) above, a problem yet to be solved is how required high-order bit planes (or unnecessary low-order bit planes) are determined such that objectives, such as minimizing a mathematical quantization error, and optimizing subjective quality of the image, are met. This is discussed in more detail.
First, the case wherein required high-order bit planes (or, unnecessary low-order bit planes) are determined such that a mathematical quantization error (mean-square value of errors) is minimized at a given compression ratio is considered.
When the entropy encoded data are decoded, the procedure 100 is followed in the reverse sequence. Specifically, the quantized frequency coefficients are de-quantized, put into a reverse frequency conversion process, and signal values are reproduced. Here, in the reverse frequency conversion process, “a gain when the frequency coefficients are de-converted to the signal values” is different for every subband. Subband gain Gs is defined as the “square of the gain.” An error Δe generated by quantization of the frequency coefficients is multiplied by the square root of the subband gain through the inverse transform for reproducing the signals, and is represented by {square root}{square root over ( )}Gs×Δe.
As disclosed by the non-patent reference 2, generally, in order to minimize the mean square errors generated in a signal after the inverse transform (the signal consisting of multiple signal values) at a given compression ratio, a simple encoding method is to perform linear quantization of each subband by the inverse value (or a value equal to the inverse value multiplied by a constant) of the square root of the subband gain. Accordingly, in the case of a conventional encoding method that does not use bit plane encoding, if coefficients are quantized by the step size (or a value equal to the step size multiplied by a constant), which is in inverse proportion to the square root of the subband gain, the mean square errors are minimized.
Now, a typical flow of the process using 5×3 wavelet transform in JPEG 2000 includes wavelet transform of an original signal to subbands, and only required high-order bit planes (or high-order sub bit planes) of wavelet coefficients are encoded for every subband, which are performed in this sequence, and called Procedure 102. Here, the sub bit planes are subsets of bit planes.
As described above, linear quantization is not performed according to the method using 5×3 wavelet transform. For this reason, the technique and means for minimizing the mean square error concerning the signal after the inverse transform of the linear quantization cannot be applied. Rather, in the case of the bit plane encoding, technique and means for determining required high-order bit planes (or unnecessary low-order bit planes) that generate the minimum mean square error have not been clarified. Much less, when a bit plane is divided into two or more subsets (i.e., sub bit planes), and encoding is performed for every sub bit plane, the technique and means for determining required high-order bit planes (or unnecessary low-order bit planes) that generate the minimum mean square error are further unclear. This is another problem to be solved.
Further, a typical flow of the process using 9×7 wavelet transform in JPEG 2000 includes wavelet transform of an original signal to subbands, linear quantization of wavelet coefficients for every subband, and encoding only required high-order bit planes (or high-order sub bit planes) of the quantized wavelet coefficients for every subband, which are performed in this sequence, and called Procedure 103.
In this case, “linear quantization of the coefficients by the step size that is in inverse proportion to the square root of the subband gain” is possible. However, performing linear quantization at the encoding stage is not suitable for the purpose of obtaining “coded data of a desired compression ratio by generating and holding lossless (or almost lossless) encoded data, and by discarding unnecessary codes as desired.” While it is desirable to minimize quantization in the encoding stage, and to perform post quantization thereafter when using the 9×7 wavelet transform, the technique and means for minimizing the mean square errors generated in the signal after an inverse transform are not clear. Much less, the technique and means in the case of encoding for every sub bit plane are even less clear. This poses another problem to be solved.
Next, obtaining “the optimal quality of image for a given compression ratio” is considered.
As indicated by the patent reference 1, human vision is more sensitive to a lower frequency region than a higher frequency region. Accordingly; the human vision sensitivity is higher for quantization errors in lower frequency subbands than in higher frequency subbands. Therefore, an effective method for linear quantization of wavelet coefficients includes a smaller step size to lower frequency subbands, and a larger step size to higher frequency subbands such that the human vision sensitivity is properly reflected in the linear quantization process, as Yasuyuki Nomizu, “Next-Generation Image Encoding Method JPEG 2000,” Triceps, Inc., Feb. 13, 2001 discloses.
Although this method cannot be applied to the case wherein 5×3 wavelet transform is used by JPEG 2000, it can be applied to the 9×7 wavelet transform such that “coefficients are quantized with the step size in inverse proportion to the magnitude of the human vision sensitivity corresponding to the frequency of subbands.” However, it is not suitable for achieving the objective to “obtain data at a desired compression ratio by generating and holding lossless (or almost lossless) encoded data, and by discarding unnecessary codes afterwards.” While it is also desirable to minimize quantization at the encoding step, and to perform post quantization afterwards, when using 9×7 wavelet transform, the technique or means for determining required high-order bit planes or high-order sub bit planes (alternatively, unnecessary low-order bit planes and low-order sub bit planes) so that the optimal quality of image can be visually obtained in the case of the post quantization are not clear. This poses another problem to be solved.
Further, considering that the human vision property is sensitive to “the quantization errors of pixels, not errors of frequency conversion coefficients,” it is desirable that both the human vision sensitivity and square roots of subband gain be considered at the post quantization. In addition, in bit plane encoding, discarding codes of n low-order bit planes (representing frequency coefficients) has the same effect as carrying out linear quantization of the frequency coefficients by 2 to the n-th power, and this is the reason for the process being called post quantization.
An encoded data generation apparatus and a method, a program, and an information recording medium are described. In one embodiment, the encoded data generation apparatus for generating encoded data by carrying out frequency conversion of an input image signal to a plurality of subbands, and carrying out bit plane encoding of each of the subbands, comprises a selection unit to select low-order bit planes or low-order sub bit planes, codes corresponding to which are not to be output to the encoded data, based on a value (a) that is one of an inverse value of the square root of the gain of the inverse transform of the frequency conversion of each of the subbands; an inverse value of human vision sensitivity; and an inverse value of a product of the square root of the gain of the inverse transform and the human vision sensitivity of each of the subbands; wherein codes corresponding to greater numbers of the low-order bit planes or the low-order sub bit planes of each of the subbands are not output to the encoded data, the greater the value (a) of the subband is.
Accordingly, embodiments of the present invention include an apparatus and a method for conversion and encoding of a signal to codes, and for recompression of the conversion encoded codes, which apparatus and method substantially obviate one or more of the problems caused by the limitations and disadvantages of the related art.
Features and advantages of embodiments of the present invention are set forth in the description that follows, and in part will become apparent from the description and the accompanying drawings, or may be learned by practice of the invention according to the teachings provided in the description. Embodiments as well as other features and advantages of the present invention will be realized and attained by an apparatus and a method for conversion and encoding of a signal to codes, and for recompression of the conversion encoded codes particularly pointed out in the specification in such full, clear, concise, and exact terms as to enable a person having ordinary skill in the art to practice the invention.
To achieve these and other advantages and in accordance with the purpose of the invention, as embodied and broadly described herein, the invention provides as follows.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein encoded data are generated by carrying out frequency conversion of an input signal to two or more subbands, and bit plane encoding of each subband; a value (a) is defined based on properties of each subband, specifically, by one of the following, namely, (i) an inverse value of the square root of the gain of inverse transform, which is the inverse operation of the frequency conversion, (ii) an inverse value of human vision sensitivity, and (iii) an inverse value of the product of the square root of the gain of the inverse transform and the human vision sensitivity; low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output to encoded data, are selected based on the value (a) such that the greater the value (a) of a subband is, the greater is the number of low-order bit planes and low-order sub bit planes of the subband that are discarded.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein encoded data, which are obtained by carrying out frequency conversion of an input signal to two or more subbands, and carrying out bit plane encoding of each subband, are treated as an input signal for recompression. Recompression is carried out in the same manner as described above for encoding.
Data that are encoded, and recompressed, if applicable, in the manner described above reproduce the input image (original image) at a satisfactory subjective quality level having few mean square errors.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, similar to those described above, wherein a subband that is obtained by the frequency conversion of the input signal is quantized, and then bit plane encoding is carried out. In this case, the value (a) is defined based on properties of each subband, specifically, by one of the following, namely, (i) an inverse value of the product of the square root of the gain of the inverse transform, which is the reverse operation of the frequency conversion, and the quantization step size, (ii) the inverse value of the product of the human vision sensitivity and the quantization step size, and (iii) an inverse value of the product of the square root of the gain of the inverse transform, the human vision sensitivity and the quantization step size. Then, the selection unit and the selection process select low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output to encoded data, based on the value (a) such that the greater the value (a) of a subband is, the greater is the number of low-order bit planes and low-order sub bit planes of the subband that are discarded.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein encoded data, which are obtained by carrying out frequency conversion of an input signal to two or more subbands, carrying out quantization of each subband, and carrying out bit plane encoding of each subband, are treated as an input signal for recompression. The recompression is performed in the same manner as described above for encoding with quantization.
Data that are encoded, and recompressed, if applicable, in the manner described above reproduce the input image (original image) at a satisfactory subjective quality level having few mean square errors.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, which are capable of handling a signal that contains multiple components.
In the case where the signal that is to be encoded contains multiple components, such as a color image, an encoding process generally includes component conversion of the signal of the original image (color conversion), frequency conversion of the signal to subbands for every component, quantization of frequency-domain coefficients that constitute each subband, and entropy encoding of the quantized coefficients, which are performed in this sequence. Here, as an example of the component conversion, RCT (reversible multiple component transform), and ICT (irreversible multiple component transform) adopted by JPEG 2000 are available.
Conversion (forward transform) and the inverse transform of RCT are expressed with the following formula.
Conversion (forward transform):
Y0(x,y)=floor(I0(x,y)+2*(I1(x,y)+I2(x,y))/4)
Y1(x,y)=I2(x,y)−I1(x,y)
Y2(x,y)=I0(x,y)−I1(x,y)
Inverse-transform:
I1(x,y)=Y0(x,y)−floor(Y2(x,y)+Y1(x,y))/4)
I0(x,y)=Y2(x,y)+I1(x,y)
I2(x,y)=Y1(x,y)+I1(x,y) (1),
wherein I represents the original signal, and Y represents the signal after conversion. In the case of an RGB signal, for example, if the original signal I is expressed as being constituted by 0=R, 1=G, and 2=B, then the Y signal is expressed as 0=Y, 1=Cb, and 2=Cr.
Conversion and the inverse transform of ICT are expressed by the following formula.
Conversion:
Y0(x,y)=0.299*I0(x,y)+0.587*I1 (x,y)+0.144*I2(x,y)
Y1(x,y)=−0.16875*I0(x,y)−0.33126*I1(x,y)+0.5*I2(x,y)
Y2(x,y)=0.5*I0(x,y)−0.41869*I1(x,y)−0.08131*I2(x,y)
Inverse transform:
I0(x,y)=Y0(x,y)+1.402*Y2(x,y)
I1(x,y)=Y0(x,y)−0.34413*Y1(x,y)−0.71414*Y2(x,y)
I2(x,y)=Y0(x,y)+1.772*Y1(x,y) (2),
wherein I represents the original signal, and Y represents the signal after conversion. In the case of an RGB signal, for example, if the original signal I is expressed as being constituted by 0=R, 1=G, and 2=B, then the Y signal is expressed as 0=Y, 1=Cb, and 2=Cr.
As seen from the formulas (1) and (2), when reverse component conversion of each component value is performed to reproduce the original signal value, scale factors of an error generated in the reproduced original signal value due to errors generated in each component value differ for every component. The square of the scale factor is called the gain of the inverse transform of the component conversion, and is expressed as the reverse component conversion gain Gc. An error Δe generated in the frequency coefficient by quantization is multiplied by the square root of the reverse component conversion gain, resulting in {square root}{square root over ( )}Gc *Δe, which causes the same influence as the subband gain as described above.
Accordingly, other embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, for generating encoded data of an input signal containing multiple components in consideration of the influence of the reverse component conversion gain. This is realized by performing component conversion, frequency conversion to obtain multiple subbands, and bit plane encoding of each subband of each component in this sequence. Therein, the selection unit and the selection process define the value (a) based on properties of each subband of each component, namely, one of (i) an inverse value of the product of the square root of the gain of the inverse transform of the frequency conversion, and the square root of the gain of the inverse transform of the component conversion, (ii) an inverse value of the product of the square root of the human vision sensitivity and the gain of the inverse transform of and the component conversion, and (iii) an inverse value of the product of the square root of the gain of the inverse transform of the frequency conversion, the human vision sensitivity, and the square root of the gain of the inverse transform of the component conversion. Then, the selection unit and the selection process. select low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output to encoded data, based on the value (a) such that the greater the value (a) of a subband is, the greater is the number of low-order bit planes and low-order sub bit planes of the subband that are discarded.
Further, embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, for recompressing the signal containing multiple components. The recompression is performed in the same manner as described above for encoding.
Data that are encoded, and recompressed, if applicable, in the manner described above reproduce the input image (original image) containing multiple components at a satisfactory subjective quality level having few mean square errors.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, for generating encoded data of a multi-component signal by carrying out bit plane encoding after quantizing each subband of each component at a quantization step size, the subband being obtained by frequency conversion after component conversion. Therein, the selection unit and the selection process define the value (a) based on properties of each subband of each component, namely, one of (i) an inverse value of the product of the square root of the gain of the inverse transform of the frequency conversion, the square root of the gain of the inverse transform of the component conversion, and the quantization step size, (ii) an inverse value of the product of the human vision sensitivity, the square root of the gain of the inverse transform of the component conversion, and the quantization step size, and (iii) an inverse value of the product of the square root of the gain of the inverse transform of the frequency conversion, the human vision sensitivity, the square root of the gain of the inverse transform of the component conversion, and the quantization step size. Then, the selection unit and the selection process select low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output to encoded data, based on the value (a) such that the greater the value (a) of a subband is, the greater is the number of low-order bit planes and low-order sub bit planes of the subband that are discarded.
Further, embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, for recompressing the signal containing multiple components, using quantization. The recompression is performed in the same manner as described above for encoding.
Data that are encoded, and recompressed, if applicable, in the manner described above, including the quantization process, reproduce the input image (original image) containing multiple components at a satisfactory subjective quality level having few mean square errors.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, with or without the recompression functions, wherein the number of low-order bit planes, codes corresponding to which are not to be output and the number of low-order sub bit planes, codes corresponding to which are not to be output are proportional to the value (a). In this manner, the input image (original image) is reproduced at a satisfactory subjective quality level having few mean square errors.
Embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, for selecting a combination pattern of low-order bit planes, codes corresponding to which low-order bit planes are not to be output, by “selecting a sheet of bit plane from the least-significant-bit side of the subband that takes the greatest value (a), and the greatest value is halved,” and these processes are repeated. In this manner, the input image (original image) is reproduced at a satisfactory subjective quality level having few mean square errors.
In addition, the combination pattern of the low-order bit planes, codes corresponding to which are not to be output, determined by the above process refers not only to all the patterns, but also to subsets thereof. Furthermore, the pattern can be determined by one of performing the encoded data generation process, referring to a table, and the like that are beforehand prepared. These two points are applicable to other implementations of embodiments of the present invention.
While the above process is for determining the combination pattern of the low-order bit planes, codes corresponding to which are not to be output, the process can be expanded to the case wherein a bit plane is divided into n sub bit planes for encoding. In this case, each of the n sub bit planes is conceptually considered as having n sheets of bit planes, there being a hierarchical relation of high-order sub bit planes and low-order sub bit planes. When the process is expanded as above, treating the n sub bit planes, also called n sheets of sub bit planes equally is easier than otherwise. Embodiments of the present invention also provide an encoded data generation apparatus, and a method thereof wherein the n sub bit planes are equally treated.
That is, other embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein each bit plane is divided into n sub bit planes, and then the n sub bit planes are encoded by bit plane encoding. Therein, the selection unit and the selection process select low-order sub bit planes, codes corresponding to which are not to be output, by referring to a combination pattern of lower-order sub bit planes that are determined by “selecting a sub bit plane of a sub band, the value (a) of which subband is the greatest, from the least-significant-bit side, and dividing the value (a) by 21/n,” which process is repeated. In this manner, code output can be finely controlled in units of sub bit planes, and encoding and recompression providing a satisfactory subjective quality level having few mean square errors at various compression ratios are realized.
Conversely, it is also possible to treat the n sheets of sub bit planes unequally, assigning different priorities between high order and low-order sub bit planes. When a bit plane is divided into n sub bit planes, a rate distortion slope (which is a ratio of “increment in the quantization error by not encoding a certain sub bit plane/decrement in the amount of codes by not encoding the sub bit plane”) is not equal among the sub bit planes. Rather, in a general encoding method, it is designed such that the absolute value of the rate distortion slope become smaller for the low-order sub bit plane than for the high-order sub bit plane. This is because it is desirable that the bit encoding property be such that the absolute value of the rate distortion slope continually increases as codes are sequentially discarded from a low-order bit plane.
In view of the above, embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein the rate distortion slope is considered. Here, low-order sub bit planes, codes corresponding to which are not to be output are determined by a combination pattern of sub bit planes, codes corresponding to which are not to be output, determined by a process that follows. Each bit plane is divided into n sub bit planes, and each sub bit plane is encoded by bit plane encoding. The selection unit and process herein define numerical sequence Ej(0<=j<n) that fulfills ΣEj=1 (the sum is taken for all j's) and Ej<=Ej+1 for every subband; and repeats “selecting a sub bit plane of a subband i, the value (a) of which is the greatest from the least-significant-bit side, dividing the value (a) by 2Eij(S), and incrementing j by 1 (however, j is made 0 (j=0) when j=n−1),” where Eij represents Ej of the subband i.
In JPEG 2000, a bit plane can be divided into three sub bit planes, which are then encoded. In this connection, other embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, wherein a bit plane is divided into three sub bit planes, which are then encoded with parameters being n=3, Ei0=5/18, Ei1=6/18, and Ei2=7/18.
Now, when determining low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output, there are cases where two or more subbands take the same value (a) that is determined to be the greatest. This can happen because the subband gain, the human vision sensitivity, and the quantization step size can be equal among two or more subbands. In the case of a color image containing multiple components, the gain of reverse component conversion may be equal in plural subbands. Embodiments of the present invention include an encoded data generating apparatus and a method thereof, including a selection unit and a selection process, respectively, for solving the problem of the multiple greatest values.
Accordingly, other embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, that select a subband that has the highest frequency among subbands that have the same value (a) that is the greatest.
Further, other embodiments of the present invention include an encoded data generation apparatus and a method thereof, including a selection unit and a selection process, respectively, that select a subband that has the lowest human vision sensitivity among subbands that have the same value (a) that is the greatest.
In the following, embodiments of the present invention are described with reference to the accompanying drawings.
Since one embodiment of the present invention is suitably applicable when JPEG 2000 is used as an encoding method, the following descriptions are presented about cases wherein JPEG 2000 is used. However, embodiments of the present invention are also applicable to encoding methods other than JPEG 2000.
In
The decoding process of the encoded data of JPEG 2000 is a reverse process of the encoding process described above. That is, the encoded data are decomposed (decoded) into a code sequence of each tile of each component based on the tag information. The code sequence is entropy-decoded to obtain wavelet coefficients. Further, if the 9×7 wavelet transform is used in encoding, the wavelet coefficients are de-quantized. Then, inverse wavelet transform is performed on the de-quantized wavelet coefficients, and each tile image of each component is reproduced. Further, if component conversion is performed at the time of encoding, reverse component conversion is carried out on each tile image.
The encoded data generation method according to one embodiment of the present invention includes process steps corresponding to the means shown by
Further,
The encoded data generation method according to one embodiment of the present invention includes process steps corresponding to the means shown by
Further,
The encoded data generation method according to one embodiment of the present invention includes process steps corresponding to the means shown by
Further,
The encoded data generation method according to one embodiment of the present invention includes process steps corresponding to the means shown by
Further,
The encoded data generation method according to a previously described embodiment of the present invention includes processing steps corresponding to the means shown by
The encoded data generation apparatus and the encoded data generation method according to the embodiments of the present invention can be realized either by hardware only, or by software using a computer, such as a personal computer and a microcomputer.
Realization of embodiments of the present invention by software using a computer is explained with reference to
In the case of the encoded data generation apparatus and the method thereof according to previously-described embodiments, image data are read from the hard disk drive unit 252 to a memory area 254 of the RAM 251. These image data are provided to the CPU 250, and encoded data are generated by the CPU 250 processing the image data. The encoded data are temporarily written in another area 255 of the RAM 251, and are provided to and held by the hard disk drive unit 252.
In the case of the encoded data generation apparatus and the method thereof according to one embodiment of the present invention, encoded data are read from the hard disk drive unit 252 to the area 254 of the RAM 251. Then, the CPU 250 recompresses the encoded data, the recompressed encoded data are written in the area 255 of the RAM 251, and the recompressed encoded data are provided to and held by the hard disk drive unit 252.
Below, the wavelet transform and inverse transform thereof according to JPEG 2000 are explained.
A process of two-dimensional wavelet transform, called 5×3 conversion, of a monochrome image of 16×16 pixels adopted by JPEG2000 is explained referring to
In JPEG 2000, first, high-pass filtering is applied to each of odd-numbered pixels, namely, P(y), where y=2i+1, serving as the center pixel sandwiched by two adjacent pixels, and coefficients C(2i+1) are obtained. Next, low-pass filtering is applied to each of even-numbered pixels, namely P(y), where y=2i, serving as the center pixel sandwiched by two adjacent pixels, and coefficients C(2i) are obtained. This process is repeated for all X coordinates. Here, the high-pass filtering and the low pass filtering are expressed by the following formulas (3) and (4), respectively. In the formulas, “floor(x)” is a floor function, the value of which function is defined as an integer that is the closest to x, but not exceeding x. Here, as for the two ends of the image, namely P(0) and P(15), only one pixel that is adjacent to the center pixel is present; in this case, a pixel value is appropriately defined by a predetermined rule; however, the explanation is omitted.
C(2i+1)=P(2i+1)−floor((P(2i)+P(2i+2))/2) [step1] (3)
C(2i)=P(2i)+floor((C(2i−1)+C(2i+1)+2)/4) [step2] (4)
For simplicity, if the coefficients obtained by the high-pass filtering are expressed by H, and the coefficients obtained by the low-pass filtering are expressed by L, the image of
Then, high-pass filtering is applied to odd-numbered coefficients in the horizontal direction X of the coefficient array of
For simplicity, the coefficients obtained by the low-pass filtering of the L coefficients are called LL, the coefficients obtained by the high-pass filtering of the L coefficients are called HL, the coefficients obtained by the low-pass filtering of the H coefficients are called LH, and the coefficients obtained by the high-pass filtering of the H coefficients are called HH. Then, the coefficient array of
In this manner, one phase of the wavelet transform (i.e., decomposition) is completed. If the LL coefficients are exclusively collected (i.e., if the coefficients are collected and arranged as shown by
Subsequent wavelet transform, which is the second phase wavelet transform, is considered with the LL subband being the target. The second phase wavelet transform is carried out on the target LL subband in the same manner as described above.
The inverse transform of the 5×3 wavelet transform is performed as follows. The coefficient array such as shown by
P(2i)=C(2i)−floor((C(2i−1)+C(2i+1)+2)/4) [step1] (5)
P(2i+1)=C(2i+1)+floor((P(2i)+P(2i+2))/2) [step2] (6)
The process descried above converts the coefficient array shown by
As mentioned above, when the 5×3 wavelet transform is used, the coefficients that constitute a subband are not quantized. Conversely, the wavelet transform called 9×7 can also be used in JPEG 2000. In this case, linear quantization is performed for every subband (an example of the step size is mentioned later).
The coefficients obtained by the wavelet transform described above are encoded by bit plane encoding. According to JPEG 2000, wavelet coefficients of sub bit planes can be encoded from high order bit (MSB) to low order bit (LSB) for every subband.
Suppose that the coefficients of a 2LL subband of
In JPEG 2000, a bit plane is classified (divided) into three sub bit planes, which are also called processing passes or encoding passes, and encoding is performed for every sub bit plane. Namely, the sub bit planes, or the encoding passes consist of a significance propagation pass (pass for encoding a coefficient that is not significant, but has significant coefficients in the circumference), a magnitude refinement pass (pass for encoding a significant coefficient), and a cleanup pass (pass for encoding the remaining bits that do not correspond to the above passes).
Nevertheless, as a result of a classification, there may be no bit that belongs to a specific sub bit plane (coding pass) in a bit plane. In this case, an empty sub bit plane is generated. The bit planes of MSBs always contain only cleanup passes.
In the case of the 2LL subband shown by
Here, “significant” means a state where it is known that a target coefficient is not 0 in the encoding process so far; in other words, the target coefficient is already encoded as 1. Conversely, “not significant” means a state where the coefficient value is 0, or has a possibility of being 0; in other words, the coefficient has not been encoded as 1.
In encoding, scanning is performed from the MSB of a bit plane, and downward to the LSB, and based on whether a significant coefficient (i.e., not 0) is present in the bit plane. Three encoding passes are not performed until a significant coefficient appears. The number of bit planes that consist of only non-significant coefficients is stored in the packet header. The number is used for structuring non-significant bit planes, and for restoring the dynamic range of the coefficient at the time of decoding. Actual encoding is started from the bit plane in which a significant bit first appears, and the bit plane is first processed by the cleanup pass. Then, the process is advanced to lower-order bit planes one by one using the three encoding passes.
Now, since sub bit plane encoding is performed from high-order bit to low-order bit, a code sequence is generated, which is configured as shown by an example shown by
Next, subband gain is explained. The case of 5×3 inverse wavelet transform is discussed. The floor functions of the formulas (5) and (6) are removed, and the following approximate expressions, formulas (7) and (8), are obtained.
From the formulas (7) and (8), the five following formulas are obtained.
P(2i−1)=−⅛×C(2i−3)+½×C(2i−2)+¾×C(2i−1)+½×C(2i)−⅛×C(2i+1)−½
P(2i)=C(2i)−¼×C(2i−1)−¼×C(2i+1)−½
P(2i+1)=−⅛×C(2i−1)+½×C(2i)+¾×C(2i+1)+½×C(2i+2)−⅛×C(2i+3)−½
P(2i+2)=C(2i+2)−¼×C(2i+1)−¼×C(2i+3)−½
P(2i+3)=−⅛×C(2i+1)+½×C(2i+2)+¾×C(2i+3)+½×C(2i+4)−⅛×C(2i+5)−½
When a quantization error amounting to 1 arises in an odd-numbered high-pass coefficient, namely C(2i+1), the above five upper formulas show influences to 5 pixels of P(2i−1) through P(2i+3). Assuming that the five errors in the five pixels are independent, the RMS error value of the five errors is equal to the square root of {(−⅛)2+(−¼)2+(¾)2+(−¼)2+(−⅛)2}=0.85. That is, an error amounting to 1 in a high-pass coefficient is equivalent to the RMS error 0.85 of a pixel value. This is the square root of the gain of one phase of reverse high-pass filtering.
Similarly, when a quantization error amounting to 1 arises in an even-numbered low-pass coefficient C(2i), the above formulas show that the error affects three pixels, namely P(2i−1) through P(2i+1), and the RMS error value of the errors generated in the three pixels is equal to the square root of {(½)2+12+(½)2}=1.1. That is, an error amounting to 1 of a low-pass coefficient is equivalent to the RMS error 1.1 of a pixel value. This is the square root of the gain for one phase of reverse low-pass filtering.
In the case of 2-dimensional inverse wavelet transform, it is necessary to apply two phases of reverse low-pass filtering to the inverse transform of the LL coefficients. For this reason, the RMS error value of the errors generated in a pixel when a quantization error 1 arises in an LL coefficient becomes 1.1×1.1. As for the inverse transform of the HL coefficients, it is necessary to apply a phase of reverse low-pass filtering, and a phase of reverse high-pass filtering. For this reason, the RMS error value of the errors generated in a pixel when a quantization error 1 arises in an HL coefficient becomes 1.1×0.85.
Similar calculations are performed for other coefficients, and the values of RMS error (square root of subband gain) caused for a pixel generated by the unit quantization error of coefficients of each subband in the case of the decomposition level 2 become as shown by
As described above, in order to minimize the mean square errors generated in the signal after inverse transform, a simple method is to perform linear quantization of each subband by the inverse value (or a multiple thereof) of the square root of the subband gain. Accordingly, what is necessary is to obtain the number of low-order bit planes and the number of low-order sub bit planes, codes corresponding to which are not to be output in bit plane encoding (either encoding is to be omitted, or encoding is performed and generated codes are to be discarded) with reference to
The number of low-order bit planes, the codes corresponding to which are not to be output, is obtained by the following formula (9).
The number of bit planes=k×log2(1/{square root}Gs) (9)
Here, the inverse value of the square root of subband gain is expressed as 1/{square root}Gs, and k is a constant. Further, since the number of bit planes is an integer, it is necessary to round a calculation result to obtain an integer by rounding off, etc. An example of the number of low-order bit planes, codes corresponding to which are not to be output in the case of k=5 is shown by
Further, the number of low-order sub bit planes, codes corresponding to which are not to be output is obtained by the following formula (10).
The number of sub bit planes=k×log2{circumflex over ( )}1/3(1/{square root}Gs) (10)
Here, the inverse value of the square root of subband gain is expressed as 1/{square root}Gs, and k is a constant. Further, since the number of the sub bit planes is an integer, the calculation result is rounded to an integer. In addition, the base of the logarithm of the formula (10) is 21/3.
The example of the number of low-order sub bit planes, codes corresponding to which are not to be output, in the case of k=5, is shown by
Here, the compression ratio becomes high as the constant k in the formulas (9) and (10) becomes greater. That is, the constant k can be selected according to a desired compression ratio.
The selection unit 205 (and the correspondence process step) according to one embodiment of the present invention as shown by
Next, human vision sensitivity is explained.
The standard document of JPEG 2000 provides constants (weights) based on the human vision sensitivity as shown by
Depending on methods for measuring the human vision sensitivity, the sensitivity may contain gain of the reverse component conversion. In that case, it is necessary to consider that such human vision sensitivity is the product of the square root of original human vision sensitivity and the square root of the gain of the reverse component conversion. The weights shown by
Accordingly, the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output are obtained by substituting the inverse value of the values shown by
In the case that the number of the low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output are obtained based on the inverse value of “the product of the human vision sensitivity and the square root of subband gain,” the values shown by
According to the embodiment of the present invention, the selection unit 205 (and the corresponding process step) shown by
When using 9×7 wavelet transform in JPEG 2000, linear quantization for every subband can be carried out.
In the case that 9×7 wavelet transform is used for encoding, but linear quantization is not performed, the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output are determined based on the inverse value of the square root of the subband gain, that is, the values shown by
Next, the cases wherein encoding is based on 9×7 wavelet transform, and linear quantization is performed is explained. According to one embodiment, the inverse value of the product of the step size and the square root of subband gain is used to determine the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output. For this purpose, the inverse value of the product of the value of
The second of the cases, wherein encoding is carried out by 9×7 wavelet transform with linear quantization, uses the inverse value of the product of the square root of the subband gain, the human vision sensitivity, and the step size for determining the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output. For this purpose,
According to a previously-described embodiment, wherein encoding is performed by 9×7 wavelet transform with linear quantization, the inverse value of the product of the human vision sensitivity and the step size is used for determining the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output. For this purpose, the inverse value of the product of the value of
Next, the gain of reverse component conversion (such as reverse ICT and reverse RCT) is explained. The gain is a sum of mean square errors of the RGB values, the errors occurring due to the unit error of each component. As it is clearly understood from the derivation process of the subband gain, or the inverse transform formula of RCT and ICT, the square root of the gain of reverse ICT and the square root of the gain of reverse RCT take values as shown by
Accordingly, when encoding is performed using component conversion (ICT or RCT), the inverse value of the product of the square root of the gain of reverse component conversion, and the square root of subband gain, or alternatively, the inverse value of the product of the square root of the gain of reverse component conversion, the square root of subband gain, and the step size is used for determining the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bits, codes corresponding to which are not to be output. For this purpose, values shown by one of
The standard document of JPEG 2000 also illustrates the weights of Cb component and Cr component as shown by
The inverse value of the product of the square root of the gain of reverse component conversion and the human vision sensitivity, or alternatively, the inverse value of the product of the square root of the subband gain, the square root of the gain of reverse component conversion, and the human vision sensitivity can also be used for determining the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output. In this case, the values of
In the case of performing 9×7 wavelet transform with linear quantization, using ICT as component conversion, the inverse value of the product of the square root of subband gain, the human vision sensitivity, the step size, and the square root of the gain of reverse component conversion is calculated for each component, which calculation result is shown by
Similarly, the number of low-order bit planes, codes corresponding to which are not to be output, and the number of low-order sub bit planes, codes corresponding to which are not to be output can be calculated based on the inverse value of the product of the square root of subband gain, the square root of the gain of reverse component conversion, and the step size; or alternatively, the inverse value of the product of the human vision sensitivity, the step size, and the square root of the gain of reverse component conversion (calculation examples are omitted). According to the embodiment of the present invention, the selection unit 237 (and the corresponding process step) of
According to the embodiment of the present invention, the selection unit 243 of
So far, the numbers of the low-order bit planes and low-order sub bit planes are obtained by using the formulas (9) and (10), respectively, codes corresponding to both planes not being output. That is, the number of combination patterns of the low-order bit planes and low-order sub bit planes is one. Of course, some different values may be given to the constant k of the formulas (9) and (10) such that two or more combination patterns of the low-order bit planes and low-order sub bit planes are prepared, and such that a compression ratio that is the closest to a desired ratio is selected from the combination patterns.
According to one embodiment of the present invention, a wider selection of combination patterns is made available, i.e., finer compression ratio control is possible, which is realized by a process shown by the flowchart in
Details follow. First, explanations are presented about the case where the combination patterns of the low-order bit planes, codes corresponding to which are not to be output, are determined one by one using inverse values, which are called values (a), of the product of the square root of the subband gain and the human vision sensitivity, the inverse numbers being shown by
Each line of the right-hand side table of
In the case that two or more subbands have the same greatest value (a), e.g., two subbands 1HL and 1LH take the greatest value of 1.27 when the transition state shifts from 1 to 2 (refer to the right-hand side table of
The combination patterns of the low-order bit planes, codes corresponding to which are not to be output, can be determined through the process as described above, and by using the inverse values of the product of the square root of the subband gain, the human vision sensitivity, the step size, and the gain of reverse component conversion of Y, Cb, and Cr, the inverse values serving as the value (a), and being shown by
The process described as above can be used with other values (a). According to the embodiment of the present invention, the selection units 205 and 216, 226, 237, and 243 shown by
The combination patterns of the low-order sub bit planes, codes corresponding to which are not to be output, are determined through the same process as described above using the inverse values of the product of the square root of the subband gain and the human vision sensitivity, the inverse values serving as the values (a), and being shown by
This process can be used when using values (a) other than the inverse values of the product of the square root of the subband gain and the human vision sensitivity. According to the embodiment of the present invention, the selection units 205, 216, 226, 237, and 243 (and the corresponding process step) shown by
The combination patterns of the low-order sub bit planes, codes corresponding to which are not to be output, can also be determined as follows, using the inverse values of the product of the square root of subband gain and the human vision sensitivity, the inverse values serving as the value (a). Specifically, a numerical sequence Ej (0<=j<n, where n is the number of sub bit planes of a bit plane) is defined for each subband, where ΣEj=1 (summation taken for all the j's), and Ej<=Ej+1. Ej of a subband i is expressed as Eij. Then, the combination patterns are determined by “selecting the lowest-order sub bit plane of a subband i that has the greatest value (a),” the value (a) is divided by 2Eij, and j is incremented (however, when J=n−1, J is set to 0), which process is repeated.
An example wherein n=3, Ei0=5/18, Ei1=6/18, and Ei2=7/18 is explained with reference to
This process can be applied to where value (a) is other than the inverse value of the product of the square root of the subband gain and the human vision sensitivity. According to the embodiment of the present invention, the selection units 205, 216, 226, 237, and 243 (and the corresponding process step) shown by
One embodiment of the present invention is applicable to an apparatus for decoding encoded data.
The decoding apparatus shown by
In addition, one embodiment of the present invention includes a computer-executable program for realizing the encoded data generation apparatus as explained above, a computer-executable program for processing the encoded data generation method, and for generating the combination patterns according to the flowcharts as shown by
For reference, the DC level shift in JPEG 2000 reduces the dynamic range of a signal by a half when converting (forward transform of) positive numbers, such as RGB signal values, and doubles the dynamic range of the signal when performing the inverse transform. The conversion (forward transform) and the inverse transform are expressed by the following formula (11). In addition, this level shift is not applied to a signed integer value (that may be positive or negative), such as Cb and Cr signals of a YCbCr signal.
I(x,y)<−I(x,y)−2Ssiz(i) Conversion (forward transform), and
I(x,y)<−I(x,y)+2Ssiz(i) Inverse transform (11)
Here, Ssiz(i) represents the bit depth of each component i of an original image (in the case of an RGB image, i=0, 1, or 2).
Further, filters for 9×7 wavelet transform are as shown below.
Conversion (forward transform):
C(2n+1)=P(2n+1)+α*(P(2n)+P(2n+2)) [step1]
C(2n)=P(2n)+β*(C(2n−1)+C(2n+1)) [step2]
C(2n+1)=C(2n+1)+γ*(C(2n)+C(2n+2)) [step3]
C(2n)=C(2n)+δ*(C(2n−1)+C(2n+1)) [step4]
C(2n+1)=K*C(2n+1) [step5]
C(2n)=(1/K)*C(2n) [step6]
Inverse transform:
P(2n)=K*C(2n) [step1]
P(2n+1)=(1/K)*C(2n+1) [step2]
P(2n)=X(2n)·−δ*(P(2n−1)+P(2n+1)) [step3]
P(2n+1)=P(2n+1)·−γ*(P(2n)+P(2n+2)) [step4]
P(2n)=P(2n)·−β*(P(2n−1)+P(2n+2)) [step5]
P(2n)=P(2n+1)·−α*(P(2n)+P(2n+2)) [step6] (12)
Further, as mentioned above, if 9×7 wavelet transform is selected when using JPEG 2000, linear (scalar) quantization of the wavelet coefficients can be performed for every subband. The same step size is used within the same subband. A quantization formula is shown by the following formula (13), and the step size (□b) is defined by the following formula (14).
qb(u,v)=sign(ab(u,v))*floor(Iab(u,v)I/Δb) (13)
As for use of the index εb and the mantissa μb, there are two methods. According to the first method, referred to herein as explicit quantization and expounded quantization, the index εb and the mantissa μb are used in specifying all the subbands of each decomposition level. According to the second method, referred to herein as implicit quantization and derived quantization, the index εb and the mantissa μb are used to specify only the LL subband of the lowest-order decomposition level, with other subbands being specified by a predetermined formula.
The pair of the index εb and the mantissa μb (εb,μb) of the implicit quantization is determined by the following formula (15).
(εb,μb)=(ε0−NL+nb and μ0) (15)
where nb represents the number of decomposition levels.
A de-quantization formula is as shown by the following formula (16).
Further, the relation between the decomposition level and resolution level, which are often confused, is as shown by
As described above, effects of one embodiment of the present invention include that data encoded and recompressed by an encoding process and a recompression process, respectively, such as processes of JPEG 2000, are encoded/recompressed by properly selecting low-order bit planes and low-order sub bit planes, codes corresponding to which are not to be output, such that a signal obtained by decoding the encoded/recompressed data reproduces the original image at a satisfactory subjective quality level having fewer mean square errors; that fine control of the compression ratio is facilitated, while providing a satisfactory quality level; and so on.
Further, the present invention is not limited to these embodiments, but various variations and modifications may be made without departing from the scope of the present invention.
The present application is based on Japanese Priority Application No.2003-125667 filed on Apr. 30, 2003 with the Japanese Patent Office, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | Kind |
---|---|---|---|
NO. 2003-125667 | Apr 2003 | JP | national |