1. Field of the Invention
The present invention relates to apparatuses having a function of taking a moving image, such as a digital camera, a mobile telephone with camera and the like, and to moving image compression-encoding techniques for creating and using image contents.
2. Description of the Related Art
In recent years, commercialization of highly efficient moving image compression-encoding techniques, such as MPEG (moving picture experts group) and the like, has rapidly penetrated into camcorders, mobile telephones and the like.
In standards for encoding techniques, such as MPEG or the like, various encoding modes are defined. For example, MPEG-4 has an “intra-encoding mode” in which only an image in a screen of a frame to be encoded is used and encoded (hereinafter referred to as a target image), and an “inter-encoding mode” in which an image region that strongly correlates with a target image is detected (motion estimation) from a frame that has already been encoded (hereinafter referred to as a reference frame), and only a difference value between an image after motion estimation (hereinafter referred to as a motion-compensated image) and the target image is encoded.
In MPEG-4AVC/H.264, pixel prediction (hereinafter referred to as intra-prediction) that employs pixels in a screen can be performed in intra-encoding, and a plurality of intra-prediction modes are defined for each of luminance and color difference signals. Also in inter-encoding, a reference frame can be selected from a plurality of candidates, and a block size of an image in which motion compensation is performed can be selected from various modes ranging from 16 pixels×16 pixels (maximum) to 4 pixels×4 pixels (minimum).
For example, mode determination of the intra-encoding mode/the inter-encoding mode in MPEG-4 is commonly performed by a method as shown in
In the conventional mode determining method of
SAD=ΣΣ|target image(org_x, org_y)—motion-compensated image(ref_x, ref_y)|
ACT=ΣΣ|target image(org_x, org_y)—an average value of the target image|.
SAD is calculated using all pixels in a macroblock (16 pixels×16 pixels) that is an encoding unit of MPEG-4. Absolute difference values are calculated on a pixel-by-pixel basis from the upper left pixels of the target image and the motion-compensated image, and the sum of the absolute difference values of a total of 256 pixels (16 pixels×16 pixels) is SAD.
ACT is also calculated using all pixels in the macroblock. Initially, an average of 256 pixels in the target image is calculated, and thereafter, absolute difference values from the average are calculated on a pixel-by-pixel basis from the upper left pixel of the target image, and the sum of the absolute difference values of the 256 pixels is ACT.
SAD and ACT are used as evaluation values to determine an encoding mode. When SAD<ACT, the inter-encoding mode is selected, and when SAD≧ACT, the intra-encoding mode is selected (S102).
Also, the target image input to the encoder 200 is input to an ACT calculating section 250 and an SAD calculating section 251, and the motion-compensated image generated in the motion-compensated image generating section 202 is input to the SAD calculating section 251, so that SAD and ACT are calculated and input to an encoding mode determining section 252. The encoding mode determining section 252 selects an encoding mode having the smaller one of these values, and outputs the result of the selection, i.e., the “intra-encoding mode” or the “inter-encoding mode”, to the encoding mode selecting section 204.
The encoding mode selecting section 204 receives the target image input to the encoder 200, the difference image generated by the subtraction section 203, and the encoding mode determined by the encoding mode determining section 252.
If the encoding mode determining section 252 determines that the “intra-encoding mode” should be used, the encoding mode selecting section 204 selects the target image. If the encoding mode determining section 252 determines that the “inter-encoding mode” should be used, the encoding mode selecting section 204 selects and outputs the difference image to a DCT (discrete cosine transform) processing section 205. The DCT processing section 205 performs a DCT process and outputs the result to a quantization processing section 206. The quantization processing section 206 performs a quantization process and outputs the result to a variable-length encoding section 209 and an inverse quantization processing section 207. The inverse quantization processing section 207 performs inverse quantization with respect to data received after the quantization process (hereinafter referred to as DCT coefficients) and outputs the result to an inverse DCT processing section 208. The inverse DCT processing section 208 performs an inverse DCT process. If the encoding mode determining section 252 has selected the “inter-encoding mode”, the motion-compensated image is added to the data after the inverse DCT process. The switching is performed in a motion compensation switching section 211, and the addition is performed in an addition section 212. An image output from the addition section 212 (hereinafter referred to as a reconstructed image) is temporarily stored as a reference image for the next frame or thereafter in the reference frame storing section 213, for use in a subsequent frame.
The variable-length encoding section 209 performs a variable-length encoding process with respect to the DCT coefficients generated by the quantization processing section 206, to generate a stream. The stream is temporarily stored in a stream storing section 210 and is subsequently output as a generated stream from the encoder 200.
An encoding mode determining technique employing a target image and a motion-compensated image that is similar to that described above has also been disclosed in Japanese Unexamined Patent Application Publication No. 2002-159012.
In the above-described encoding mode determining technique, the amount of final codes generated is not taken into consideration during encoding mode determination, and therefore, more codes may be generated in the selected encoding mode. For example, it is assumed that the intra-encoding mode is selected since SAD>>ACT. However, it may be well expected that the actual amount of codes in a stream that has been subjected to an encoding process is such that “the amount of codes in the intra-encoding mode” >>“the amount of codes in the inter-encoding mode”.
Hereinafter, a specific example will be described.
Here, for the target image and the motion-compensated image, ACT is “180” and SAD is “2400”, so that SAD>>ACT. Considering the sequences of encoding steps in the intra-encoding mode and the inter-encoding mode when these images are used, however, coefficients are distributed in all frequency bands when the target image is subjected to a DCT process in the intra-encoding mode (see
Thereafter, a variable-length encoding process is performed with respect to the data after quantization. Since a variable-length encoding process is typically performed only with respect to data after quantization other than “0” (hereinafter referred to as non-0 data), the amount of codes is larger in an encoding mode in which a larger amount of non-0 data is included. In this example, it can be easily expected that “the number of pieces of non-0 data in the intra-encoding mode”>>“the number of pieces of non-0 data in the inter-encoding mode”, resulting in “the amount of codes in the intra-encoding mode”>>“the amount of codes in the inter-encoding mode”.
Thus, it is sufficiently possible in an encoding process that “the amount of codes in the intra-encoding mode”>>“the amount of codes in the inter-encoding mode” irrespective of SAD>>ACT.
On the other hand, when an encoding process is performed, aiming a certain bit rate, then if the amount of codes temporarily increases in a certain frame (or a macroblock), it is required to absorb the increased amount of codes in the next frame (or the next macroblock) and thereafter. For example, assuming that two frames of images are encoded at a rate of 2 Mbps (bits per second), i.e., 2 fps (frames per second), the following two types of encoding processes will be compared.
An encoding process A in which 1 Mbits of codes are generated in the first frame and 1 Mbits of codes are also generated in the second frame on the average, is compared with an encoding process B in 1.999 Mbits of codes are generated in the first frame and 0.001 Mbits of codes are generated in the second frame to seek 2 Mbps. In comparison about the first frame, the amount of codes in the encoding process B is about two times lager than in the encoding process A, so that a higher level of image quality is considered to be obtained in the encoding process B. In comparison about the second frame, the image quality is considered to be lower in the encoding process B in which only a very small amount of codes is generated. In other words, whereas both the two frames have an average level of image quality in the encoding process A, one frame has a high level of image quality while the next frame has a low level of image quality in the encoding process B, so that such a large difference in image quality between frames may lead to a low-quality moving image.
As described above, in conventional methods for determining an encoding mode, the amount of codes is not taken into consideration. Therefore, when encoding is performed in a selected encoding mode, the amount of codes generated therein may be larger than when encoding is performed in another encoding mode, resulting in a hindrance to efforts to improve image quality and a compression ratio.
The present invention is characterized in that, in image compression in which there are a plurality of encoding modes, an encoding process is performed in each of the plurality of encoding modes until quantized DCT coefficients are generated, an encoding mode that provides a smallest code amount is determined based on information about the amount of codes to be generated in each encoding mode, and DCT coefficients of the determined encoding mode are selected and subjected to variable-length encoding.
According to the present invention, when an encoding mode is selected from a plurality of encoding modes, an encoding mode that provides a smallest code amount can be invariably and correctly selected. In addition, by selecting and subjecting DCT coefficients of the determined encoding mode to variable-length encoding, the size of an encoding process device can be reduced.
Note that each of the encoding sections 110 to 113 needs to output the amount of codes to the encoding mode determining section 120, and hence needs to additionally include the code amount calculating section 230 for calculating only the code amount from DCT coefficients instead of the variable-length encoding section 209. The code amount calculating section 230 may have only a function of calculating the amount of codes, and therefore, requires a smaller size than that of the variable-length encoding section 209.
Also, each of the encoding sections 110 to 113 does not generate an encoded stream, and therefore, does not need to include the stream storing section 210. The stream storing section 210 needs to have a capacity that can store at least one macroblock of stream. However, a variable-length encoding process cannot necessarily compress data. In other words, a generated stream does not necessarily have a smaller data size than that of an input image. Therefore, the stream storing section 210 often has a capacity with a margin. However, since the stream storing section 210 can be removed from the encoding process device 100 of the present invention, the size of the encoding process device 100 can be significantly reduced.
Since the first encoding section 1-1 (110) operates in the intra-encoding mode, the encoding mode selecting section 204 of
Since the second encoding section 1-2 (111) operates in the inter-encoding mode, the encoding mode selecting section 204 of
A code amount output from the first encoding section 1-1 (110) and a code amount output from the second encoding section 1-2 (111) are input to the encoding mode determining section 120. In view of these code amounts, the encoding mode determining section 120 determines an encoding section that has performed an encoding process in an encoding mode which provides a smallest code amount, and outputs the result to the DCT coefficient selecting section 121 and the reconstructed image selecting section 122. The DCT coefficient selecting section 121 supplies, to the variable-length encoding section 209, quantized DCT coefficients that have been obtained from the quantization processing section 206 of the encoding section selected by the encoding mode determining section 120. The variable-length encoding section 209 outputs the result of a variable-length encoding process as a stream from the encoding process device 100. The reconstructed image selecting section 122 reads a reconstructed image from the addition section 212 in the encoding section selected by the encoding mode determining section 120, and writes the reconstructed image to the reference frame storing section 213 provided outside the encoding process device 100.
Specifically, the amount of codes in a stream to be processed and generated in the intra-encoding mode in the first encoding section 1-1 (110) is compared with the amount of codes in a stream to be processed and generated in the inter-encoding mode in the second encoding section 1-2 (111), and an encoding mode that results in the smaller code amount is selected, thereby making it possible to execute an encoding process invariably using an encoding mode that provides the smaller code amount.
The encoding process device 100 of
In the DCT processing section 205, the quantization processing section 206, and the code amount calculating section 230, the size of a block processed is the same. For example, if the size of a block processed in the DCT processing section 205 is 8 pixels×8 pixels, the size of a block processed in the quantization processing section 206 and the code amount calculating section 230 is also 8 pixels×8 pixels.
As shown in
The present invention is also applicable to determination of an intra-prediction mode that is defined in a moving image compression-encoding technique, such as, representatively, MPEG-4AVC/H.264. Here, intra-prediction will be described.
In an intra-encoding mode process, initially, an intra-prediction image is generated using images of surrounding blocks, a difference image between a target image and the intra-prediction image is generated, and the difference image is subjected to a DCT process or the like. A stronger correlation between a target image and an intra-prediction image, i.e., a smaller difference image, has a higher encoding efficiency. For the method for generating an intra-prediction image using images of surrounding blocks, several modes are defined. For example, there are nine modes for prediction of a luminance signal when the prediction block size is 4×4. Among them are an “intra-prediction mode 0” in which four pixels at a lower end of an upper adjacent block are used to generate an intra-prediction image, an “intra-prediction mode 1” in which four pixels at a right end of a left adjacent block are used to generate an intra-prediction image, and the like.
Also, for luminance, four modes are defined for 16×16, and nine modes are defined for 8×8, in addition to the prediction block size of 4×4. For color difference, four modes are defined for 8×8.
If a prediction block size is changed or an intra-prediction mode used is changed, a different intra-prediction image is generated and a different stream is also generated. In other words, the amount of codes itself varies depending on the prediction block size or the intra-prediction mode.
The present invention is also applicable when an encoding process is performed in the intra-prediction mode determination while invariably selecting a mode which provides a smallest code amount.
In
In this case, as shown in
Among the moving image compression-encoding techniques, such as, representatively, MPEG-4AVC/H.264, is motion compensation in which a plurality of reference frames are used. The present invention can also be used to determine the reference frames. The motion compensation using a plurality of reference frames means that, as a frame used in motion compensation, any frame can be selected from several frames that have been completely encoded.
In this case, if the encoding section 3-1 (110d) to the encoding section 3-n (113d) are caused to perform a motion compensation process using different reference frames as shown in
Also, in a moving image compression-encoding technique, such as, representatively, MPEG-4AVC/H.264, a block size for motion compensation can be changed on a macroblock-by-macroblock basis. The present invention is also applicable to determination of the block size for motion compensation. As previously described, the block size for motion compensation includes 16×16, 8×16, 16×8, 8×8, and the like.
In this case, if the encoding sections 4-1 (110e) to the encoding section 4-n (113e) are caused to perform a motion compensation process using different block sizes for motion compensation as shown in
It is well known that frame encoding types mainly include I-picture, P-picture, and B-picture in moving image compression-encoding, such as, representatively, MPEG. In MPEG-4AVC/H.264, a picture can be divided into one or a plurality of slices, and an encoding type (I-slice/P-slice/B-slice) can be determined for each slice. For P-pictures (P-slices) and B-pictures (B-slices), the “intra-encoding mode” and the “inter-encoding mode” can be selected and changed for each macroblock. For I-pictures (I-slices), however, the “intra-encoding mode” needs to be used in all macroblocks, which is defined in the standards.
Under these conditions, in a device, such as the encoding process device 100 of
When an I-picture (I-slice) is processed, the “intra-encoding mode” can be designated for both the first and second encoding sections 110 and 111, and the quantization values of the quantization processing sections 206 in the encoding sections 110 and 111 can be changed to different values. For example, the “intra-encoding mode” is designated for the first encoding section 110, and α is designated for the quantization value of the quantization processing section 206 in the first encoding section 110. On the other hand, the “intra-encoding mode” is designated for the second encoding section 111, and β is designated for the quantization value of the quantization processing section 206. These quantization values α and β mean Q-parameters for controlling a compression ratio and, for example, take a value of 1 to 31 in MPEG-4.
The encoding process device 100f of
As described above, a compression ratio is determined based on a quantization value. For example, when an excessively large amount of codes is generated in a p-th frame, a process in which a compression ratio is increased so as to suppress generation of codes to cancel the excess of the p-th frame (a quantization value is changed) is typically performed in a (p+1)-th frame. However, it is not possible to determine how much the quantization value is changed to do so unless, actually, the quantization value is set, an encoding process is performed, and a code amount is investigated. It is well known that an approximate guideline value may be calculated from the quantization value of the previous frame, the amount of codes generated in the previous frame, and a target code amount of the next frame, to determine the quantization value of the next frame. However, the thus-calculated quantization value of the next frame does not necessarily lead to the target amount of codes. In this case, by performing encoding using two candidate quantization values, such as a and B described above, a code amount closer to the target value can be achieved, resulting in a more correct control of the amount of codes.
Note that a plurality of configurations for encoding mode determination of
Also, for example, when only intra-encoding is performed in the encoding section 110 for which the intra-encoding mode is designated as shown in
According to
With the configuration of
Note that image processing in the signal processing device 606 of the present invention is not limited to a signal based on image light imaged on the sensor 603 via the optical system 602, and is applicable to, for example, a case where an image signal that is input as an electrical signal from the outside of the device is processed.
As described above, the encoding process method and the encoding process device of the present invention can correctly and invariably select an encoding mode which provides a smallest code amount when an encoding mode is determined from a plurality of encoding modes, and therefore, are useful as an apparatus having a function of taking a moving image, a technique for creating or using image contents, and the like.
Number | Date | Country | Kind |
---|---|---|---|
2007-229995 | Sep 2007 | JP | national |
2008-156761 | Jun 2008 | JP | national |