1. Field of the Invention
The present invention relates to a moving image encoding apparatus and a moving image encoding method of performing inter-frame encoding of moving images by encoding techniques such as MPEG2.
2. Description of the Related Art
In conventional moving image encoding, MPEG2 etc. are used, and respective pictures are encoded by any of an I-picture for encoding only a frame, a P-picture for encoding the difference between a frame and a reference image using the past frame already encoded on a time basis as the reference image, and a B-picture for encoding the difference between a frame and a reference image using average values of the past frames already encoded on a time basis, the future frames or the past and future frames as the reference image. What these plural pictures are combined is defined as GOP and the number of frames constructing the GOP is indicated by N. Also, a distance between the I-picture and the neighboring P-picture of the GOP is denoted by M, and a configuration of the GOP is represented by the N and M and a distance between the nearest neighboring two P-pictures is also M.
In
However, when predictive encoding of moving images is performed, an inter-frame prediction error may increase for some frames and thus, there was a problem that the amount of information occurrence in the case of prediction error encoding becomes large and degradation in image quality is caused.
Particularly, in the case of an image for making a special change in which special video processing is performed as a dissolve image, unlike motion of an object or pan of a camera, motion compensation prediction does not fall and the prediction is difficult, so that a prediction error signal might increase. As a result of that, when predictive encoding of the image for making a special change was performed, the amount of information occurrence became large and degradation in image quality was caused.
Therefore, the invention is implemented to solve the problem described above, and it is an object of the invention to provide a moving image encoding apparatus and a moving image encoding method capable of improving encoding efficiency of predictive encoding and minimizing degradation in image quality for an image in which the amount of information occurrence becomes large in the case of the predictive encoding by performing encoding based on features of a moving image to be encoded.
In order to solve the problem, a moving image encoding apparatus of the invention comprises an encoding preprocessing part for extracting the amount of image feature from a moving image before encoding and also sorting each frame constructing the moving image in order of encoding, a control part for setting an encoding parameter based on the amount of image feature extracted in the encoding preprocessing part, and an encoding part for encoding the moving image whose frames are sorted by the encoding preprocessing part based on the encoding parameter from the control part.
Also, it is characterized in that the encoding preprocessing part extracts the amount of image feature from a moving image before encoding, and the control part changes settings of encoding parameters within a dissolve interval and without the dissolve interval based on the amount of image feature extracted in the encoding preprocessing part.
Also, it is characterized in that the control part sets the encoding parameters so that a distance of an intra coded picture or a predictive coded picture is 2 when the encoding part encodes frames of the dissolve interval based on the amount of image feature extracted in the encoding preprocessing part.
Also, it is characterized in that the control part obtains a linear differential value and a quadratic differential value of the amount of image feature acquired from the encoding preprocessing part and determines whether there is the dissolve interval or not according to these values.
Also, it is characterized in that the encoding preprocessing part extracts the amounts of image feature every signal component of each frame constructing the moving image.
Also, it is characterized in that the encoding preprocessing part divides each frame constructing the moving image into plural regions and obtains the amount of image feature in each region unit.
Also, a moving image encoding method of the invention is characterized by extracting the amount of image feature from a moving image before encoding and also sorting each frame constructing the moving image in order of encoding and encoding the moving image whose frames are sorted based on an encoding parameter set on the basis of the amount of image feature.
Also, it is characterized in that the amount of image feature for detecting a dissolve interval from a moving image before encoding is extracted as the amount of image feature and settings of encoding parameters within the dissolve interval and without the dissolve interval are changed on the basis of the amount of image feature.
A first embodiment of a moving image encoding apparatus according to the invention will be described. Incidentally, in the following first to fourth embodiments, as an image in which the amount of information occurrence becomes large in the case of predictive encoding, a dissolve image is used as one example and when it is determined that there is the dissolve image, by changing a configuration of GOP to perform encoding relative to the case of encoding except the dissolve image, a moving image encoding apparatus and a moving image encoding method for improving encoding efficiency will be described.
Next, operations will be described.
First, by the encoding preprocessing part 1, calculation for extracting the amount of image feature such as an inter-frame difference value is performed relative to image data to be encoded, and these information is passed to the control part 2 as the image feature amount information 11 and also is sent to the encoding part 3 as the image data 10.
The control part 2 determines whether there is a dissolve image or not based on these image feature amount information 11, and sets the encoding parameter 12 to the encoding part 3. For example, in the control part 2, if it is determined that the image data 10 encoded by the encoding part 3 is the dissolve image, the encoding parameter 12 is set to the encoding part 3 so that a distance M of an intra coded (I) picture or a predictive coded (P) picture out of the configuration of GOP is M=2 and on the other hand, if it is not determined that the image data 10 is the dissolve image, the encoding parameter is set so that the configuration of GOP is M=3. In the encoding part 3, by conditions set by the encoding parameter 12 from the control part 2, encoding is performed to the image data 10 inputted from the encoding preprocessing part 1.
A frame C in
C(i,j)={f2×A(i,j)+f1×B(i,j)}/(f1+f2) expression
Here, i indicates coordinates of the horizontal direction and j indicates coordinates of the vertical direction, and a pixel value of the frame A is represented by A (i,j) and a pixel value of the frame B is represented by B(i,j).
Next, a reason for changing the configuration of GOP depending on whether the image data is the dissolve image or not will be described.
Generally, in encoding of MPEG2, the encoding is often performed in the case that the distance M of the I-picture or the P-picture is M=3.
Here, when a pixel value of the frame 100 is D0 and a pixel value of the frame 101 is D1 and a pixel value of the frame 102 is D2 and a pixel value of the frame 103 is D3 and one inter-frame difference value is dif, relations of expressions (2) to (4) as shown below generally hold in the dissolve image.
D1−D0=dif expression (2)
D2−D1=dif expression (3)
D3−D2=dif expression (4)
Also, a pixel value D4 of an image frame 104 obtained by averaging the frame 100 and the frame 103 can be indicated as the following expression (5).
D4={D0+D3}/2 expression (5)
As a result of this, each prediction error concerning the frame 101 can be indicated as expressions (6) to (8), respectively, when a forward prediction error in the case that the frame 100 is a forward prediction frame is indicated by dif_f and a backward prediction error in the case that the frame 103 is a backward prediction frame is indicated by dif_b and a bidirectional prediction error in the case that the frame 104 is a bidirectional prediction frame is indicated by dif_i.
dif—f=D1−D0=dif expression (6)
dif—b=D3−D1=2×dif expression (7)
dif—i=D4−D1=dif/2 expression (8)
Therefore, a relation as shown in the following expression (9) holds and in the case of a frame configuration with M=3 by the conventional art shown in
dif—i<dif—f<dif—b expression (9)
However, for M=3 shown in
Also, in a manner similar to the case of the frame 101, prediction errors concerning the frame 102 are indicated as expressions (10) to (12), respectively, when a forward prediction error is indicated by dif_f and a backward prediction error is indicated by dif_b and a bidirectional prediction error is indicated by dif_i.
dif—f=D2−D0=2×dif expression (10)
dif—b=D3−D2=dif expression (11)
dif—i=D4−D2=dif/2 expression (12)
Therefore, a relation as shown in the following expression (13) holds and in the case of a frame configuration with M=3 by the conventional art shown in
dif—i<dif—b<dif—f expression (13)
However, for M=3 shown in
Here, in a manner similar to the case of
D1−D0=dif expression (14)
D3−D1=dif expression (15)
Also, a pixel value D4 of a frame 104 obtained by averaging the frame 100 and the frame 103 can be indicated as the following expression (16) in a manner similar to the expression (5).
D4={D0+D3}/2 expression (16)
As a result of this, each prediction error concerning the frame 101 is indicated as expressions (17) to (19) respectively, when a forward prediction error is indicated by dif_f and a backward prediction error is indicated by dif_b and a bidirectional prediction error is indicated by dif_i.
dif—f=D1−D0=dif expression (17)
dif—b=D3−D1=dif expression (18)
dif—i=D4−D1≈0 expression (19)
Therefore, a relation as shown in the following expression (20) holds and in the case of a frame configuration with M=2 by the first embodiment shown in
dif—i(≈0)<<dif—b=dif—f=dif expression (20)
As a result of that, for the first embodiment, the prediction error of the B-picture frame 101 in the dissolve interval becomes about zero and on encoding in the dissolve interval, only in the case of the frames 100 and 103 of the I-picture or the P-picture, encoding data occurs and the amount of information occurrence can be suppressed.
Therefore, according to the first embodiment, when the dissolve interval is detected, by changing the M value within the dissolve interval relative to the M value without the dissolve interval to change a configuration of GOP, the inter-frame difference value in the B-picture frame can be made to zero and be made smaller than that of the conventional method and as a result of that, the image with slight degradation can be encoded.
Next, a second embodiment characterized by being constructed so that an encoding preprocessing part 1 further obtains the amounts of image feature every signal component in relation to the first embodiment and outputs information about the amounts of image feature every signal component to a control part 2 will be described.
The second embodiment differs from the first embodiment in that the encoding preprocessing part 1 obtains the amounts of image feature every signal component and outputs the amounts of image feature obtained every signal component in parallel relation to the control part 2 and the control part 2 inputs to process the amounts of image feature every signal component inputted in parallel. In
Next, peculiar operations of the second embodiment will be described. Unlike the first embodiment, in the second embodiment, the encoding preprocessing part 1 obtains the amounts of image feature every signal component and outputs the amounts of image feature obtained every signal component in parallel relation to the control part 2 and the control part 2 inputs to process the amounts of image feature every signal component inputted in parallel, so that, for example, in two scenes, detection can be made as a dissolve interval even in the scenes in which the Y signal that is a luminance component little changes at the substantially same level and the Cb signal or the Cr signal that is a color difference component gradually changes.
Therefore, according to the second embodiment, it is constructed so as to obtain the amounts of image feature every signal component, so that error detection of the dissolve interval can be prevented and encoding processing can be performed more effectively and also, detection can be made as the dissolve interval even in the case of the scenes in which only a particular signal component changes by dissolve processing. Thus, when the dissolve interval is detected, by changing the M value within the dissolve interval relative to the M value without the dissolve interval to change a configuration of GOP, the inter-frame difference value in the B-picture frame can be made smaller than that of the conventional method and as a result of that, the image with slight degradation can be encoded.
Incidentally, in the second embodiment, the description constructed so as to output the amounts of image feature every signal component in parallel relation to the control part 2 from the encoding preprocessing part 1 has been made, but in the invention, the amounts of image feature every signal component are not necessarily sent at the same time and may naturally be sent in a time division manner.
A third embodiment showing in detail dissolve interval detection processing within a control part 2 in relation to the first and second embodiments will be described. Incidentally, since the configuration itself of the moving image encoding apparatus is identical to that of
First, an encoding preprocessing part 1 extracts the amount of image feature such as an inter-frame difference value from each frame of an inputted moving image and outputs the amount of image feature to the control part 2 as image feature amount information 11, and the control part 2 performs the dissolve interval detection processing based on this image feature amount information 11.
Specifically, assuming that a linear differential value of the image feature amount information 11 is Δ1 and a difference of the linear differential value of the image feature amount information 11 is a quadratic differential value Δ2 of the image feature amount information 11 and also θ1, θ2, θ3 are predetermined constants with a relation of θ1<θ2, it is first determined whether a condition of the following expression (20) is satisfied or not, namely the linear differential value Δ1 which is the rate of change of the image feature amount information 11 is in the range of the predetermined value θ1 to θ2 and the amount of image feature gradually changes or not (step S100) and if the condition of this expression (20) is satisfied (“Yes” in step S100), it is further determined whether a condition of the following expression (21) is satisfied or not, namely the quadratic differential value Δ2 which is the rate of change of the linear differential value Δ1 is less than or equal to the predetermined value θ3 or not (step S110) and if the condition of this expression (21) is also satisfied (“Yes” in step S110), it is determined that there is within a dissolve interval (step S120).
θ1≦Δ1≦θ2 expression (20)
θ2≦θ3 expression (21)
Incidentally, if the linear differential value Δ1 of the image feature amount information 11 is not in the range of the predetermined value θ1 to θ2 and the expression (20) is not satisfied (“No” in step S100), it is determined that there is without the dissolve interval (step S130), and also if the expression (20) is satisfied and the quadratic differential value Δ2 of the image feature amount information 11 is more than the predetermined value θ3 (“Yes” in step S100 and “No” in step S11), it is determined that there is without the dissolve interval (step S130).
Incidentally, in the case of the second embodiment shown in
Therefore, according to the third embodiment, it is constructed so as to detect the dissolve interval using the linear differential value Δ1 and the quadratic differential value Δ2 of the image feature amount information 11, so that it is not determined that there is the dissolve interval even in the case of images in which scene changes are made and the dissolve interval can be detected. As a result of that, when the dissolve interval is detected, in a manner similar to the cases of the first and second embodiments, by changing the M value within the dissolve interval relative to the M value without the dissolve interval to change a configuration of GOP, the inter-frame difference value in the B-picture frame can be made to zero and be made smaller than that of the conventional method and thus, the image with slight degradation can be encoded.
In the first to third embodiments, the description constructed so that the encoding preprocessing part 1 extracts the amount of image feature in a frame unit has been made, but next, a fourth embodiment in which an extraction unit of the amount of image feature is not a frame and is a local region unit obtained by dividing a frame (screen) into plural local regions and the amount of image feature is extracted in a local region unit and the control part 2 determines whether there is within the dissolve interval in the region unit or not will be described. Incidentally, since the configuration itself of the moving image encoding apparatus is identical to that of
The control part 2 inputs the image feature amount information 11 or the image feature amount information 11a to 11c every the divided local region, and determines whether there is within the dissolve interval or not every the divided local region in a manner similar to the cases of the first to third embodiments.
In that case, for example, if the control part 2 primarily determines that a total of 16 local regions of upper four rows shown by hatch in
On the other hand, if the number of local regions primarily determined that the local regions are within the dissolve interval is less than one-half of all the local regions of the one frame, the control part 2 finally determines that this frame is without the dissolve interval and in a manner similar to the cases of the first to third embodiments, the encoding parameter 12 is set so that the configuration of GOP is M=3 and encoding processing is performed by the encoding part 3.
Therefore, according to the fourth embodiment, it is constructed so as to extract the amount of image feature in a local region unit obtained by dividing one frame into plural regions and determine whether dissolve processing is performed every local region or not and determine whether said frame is within the dissolve interval or not based on the proportion of the dissolved local regions, so that it can be determined whether the frame is within the dissolve interval or not even in the case of a moving image in which the dissolve processing is performed only in the particular local regions of one frame. Thus, in a manner similar to the cases of the first to third embodiments, by changing the M value within the dissolve interval relative to the M value without the dissolve interval to change a configuration of GOP in relation to the frame of the dissolve interval, the inter-frame difference value in the B-picture frame can be made to zero or be made smaller than that of the conventional method and as a result of that, the image with slight degradation can be encoded.
Incidentally, in the fourth embodiment, the description constructed so as to determine whether the whole one frame is within the dissolve interval or not by whether the proportion of the dissolved local regions is more than one-half of all the local regions of the one frame or not has been made, but the invention is not limited to this and may naturally be constructed so as to determine that the whole one frame is within the dissolve interval even in the case that the proportion of the dissolved local regions is less than or equal to one-half of all the local regions, or may naturally be constructed so as to determine whether the dissolve processing is performed or not in one or plural particular local regions and thereby determine whether the whole one frame is within the dissolve interval or not. Thus, even in the case of frames in which a dissolved region is small or the dissolve processing is locally performed, it can be regarded as the frame in which the dissolve processing is performed as the whole frame. Therefore, even in the case of images such as so-called picture-in-picture or picture-out-picture, detection can easily be made as the dissolve interval.
Also, the description constructed so as to determine whether the whole one frame is within the dissolve interval or not based on the proportion of the dissolved local regions has been made, but the invention is not limited to this and may naturally be constructed so as to determine whether there is within the dissolve interval or not every local region and change the configuration of GOP every local region in the above manner. Thus, on encoding images with high image quality such as HDTV, when a screen is divided to perform encoding by individual encoding apparatus, the configurations of GOP can be changed to perform the encoding in the respective encoding apparatus.
Also, in the first to fourth embodiments, the description constructed so as to extract the amount of image feature from the moving image before encoding and determine whether there is within the dissolve interval or not based on the amount of image feature extracted and perform encoding by changing the M value of GOP to change the configuration of GOP in the case that there is within the dissolve interval and improve efficiency of the encoding has been made, but the invention is not limited to this and may naturally be constructed so as to determine whether there is an image in which the amount of information on a prediction error signal possibly becomes large or not like the case of an image for making a special change in which special video processing other than a dissolve image is performed or a scene change image based on the amount of image feature extracted and perform predictive encoding by changing the encoding parameter, for example, changing the M value or N value of GOP to change the configuration of GOP or changing a quantization step in the case of the image in which the amount of information on the prediction error signal possibly becomes large and thereby improve efficiency of the encoding.
As described above, according to the invention, it is constructed so as to extract the amount of image feature from a moving image before encoding and also sort each frame constructing the moving image in order of encoding and encode the moving image whose frames are sorted based on an encoding parameter set on the basis of the amount of image feature, so that in the case of an image in which the amount of information on a prediction error signal possibly becomes large, encoding can be performed on the basis of the feature and as a result of that, the amount of information occurrence on the encoding can be reduced.
Particularly, it is constructed so as to extract the amount of image feature from a moving image before encoding and determine whether there is a dissolve interval or not and encode the moving image by changing settings of encoding parameters within the dissolve interval and without the dissolve interval in the case that there is the dissolve interval, so that encoding suitable for a dissolve image can be performed in the dissolve interval and efficiency of the encoding improves and there is the effect of obtaining an image with less degradation.
Also, in the case that there is the dissolve interval, a distance of an intra coded picture or a predictive coded picture is set to 2, so that a prediction error signal of a bidirectional predictive coded picture in the range of the distance can be made to substantially zero and the amount of information occurrence of the dissolve interval can be reduced.
Also, it is constructed so as to obtain a linear differential value and a quadratic differential value of the amount of image feature and determine whether there is the dissolve interval or not according to these values, so that it is not determined that there is the dissolve interval even in the case of images in which scene changes are made and the dissolve interval can be detected.
Also, it is constructed so as to extract the amounts of image feature every signal component of each frame and make a determination every signal component, so that a special change in an image of the dissolve interval can be detected and error detection of the special change in the image can be prevented and encoding processing can be performed more effectively and also, the change can be detected even in the case of a scene in which only a particular signal component changes.
Also, it is constructed so as to divide a frame into plural regions and obtain the amount of image feature in each region unit, so that a special change in an image of the dissolve interval can be detected even in each region unit acquired by dividing the frame into the plural regions other than the whole frame. As a result of that, the special change in the image of the dissolve interval can be detected even in images such as picture-in-picture or picture-out-picture. Further, when a screen is divided to perform encoding by individual encoding apparatus on encoding images with high image quality such as HDTV, configurations of GOP can be changed to perform the encoding in the respective encoding apparatus.
Number | Date | Country | Kind |
---|---|---|---|
P. 2000-270264 | Sep 2000 | JP | national |
Number | Name | Date | Kind |
---|---|---|---|
5283645 | Alattar | Feb 1994 | A |
5847772 | Wells | Dec 1998 | A |
5894526 | Watanabe et al. | Apr 1999 | A |
6084641 | Wu | Jul 2000 | A |
6097737 | Takizawa et al. | Aug 2000 | A |
6463101 | Koto | Oct 2002 | B1 |
6631210 | Mutoh et al. | Oct 2003 | B1 |
6771825 | Hurst, Jr. | Aug 2004 | B1 |
20020136297 | Shimada et al. | Sep 2002 | A1 |
Number | Date | Country |
---|---|---|
8242452 | Sep 1996 | JP |
08307878 | Nov 1996 | JP |
9182079 | Jul 1997 | JP |
09182079 | Jul 1997 | JP |
9-322171 | Dec 1997 | JP |
2000-59775 | Feb 2000 | JP |
Number | Date | Country | |
---|---|---|---|
20020028023 A1 | Mar 2002 | US |