The present invention relates to a video encoding method, a video decoding method, a video encoding apparatus, a video decoding apparatus, and a program thereof, which have a function of changing a set of interpolation filter coefficients within a frame.
Priority is claimed on Japanese Patent Application No. 2010-180814, filed Aug. 12, 2010, the content of which is incorporated herein by reference.
According to video encoding, in inter-frame prediction (motion compensation) encoding in which prediction is performed between different frames, a motion vector is obtained with reference to an already decoded frame such that prediction error energy and the like are minimized. A residual signal generated by the motion vector is orthogonally transformed, is subject to quantization, and is generated as binary data through entropy encoding. In order to improve coding efficiency, it is necessary to obtain a prediction scheme with higher prediction precision, and to reduce prediction error energy.
In relation to a video coding standard scheme, many tools for increasing the precision of inter-frame prediction have been introduced. For example, in H.264/AVC (Advanced Video Coding), when occlusion exists in the next frame, it is possible to reduce prediction error energy when referring to frames temporally separated from each other in a little distance, and thus it is possible to refer to a plurality of frames. This tool is called multiple reference frame prediction.
Furthermore, in order to cope with complicated forms of motion, it is possible to finely divide a block size such as 16×8, 8×16, 8×4, 4×8, and 4×4, in addition to 16×16 and 8×8. This tool is called variable block size prediction.
Similarly, ½ precision pixels are interpolated from integer precision pixels of a reference frame using a 6-tap filter, and ¼ precision pixels are generated using the pixels through linear interpolation. In this way, prediction for motion with non-integer precision is realized. This tool is called ¼ pixel precision prediction.
In order to design the next generation video coding standard scheme with the coding efficiency higher than that of H.264/AVC, the international standardization organizations ISO/IEC “MPEG” (International Organization for Standardization/International Electrotechnical Commission “Moving Picture Experts Group”) and ITU-T “VCEG” (International Telecommunication Union-Telecommunication Standardization sector “Video Coding Experts Group”) have currently collected various proposals from various countries around the world. Among the proposals, there are many proposals associated with inter-frame prediction (motion compensation), and the next generation video coding software (hereinafter referred to as KTA (Key Technical Area) software) created at the initiative of VCEG employs a tool for reducing a bit amount of a motion vector, or a tool for expanding a block size to 16×16 or more.
Particularly, a tool for adaptively changing a set of interpolation filter coefficients of a decimal precision pixel is called an adaptive interpolation filter, has an effect in almost all sequences, and is initially employed in KTA software. In contributions to collection (Call for Proposal) of a new coding test model issued by a group JCT-VC (Joint Collaborative Team on Video Coding) for designing the next generation video coding standard jointly conducted by MPEG and VCEG, this technology is frequently employed. Since contribution to the coding efficiency improvement is high, performance improvement of the adaptive interpolation filter is considered to be a highly anticipated field in the future.
The current situation has been described above. However, as an interpolation filter in video coding, the following filters have been used in the related art.
[Fixed Interpolation]
In the past video coding standard scheme MPEG-1/2/4, as illustrated in
Meanwhile, in H.264/AVC, when interpolating pixels at ½ pixel positions, interpolation is performed using the total six integer pixels at the three right and left points of pixels to be interpolated. For the vertical direction, interpolation is performed using the total six integer pixels at the three upper and lower points. Filter coefficients are [(1, −5, 20, 20, −5, 1)/32]. After the ½ precision pixels are interpolated, the ¼ precision pixels are interpolated using an average value filter of [½, ½]. Since it is necessary to interpolate all the ½ precision pixels once, the degree of calculation complexity is high, but interpolation with high performance is possible and the coding efficiency is improved.
[Adaptive Interpolation]
In the H.264/AVC, regardless of an input image condition (a sequence type/an image size/a frame rate) or an encoding condition (a block size/a GOP (Group of Pictures) structure/QP (Quantization Parameter), a filter coefficient value is constant. When the filter coefficient value is fixed, for example, a temporally changing effect, such as aliasing, a quantization error, an error due to motion estimation, or a camera noise, is not considered. Accordingly, there is considered to be a limitation in performance improvement in terms of the coding efficiency. Therefore, a scheme of adaptively changing interpolation filter coefficients is proposed in Non-Patent Document 4, and is called a non-separable adaptive interpolation filter.
In Non-Patent Document 4, a two-dimensional interpolation filter (the total 36 filter coefficients of 6×6) is considered, and filter coefficients are determined such that prediction error energy is minimized. In this scheme, it is possible to realize high coding efficiency as compared with the case of using a one-dimensional 6-tap fixed interpolation filter used in the H.264/AVC. However, since the degree of calculation complexity is significantly high in acquiring the filter coefficients, a proposal for reducing the degree of calculation complexity is introduced in Non-Patent Document 5.
A scheme introduced in Non-Patent Document 5 is called a SAIF (Separable Adaptive Interpolation Filter), and uses a one-dimensional 6-tap interpolation filter instead of a two-dimensional interpolation filter.
In Equation 1 above, S denotes an original image, P denotes a decoded reference image, and x and y denote horizontal and vertical positions of an image. Furthermore, ˜x (˜ is the symbol above x; the same hereinafter) is expressed by x+MVx−FilterOffset, wherein MVx denotes a horizontal component of a motion vector acquired in advance, and FilterOffset denotes an offset (a value obtained by dividing a horizontal filter length by 2) for adjustment. For the vertical direction, ˜y is expressed by y+MVy, wherein MVy denotes a vertical component of the motion vector. wci denotes a horizontal filter coefficient group ci (0≦ci<6) to be calculated.
A linear equation having a number equal to the filter coefficients calculated by Equation 1 above is acquired, so that a minimization process is independently performed for each decimal pixel position in the horizontal direction. Through this minimization process, three types of 6-tap filter coefficient groups are acquired, and decimal precision pixels a, b, and c are interpolated using the filter coefficients.
After the pixel interpolation in the horizontal direction is completed, an interpolation process in the vertical direction is performed as indicated in Step 2 of
In Equation 2 above, S denotes an original image, ̂ P (̂ is the symbol P with above) denotes an image subject to a horizontal interpolation process after decoding, and x and y denote horizontal and vertical positions of an image. Furthermore, ˜x is expressed by 4·(x+MVx), wherein MVx denotes a rounded horizontal component of a motion vector. For the vertical direction, ˜y is expressed by y+MVy−FilterOffset, wherein MVy denotes a vertical component of the motion vector and FilterOffset denotes an offset (a value obtained by dividing a filter length by 2) for adjustment. wcj denotes a vertical filter coefficient group cj (0≦cj<6) to be calculated.
A minimization process is independently performed for each decimal pixel position, so that 12 types of 6-tap filter coefficient groups are acquired. Using the filter coefficients, remaining decimal precision pixels are interpolated.
Thus, the total 90 (=6×15) filter coefficients need to be coded and transmitted to a decoder side. Particularly, for encoding with low resolution, since overhead is large, filter coefficients to be transmitted are reduced using symmetry of a filter. For example, in
In addition, since d and 1 are symmetrical to each other with respect to h, filter coefficients may also be inverted for use. That is, if six coefficients of d are transmitted, the value may also be applied to 1. c(d)1 is set to c(1)6, c(d)2 is set to c(1)5, c(d)3 is set to c(1)4, c(d)4 is set to c(1)3, c(d)5 is set to c(1)2, and c(d)6 is set to c(1)1. This symmetry is also available to e and m, f and n, and g and o. Even for a and c, the same logic is applicable. However, since a result in the horizontal direction has an influence on interpolation in the vertical direction, symmetry is not used and a and c are individually transmitted. As a result of using the symmetry, the number of filter coefficients to be transmitted in each frame is 51 (15 in the horizontal direction and 36 in the vertical direction).
So far, in the adaptive interpolation filter of Non-Patent Document 5, a unit of the minimization process of the prediction error energy is fixed in a frame. For one frame, 51 filter coefficients are decided. When a frame to be encoded is divided in two types (or a plurality of types) of large texture areas, optimal filter coefficients are coefficient groups in which the two textures (all the textures) are considered. In the state in which filter coefficients having characteristics only in the vertical direction are acquired in area A and filter coefficients having characteristics only in the horizontal direction are acquired in area B, filter coefficients are derived by averaging these.
Non-Patent Document 6 proposes a method in which one filter coefficient group (51 filter coefficients) is not limited to one frame, and a plurality of filter coefficient groups are prepared and switched according to local characteristics of an image, so that the prediction error energy is reduced and thus the coding efficiency is improved.
As illustrated in
In this regard, in Non-Patent Document 6, a method of using a plurality of filter coefficient groups optimized by region division for one frame is considered. As a region division scheme, Non-Patent Document 6 employs a motion vector (horizontal and vertical components, and directions) or a spatial coordinate (a macro block position, and coordinate x or coordinate y of a block), and region division is performed in consideration of various image characteristics.
In a video encoding apparatus 100, a region division unit 101 divides a frame to be encoded of an input video signal into a plurality of regions including a plurality of blocks that are set to units in which interpolation filter coefficients are adaptively switched. An interpolation filter coefficient switching unit 102 switches a set of interpolation filter coefficients of a decimal precision pixel, which is used in a reference image in predictive encoding, for each region divided by the region division unit 101. As a set of interpolation filter coefficients to be switched, for example, a set of filter coefficients optimized by a filter coefficient optimization section 1021 is used. The filter coefficient optimization section 1021 calculates a set of interpolation filter coefficients in which prediction error energy between an original image and an interpolated reference image is minimized.
A predictive signal generation unit 103 includes a reference image interpolation section 1031 and a motion detection section 1032. The reference image interpolation section 1031 applies an interpolation filter based on a set of interpolation filter coefficients, which is selected by the interpolation filter coefficient switching unit 102, to a decoded reference image stored in a reference image memory 107. The motion detection section 1032 performs motion search for an interpolated reference image, thereby calculating a motion vector. The predictive signal generation unit 103 generates a predictive signal through motion compensation based on a decimal precision motion vector calculated by the motion detection section 1032.
A predictive encoding unit 104 performs predictive encoding processes such as calculation of a residual signal between the input video signal and the predictive signal, orthogonal transformation of the residual signal, and quantization of the transformed coefficients. Furthermore, a decoding unit 106 decodes a result of the predictive encoding, and stores a decoded image in the reference image memory 107 for next predictive encoding.
A variable length encoding unit 105 performs variable length encoding for the quantized transform coefficients and the motion vector, performs variable length encoding for the interpolation filter coefficients, which are selected by the interpolation filter coefficient switching unit 102, for each region, and outputs them as an encoded bit stream.
In the video decoding apparatus 200, a variable length decoding unit 201 receives an encoded bit stream, and decodes quantized transform coefficients, a motion vector, an interpolation filter coefficient group and the like. A region determination unit 202 determines regions that are set to units in which an interpolation filter coefficient group is adaptively switched for a frame to be decoded. An interpolation filter coefficient switching unit 203 switches the interpolation filter coefficient group, which is decoded by the variable length decoding unit 201, for each region determined by the region determination unit 202.
A reference image interpolation section 2041 in a predictive signal generation unit 204 applies an interpolation filter based on the interpolation filter coefficients, which are received from the interpolation filter coefficient switching unit 203, to a decoded reference image stored in a reference image memory 206, and restores decimal precision pixels of the reference image. The predictive signal generation unit 204 generates a predictive signal of blocks to be decoded from the reference image for which the restoration of the decimal precision pixels has been performed.
A predictive decoding unit 205 performs inverse quantization, inverse orthogonal transform and the like for the quantized coefficients decoded by the variable length decoding unit 201, generates a decoded signal by adding a predictive residual signal calculated by this process to the predictive signal generated by the predictive signal generation unit 204, and outputs the decoded signal as a decoded image. Furthermore, the decoded image decoded by the predictive decoding unit 205 is stored in the reference image memory 206 for next predictive decoding.
The region division-type adaptive interpolation filter (Non-Patent Document 6) used by the video encoding apparatus 100 as illustrated in FIG. 14 switches a plurality of filter coefficient groups in a frame in consideration of local characteristics of an image, thereby reducing prediction error energy and thus improving the coding efficiency. However, in this apparatus, a region division scheme used in an initial frame is used for all frames. Since a video could have intra-frame characteristics changed in the time direction (for example, scene change and the like), if it is possible to change a division scheme in units of frames, the coding efficiency is anticipated to be further improved.
In order to solve these problems, it is an object of the present invention to select an optimal region division scheme in units of frames or slices with respect to an image in which optimal values of interpolation filter coefficients are changed in time and space, thereby further reducing residual energy of motion-compensated inter-frame prediction and thus improving the coding efficiency.
According to a method for achieving the object, a plurality of region division schemes are prepared, a rate distortion cost is calculated for each scheme, a region division scheme, in which the cost is minimized, is selected, and information indicating the region division scheme is transmitted as a flag. The plurality of region division schemes are switched in units of frames, so that prediction error energy is reduced and thus the coding efficiency is improved.
That is, the present invention is a video encoding method using motion compensation in which a plurality of region division schemes for dividing a frame (or a slice) to be encoded are prepared, one region division scheme is sequentially selected from among the plurality of region division schemes, encoding information (information acquired after decoding or during the decoding) is detected from the frame to be encoded, region division is performed in the frame based on the detected encoding information, an interpolation filter of a decimal precision pixel is selected according to a result of the division, encoding is performed by interpolating a decimal precision pixel using the selected interpolation filter, a cost for the selected region division scheme is calculated and stored, the best region division scheme is selected based the stored cost, a region division mode number indicating the region division scheme is encoded, and encoding is performed using the best region division scheme.
Furthermore, the present invention is a video decoding method for decoding an encoded stream encoded using the video encoding method, in which the region division mode number is decoded, the interpolation filter coefficients of a decimal precision pixel are decoded, classification is performed in units of blocks using information acquired from a block to be decoded, region division is performed according to a result of the classification, and decoding is performed by switching the interpolation filter of a decimal precision pixel for each divided region.
The operation of the present invention is as follows. In the related region division-type adaptive interpolation filter, only one type of region division scheme is applied to one type of video and there is a limitation in improving the coding efficiency when there are significant spatiotemporal differences in characteristics of entire video. Meanwhile, in the present invention, a set of interpolation filter coefficients are spatiotemporally optimized, so that flexible treatment to locality of an image is possible and the coding efficiency can be further improved.
As described above, according to the present invention, it is possible to select an optimal region division scheme in units of one or a plurality of frames or slices and to switch a set of interpolation filter coefficients in consideration of spatiotemporal locality of an image, which is not treated by the related separable adaptive interpolation filter. Consequently, it is possible to improve the coding efficiency through reduction of prediction error energy.
Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings. In addition, as an example, a method for dividing a region in units of frames is described. However, the region may be divided in units of slices. Furthermore, region division may be decided in a plurality of frames such as two or three frames.
[Video Encoding Apparatus]
In the video encoding apparatus 10, a region division unit 11 divides a frame to be encoded of an input video signal into a plurality of regions including a plurality of blocks that are set to units in which interpolation filter coefficients are adaptively switched. In the division of the region, a plurality of region division modes are prepared, and respective regions are divided according to one region division mode sequentially selected from the plurality of region division modes.
An interpolation filter coefficient switching unit 12 switches a set of interpolation filter coefficients of a decimal precision pixel, which is used for a reference image in predictive encoding, for each region divided by the region division unit 11. As interpolation filter coefficients to be switched, optimized interpolation filter coefficients, in which prediction error energy of an original image and an interpolated reference image is minimized, is used for each region divided by the region division unit 11.
A predictive signal generation unit 13 includes a reference image interpolation section 131 and a motion detection section 132. The reference image interpolation section 131 applies an interpolation filter based on interpolation filter coefficients, which are selected by the interpolation filter coefficient switching unit 12, to a decoded reference image stored in a reference image memory 18. The motion detection section 132 performs motion search for the interpolated reference image, thereby calculating a motion vector. The predictive signal generation unit 13 generates a predictive signal through motion compensation based on a decimal precision motion vector calculated by the motion detection section 132.
A predictive encoding unit 14 performs predictive encoding processes such as calculation of a residual signal between the input video signal and the predictive signal, orthogonal transformation of the residual signal, and quantization of the transformed coefficients.
A region division mode determination unit 15 stores a rate distortion (RD) cost of a result encoded by the predictive encoding unit 14 for each region division mode selected by the region division unit 11, and selects a region division mode in which the rate distortion cost is minimized.
A variable length encoding unit 16 performs variable length encoding for the region division mode (for example, a mode number) selected by the region division mode determination unit 15. Furthermore, the variable length encoding unit 16 performs variable length encoding for the interpolation filter coefficients selected by the interpolation filter coefficient switching unit 12 for each region. Moreover, the variable length encoding unit 16 performs variable length encoding for quantized transform coefficients, which is output by the predictive encoding unit 14 in a finally selected region division mode, and a motion vector output by the motion detection section 132. The variable length encoding unit 16 outputs information on the encoding as an encoded bit stream.
A decoding unit 17 decodes a result of the predictive encoding by the predictive encoding unit 14, and stores a decoded signal in the reference image memory 18 for next predictive encoding.
[Process Flow of Video Encoding Apparatus]
First, in step S101, a frame to be encoded is input. Next, in step S102, the input frame is divided into blocks (for example, a block size of the related motion estimation such as 16×16 or 8×8), and an optimal motion vector is calculated by the motion detection section 132 in units of blocks. In interpolation of decimal precision pixels of a reference image in step S102, the fixed 6-tap filter based on the conventional H.264/AVC is used.
Next, in step S103, the region division unit 11 sequentially selects one region division mode from among a plurality of prepared region division modes, and repeats the process up to step S110 with respect to the selected region division mode. Details of an example of the region division mode will be described later with reference to
In step S104, the region division unit 11 performs region division according to the region division mode selected in step S103.
In steps S105 to S108, from a result of the region division of step S104, an optimization process is performed for each region. First, in step S105, using Equation 3 below, which is a prediction error energy function, an optimization process of interpolation filter coefficients is performed for each decimal precision pixel in the horizontal direction.
In Equation 3 above, αm,n denotes each region, m denotes a region division mode number, n denotes a region number in a specific region division mode, S denotes an original image, P denotes a decoded reference image, and x and y denote horizontal and vertical positions of an image. Furthermore, ˜x (˜ is the symbol above x) is expressed by x+MVx−FilterOffset, wherein MVx denotes a horizontal component of a motion vector acquired in advance, and FilterOffset denotes an offset (a value obtained by dividing a horizontal filter length by 2) for adjustment. For the vertical direction, ˜y is expressed by y+MVy wherein MVy denotes a vertical component of the motion vector. wci denotes a horizontal filter coefficient group c, (0≦ci<6) to be calculated.
Next, in step S106, using the horizontal interpolation filter coefficients acquired in step S105, decimal pixel interpolation (interpolation of a, b, and c in
In step S107, an optimization process of interpolation filter coefficients in the vertical direction is performed. Using Equation 4 below, which is a prediction error energy function in the vertical direction, an optimization process of interpolation filter coefficients is performed for each decimal precision pixel in the vertical direction.
In Equation 4 above, αm,n denotes each region, m denotes a region division mode number, n denotes a region number in a specific region division mode, S denotes an original image, ̂P (̂ is the symbol P with above) denotes an image interpolated in the horizontal direction in step S105, and x and y denote horizontal and vertical positions of an image. Furthermore, ˜x is expressed by 4·(x+MVx), wherein MVx denotes a rounded horizontal component of a motion vector. For the vertical direction, ˜y is expressed by y+MVy−FilterOffset, wherein MVy denotes a vertical component of the motion vector and FilterOffset denotes an offset (a value obtained by dividing a filter length by 2) for adjustment. wcj denotes a horizontal filter coefficient group cj (0≦cj<6) to be calculated.
In step S108, using the vertical interpolation filter coefficients acquired in step S107, decimal pixel interpolation (interpolation of d to o in
Next, in step S109, using the vertically interpolated image in step S108 as a reference image, a motion vector is calculated again.
In step S110, a rate distortion cost (an RD cost) for the region division mode selected in step S103 is calculated and stored. The process from step S103 to step S110 is performed for all the prepared region division modes.
Next, in step S111, the region division mode determination unit 15 decides an optimal region division mode in which the rate distortion cost is minimized, among the plurality of the prepared region division modes.
Next, in step S112, the variable length encoding unit 16 encodes the optimal region division mode decided in step S111. Furthermore, in step S113, the variable length encoding unit 16 encodes the interpolation filter coefficients in the region division mode decoded in step S112. Moreover, in step S114, residual information (a motion vector, a DCT coefficient and the like) to be encoded is encoded in the region division mode decided in step S111.
[Region Division Mode]
Next, an example of the region division mode used in the present embodiment will be described.
In the example illustrated in
[Mode Number is 0]
Mode number 0 indicates the case in which a region in the frame is not divided and the related adaptive interpolation filter (AIF) is used.
[Mode Numbers are 1 and 2]
Mode number 1 indicates a mode in which a region is divided while focusing on an x component (MVx) of a motion vector, and the region is divided as a first region (region 1) if MVx is between the threshold values Thx1 and Thx2, and is divided as a second region (region 2) if MVx is outside the range of the threshold values Thx1 and Thx2.
Mode number 2 indicates a mode in which a region is divided while focusing on a y component (MVy) of the motion vector, and a first region (region 1) is acquired if MVy is between the threshold values Thy1 and Thy2, and is divided as a second region (region 2) if MVy is outside the range of the threshold values Thy1 and Thy2.
The calculation of the threshold value in step S203 will be described using the case in which the mode number is 1 in
When the mode number 1 or the mode number 2 is selected, a threshold value is encoded and is transmitted to the video decoding apparatus similarly to the interpolation filter coefficients.
[Mode Numbers are 3, 4, and 5]
Mode numbers 3, 4, and 5 indicate a mode in which a region is divided while focusing on the direction of a motion vector.
In the case of a division mode in which the mode number is 3, as illustrated in
In the case of a division mode in which the mode number is 4, as illustrated in
In the case of a division mode in which the mode number is 5, as illustrated in
[Mode Numbers are 6 and 7]
Mode numbers 6 and 7 indicate a mode in which a region is divided while focusing on a spatial coordinate.
A division mode in which the mode number is 6 is a mode in which a frame is divided into the two right and left regions, and is a mode in which a first region (region 1) is acquired when the spatial coordinate x of the block is equal to or less than Fx/2 that means half of a horizontal width of the frame, and a second region (region 2) is acquired when the spatial coordinate x of the block is larger than Fx/2 that means half of the horizontal width, as illustrated in
A division mode in which the mode number is 7 is a mode in which a frame is divided into the two upper and lower regions, and is a mode in which a first region (region 1) is acquired when the spatial coordinate y of the block is equal to or less than Fy/2 that means half of a vertical width of the frame, and a second region (region 2) is acquired when the spatial coordinate y of the block is larger than Fy/2 that means of the vertical width, as illustrated in
The above is an example of the region division mode when the number of regions is 2. However, modes in which the number of regions is not 2 may be mixed to the region division mode. The following is an example of the region division mode when the number of regions is 4.
[Example when the Number of Regions is 4]
In this division mode, as illustrated in
[Video Decoding Apparatus]
In the video decoding apparatus 20, a variable length decoding unit 21 receives the encoded bit stream, and decodes quantized transform coefficients, a motion vector, an interpolation filter coefficient group and the like. Particularly, a region division mode decoding section 211 decodes a mode number indicating the region division scheme encoded by the video encoding apparatus 10. Depending on the mode number, additional information (that is, a threshold value of a motion vector or a threshold value of a spatial coordinate), other than the mode number, is also decoded.
A region determination unit 22 determines regions that are set to units, in which interpolation filter coefficients are adaptively switched, for a frame to be decoded from the motion vector or the spatial coordinate of a block according to the region division mode indicated by the mode number decoded by the region division mode decoding section 211. An interpolation filter coefficient switching unit 23 switches the interpolation filter coefficients, which is decoded by the variable length decoding unit 21, for each region determined by the region determination unit 22.
A reference image interpolation section 241 in a predictive signal generation unit 24 applies an interpolation filter based on the interpolation filter coefficients, which are received from the interpolation filter coefficient switching unit 23, to a decoded reference image stored in a reference image memory 26, and restores decimal precision pixels of the reference image. The predictive signal generation unit 24 generates a predictive signal of blocks to be decoded from the reference image for which the restoration of the decimal precision pixels has been performed.
A predictive decoding unit 25 performs inverse quantization, inverse orthogonal transform and the like for the quantized coefficients decoded by the variable length decoding unit 21, generates a decoded signal by adding a predictive residual signal calculated by this process to the predictive signal generated by the predictive signal generation unit 24, and outputs the decoded signal as a decoded image. The decoded signal decoded by the predictive decoding unit 25 is stored in the reference image memory 26 for next predictive encoding.
[Process Flow of Video Decoding Apparatus]
First, in step S601, the variable length decoding unit 21 acquires frame head information from an input bit stream. Next, in step S602, the variable length decoding unit 21 decodes a region division mode (a mode number) required for determination to switch interpolation filter coefficients in a frame. Additional information required in response to the mode number is also decoded in step S602. Next, in step S603, the variable length decoding unit 21 decodes various interpolation filter coefficients required for interpolation of decimal precision pixels of a reference image, and acquires an interpolation filter coefficient group for each region. In step S604, the variable length decoding unit 21 decodes various types of encoding information of a motion vector (MV) and the like.
Next, in step S605, the region determination unit 22 determines a region in units of blocks according to definition of the region division mode acquired in step S602, and acquires a region number.
Next, in step S606, the interpolation filter coefficient switching unit 23 selects a set of optimal interpolation filter coefficients from among the interpolation filter coefficient group acquired in step S603 from the region number acquired in step S605, and notifies the reference image interpolation section 241 of the optimal interpolation filter coefficients. The reference image interpolation section 241 restores decimal precision pixels of a reference image using an interpolation filter based on the notified interpolation filter coefficients. After restoring the decimal precision pixels, the predictive signal generation unit 24 generates a predictive signal of a block to be decoded using the motion vector decoded in step S604.
In step S607, the variable length decoding unit 21 decodes a predictive residual signal of the block to be decoded from the input bit stream.
Next, in step S608, the predictive decoding unit 25 generates a decoded signal by adding the predictive signal acquired in step S606 to the predictive residual signal acquired in step S607. The generated decoded signal is output as a decoded image and is stored in the reference image memory 26.
Steps S601 to S608 are repeated until decoding of all frames is completed, and when the decoding of all frames is completed, the procedure is completed (step S609).
The aforementioned video encoding and decoding processes may also be realized by a computer and a software program, and the program may also be recorded on a computer-readable recording medium through a network.
While the embodiments of the present invention have been described above with reference to the accompanying drawings, detailed configurations are not limited to the embodiments, and designs (addition, omission, replacement, and other modifications of the configuration) without departing from the scope and spirit of the present invention are also included. The present invention is not limited by the above description, and is limited only by the appended claims.
The present invention can be applied to video encoding and decoding methods, and video encoding and decoding apparatuses having a function of changing a set of interpolation filter coefficients within a frame, and can select an optimal region division scheme in units of frames or slices, and can switch interpolation filter coefficients in consideration of spatiotemporal locality of an image. Consequently, it is possible to improve the coding efficiency through reduction of prediction error energy.
Number | Date | Country | Kind |
---|---|---|---|
2010-180814 | Aug 2010 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2011/067963 | 8/5/2011 | WO | 00 | 2/7/2013 |