1. Field of the Invention
The present invention relates to an image encoding device for and an image encoding method of compression-encoding and transmitting an image, and an image decoding device for and an image decoding method of decoding encoded data transmitted by the image encoding device to reconstruct an image.
2. Background of the Invention
Conventionally, in accordance with international standard video encoding methods, such as MPEG and ITU-T H.26×, after an input video frame is divided into macro blocks each of which is a 16×16 pixel block and a motion-compensated prediction is performed on each macro block, information compression is carried out by performing orthogonal transformation and quantization on a prediction error signal in units of a block.
A problem is, however, that as the compression ratio becomes high, the compression efficiency is reduced resulting from degradation in the quality of a prediction reference image which is used when carrying out the motion-compensated prediction.
To solve this problem, in accordance with an encoding method, such as MPEG-4 AVC/H.264 (refer to nonpatent reference 1), a block distortion which occurs in a prediction reference image with quantization of orthogonal transformation coefficients is tried to be removed by performing a blocking filter process within a loop.
In this image encoding device, when receiving an image signal which is a target to be encoded, a block dividing unit 101 divides the image signal into macro blocks, and outputs an image signal in units of a macro block to a predicting unit 102 as a split image signal.
When receiving the split image signal from the block dividing unit 101, the predicting unit 102 calculates a prediction error signal by predicting an image signal of each color component in each macro block within the frame or between frames.
Particularly, when carrying out a motion-compensated prediction between frames, the predicting unit searches for a motion vector in units of either a macro block itself or each of subblocks into which each macro block is more finely divided.
The predicting unit then performs a motion-compensated prediction on a reference image signal stored in a memory 107 by using the motion vector to generate a motion-compensated prediction image, and determines the difference between a prediction signal showing the motion-compensated prediction image and the split image signal to calculate a prediction error signal.
The predicting unit 102 also outputs parameters for prediction signal generation which the predicting unit has determined when acquiring the prediction signal to a variable length encoding unit 108.
For example, the parameters for prediction signal generation include pieces of information such as an intra prediction mode showing how to perform a space prediction within each frame, and a motion vector showing an amount of motion between frames.
When receiving the prediction error signal from the predicting unit 102, a compressing unit 103 quantizes the prediction error signal to acquire compressed data after performing a DCT (discrete cosine transform) process on the prediction error signal to remove a signal correlation from this prediction error signal.
When receiving the compressed data from the compressing unit 103, a local decoding unit 104 carries out inverse quantization of the compressed data and then performs an inverse DCT process on the compressed data inverse-quantized thereby to calculate a prediction error signal corresponding to the prediction error signal outputted from the predicting unit 102.
When receiving the prediction error signal from the local decoding unit 104, an adder 105 adds the prediction error signal and the prediction signal outputted from the predicting unit 102 to generate a local decoded image.
A loop filter 106 removes a block distortion superimposed onto the local decoded image signal showing the local decoded image generated by the adder 105, and stores the distortion-removed local decoded image signal in the memory 107 as the reference image signal.
When receiving the compressed data from the compressing unit 103, the variable length encoding unit 108 entropy-encodes the compressed data to output a bit stream which is the encoded result.
When outputting the bit stream, the variable length encoding unit 108 multiplexes the parameters for prediction signal generation outputted from the predicting unit 102 into the bit stream and outputs this bitstream.
In accordance with the method disclosed in nonpatent reference 1, the loop filter 106 determines the smoothing intensity according to information including the quantization resolution, the encoding mode, the variation degree of motion vector, etc. for pixels in the vicinity of a block border of DCT to provide a reduction in the distortion occurring in the block border.
As a result, the quality of the reference image signal can be improved, and the efficiency of the motion-compensated prediction in subsequent encoding processes can be improved.
In contrast, a problem with the method disclosed in nonpatent reference 1 is that higher frequency components of the signal are lost with increase in the compression rate at which the signal is encoded, and therefore the entire screen is smoothed too much and the encoded video becomes blurred.
In order to solve this problem, nonpatent reference 2 discloses a technique of applying a Wiener filter as the loop filter 106, and forming this loop filter 106 in such a way that a squared error distortion between an image signal to be encoded, which is an original image signal, and a reference image signal corresponding to this image signal is minimized.
In
More specifically, the signal s′ is the one in which an encoding distortion (noise) e is superimposed onto the signal s.
A Wiener filter is defined as a filter which is applied to the signal s′ in such a way as to minimize this encoding distortion (noise) e with a squared error distortion criterion. Typically, filter coefficients w can be determined by using the following equation (1) from both an autocorrelation matrix Rs′s′ of the signal s′, and a cross correlation matrix Rss′ between the signals s and s′. The size of the matrices Rs′s′ and Rss′ corresponds to the number of taps of the determined filter.
w=R
s′s′
−1
·R
ss′ (1)
By applying the Wiener filter having the filter coefficients w, a signal s hat whose quality has been improved (“̂” attached to an alphabetical letter is referred to as hat because this application is an electronic patent application in Japan) is acquired as a signal corresponding to the reference image signal. The image encoding device disclosed in nonpatent reference 2 determines the filter coefficients w in each of two or more different numbers of taps for the whole of each frame of the image which is the target to be encoded, and, after deter mining the filter having a number of taps which optimizes the amount of code of the filter coefficients w and the distortion (e′=s hat−s), which is calculated after the filtering process is implemented, with the rate distortion criterion, further divides the signal s′ into a plurality of blocks each having a certain size, selects whether or not to apply the Wiener filter having the optimal number of taps which is determined above to each block, and transmits information about filter ON/OFF for each block.
As a result, the additional amount of code required to perform the Wiener filter process can be reduced, and the quality of the prediction image can be improved.
Because the conventional image encoding device is constructed as above, a single Wiener filter is designed for the whole of the frame which is the target to be encoded, information showing whether or not to apply the Wiener filter process is applied to each of the blocks which construct each frame. A problem is, however, that because the same Wiener filter is applied to any block of each frame, there is a case that the Wiener filter is not always an optimal filter to each block and the image quality cannot be improved sufficiently.
The present invention is made in order to solve the above-mentioned problem, and it is therefore an object of the present invention to provide an image encoding device, an image decoding device, an image encoding method, and an image decoding method which can improve the improvement accuracy of the image quality.
In accordance with an embodiment of the present invention, there is provided an image decoding device for generating a decoded image. The decoding device includes: a variable length decoder to variable-length-decode an inputted encoded bit stream to obtain a parameter for prediction signal generation, a compressed difference image, and filters; and a filtering processor to carry out a filtering process on a decoded image acquired by adding a prediction image and a difference image, wherein the prediction image is generated by using the parameter for prediction signal generation, and the difference image is acquired by decoding the compressed difference image, wherein the variable length decoder variable-length-decodes class identification information for a block in the decoded image, and wherein the filtering processor refers to the class identification information for the block to determine a class for the block, and carries out a filtering process on the block based on the determined class and the filters.
In accordance with another embodiment of the present invention, there is provided an image decoding method for generating a decoded image, the decoded image being generated by adding a prediction image and a difference image comprising, the steps of: variable-length-decoding an inputted encoded bit stream to obtain a parameter for prediction signal generation, a compressed difference image, filters and class identification information for a block in the decoded image, the parameter being used for generating the prediction image, the compressed difference image being used for generating the difference image; determining a class for the block according to the class identification information; and carrying out a filtering process on the block based on the determined class and the filters.
Because the filtering processor in accordance with the present invention includes the region classifying unit for extracting a feature quantity of each of the regions which construct a local decoded image acquired by the local decoding unit to classify each of the regions into a class to which the region belongs according to the feature quantity, and the filter designing and processing unit for, for each class to which one or more regions, among the regions which construct the local decoded image, belong, generating a filter which minimizes an error occurring between an inputted image and the local decoded image in each of the one or more regions belonging to the class to compensate for a distortion superimposed onto the one or more regions by using the filter, there is provided an advantage of being able to improve the improvement accuracy of the image quality.
Hereafter, in order to explain this invention in greater detail, the preferred embodiments of the present invention will be described with reference to the accompanying drawings.
When receiving the split image signal from the block dividing unit 1, the predicting unit 2 performs a predicting process on the split image signal within the frame or between frames to generate a prediction signal.
Particularly, when carrying out a motion-compensated prediction between frames, the predicting unit detects a motion vector in units of a macro block or each of subblocks into which a macro block is more finely divided from both the split image signal and a reference image signal showing a reference image stored in a memory 7 to generate a prediction signal showing a prediction image from both the motion vector and the reference image signal.
After generating the prediction signal, the predicting unit then carries out a process of calculating a prediction error signal which is the difference between the split image signal and the prediction signal.
Furthermore, when generating the prediction signal, the predicting unit 2 determines parameters for prediction signal generation, and outputs the parameters for prediction signal generation to a variable length encoding part 8.
For example, the parameters for prediction signal generation include pieces of information such as an intra prediction mode showing how to perform a spatial prediction within the frame, and a motion vector showing an amount of motion between frames.
A prediction processing unit is comprised of the block dividing unit 1 and the predicting unit 2.
A compressing unit 3 carries out a process of carrying out a DCT (discrete cosine transform) process on the prediction error signal calculated by the predicting unit 2 to calculate DCT coefficients while quantizing the DCT coefficients to output compressed data which are the DCT coefficients quantized thereby to a local decoding part 4 and the variable length encoding part 8. The compressing unit 3 constructs a difference image compression unit.
The local decoding part 4 carries out a process of carrying out inverse quantization of the compressed data outputted from the compressing unit 3 and performing an inverse DCT process on the compressed data inverse-quantized thereby to calculate a prediction error signal corresponding to the prediction error signal outputted from the predicting unit 2.
An adder 5 carries out a process of adding the prediction error signal calculated by the local decoding part 4 and the prediction signal generated by the predicting unit 2 to generate a local decoded image signal showing a local decoded image.
A local decoding unit is comprised of the local decoding part 4 and the adder 5.
A loop filter 6 carries out a process of performing a filtering process of compensating for a distortion superimposed onto the local decoded image signal generated by the adder 5 to output the local decoded image signal filtered thereby to the memory 7 as the reference image signal while outputting information about the filter which the loop filter uses when carrying out the filtering process to the variable length encoding part 8. The loop filter 6 constructs a filtering unit.
The memory 7 is a recording medium for storing the reference image signal outputted from the loop filter 6.
The variable length encoding part 8 carries out a process of entropy-encoding the compressed data outputted from the compressing unit 3, the filter information outputted from the loop filter 6, and the parameters for prediction signal generation outputted from the predicting unit 2 to generate a bit stream showing these encoded results. The variable length encoding part 8 constructs a variable length encoding unit.
In
A region classifying unit 12 carries out a process of extracting a feature quantity of each of the of regions which construct a local decoded image shown by one frame of the local decoded image signal stored in the frame memory 11 to classify each of the regions into the class to which the region belongs according to the feature quantity.
A filter designing and processing unit 13 carries out a process of generating, for each class to which one or more regions included in the regions which construct the local decoded image belongs, a Wiener filter which minimizes an error occurring between the image signal which is the target to be encoded and the local decoded image signal in each of the one or more regions which belong to the class, and using the Wiener filter to compensate for the distortion superimposed onto the region.
The filter designing and processing unit 13 also carries out a process of outputting filter information about the Wiener filter to the variable length encoding part 8.
Next, the operation of the image encoding device will be explained.
When receiving an image signal which is a target to be encoded, the block dividing unit 1 divides the image signal into macro blocks, and output an image signal in units of a macro block to the predicting unit 2 as a split image signal.
When receiving the split image signal from the block dividing unit 1, the predicting unit 2 detects parameters for prediction signal generation which the predicting unit uses to perform a predicting process on the split image signal within the frame or between frames. Then, the predicting unit generates a prediction signal showing a prediction image using the parameters for prediction signal generation.
Particularly, the predicting unit detects a motion vector which is a parameter for prediction signal generation used for performing a predicting process between frames from the split image signal and the reference image signal stored in the memory 7.
After detecting the motion vector, the predicting unit 2 then generates the prediction signal by performing a motion-compensated prediction on the reference image signal by using the motion vector.
After generating the prediction signal showing the prediction image, the predicting unit 2 calculates a prediction error signal which is the difference between the prediction signal and the split image signal, and outputs the prediction error signal to the compressing unit 3.
When generating the prediction signal, the predicting unit 2 also determines the parameters for prediction signal generation and outputs the parameters for prediction signal generation to the variable length encoding part 8.
For example, the parameters for prediction signal generation include pieces of information such as an intra prediction mode showing how to perform a spatial prediction within the frame, and a motion vector showing an amount of motion between frames.
When receiving the prediction error signal from the predicting unit 2, the compressing unit 3 calculates DCT coefficients by performing a DCT (discrete cosine transform) process on the prediction error signal, and then quantizes the DCT coefficients.
The compressing unit 3 then outputs compressed data which are the DCT coefficients quantized thereby to the local decoding part 4 and the variable length encoding part 8.
When receiving the compressed data from the compressing unit 3, the local decoding part 4 carries out inverse quantization of the compressed data and then carries an inverse DCT process on the compressed data inverse-quantized thereby to calculate a prediction error signal corresponding to the prediction error signal outputted from the predicting unit 2.
After the local decoding part 4 calculates the prediction error signal, the adder 5 adds the prediction error signal and the prediction signal generated by the predicting unit 2 to generate a local decoded image signal showing a local decoded image.
After the adder 5 generates the local decoded image signal, the loop filter 6 carries out a filtering process of compensating for the distortion superimposed onto the local decoded image signal, and stores the local decoded image signal filtered thereby in the memory 7 as the reference image signal.
The loop filter 6 also outputs information about the filter which the loop filter uses when carrying out the filtering process to the variable length encoding part 8.
The variable length encoding part 8 carries out the process of entropy-encoding the compressed data outputted from the compressing unit 3, the filter information outputted from the loop filter 6, and the parameters for prediction signal generation outputted from the predicting unit 2 to generate a bit stream showing these encoded results.
At this time, although the variable length encoding unit also entropy-encodes the parameters for prediction signal generation, the image encoding device can alternatively multiplex the parameters for prediction signal generation into the bit stream, which the image encoding device generates, and output this bit stream without entropy-encoding the parameters for prediction signal generation.
Hereafter, the process performed by the loop filter 6 will be explained concretely.
First, the frame memory 11 of the loop filter 6 stores only one frame of the local decoded image signal generated by the adder 5.
The region classifying unit 12 extracts a feature quantity of each of the regions which construct the local decoded image shown by the single frame of the local decoded image signal stored in the frame memory 11, and classifies each of the regions into the class to which the region belongs according to the feature quantity (step ST1).
For example, for each region (each block having an arbitrary size (M×M pixels)), the region classifying unit extracts a variance of the local decoded image signal, the DCT coefficients, the motion vector, the quantization parameter of the DCT coefficients, or the like in the region as the feature quantity, and carries out the class classification on the basis of these pieces of information. In this case, M is an integer equal to or larger than 1.
For example, when the variance of the local decoded image signal in the region is used as the feature quantity in a case in which each of the regions is classified to one of class 1 to class N (N is an integer equal to or larger than 1), (N−1) thresholds are prepared beforehand and the variance of the local decoded image signal is compared with each of the (N−1) thresholds (th1<th2< . . . <thN-1), and the class to which the region belongs is identified.
For example, when the variance of the local decoded image signal is equal to or larger than thN-3 and is smaller than thN-2, the region is classified to the class N−2. Furthermore, when the variance of the local decoded image signal is equal to or larger than th2 and is smaller than th3, the region is classified to the class 3.
In this case, although the example in which the (N−1) thresholds are prepared beforehand is shown, these thresholds can be changed dynamically for each sequence or each frame.
For example, when using the motion vector in the region as the feature quantity, the region classifying unit calculates a mean vector which is the mean of motion vectors or a median vector which is the median value of motion vectors, and identifies the class to which the region belongs according to the magnitude or direction of the vector.
In this case, the mean vector has components (x and y components) each of which is the mean value of the corresponding components of the motion vectors.
In contrast, the median vector has components (x and y components) each of which is the median value of the corresponding components of the motion vectors.
When the region classifying unit 12 classifies each of the regions into one of the classes 1 to N, the filter designing and processing unit 13 generates, for each class to which one or more regions included in the regions which construct the local decoded image belongs, a Wiener filter which minimizes an error occurring between the image signal which is the target to be encoded and the local decoded image signal in each of the one or more regions which belong to the class (steps ST2 to ST8).
For example, in a case in which the local decoded image consists of four regions (a region A, a region B, a region C, and a region D) as shown in
The filter designing and processing unit further generates a Wiener filter which minimizes the error occurring between the image signal which is the target to be encoded and the local decoded image signal in the region B belonging to the class 5, and also generates a Wiener filter which minimizes the error occurring between the image signal which is the target to be encoded and the local decoded image signal in the region D belonging to the class 6.
For example, in a case of designing a filter with a variable number of taps when generating a Wiener filter which minimizes the error, the filter designing and processing unit 13 calculates a cost as will be shown below for each different number of taps, and then determines the number of taps and the coefficient values of the filter which minimize the cost.
Cost=D+λ·R (2)
where D is the sum of squared errors between the image signal which is the target to be encoded in the region to which the target filter is applied, and the local decoded image signal filtered, λ is a constant, and R is the amount of codes which are generated in the loop filter 6.
Although in this case the cost is given by the equation (2), this case is only an example. For example, only the sum of squared errors D can be defined as the cost.
Furthermore, another evaluated value such as the sum of absolute error values can be used instead of the sum of squared errors D.
After generating a Wiener filter for each class to which one or more regions belong, the filter designing and processing unit 13 determines whether or not each of the blocks which construct the local decoded image (e.g. each of local regions which is smaller than each of the regions A to D which constructs the local decoded image) is a block on which the filter designing and processing unit should perform the filtering process (steps ST9 to ST16).
More specifically, for each of the blocks which construct the local decoded image, the filter designing and processing unit 13 compares errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block between before and after the filtering process.
For example, in a case in which the local decoded image consists of 16 blocks (K) (K=1, 2, . . . , and 16), as shown in
A block 1, a block 2, a block 5, and a block 6 shown in
Although the filter designing and processing unit compares the sum of squared errors between before and after the filtering process, the filter designing and processing unit can alternatively compare either the cost (D+λ·R) shown by the equation (2) or the sum of absolute error values between before and after the filtering process.
When the sum of squared errors acquired after the filtering process is smaller than the sum of squared errors acquired before the filtering process, the filter designing and processing unit 13 determines that the block (K) is a block which is a target for filtering.
In contrast, when the sum of squared errors acquired after the filtering process is equal to or larger than the sum of squared errors acquired before the filtering process, the filter designing and processing unit determines that the block (K) is a block which is not a target for filtering.
The filter designing and processing unit 13 then calculates the cost at the time when performing the filtering process which causes the cost to become a minimum in the steps ST1 to ST16 and the cost at the time when not performing the filtering process on the whole of the frame currently processed to determine whether or not to perform the filtering process on the whole of the frame currently processed (steps ST17 to ST18).
When, in step ST18, determining to perform the filtering process on the whole of the frame, the filter designing and processing unit sets a flag (frame_filter_on_off_flag) to 1 (ON), and then performs the filtering process which causes the cost to become a minimum in the steps ST1 to ST16 and outputs the local decoded image signal on which the filter designing and processing unit has performed the filtering process to the memory 7 as the reference image signal (steps ST19 to ST20).
For example, when the region including the block (K) is the region B and the class to which the region B belongs is the class 5, the filter designing and processing unit performs the filtering process on the block (K) by using the Wiener filter of the class 5, and outputs the local decoded image signal on which the filter designing and processing unit has performed the filtering process to the memory 7 as the reference image signal.
At this time, when, in steps ST1 to ST16, determining that the cost is minimized when the process of selecting whether or not to carry out the filtering process for each block is carried out (at the time of the flag (block_filter_on_off_flag)=1 (ON)), the filter designing and processing unit outputs the yet-to-be-filtered local decoded image signal for the block (K) on which the filter designing and processing unit has determined not to perform the filtering process to the memory 7 as the reference image signal, just as it is, without performing the filtering process on the block (K). In contrast, when, in steps ST1 to ST16, determining that the cost is minimized when the process of selecting whether or not to carry out the filtering process for each block is not carried out (at the time of the flag (block_filter_on_off_flag)=0 (OFF)), the filter designing and processing unit performs the filtering process on each of all the local decoded image signals in the frame by using the Wiener filter of the class into which the region to which the local decoded image signal belongs is classified, and outputs the local decoded image signal on which the filter designing and processing unit has performed the filtering process to the memory 7 as the reference image signal.
In contrast, when, in step ST18, determining not to perform the filtering process on the whole of the frame, the filter designing and processing unit sets the flag (frame_filter_on_off_flag) to 0 (OFF), and outputs the yet-to-be-filtered local decoded image signal to the memory 7 as the reference image signal, just as it is (steps ST21 to ST22).
In steps ST2 to ST22 in the flow chart, “min_cost” is a variable for storing the minimum cost, “i” is an index of the number of filter taps tap[i] and a loop counter, and “j” is an index of the block size bl_size[j] and a loop counter.
Furthermore, “min_tap_idx” is an index (i) of the number of filter taps at the time when the cost is minimized, “min_bl_size_idx” is an index (j) of the block size at the time when the cost is minimized, and “MAX” is an initial value of the minimum cost (a sufficiently large value).
A sequence in which N1 (N1>=1) different numbers of filter taps, which are determined beforehand and each of which can be selected, are stored.
A sequence in which N2 (N2>=1) different block sizes (bl_size[j]×bl_size[j] pixels), which are determined beforehand and each of which can be selected, are stored.
The flag showing whether or not to carry out the process of selecting whether or not to carry out the filtering process for each block in the frame currently processed.
The flag showing whether or not to carry out the filtering process for the frame currently processed.
Step ST2 is the step of setting up initial values, and steps ST3 to ST8 are a loop for carrying out the process of selecting the number of filter taps.
Furthermore, step ST9 is a step of setting up initial values, and steps ST10 to ST16 are a loop for carrying out the process of selecting the block size and the process of determining whether or not to carry out the filtering process for each block having the selected block size.
In addition, steps ST17 to ST18 are the steps of determining whether or not to perform the filtering process on the whole of the frame currently processed, steps ST19 to ST20 are the steps of carrying out the optimal filtering process which is determined in steps ST1 to ST16 with frame_filter_on_off_flag=1 (ON), and steps ST21 to ST22 are the steps of setting frame_filter_on_off_flag to 0 (OFF) and not carrying out the filtering process for the frame currently processed.
After generating the Wiener filter and then carrying out the filtering process in the above-mentioned way, the filter design and the processing unit 13 outputs the filter information about the Wiener filter to the variable length encoding part 8.
The filter information includes the flag (frame_filter_on_off_flag) showing whether or not to carry out the filtering process for the frame currently processed.
When this flag is set to ON (shows that the filtering process is carried out), information as will be shown below is included in the filter information.
(1) The number of Wiener filters (the number of classes to each of which one or more regions belong)
(2) Information (index) about the number of taps of each Wiener filter
(3) Information about the coefficients of an actually-used Wiener filter (a Wiener filter of each class to which one or more regions belong)
(4) ON/OFF information and block size information about filters for each block
In this embodiment, the example in which the pieces of information (1) to (4) are included in the filter information is shown. The number of Wiener filters, the number of taps of each Wiener filter, and the block size for ON/OFF can be held by both the image encoding device and the image decoding device as information determined in common in the image encoding device and the image decoding device, instead of encoding and transmitting the pieces of information between them.
Furthermore, although in the above explanation
As mentioned above, the filter information outputted from the filter designing and processing unit 13 is entropy-encoded by the variable length encoding part 8, and is transmitted to the image decoding device.
In
A predicting unit 22 carries out a process of generating a prediction signal showing a prediction image by using the parameters for prediction signal generation which the variable length decoding part 21 has variable-length-decoded. Particularly, in a case in which a motion vector is used as a parameter for prediction signal generation, the predicting unit carries out a process of generating a prediction signal from the motion vector and a reference image signal stored in a memory 26.
The predicting unit 22 constructs a prediction image generating unit.
A prediction error decoding unit 23 carries out a process of performing inverse quantization on the compressed data which the variable length decoding part 21 has variable-length-decoded, and then performing an inverse DCT process on the compressed data inverse-quantized thereby to calculate a prediction error signal corresponding to the prediction error signal outputted from the predicting unit 2 shown in
An adder 24 carries out a process of adding the prediction error signal calculated by the prediction error decoding unit 23 and the prediction signal generated by the predicting unit 22 to calculate a decoded image signal corresponding to the decoded image signal outputted from the adder 5 shown in
A decoding unit is comprised of the prediction error decoding unit 23 and the adder 24.
A loop filter 25 carries out a filtering process of compensating for a distortion superimposed onto the decoded image signal outputted from the adder 24, and then carries out a process of outputting the decoded image signal filtered thereby to outside the image decoding device and to the memory 26 as a filtered decoded image signal. The loop filter 25 constructs a filtering unit.
The memory 26 is a recording medium for storing the filtered decoded image signal outputted from the loop filter 25 as the reference image signal.
In
A region classifying unit 32 carries out a process of extracting a feature quantity of each of the regions which construct a decoded image shown by the single frame of the decoded image signal stored in the frame memory 31 to classify each of the regions belongs into the class to which the region belongs according to the feature quantity, like the region classifying unit 12 shown in
A filter processing unit 33 carries out a process of generating a Wiener filter which is applied to the class into which each of the regions is classified by the region classifying unit 32 with reference to the filter information which the variable length decoding part 21 has variable-length-decoded to compensate for the distortion superimposed onto the region by using the Wiener filter.
Although in the example of
In this case, the image encoding device needs to perform the filtering process on each macro block independently.
Next, the operation of the image decoding device will be explained.
When receiving the bit stream from the image encoding device, the variable length decoding part 21 variable-length-decodes compressed data, filter information, and parameters for prediction signal generation which are included in the bit stream.
When receiving the parameters for prediction signal generation, the predicting unit 22 generates a prediction signal from the parameters for prediction signal generation. Particularly, when receiving a motion vector as the parameter for prediction signal generation, the predicting unit generates a prediction signal from the motion vector and the reference image signal stored in the memory 26.
When receiving the compressed data from the variable length decoding part 21, the prediction error decoding unit 23 performs inverse quantization on the compressed data and then performs an inverse DCT process on the compressed data inverse-quantized thereby to calculate a prediction error signal corresponding to the prediction error signal outputted from the predicting unit 2 shown in
After the prediction error decoding unit 23 calculates the prediction error signal, the adder 24 adds the prediction error signal and the prediction signal generated by the predicting unit 22 to calculate a decoded image signal corresponding to the local decoded image signal outputted from the adder 5 shown in
When receiving the decoded image signal from the adder 24, the loop filter 25 carries out the filtering process of compensating for the distortion superimposed onto the decoded image signal, and outputs the decoded image signal filtered thereby to outside the image decoding device as a filtered decoded image signal while storing the filtered decoded image signal in the memory 26 as the reference image signal.
Hereafter, the process carried out by the loop filter 25 will be explained concretely.
First, the frame memory 31 of the loop filter 25 stores only one frame of the decoded image signal outputted from the adder 24.
When the flag (frame_filter_on_off_flag) included in the filter information is set to ON (shows that the filtering process is carried out) (step ST31), the region classifying unit 32 extracts a feature quantity of each of the regions which construct the decoded image shown by the single frame of the decoded image signal stored in the frame memory 31, and classifies each of the regions into the class to which the region belongs according to the feature quantity, like the region classifying unit 12 shown in
When receiving the filter information from the variable length decoding part 21, the filter processing unit 33 generates a Wiener filter which is applied to the class to which each of the regions classified by the region classifying unit 32 belongs with reference to the filter information (step ST33).
For example, when the number of Wiener filters (the number of classes to each of which one or more regions belong) is expressed as N, the number of taps of each Wiener filter is expressed as L×L, and the coefficient values of each Wiener filter are expressed as wi11, wi12, . . . , wi1L, . . . , wiL1, wiL2, . . . , wiLL, the N Wiener filters Wi (i=1, 2, . . . , N) are shown as follows.
After generating the N Wiener filters Wi, the filter processing unit 33 compensates for the distortion superimposed onto the single frame of the decoded image signal by using these Wiener filters, and outputs the distortion-compensated decoded image signal to outside the image decoding device and to the memory 26 as the filtered decoded image signal (step ST34).
The filtered decoded image signal s hat is expressed by the following equation (4).
ŝ=S·W
id(s) (4)
A matrix S is a group of reference signals of L×L pixels including the decoded image signal s which is the target for filtering, and id(s) is the number (filter numbers) of the class which is determined by the region classifying unit 32 and to which the region including the signal s belongs.
When performing the above-mentioned filtering process, the filter processing unit 33 refers to the flag (block_filter_on_off_flag) included in the filter information, and, when the flag (block_filter_on_off_flag) is set to 1 (ON), refers to the block size information included in the filter information and then identifies the plurality of blocks (K) which construct the decoded image, and, after that, carries out the filtering process with reference to the information included in the filter information and showing whether or not to carry out the filtering process for each block (K).
More specifically, when flag (block_filter_on_off_flag) is set to 1 (ON), the filter processing unit 33 performs the filtering process on the decoded image signal in the block (K), on which the filtering unit is going to perform the filtering process, among the blocks which construct the decoded image, by using the Wiener filter of the class to which the region including the block (K) belongs while outputting the yet-to-be-filtered decoded image signal in the block (K) which the filtering unit is not going to perform the filtering process to outside the image decoding device and to the memory 26 as the filtered decoded image signal, just as it is.
In contrast, when flag (block_filter_on_off_flag) is set to 0 (OFF), the filter processing unit performs the filtering process on each of all the decoded image signals in the frame currently processed by using the filter corresponding to the class into which each of the regions is classified by the region classifying unit 32.
When the flag (frame_filter_on_off_flag) included in the filter information is set to OFF (the filtering process is not carried out) (step ST31), the filter processing unit 33 does not perform the filtering process on the frame currently processed, and outputs each decoded image signal outputted from the adder 24 to outside the image decoding device and to the memory 26 as the filtered decoded image signal, just as it is (step ST35).
As can be seen from the above description, in the image encoding device in accordance with this Embodiment 1, the loop filter 6 includes the region classifying unit 12 for extracting a feature quantity of each of the regions which construct a local decoded image shown by a local decoded image signal outputted by the adder 5 to classify each of the regions into the class to which the region belongs according to the feature quantity, and the filter designing and processing unit 13 for, for each class to which one or more regions, among the regions which construct the local decoded image, belong, generating a Wiener filter which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image in each of the one or more regions belonging to the class to compensate for a distortion superimposed onto the one or more regions by using the Wiener filter. Therefore, the image encoding device implements the filtering process according to the local properties of the image, thereby being able to improve the improvement accuracy of the image quality.
Furthermore, in the image decoding device in accordance with this Embodiment 1, the loop filter 25 includes the region classifying unit 32 for extracting a feature quantity of each of the regions which construct a decoded image shown by a decoded image signal outputted by the adder 24 to classify each of the regions to the class to which the region belongs according to the feature quantity, and the filter processing unit 33 for referring to filter information which the variable length decoding part 21 has variable-length-decoded to generate a Wiener filter which is applied to the class to which each region classified by the region classifying unit 32 belongs, and for compensating for a distortion superimposed onto the region by using the Wiener filter. Therefore, the image decoding device implements the filtering process according to the local properties of the image, thereby being able to improve the improvement accuracy of the image quality.
In above-mentioned Embodiment 1, the loop filter in which the filter designing and processing unit 13 generates a Wiener filter for each class to which one or more regions belong, and performs the filtering process on each of the blocks (K) which construct a local decoded image by using the Wiener filter of the class to which the region including the block (K) belongs is shown. As an alternative, for each of the blocks, the loop filter can select a Wiener filter which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K) from among Wiener filters which the loop filter generates for each class to which one or more regions belong, and can compensate for a distortion superimposed onto the block (K) by using the Wiener filter selected thereby.
Concretely, a loop filter of this embodiment operates as follows.
A filter designing and processing unit 13 generates a Wiener filter for each class to which one or more regions belong, like that in accordance with above-mentioned Embodiment 1 (steps ST2 to ST8).
In accordance with this Embodiment 2, the filter designing and processing unit does not use a flag (block_filter_on_off_flag) showing whether or not to carry out a process of selecting whether or not to carry out a filtering process for each block within a frame currently processed, but uses a flag (block_filter_selection_flag) showing whether or not to select a filter which is to be used for each block within the frame currently processed. Furthermore, the flag (block_filter_selection_flag) is initially set to OFF in step ST40, and is set to ON only when step ST46 is carried out.
As will be mentioned later, only when the flag (block_filter_selection_flag) is set to ON, a block size and filter selection information about each block are included in filter information.
After generating a Wiener filter for each class to which one or more regions belong, the filter designing and processing unit 13 selects an optimal process (e.g. a process which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K)) from among a process of performing the filtering process on each of the blocks (K) which construct the local decoded image by selecting a Wiener filter from among Wiener filters which the filter designing and processing unit generates for each class to which one more regions belongs, and a process of not performing the filtering process on each of the blocks (steps ST9, and ST41 to ST47).
More specifically, in a case of generating four Wiener filters W1, W2, W3, and W4 and carrying out the filtering process using each of the four Wiener filters, the filter designing and processing unit selects the Wiener filter W3 which minimizes the sum of squared errors E for the block (K) if the sum of squared errors E in the block (K) has the following inequality among the four filters.
where EW0 shows the sum of squared errors E at the time when any filtering process is not carried out.
When determining to perform the filtering process on the frame currently processed by using the Wiener filters selected, the filter designing and processing unit 13 sets the flag (frame_filter_on_off_flag) to 1 (ON), and carries out the filtering process which minimizes the cost in steps ST1 to ST9 and ST40 to ST47 and outputs the local decoded image signal filtered thereby to a memory 7 as a reference image signal (steps ST17 to ST20).
In contrast, when determining not to perform the filtering process on the whole of the frame currently processed (steps ST17 to ST18), the filter designing and processing unit sets the flag (frame_filter_on_off_flag) to zero (OFF), and outputs the yet-to-be-filtered local decoded image signal to the memory 7 as the reference image signal (steps ST21 to ST22).
After generating the Wiener filters and then carrying out the filtering process in the above-mentioned way, the filter designing and processing unit 13 outputs the filter information about the Wiener filters to a variable length encoding part 8.
The flag (frame_filter_on_off_flag) showing whether or not to carry out the filtering process within the frame currently processed is included in the filter information.
When this flag is set to ON (shows that the filtering process is carried out), information as will be shown below is included in the filter information.
(1) The number of Wiener filters (the number of classes to each of which one or more regions belong)
(2) Information (index) about the number of taps of each Wiener filter
(3) Information about the coefficients of an actually-used Wiener filter (a Wiener filter of each class to which one or more regions belong)
(4) Filter selection information about each block and block size information
In this embodiment, the example in which the pieces of information (1) to (4) are included in the filter information is shown. The number of Wiener filters, the number of taps of each Wiener filter, and the block size can be held by both the image encoding device and an image decoding device as information determined in common in the image encoding device and the image decoding device, instead of encoding and transmitting the pieces of information between them.
A loop filter 25 in the image decoding device carries out the following process.
First, a frame memory 31 of the loop filter 25 stores only one frame of a decoded image signal outputted from an adder 24.
When the flag (frame_filter_on_off_flag) included in the filter information is set to ON (shows that a filtering process is carried out) (step ST31), and when the flag (block_filter_selection_flag) included in the filter information is set to OFF (step ST51), a region classifying unit 32 extracts a feature quantity of each of the regions which construct the decoded image shown by the single frame of the decoded image signal stored in the frame memory 31, and classifies each of the regions into the class to which the region belongs according to the feature quantity (step ST32), like that in accordance with above-mentioned Embodiment 1.
In contrast, when the flag (frame_filter_on_off_flag) included in the filter information is set to ON (shows that the filtering process is carries out) (step ST31), and when the flag (block_filter_selection_flag) included in the filter information is set to ON (step ST51), the region classifying unit refers to the information about the size of each block, which is the unit for selection, and the filter selection information about each block among the pieces of information included in the filter information, and performs class classification for each block (step ST52).
After the region classifying unit 32 classifies each region (each block) into the class to which the region belongs, a filter processing unit 33 refers to the filter information outputted from a variable length decoding part 21, and generates a Wiener filter which is applied to the class to which each region (each block) classified by the region classifying unit 32 belongs (step ST33), like that in accordance with above-mentioned Embodiment 1.
After generating a Wiener filter which is applied to each class, when (block_filter_selection_flag) is set to OFF, the filter processing unit 33 performs the filtering process on each of all the decoded image signals in a frame currently processed by using the generated Wiener filters, and outputs each decoded image signal filtered thereby to outside the image decoding device and to a memory 26 as a filtered decoded image signal (step ST53), like in the case in which the flag (block_filter_on_off_flag) is set to OFF in above-mentioned Embodiment 1.
In contrast, when (block_filter_selection_flag) is set to ON, the filter processing unit 33 compensates for the distortion superimposed onto the decoded image signal in each block by using the Wiener filter which is selected for the block after generating the Wiener filter which is applied to each class, and outputs the decoded image signal filtered thereby to outside the image decoding device and to the memory 26 as a filtered decoded image signal (step ST53).
The filtered decoded image signal s hat at this time is expressed by the following equation (5).
ŝ=S·W
id
2(bl) (5)
A matrix S is a group of reference signals of L×L pixels including the decoded image signal s which is the target for filtering.
id—2 (bl) is the filter selection information in a block bl in which the decoded image signal s is included, i.e. the class number (filter number) of the block bl.
id—2(bl)=0 shows a block on which any filtering process is not performed. Therefore, any filtering process is not performed on the block.
As can be seen from the above description, because the image encoding device in accordance with this Embodiment 2 is constructed in such a way that, for each of the blocks (K) which construct a decoded image, the loop filter selects a Wiener filter which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the decoded image signal in the block (K) from among Wiener filters which the loop filter generates for each class to which one or more regions belong, and compensates for the distortion superimposed onto the block (K) by using the Wiener filter selected thereby, there is provided an advantage of further improving the improvement accuracy of the image quality compared with above-mentioned Embodiment 1.
In above-mentioned Embodiment 2, the method of selecting, from among the process of performing the filtering process on each of the blocks (K) which construct a decoded image by using one of Wiener filters which are generated for each class to which one more regions in a frame currently processed belongs, and the process of not performing the filtering process on each block, the process which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K) is shown. As an alternative, from among a process of preparing one or more Wiener filters in advance, and using one of the one or more Wiener filters which have been prepared in advance, the process of using one of Wiener filters which are generated for each class to which one more regions in a frame currently processed belongs, and the process of not performing the filtering process on each block, the loop filter can select the process which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K).
Because this Embodiment 3 provides a wider choice of Wiener filters compared with that in above-mentioned Embodiment 2, the probability that an optimal Wiener filter is selected is increased compared with above-mentioned Embodiment 2.
Because a method of selecting a Wiener filter is the same as that shown in above-mentioned Embodiment 2, the explanation of the method will be omitted hereafter.
Because the process carried out by an image decoding device is the same as that in accordance with above-mentioned Embodiment 2, the explanation of the process will be omitted hereafter.
In above-mentioned Embodiment 2, the method of selecting, from among the process of performing the filtering process on each of the blocks (K) which construct a decoded image by using one of Wiener filters which are generated for each class to which one more regions in a frame currently processed belongs, and the process of not performing the filtering process on each block, the process which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K) is shown. As an alternative, from among the process of using one of Wiener filters which are generated for each class to which one more regions in a frame currently processed belongs, a process of using one of Wiener filters which have been used for an already-encoded frame, and the process of not performing the filtering process on each block, the loop filter can select the process which minimizes the sum of squared errors occurring between the image signal which is the target to be encoded and the local decoded image signal in the block (K).
As a reference method of referring to a Wiener filter which has been used for an already-encoded frame, for example, reference methods as will be shown below can be provided.
Method (1) of referring to a Wiener filter which has been used for a block at a position shown by a representative motion vector which is calculated in a block which is a target for filtering.
Method (2) of referring to a Wiener filter which has been used for a block located in a frame which is the nearest in time to a block which is a target for filtering, and located at the same position as the target block.
Method (3) of referring to a Wiener filter which has been used for a block having the highest cross-correlation among the blocks in the already-encoded frame.
In the case of using the method (3), an identical block searching process needs to be carried out by the image encoding device and an image decoding device.
Because this Embodiment 4 provides a wider choice of Wiener filters compared with that in above-mentioned Embodiment 2, the probability that an optimal Wiener filter is selected is increased compared with above-mentioned Embodiment 2.
Because a method of selecting a Wiener filter is the same as that shown in above-mentioned Embodiment 2, the explanation of the method will be omitted hereafter.
Because the process carried out by the image decoding device is the same as that in accordance with above-mentioned Embodiment 2, the explanation of the process will be omitted hereafter.
The image encoding device, the image decoding device, the image encoding method, and the image decoding method in accordance with the present invention can improve the improvement accuracy of the imaging quality. The image encoding device and the image encoding method are suitable for use as an image encoding device or the like for and an image encoding method or the like of compression-encoding and transmitting an image, respectively, and the image decoding device and the image decoding method are suitable for use as an image decoding device or the like for and an image decoding method or the like of decoding encoded data transmitted by the image encoding device to reconstruct an image, respectively.
Number | Date | Country | Kind |
---|---|---|---|
2009-146350 | Jun 2009 | JP | national |
This application is a Divisional of copending application Ser. No. 13/378,974, filed on Dec. 16, 2011, which was filed as PCT International Application No. PCT/JP2010/003492 on May 25, 2010, which claims the benefit under 35 U.S.C. §119(a) to Patent Application No. 2009-146350, filed in Japan on Jun. 19, 2009, all of which are hereby expressly incorporated by reference into the present application.
Number | Date | Country | |
---|---|---|---|
Parent | 13378974 | Dec 2011 | US |
Child | 14515136 | US |