The present disclosure relates to an image encoding device, an image decoding device, an image encoding method, and an image decoding method.
Patent Literature 1 discloses a video encoding and decoding method using an adaptive coupled prefilter and a postfilter.
Patent Literature 2 discloses an encoding method of image data for loading into an artificial intelligence (AI) integrated circuit.
Patent Literature 1: U.S. Pat. No. 9,883,207
Patent Literature 2: U.S. Pat. No. 10,452,955
An object of the present disclosure is to change a filter applied to a decoded image in accordance with an image usage.
An image decoding device according to one aspect of the present disclosure includes: circuitry; and a memory connected to the circuitry, in which, in operation, the circuitry acquires a first image and a plurality of first filter sets by decoding a bitstream, and generates and outputs a second image by selecting one first filter set from the plurality of first filter sets based on usage information indicating an image usage, and applying the first filter set having been selected to the first image.
The conventional encoding method has aimed to provide optimal video under bit rate constraints for human vision.
With the progress of machine learning or neural network-based applications along with abundant sensors, many intelligent platforms that handle large amounts of data, including connected cars, video surveillance, and smart cities have been implemented. Since large amounts of data are constantly generated, the conventional method involving humans in pipelines has become inefficient and unrealistic in terms of latency and scale.
Furthermore, in transmission and archive systems, there is a concern that more compact data representation and low-latency solutions are required, and therefore, video coding for machines (VCM) has been introduced.
In some cases, machines can communicate with each other and execute tasks without human intervention, while in other cases, additional processing by humans may be necessary for decompressed specific streams. Such cases include a case where for example, in surveillance cameras, a human “supervisor” searches for a specific person or scene in a video.
In other cases, corresponding bitstreams are used by both humans and machines. For connected cars, features can be used for image correction functions for humans and for object detection and segmentation for machines.
Typical system architecture includes a pair of image encoding device and image decoding device. The input of the system is a video, a still image, or a feature quantity. Examples of a machine task include object detection, object segmentation, object tracking, action recognition, pose estimation, or a discretionary combination thereof. There is a possibility that human vision is one of the use cases that can be used along with the machine task.
According to the conventional technique, there is a problem that a filter applied to a decoded image in an image decoding device cannot be dynamically changed in accordance with an image usage.
In order to solve such a problem, the present inventor has found that a decoded image and a plurality of filter sets are acquired by decoding a bitstream, and a filter set to be applied to the decoded image can be dynamically changed in accordance with an image usage by selecting one filter set from the plurality of filter sets based on usage information indicating an image usage on an image decoding device side, and have arrived at the present disclosure.
Next, each aspect of the present disclosure will be described.
An image decoding device according to a first aspect of the present disclosure includes: circuitry; and a memory connected to the circuitry, in which, in operation, the circuitry acquires a first image and a plurality of first filter sets by decoding a bitstream, and generates and outputs a second image by selecting one first filter set from the plurality of first filter sets based on usage information indicating an image usage, and applying the first filter set having been selected to the first image.
According to the first aspect, the first filter set applied to the first image can be dynamically changed in accordance with the image usage.
In the image decoding device according to a second aspect of the present disclosure, in the first aspect, the circuitry may acquire a plurality of second filters and a plurality of parameter values by decoding the bitstream, may select one first filter set from the plurality of first filter sets, may select one second filter from the plurality of second filters, and may select one parameter value from the plurality of parameter values based on the usage information, and may select one first filter from the first filter set based on a feature value obtained by applying the second filter having been selected to the first image and the parameter value having been selected, and may generate the second image by applying the first filter having been selected to the first image.
According to the second aspect, the first filter applied to the first image can be dynamically changed based on the feature value of the first image obtained by application of the second filter.
In the image decoding device according to a third aspect of the present disclosure, in the second aspect, the number of pixels in a first image region of the first image, the first image region applied with the first filter, may be is equal to the number of pixels of a second image region of the first image, the second image region applied with the second filter, and a range of the second image region may be wider than a range of the first image region.
According to the third aspect, it is possible to reduce the influence of local noise while suppressing an increase in the processing load due to application of the second filter.
In the image decoding device according to a fourth aspect of the present disclosure, in any one of the first to third aspects, one of the plurality of first filter sets may be a bypass filter that outputs the first image as the second image.
According to the fourth aspect, by selecting the bypass filter, it is possible to avoid unnecessary filter processing from being executed.
In the image decoding device according to a fifth aspect of the present disclosure, in any one of the first to fourth aspects, the circuitry may acquire, as one of the plurality of first filter sets, a postfilter set corresponding to a prefilter set applied to an input image by an image encoding device by decoding the bitstream received from the image encoding device.
According to the fifth aspect, since the conversion processing from the prefilter set to the postfilter set is executed on the image encoding device side, the processing load of the image decoding device can be reduced.
In the image decoding device according to a sixth aspect of the present disclosure, in any one of the first to fourth aspects, the circuitry may acquire a prefilter set applied to an input image by an image encoding device by decoding the bitstream received from the image encoding device, and acquire, as one of the plurality of first filter sets, a postfilter set corresponding to the prefilter set by converting the prefilter set.
According to the sixth aspect, since the conversion processing from the prefilter set to the postfilter set is executed on the image decoding device side, the processing load of the image encoding device can be reduced.
In the image decoding device according to a seventh aspect of the present disclosure, in any one of the first to sixth aspects, the image usage may include at least one machine task and human vision.
According to the seventh aspect, it is possible to not only select the first filter set suitable for the machine task but also select the first filter set suitable for human vision.
In the image decoding device according to an eighth aspect of the present disclosure, in any one of the first to seventh aspects, the circuitry may acquire the plurality of first filter sets by decoding a header of the bitstream.
According to the eighth aspect, by storing the first filter set in the header of a bitstream, the circuitry can easily acquire the first filter.
In the image decoding device according to a ninth aspect of the present disclosure, in the eighth aspect, the header may have a supplemental enhancement information (SEI) region, and the circuitry may acquire the plurality of first filter sets by decoding the SEI region.
According to the ninth aspect, by storing the first filter set in the SEI region, the first filter set can be easily handled as additional information.
An image encoding device according to a tenth aspect of the present disclosure includes: circuitry; and a memory connected to the circuitry, in which, in operation, the circuitry generates a first image by applying a prefilter set in accordance with an image usage to an input image, and generates a bitstream by encoding the first image and the prefilter set or a postfilter set corresponding to the prefilter set.
According to the tenth aspect, since the bitstream includes the first image and the plurality of prefilter sets or the plurality of postfilter sets, the image decoding device can perform the optimum postfilter processing in accordance with the prefilter processing applied to the first image.
In the image encoding device according to an eleventh aspect of the present disclosure, in the tenth aspect, the image usage may be an image usage in an image decoding device, and the circuitry may transmit the bitstream to the image decoding device.
According to the eleventh aspect, a prefilter set in accordance with the image usage in the image decoding device can be applied to the input image.
An image decoding method according to a twelfth aspect of the present disclosure includes: acquiring a first image and a plurality of first filter sets by decoding a bitstream, and generating and outputting a second image by selecting one first filter set from the plurality of first filter sets based on usage information indicating an image usage, and applying the first filter set having been selected to the first image.
According to the twelfth aspect, the first filter set to be applied to the first image can be dynamically changed in accordance with the image usage.
An image encoding method according to a thirteenth aspect of the present disclosure includes: generating a first image by applying a prefilter set in accordance with an image usage to an input image; and generating a bitstream by encoding the first image and the prefilter set or a postfilter set corresponding to the prefilter set.
According to the thirteenth aspect, since the bitstream includes the first image and the prefilter set or the postfilter set, the image decoding device can perform the optimum postfilter processing in accordance with the prefilter processing applied to the first image.
Embodiments of the present disclosure will be described below in detail with reference to the drawings. Elements denoted by the same corresponding reference signs in different drawings represent the same or corresponding elements.
Embodiments to be described below will each refer to a specific example of the present disclosure. The numerical values, shapes, constituent elements, steps, orders of the steps, and the like of the following embodiments are merely examples, and do not intend to limit the present disclosure. A constituent element not described in an independent claim representing the highest concept among constituent elements in the embodiments below is described as a discretionary constituent element. In all embodiments, respective items of content can be combined.
The image encoding device 10 includes a filter processing unit 11 and an encoding processing unit 12. Image data D1 of an input image is input to the filter processing unit 11. The input image includes a video, still image, or feature quantity. The filter processing unit 11 includes a plurality of prefilter sets of different types in accordance with the image usage on the image decoding device 20 side. The filter processing unit 11 selects one prefilter set from the plurality of prefilter sets in accordance with the image usage, generates a first image by performing filter processing using the selected prefilter set on the input image, and outputs image data D2 of the first image. The filter processing unit 11 outputs filter information D3 regarding the filter applied to the input image. The filter information D3 includes a plurality of pieces of first filter set information D3a regarding the prefilter set (or the postfilter set complementary thereto) applied to the input image and another prefilter set or postfilter set, a plurality of pieces of second filter information D3b regarding a plurality of feature extraction filters, and a plurality of pieces of parameter value information D3c. The filter processing unit 11 selects a prefilter to be applied to the input image based on a result of comparison between a feature value obtained by applying a feature extraction filter to the input image and a parameter value such as a threshold.
The encoding processing unit 12 generates a bitstream D4 by performing encoding processing on the image data D2 and the filter information D3, and transmits the bitstream D4 to the image decoding device 20 via the network Nw.
The network Nw is the Internet, a wide area network (WAN), a local area network (LAN), or a discretionary combination thereof. The network Nw needs not necessarily be limited to a bidirectional communication network, but may be a unidirectional communication network that transmits broadcast waves such as terrestrial digital broadcasting or satellite broadcasting. The network Nw may be a recording medium such as a digital versatile disc (DVD) or a Blue-ray disc (BD) on which the bitstream D4 is recorded.
The image decoding device 20 includes a decoding processing unit 21, a filter processing unit 22, and a task processing unit 23. The decoding processing unit 21 receives the bitstream D4 from the image encoding device 10 via the network Nw, generates a first image by decoding the bitstream D4, and outputs the image data D5 of the first image corresponding to the image data D2. The decoding processing unit 21 acquires filter information D6 corresponding to the filter information D3 by decoding the bitstream D4. The filter information D6 includes a plurality of pieces of first filter set information D6a corresponding to the plurality of pieces of first filter set information D3a, a plurality of pieces of second filter information D6b corresponding to the plurality of pieces of second filter information D3b, and a plurality of pieces of parameter value information D6c corresponding to the plurality of pieces of parameter value information D3c.
Based on usage information D7 indicating the image usage in task processing unit 23, the filter processing unit 22 selects one first filter set from a plurality of first filter sets indicated by the first filter set information D6a, one second filter from a plurality of second filters indicated by the second filter information D6b, and one parameter value from a plurality of parameter values indicated by the parameter value information D3c. The image usage may be designated by a user, for example, or may be acquired by decoding the bitstream D4 as one piece of the filter information D6. The filter processing unit 22 selects one first filter from the first filter set based on a result of comparison between a feature value obtained by applying the selected second filter to the first image and the selected parameter value. The filter processing unit 22 generates the second image by applying the selected first filter to the first image, and outputs image data D8 of the second image.
One of the plurality of first filter sets may be a bypass filter that bypasses the filter processing and causes the filter processing unit 22 to output the first image (image data D5) as the second image (image data D8). By selecting the bypass filter, it is possible to avoid unnecessary filter processing from being executed. For example, the bypass filter may be indicated by setting the values of all filter coefficients to a specific value (e.g., 0), or the bypass filter may be indicated by another information in place of the setting of the values of the filter coefficients. When the first filter set corresponding to the image usage indicated by the usage information D7 does not exist in the plurality of first filter sets indicated by the first filter set information D6a, the filter processing unit 22 may be caused to output the first image (image data D5) by bypassing the filter processing.
By using the second image indicated by the image data D8, the task processing unit 23 executes task processing in accordance with the usage information D7 indicating the image usage, and outputs result data D9 such as an inference result.
The filter processing unit 11 includes a plurality of first filter sets of different types in accordance with the image usage on the image decoding device 20 side. The type includes at least one of the shape, size, and coefficient value of the filter. The first filter set corresponding to the machine task includes at least one of a noise removal filter, a sharpening filter, a bit depth conversion filter, a color space conversion filter, a resolution conversion filter, and a filter using a neural network. The noise removal filter includes at least one of a low-pass filter, a Gaussian filter, a smoothing filter, an averaging filter, a bilateral filter, and a median filter to remove noise by reducing information on details of the input image. The sharpening filter includes an edge detection filter or an edge enhancement filter, specifically includes a Laplacian filter, a Gaussian-Laplacian filter, a Sobel filter, a Prewitt filter, or a Canny edge detection filter. The bit depth conversion filter converts bit depth of a luminance signal and/or a color signal between the input image and the first image. For example, by truncating lower bits of the color signal of the first image and converting the bit depth of the first image to be smaller than the bit depth of the input image, a code amount is reduced. The color space conversion filter converts the color space between the input image and the first image. For example, by converting a color space of YUV444 in the input image to YUV422, YUV420, or YUV400 in the first image, the code amount is reduced. The resolution conversion filter converts the image resolution between the input image and the first image. The resolution conversion filter includes a downsampling filter that reduces the resolution of the first image as compared with the resolution of the input image. The resolution conversion filter may include an upsampling filter that increases the resolution of the first image as compared with the resolution of the input image. The first filter set corresponding to the machine task may be, for example, a deblocking filter, an ALF filter, a CCALF filter, an SAO filter, an LMCS filter, which are defined in H.266/Versatile Video Codec (VVC), or a discretionary combination thereof.
The first filter set corresponding to human vision is a filter that does not reduce the code amount of the first image as compared with the code amount of the input image by the filter processing. The first filter set corresponding to human vision includes a bypass filter that outputs an input image as it is as a first image. The first filter set corresponding to human vision may be a filter that reduces the code amount of the first image as compared with the code amount of the input image by filter processing, but the reduction effect of the code amount is suppressed more than that of the first filter set corresponding to the machine task. The first filter set corresponding to human vision may be a filter that enhances an important region of the input image, but the enhancement effect is suppressed more than that of the first filter set corresponding to the machine task.
As described above, the filter processing unit 22 selects one first filter set from the plurality of first filter sets indicated by the first filter set information D6a based on the usage information D7 indicating the image usage in the task processing unit 23. The first filter set may be two or more filters having different filter strengths.
As described above, the filter processing unit 22 selects one second filter from the plurality of second filters indicated by the second filter information D6b based on the usage information D7 indicating the image usage in the task processing unit 23. The second filter is a feature extraction filter for classifying a region in the image based on image characteristics. As the second filter, any filter that can classify a region in the image, such as a differential filter, a saliency filter, or a segmentation filter, can be used.
The differential filter is used to calculate an image gradient such as a direction change in luminance or color in an image. As the differential filter, for example, an edge detector can be used. The edge detector may be a first-order differential filter such as a Sobel filter or a Prewitt filter, or may be a second-order differential filter such as a Laplacian filter or a Gaussian-Laplacian filter.
The saliency filter is used to detect a visual saliency region in an image where human eyes are more focused. The visual saliency region can be used to improve a human visual recognition score or to reduce computational complexity in a machine task.
As the segmentation filter, a luminance-based image segmentation filter, a model-based image segmentation filter, or a hybrid segmentation filter can be used. The luminance-based image segmentation filter divides a region of an image based on a luminance value of each pixel in the image. The model-based image segmentation filter segments each region of an image using a neural network model such as a light-weight object detection model.
As the hybrid segmentation filter, for example, a filter in which an existing filter and a model-based image segmentation filter are combined can be used.
In the example illustrated in
In the example illustrated in
In the example illustrated in
As described above, the filter processing unit 22 selects one first filter from the first filter set based on a result of comparison between a feature value obtained by applying the selected second filter to the first image and the selected parameter value. The feature value is, for example, an edge strength, and the parameter value is, for example, a threshold. The filter processing unit 22 applies the first filter having a weak filter strength to a first image region corresponding to a certain second image region when the edge strength regarding the second image region is equal to or greater than the threshold, and applies the first filter having a strong filter strength to a first image region corresponding to a certain second image region when the edge strength regarding the second image region is less than the threshold. A configuration of selecting one first filter from three or more first filters by setting two or more thresholds may be adopted. The filter processing unit 22 may select one first filter from two or more first filters based on a task type parameter indicated by the usage information D7. The first filter set may include only one first filter, and in this case, the same first filter may be applied in all regions in a screen without using the second filter.
postfilter_hint_size_y designates a filter coefficient or a vertical size of a correlation array, and has a value from “1” to “15”, for example.
postfilter_hint_size_x designates a filter coefficient or a horizontal size of a correlation array, and has a value from “1” to “15”, for example.
num_of_postfilters designates the total number of postfilters, and has a value from “1” to “15”, for example.
postfilter_hint_type designates a type of a postfilter by, for example, two-bit flag information, and indicates, for example, a two-dimensional FIR filter when the value is “0”, a one-dimensional FIR filter when the value is “1”, and a cross-correlation matrix between an input image signal and a filtering image signal when the value is “2”.
cIdx designates a relevant color component. chroma_format_idc designates a chroma format, and indicates monochrome when the value is “0”, YUV420 when the value is “1”, and YUV422 when the value is “2”, for example. cy represents a vertical counter, and cx represents a horizontal counter. postfilter_hint_value [cIdx][cy][cx] indicates a filter coefficient or an element of a cross-correlation matrix.
derivative_filter_hint_size_y designates a filter coefficient or a vertical size of a correlation array, and has a value from “1” to “15”, for example.
derivative_filter_hint_size_x designates a filter coefficient or a horizontal size of a correlation array, and has a value from “1” to “15”, for example.
num_of_derivative_filters designates the total number of the second filters, and has a value from “1” to “15”, for example.
derivative_filter_hint_type designates a type of the second filter by, for example, two-bit flag information, and indicates, for example, the two-dimensional FIR filter when the value is “0”, the one-dimensional FIR filter when the value is “1”, and the cross-correlation matrix between an input image signal and a filtering image signal when the value is “2”.
cIdx designates a relevant color component. chroma_format_idc designates a chroma format, and indicates monochrome when the value is “0”, YUV420 when the value is “1”, and YUV422 when the value is “2”, for example. cy represents a vertical counter, and cx represents a horizontal counter. derivative_filter_hint_value [cIdx][cy][cx] indicates a filter coefficient or an element of a cross-correlation matrix.
First, in step SP101, the filter processing unit 11 generates the first image by performing filter processing using a prefilter set on an input image, and outputs image data D2 of the first image. The filter processing unit 11 outputs filter information D3 regarding the filter applied to the input image.
Next, in step SP102, the encoding processing unit 12 generates the bitstream D4 by performing encoding processing to the first image. At that time, the encoding processing unit 12 encodes the filter information D3 and stores the encoded data 70 of the filter information D3 into the bitstream D4. The encoding processing unit 12 transmits the generated bitstream D4 to the image decoding device 20 via the network Nw.
First, in step SP201, the decoding processing unit 21 receives the bitstream D4 from the image encoding device 10 via the network Nw, generates a first image by decoding the bitstream D4, and outputs the image data D5 of the first image. The decoding processing unit 21 acquires the filter information D6 by decoding the bitstream D4.
Next, in step SP202, based on the usage information D7, the filter processing unit 22 selects one first filter set from a plurality of first filter sets indicated by the first filter set information D6a, one second filter from a plurality of second filters indicated by the second filter information D6b, and one parameter value from a plurality of parameter values indicated by the parameter value information D3c.
Next, in step SP203, the filter processing unit 22 generates the second image by applying the selected first filter set to the first image, and outputs the image data D8 of the second image.
Next, in step SP204, by using the second image indicated by the image data D8, the task processing unit 23 executes task processing in accordance with the usage information D7, and outputs the result data D9 such as an inference result.
According to the present embodiment, the first filter set applied to the first image can be dynamically changed in accordance with the image usage such as each machine task or human vision. This enables an optimum filter set to be selected and designated in accordance with image properties required for each image usage. Depending on the image usage, by removing unnecessary information from an image to be transmitted to the bitstream D4, it is possible to reduce a transmission code amount from the image encoding device 10 to the image decoding device 20.
The first filter to be applied to the first image can be dynamically selected from the first filter set based on the feature value of the first image obtained by application of the second filter.
The present disclosure is particularly useful for application to an image processing system including an image encoding device that transmits an image and an image decoding device that receives an image.
| Number | Date | Country | |
|---|---|---|---|
| 63342736 | May 2022 | US |
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/JP2023/017508 | May 2023 | WO |
| Child | 18945959 | US |