The present invention relates to a method, a program and an apparatus for reducing data size of a plurality of images containing mutually similar information, as well as a data structure representing a plurality of images containing mutually similar information.
Currently, research on various technologies for achieving ultra-realistic communication is being advanced. One of such technologies is a 3D display technology for providing high-definition 3D displays using multi-view images. Such 3D displays are achieved by parallax images obtained by capturing images of a subject from a large number of views (e.g., 200 views).
One of issues for putting such 3D displays into practical use resides in reduction of data size of multi-view images. Multi-view images contain information obtained by observing a subject from many views, resulting in increased data size. Various proposals have been made for such an issue.
For example, NPD 1 discloses a method called adaptive distributed coding of multi-view images. More specifically, this method is based on a modulo operator, and images obtained from respective views are encoded without exchanging mutual information, and in decoding, information exchange among the views is permitted.
NPD 2 and NPD 3 each disclose a method based on a residual and depths between images. These methods use one or more original views and a difference between a warped view obtained by warping the original view(s) and neighbor views. This warped view is a virtual view corresponding to the original view(s), generated from the neighbor views.
Since the method disclosed in above-described NPD 1 is mainly intended to be applied to distributed source coding, distributed video frames coding, and the like, linkage among views is not taken into consideration in an encoding process. Moreover, according to the method disclosed in NPD 1, side information is utilized in an encoding process and a decoding process, but the image quality after decoding may deteriorate because the accuracy of the side information is not very high in a region with a large difference from a value of an original view. In contrast, according to the methods disclosed in NPD 2 and NPD 3, information contained in a region with a small difference from an original view included in side information cannot be recovered appropriately.
The present invention was made to solve problems as described above, and has an object to provide a method, a program and an apparatus for reducing data size of a plurality of images containing mutually similar information more efficiently, as well as a data structure representing a plurality of images containing mutually similar information.
According to an aspect of the present invention, a method for reducing data size of a plurality of images containing mutually similar information is provided. The present method includes the steps of acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, generating a mask image indicating a potential error of the synthesized image generated based on the first reference image and the second reference image, with respect to the target image, generating a residual image in accordance with a difference between the target image and the synthesized image, generating a converted image corresponding to the target image by assigning, based on the mask image, a region where the potential error is relatively large, information on a corresponding region of the residual image, and outputting the first reference image, the second reference image and the converted image as information representing the target image, the first reference image and the second reference image.
Preferably, the present method further includes the step of generating a remainder image composed of a remainder at each pixel location calculated by performing a modulo operation on an intensity value at each pixel location of the target image. The step of generating the converted image includes a step of assigning, based on the mask image, a region where the potential error is relatively small, information on a corresponding region of the remainder image.
Preferably, the step of generating the converted image includes a step of comparing a value at each pixel location of the mask image with a predetermined threshold value to distinguish between a region where the potential error is relatively large and a region where the potential error is relatively small.
Preferably, the step of generating the residual image includes a step of performing bit plane conversion on an image generated from the difference between the target image and the synthesized image.
Preferably, the present method further includes the steps of acquiring the first reference image, the second reference image and the converted image having been output, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image having been acquired, generating a mask image based on the first reference image and the second reference image having been acquired, and extracting information on a region of the converted image where the potential error is relatively large based on the mask image, and determining an intensity value at a corresponding pixel location of the target image based on the extracted information and the synthesized image.
More preferably, the present method further includes the step of performing an inverse modulo operation on information on a region of the converted image where the potential error is relatively small based on the mask image, thereby determining an intensity value at a corresponding pixel location of the target image.
More preferably, in the step of determining an intensity value, bit plane conversion is performed on information on a region of the converted image where the potential error is relatively large, and an image whose bit number has been increased by the bit plane conversion and the synthesized image are added to determine the intensity value.
More preferably, the step of outputting includes a step of outputting a subsampled image of the target image. The method further includes the step of replacing a region of the target image reconstructed by the determined intensity value where a difference from an upsampled image obtained by upsampling the subsampled image is relatively large, by a corresponding value of the upsampled image.
Preferably, the step of selecting includes steps of selecting the target image as well as the first reference image and the second reference image based on a baseline distance when the plurality of images are multi-view images, and selecting the target image as well as the first reference image and the second reference image based on a frame rate when the plurality of images represent a sequence of video frames.
According to another aspect of the present invention, a program for reducing data size of a plurality of images containing mutually similar information is provided. The program causes a computer to execute the steps of acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, generating a mask image indicating a potential error of the synthesized image generated based on the first reference image and the second reference image, with respect to the target image, generating a residual image in accordance with a difference between the target image and the synthesized image, generating a converted image corresponding to the target image by assigning, based on the mask image, a region where the potential error is relatively large, information on a corresponding region of the residual image, and outputting the first reference image, the second reference image and the converted image as information representing the target image, the first reference image and the second reference image.
According to still another aspect of the present invention, an apparatus for reducing data size of a plurality of images containing mutually similar information is provided. The apparatus includes a means for acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, a means for generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, a means for generating a mask image indicating a potential error of the synthesized image generated based on the first reference image and the second reference image, with respect to the target image, a means for generating a residual image in accordance with a difference between the target image and the synthesized image, a means for generating a converted image corresponding to the target image by assigning, based on the mask image, a region where the potential error is relatively large, information on a corresponding region of the residual image, and a means for outputting the first reference image, the second reference image and the converted image as information representing the target image, the first reference image and the second reference image.
According to yet another aspect of the present invention, a data structure representing a plurality of images containing mutually similar information is provided. The data structure includes a converted image corresponding to a target image contained in the plurality of images, and a first reference image and a second reference image similar to the target image. The converted image is obtained by assigning a region of a synthesized image generated based on the first reference image and the second reference image where a potential error with respect to the target image is relatively large, information on a corresponding region of a residual image in accordance with a difference between the target image and the synthesized image.
According to the present invention, data size of a plurality of images containing mutually similar information can be reduced more efficiently.
An embodiment of the present invention will be described in detail with reference to the drawings. It is noted that, in the drawings, the same or corresponding portions have the same reference characters allotted, and detailed description thereof will not be repeated.
First, a typical application example will be described for easy understanding of a data size reduction method according to the present embodiment. It is noted that the application range of the data size reduction method according to the present embodiment is not limited to a structure which will be described below, but can be applied to any structure.
More specifically, 3D displays reproduction system 1 includes an information processing apparatus 100 functioning as an encoder to which respective images (parallax images) are input from plurality of cameras 10, and an information processing apparatus 200 functioning as a decoder which decodes data transmitted from information processing apparatus 100 and outputs multi-view images to 3D display device 300. Information processing apparatus 100 performs a data compression process which will be described later along with an encoding process, thereby generating data suitable for storage and/or transmission. As an example, information processing apparatus 100 wirelessly transmits data (compressed data) containing information on the generated multi-view images using a wireless transmission device 102 connected thereto. This wirelessly transmitted data is received by a wireless transmission device 202 connected to information processing apparatus 200 through a wireless base station 400 and the like.
3D display device 300 includes a display screen mainly composed of a diffusion film 306 and a condenser lens 308, a projector array 304 which projects multi-view images on the display screen, and a controller 302 for controlling images to be projected by respective projectors of projector array 304. Controller 302 causes a corresponding projector to project each parallax image contained in multi-view images output from information processing apparatus 200.
With such an apparatus structure, a viewer who is in front of the display screen is provided with a reproduced 3D display of subject 2. At this time, a parallax image entering a viewer's view is intended to be changed depending on the relative positions of the display screen and the viewer, giving the viewer an experience as if he/she is in front of subject 2.
Such 3D displays reproduction system 1 is expected to be used for general applications in a movie theater, an amusement facility and the like, and to be used for industrial applications as a remote medical system, an industrial design system and an electronic advertisement system for public viewing or the like.
Considering multi-view images, moving pictures or the like generated by capturing images of subject 2 with the camera array as shown in
The data size reduction method according to the present embodiment and the data structure used therein can be applied to multi-view data representation as described above, and can also be applied to distributed source coding. Alternatively, the data size reduction method according to the present embodiment and the data structure used therein can be applied to video frames representation, and can also be applied to distributed video frames coding. It is noted that the data size reduction method according to the present embodiment may be used alone or as part of pre-processing before data transmission.
Assuming multi-view images captured with the camera array as shown in
The unchanged image and the depth map are used to synthesize (estimate) a virtual view at the location of an image to be converted into a hybrid image. The depth map can also be utilized in a decoding process (a process of reconverting a converted image/a process of returning a converted image to original image form). In the reconversion process, the depth map for an unchanged image may be reconstructed using that unchanged image.
A converted image is generated by assigning of information on a corresponding region of a residual image based on a mask image indicating a potential error, with respect to an image to be converted, of an image synthesized from an unchanged image and a depth map. More specifically, a converted image is generated by assigning a region of the synthesized image where a potential error with respect to a target image is relatively large, information on a corresponding region of a residual image. Typically, a mask image is generated before image conversion.
Furthermore, in a converted image, it may assign a region to which the residual image has not been assigned, information on a corresponding region of a remainder image. That is, a converted image may be generated by combining the intensity value of a remainder image and the intensity value of a residual image in accordance with a mask image.
In this way, a converted image according to the present embodiment, which may be formed by combining a plurality of pieces of information, will be called a “hybrid image” in the following description for the sake of convenience.
A residual image is generated based on a target image and a virtual image.
A remainder image is generated using side information which is information on a virtual view at the location of an image to be converted. When input images are multi-view images, a synthesized virtual image (virtual view) is used as side information. Alternatively, a virtual image may be synthesized using an unchanged image and a depth map, and this synthesized image may be used as side information. When generating a remainder image from side information, a gradient image is generated. The value of each gradient is an integer value, and a modulo operation or an inverse modulo operation is executed using this integer value.
It is noted that a target image itself to be converted into a hybrid image may be used as side information. In this case, a synthesized virtual image and/or a subsampled image of a target image will be used as side information because the target image cannot be used as it is in a decoding process.
On the other hand, if input images represent a sequence of video frames, a frame interpolated or extrapolated between frames can be used as side information.
By the data size reduction method according to the present embodiment, a hybrid image 190 with which information on target image 170 can be reconstructed from information on proximate reference images 172 and 182 is generated, and this hybrid image 190 is output instead of target image 170. Basically, hybrid image 190 interpolates information, in information possessed by target image 170, that is lacking with information contained in reference images 172 and 182, and redundancy can be eliminated as compared with the case of outputting target image 170 as it is. Therefore, data size can be reduced as compared with the case of outputting target image 170 and reference images 172, 182 as they are.
As will be described later, target image 170 and reference images 172, 182 can be selected at any intervals as long as they contain mutually similar information. For example, as shown in
As shown in
Target image 170 and reference images 172, 182 can also be selected at any frame intervals similarly for a sequence of video frames as long as they contain mutually similar information. For example, as shown in
The data size reduction method according to the present embodiment may be used alone or as part of pre-processing before data transmission.
It is noted that “image capturing” in the present specification may include processing of arranging some object on a virtual space and rendering an image from a view optionally set for this arranged object (that is, virtual image capturing on a virtual space) as in computer graphics, for example, in addition to processing of acquiring an image of a subject with a real camera.
In the present embodiment, cameras can be optionally arranged in the camera array for capturing images of a subject. For example, any arrangement, such as one-dimensional arrangement (where cameras are arranged on a straight line), two-dimensional arrangement (where cameras are arranged in a matrix form), circular arrangement (where cameras are arranged entirely or partially on the circumference), spiral arrangement (where cameras are arranged spirally), and random arrangement (where cameras are arranged without any rule), can be adopted.
Next, an exemplary configuration of hardware for achieving the data size reduction method according to the present embodiment will be described.
Referring to
Processor 104 reads a program stored in hard disk 110 or the like, and expands the program in memory 106 for execution, thereby achieving the encoding process according to the present embodiment. Memory 106 functions as a working memory for processor 104 to execute processing.
Camera interface 108 is connected to plurality of cameras 10, and acquires images captured by respective cameras 10. The acquired images may be stored in hard disk 110 or memory 106. Hard disk 110 holds, in a nonvolatile manner, image data 112 containing the acquired images and an encoding program 114 for achieving the encoding process and data compression process. The encoding process which will be described later is achieved by processor 104 reading and executing encoding program 114.
Input unit 116 typically includes a mouse, a keyboard and the like to accept user operations. Display unit 118 informs a user of a result of processing and the like.
Communication interface 120 is connected to wireless transmission device 102 and the like, and outputs data output as a result of processing executed by processor 104, to wireless transmission device 102.
Referring to
Processor 204, memory 206, input unit 216, and display unit 218 are similar to processor 104, memory 106, input unit 116, and display unit 118 shown in
Projector interface 208 is connected to 3D display device 300 to output multi-view images decoded by processor 204 to 3D display device 300.
Communication interface 220 is connected to wireless transmission device 202 and the like to receive image data transmitted from information processing apparatus 100 and output the image data to processor 204.
Hard disk 210 holds, in a nonvolatile manner, image data 212 containing decoded images and a decoding program 214 for achieving a decoding process. The decoding process which will be described later is achieved by processor 204 reading and executing decoding program 214.
The hardware itself and its operation principle of each of information processing apparatuses 100 and 200 shown in
All or some of functions of information processing apparatus 100 and/or information processing apparatus 200 may be implemented by using a dedicated integrated circuit such as ASIC (Application Specific Integrated Circuit) or may be implemented by using programmable hardware such as FPGA (Field-Programmable Gate Array) or DSP (Digital Signal Processor).
In a data server for managing images, for example, a single information processing apparatus will execute the encoding process and decoding process, as will be described later.
Next, an overall procedure of the data size reduction method according to the present embodiment will be described.
Referring to
Subsequently, processor 104 generates a synthesized image corresponding to the target image based on the set two reference images (step S102), and generates a mask image from the two reference images and their depth maps (step S104). This mask image indicates a potential error of the synthesized image generated based on the two reference images, with respect to the target image.
Subsequently, processor 104 generates a residual image from the target image and the synthesized image (step S106). The residual image is an image in accordance with a difference between the target image and the synthesized image.
Processor 104 also generates a remainder image from the target image, the synthesized image and the like (step S108). More specifically, in the processing of generating a remainder image in step S108, processor 104 generates side information based on part or all of the target image and the synthesized image (step S1081). The side information is information on a virtual view at the location of the target image, and contains information necessary for reconstructing the target image from the remainder image and the reference images. Subsequently, processor 104 generates a gradient image from the generated side information (step S1082). Then, processor 104 calculates a remainder at each pixel location from the generated gradient image (step S1083).
Subsequently, processor 104 generates a hybrid image from the mask image, the residual image and the remainder image (step S110). Finally, processor 104 at least outputs the hybrid image and the reference images as information corresponding to the target image and the reference images (step S112). That is, processor 104 outputs the two reference images and the hybrid image as information representing the target image and the two reference images.
As a decoding process, processing in steps S200 to S214 is executed. Specifically, processor 204 acquires information output as a result of the encoding process (step S200). That is, processor 204 at least acquires the two reference images and the hybrid image having been output.
Subsequently, processor 204 generates a synthesized image corresponding to the target image based on the reference images contained in the acquired information (step S202), and generates a mask image from the two reference images and their depth maps (step S204).
Subsequently, processor 204 separates the hybrid image into a residual image region and a remainder image region based on the generated mask image (step S206). Then, processor 204 reconstructs a corresponding region of the target image from the synthesized image and the separated residual image region (step S208), and reconstructs a corresponding region of the target image from the synthesized image and the separated remainder image region (step S210).
More specifically, in the processing in step S210 of reconstructing a region corresponding to the remainder image, processor 204 generates side information from the acquired information (step S2101). Subsequently, processor 204 generates a gradient image from the generated side information (step S2102). Then, processor 204 determines an intensity value at each pixel location of the target image from the side information, the gradient image and the remainder image (step S2103).
Finally, processor 204 combines the reconstructed region corresponding to the residual image and the reconstructed region corresponding to the remainder image to reconstruct the target image (step S212), and outputs the reconstructed target image and the reference images (step S214).
Next, the encoding process (steps S100 to S112 in
Image acquisition processing shown in step S100 of
Target image 170 and reference images 172, 182 must contain mutually similar information. Therefore, in the case of multi-view images, target image 170 and reference images 172, 182 are preferably selected based on their baseline distance. That is, target image 170 and reference images 172, 182 are selected in accordance with parallaxes produced therebetween. In the case of a sequence of video frames (moving picture), frames to be a target are selected based on the frame rate. That is, the processing in step S100 of
In
By the data size reduction method according to the present embodiment, a synthesized image 176 corresponding to the target image may be generated using depth maps of reference images 172 and 182, as will be described later. Therefore, a depth map 174 of reference image 172 and a depth map 184 of reference image 182 are acquired or estimated using any method.
For example, in the case of using a camera array as shown in
In
In the case where the input plurality of images are multi-view images, and when a depth map for a view cannot be used or when a distance camera cannot be used, depth information estimation unit 152 generates depth maps 174 and 184 corresponding to reference images 172 and 182, respectively. As a method for estimating a depth map by depth information estimation unit 152, various methods based on stereo matching with which energy optimization as disclosed in NPD 4 is used together can be adopted. For example, optimization can be done using graph cuts as disclosed in NPD 5.
Depth maps 174 and 184 generated by depth information estimation unit 152 are stored in depth information buffer 154.
It is noted that when the input plurality of images represent a sequence of video frames (moving picture), it is not always necessary to acquire depth maps.
The following description will mainly illustrate a case where one set of input data contains target image 170, reference image 172 and corresponding depth map 174 as well as reference image 182 and corresponding depth map 184, as a typical example.
The processing of generating a synthesized image shown in step S102 of
When the input plurality of images represent a sequence of video frames (moving picture), interpolation processing or extrapolation processing is performed based on information on frames corresponding to two reference images 172 and 182, thereby generating information on a frame corresponding to target image 170, which can be used as synthesized image 176.
The processing of generating a mask image shown in step S104 of
Subsequently, mask estimation unit 160 calculates the intensity difference (absolute value) between a projected pixel location of the transformed image and a pixel location of a corresponding reference image. That is, mask estimation unit 160 calculates the intensity difference (absolute value) between transformed image 173 and reference image 172 for the respective corresponding pixel locations, thereby generating an error image 175 (denoted as “eR”). Similarly, mask estimation unit 160 calculates the intensity difference (absolute value) between transformed image 183 and reference image 182 for the respective corresponding pixel locations, thereby generating an error image 185 (denoted as “eL”). Error images 175 and 185 indicate estimate values of errors for the right-side and left-side reference images (reference images 172 and 184), respectively.
Subsequently, binarizing processing is executed on error images 175 and 185. That is, mask estimation unit 160 compares the intensity value at each pixel location of error images 175 and 185 with a threshold value to distinguish between a region where the intensity value is lower than a preset threshold value and a region where the intensity value is higher than the preset threshold value. As a result, binarized error images 177 and 187 having either “0” for a pixel in a region where the intensity value is lower than the threshold value or another integer value (typically, “1”) are generated from error images 175 and 185, respectively.
Furthermore, mask estimation unit 160 subjects binarized error images 177 and 187 to 3D warping into the location of target image 170 using respectively corresponding depth maps 174 and 184, thereby generating binarized error transformed images 179 and 189. Ultimately, mask estimation unit 160 generates mask image 180, representing an estimate value of the error with respect to target image 170, in which binarized error transformed image 179 and binarized error transformed image 189 are integrated. More specifically, mask estimation unit 160 obtains a logical product for respective pixel locations between binarized error transformed image 179 and binarized error transformed image 189, thereby calculating mask image 180.
It is noted that calculated mask image 180 may be subjected to a filtering process. A noise component contained in the calculated mask image can be reduced by the filtering process. For such a filtering process, various methods, such as Gaussian, Median, and morphological operations (e.g., dilation, erosion, etc.), can be adopted.
In this mask image 180, a region with a small error between reference images is indicated by “0”, and a region with a large error between reference images is indicated by another integer value (e.g., “1”). In this way, the processing of generating a mask image shown in step S104 of
It is noted that for the processing of generating mask image 180 shown in
The processing of generating a residual image shown in step S106 of
The processing of generating a remainder image shown in step S108 of
<<e6-1: Generation of Side Information>>
The processing of generating side information shown in step S1018 of
Subsampling unit 166 generates a subsampled image 178 from target image 170. In
Any method can be adopted for the processing of generating subsampled image 178 in subsampling unit 166. For example, for every predetermined region, one piece of pixel information contained in that predetermined region can be extracted from target image 170 for output as subsampled image 178.
Alternatively, subsampled image 178 may be generated through any filtering process (e.g., nearest neighbor method, interpolation, bicubic interpolation, or bilateral filter). For example, subsampled image 178 of any size can be generated by dividing target image 170 into regions of predetermined size (e.g., 2×2 pixels, 3×3 pixels, etc.), and in each region, performing linear or non-linear interpolation processing on information on a plurality of pixels contained in that region.
Typically, a method for generating side information 192 can be selected optionally from among the following four methods (a) to (d).
(a) In the case where target image 170 itself is used as side information 192:
Side information selection unit 1641 directly outputs input target image 170 as side information 192. Since target image 170 cannot be used as it is in a decoding process, a synthesized image generated based on reference images is used as side information.
(b) In the case where subsampled image 178 of target image 170 is used as side information 192:
Side information selection unit 1641 directly outputs subsampled image 178 generated by subsampling unit 166 as side information 192.
(c) In the case where synthesized image 176 is used as side information 192:
Side information selection unit 1641 directly outputs synthesized image 176 generated by image synthesis unit 158 as side information 192.
(d) In the case where the combination of subsampled image 178 and synthesized image 176 is used as side information 192:
Side information selection unit 1641 generates side information 192 in accordance with a method which will be described later. In this case, the processing of generating side information shown in step S1018 of
More specifically, side information selection unit 1641 first calculates a weighting factor used for combination. This weighting factor is associated with a reliability distribution of synthesized image 176 with respect to subsampled image 178 of target image 170. That is, the weighting factor is determined based on an error between synthesized image 176 and subsampled image 178 (target image 170) (or the degree of matching therebetween). The calculated error distribution is equivalent to the inverse of the reliability distribution. It can be considered that, as the error is smaller, the reliability is higher. That is, the region with a larger error is considered that the reliability of synthesized image 176 is lower. Thus, more information on subsampled image 178 (target image 170) is assigned to such a region. On the other hand, the region with a smaller error is considered that the reliability of synthesized image 176 is higher. Thus, more information on synthesized image 176 having a smaller error is assigned.
In this way, when the scheme (d) is selected, side information selection unit 1641 determines the error distribution based on the difference between upsampled image 198 obtained by upsampling subsampled image 178 and synthesized image 176. Side information selection unit 1641 combines subsampled image 178 (or upsampled image 198) and synthesized image 176 based on determined error distribution R, to generate side information 192. Although various methods can be considered as a method for generating side information 192 using calculated error distribution R, the following processing examples can be adopted, for example.
In this processing example, calculated error distribution R is divided into two regions using any threshold value. Typically, a region where the error is higher than the threshold value is called a Hi region, and a region where the error is smaller than the threshold value is called a Lo region. Then, information on subsampled image 178 (substantially, upsampled image 198) or synthesized image 176 is assigned to each pixel of side information 192 in correspondence with the Hi region and the Lo region of error distribution R. More specifically, the value at a pixel location of upsampled image 198 obtained by upsampling subsampled image 178 is assigned to a corresponding pixel location of side information 192 corresponding to the Hi region of error distribution R, and the value at a pixel location of synthesized image 176 is assigned to a corresponding pixel location corresponding to the Lo region of error distribution R.
That is, if upsampled image 198 (image obtained by upsampling subsampled image 178) is denoted as SS and synthesized image 176 is denoted as SY, the value at a pixel location (x, y) of side information 192 (denoted as “SI”) is expressed as follows using a predetermined threshold value TH.
SI(x,y)=SS(x,y){if R(x,y)≧TH}
=SY(x,y){if R(x,y)<TH}
In this way, in this processing example, side information selection unit 1641 assigns information on upsampled image 198 obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error.
In this processing example, calculated error distribution R is divided into n types of regions using (n−1) threshold values. Assuming the number k of the divided regions as 1, 2, . . . , and n in the order that the error increases, the value at the pixel location (x, y) of side information 192 (SI) is expressed as follows using the number k of the divided regions.
SI(x,y)=(k/n)×SY(x,y)+(1−K/n)×SS(x,y)
In this way, in this processing example, side information selection unit 1641 assigns information on upsampled image 198 obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error.
In this processing example, an inverse value of the error at a pixel location is considered as a weighting factor, and side information 192 is calculated using this. Specifically, a value SI(x, y) at the pixel location (x, y) of side information 192 is expressed as follows.
SI(x,y)=(1/R(x,y))×SY(x,y)+(1−1/R(x,y))×SS(x,y)
In this way, in this processing example, side information selection unit 1641 assigns information on upsampled image 198 obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error. In this processing example, upsampled image 198 (subsampled image 178) is dominant as the error is larger, and synthesized image 176 is dominant as the error is smaller.
<<e6-2: Generation of Gradient Image>>
The processing of generating a gradient image shown in step S1082 of
Typically, gradient image 196 is generated by the following procedure.
(a) Resize side information 192 to an image size of a remainder image to be output.
(b) Apply Gaussian filtering to the resized side information to remove noise (Gaussian smoothing).
(c) Split the filtered side information to color components (i.e., a gray scale image is generated for each color component).
(d) Execute operations of (d1) to (d4) for the gray scale image of each color component.
(d1) Edge detection
(d2) Gaussian smoothing (once or more) (or Median filter)
(d3) a series of morphological operations (e.g., dilation (once or more), erosion (once or more), dilation (once or more))
(d4) Gaussian smoothing (once or more)
Through the operations as described above, a gradient image is generated for each color component constituting side information 192. That is, the processing of generating gradient image 196 shown in S1082 of
The procedure described herein is merely an example, and the details of processing, procedure and the like of Gaussian smoothing and morphological operations can be designed appropriately.
Furthermore, processing of generating a pseudo gradient image may be adopted. That is, any filtering process may be adopted as long as an image in which a region with a larger textural change in side information 192 has a larger intensity can be generated.
<<e6-3: Calculation of Remainder>>
The processing of calculating a remainder shown in step S1083 of
In this way, the processing of calculating a remainder shown in step S1083 of
As a method for selecting factor D, any method can be adopted. For example, the value of gradient image 196 itself may be selected as factor D. However, in order to improve the image quality after decoding, factor D is determined nonlinearly with respect to gradient image 196 in the present embodiment. Specifically, with reference to Lookup table 1644, factor D corresponding to each pixel location of gradient image 196 is selected. Here, factor D is determined for each pixel location of each color component contained in gradient image 196.
In this way, the processing of calculating a remainder shown in step S1083 of
Returning to
Modulo operation unit 1645 performs a modulo operation on the intensity value at each pixel location using corresponding factor D as a modulus. More specifically, a minimum m with which the intensity value P=q×D+m (q≧0, D>0) at each pixel location holds is determined. Herein, q is a quotient, and m is a remainder.
Since “intensity value P=q′×D+m” is calculated in processing of reconstructing target image 170 (decoding process) which will be described later, remainder m calculated at each pixel location by each color component is stored as remainder image 188. That is, remainder m at each pixel location constitutes remainder image 188.
Remainder image 188 may be resized to any size using a well-known downsampling method or upsampling method.
The processing of generating a hybrid image shown in step S110 and the processing of outputting a processing result shown in step S112 of
More specifically, the processing of generating a hybrid image shown in step S110 of
Image combining unit 168 selects one of residual image 186 and remainder image 188 in accordance with the value at each pixel location of mask image 180, and outputs the value at a corresponding pixel location as an intensity value at a corresponding pixel location of hybrid image 190. More specifically, image combining unit 168 adopts the value at the corresponding pixel location of remainder image 188 if mask image 180 has a value of “0”, and adopts the value at the corresponding pixel location of residual image 186 if mask image 180 has a value other than “0”. That is, the value at each pixel location of mask image 180 indicates the magnitude of error between reference images, and if it has a value of “0” (when the error is relatively small), remainder image 188 with which a reconstructed image will have higher image quality is selected, and if it has a value other than “0” (when the error is relatively large), residual image 186 with a smaller amount of information is selected. In this way, hybrid image 190 is generated by selectively combining residual image 186 and remainder image 188 in accordance with the two regions included in mask image 180.
In this way, hybrid image 190 for representing a plurality of images containing mutually similar information is obtained by assigning information on a region of residual image 186 in accordance with the difference between target image 170 and synthesized image 176 to a corresponding region of synthesized image 176 generated based on reference images 172 and 182 where the potential error with respect to target image 170 is relatively large.
As described above, by appropriately combining residual image 186 and remainder image 188 to generate hybrid image 190, data size can be reduced, and the quality of a reconstructed image can be maintained at a more suitable level.
As a final output of the encoding process by the data size reduction method according to the present embodiment, at least reference images 172 and 182 as input and hybrid image 190 as a processing result are stored. As an option, depth map 174 of reference image 172 and depth map 184 of reference image 182 may be output. As another option, subsampled image 178 may be output together with remainder image 188. These pieces of information (images) added as options are suitably selected in accordance with the details of processing in the decoding process.
The above-described description has been given paying attention to the set of one target image 170 and two reference images 172, 182. Similar processing is executed on all of target images and respectively corresponding reference images set for a plurality of input images (multi-view images or a sequence of video frames).
A processing example of a decoding process by the data size reduction method according to the present embodiment will now be described.
It is understood that even in the case of high-definition target image 170 as shown in
Furthermore, by assigning a component of residual image 186 to white portions of the mask image shown in
Next, the details of a decoding process (steps S200 to S214 of
Since it is basically inverse processing of the encoding process, the detailed description of similar processing will not be repeated.
Referring to
Information processing apparatus 200 reconstructs original target image 170 using encoded information (reference images 172, 182 and hybrid image 190). For example, as shown in
As shown in
The acquisition processing in the encoding process shown in step S200 of
On the other hand, if depth maps 174 and 184 are not input, depth information estimation unit 252 generates depth maps 174 and 184 corresponding to reference images 172 and 182, respectively. Since the method for estimating depth maps in depth information estimation unit 252 is similar to the above-described method for estimating depth maps in depth information estimation unit 152 (
The processing of generating a synthesized image shown in step S202 of
The processing of generating a mask image shown in step S204 of
Mask estimation unit 260 generates a mask image 280 indicating the magnitude of error between reference images using reference image 172 and corresponding depth map 174 as well as reference image 182 and corresponding depth map 184. Since the method for generating a mask image in mask estimation unit 260 is similar to the above-described method for generating a mask image in mask estimation unit 160 (
Region separation unit 268 performs separation into a residual image region 286 and a remainder image region 288 from mask image 280 and hybrid image 190. In
Specifically, if mask image 280 has a value of “0”, region separation unit 268 outputs the value at a corresponding pixel location of hybrid image 190 as the value of remainder image region 288. If mask image 280 has a value other than “0”, region separation unit 268 outputs the value at a corresponding pixel location of hybrid image 190 as the value of residual image region 286. In this way, hybrid image 190 is separated into two independent images based on mask image 280. Remainder image region 288 and residual image region 286 are generated for each color component included in hybrid image 190.
The processing of reconstructing a residual image region shown in step S208 of
As described with reference to
The processing of reconstructing a remainder image region shown in step S210 of
<<f6-1: Generation of Side Information>>
The processing of generating side information shown in step S2101 of
As described above, subsampled image 178 may not be contained in input data. In this case, side information selection unit 2641 generates side information 292 based on synthesized image 276 generated by image synthesis unit 258.
On the other hand, if subsampled image 178 is contained in input data, side information selection unit 2641 may use upsampled image 272 obtained by upsampling subsampled image 178 as side information 292, or may generate side information by the combination of upsampled image 272 and synthesized image 276.
Upsampling unit 266 shown in
For such processing of generating side information by the combination of upsampled image 272 and synthesized image 276, binary weighted combination, discrete weighted combination, continuous weighted combination, or the like can be adopted using the error distribution as described above. Since these processes have been described above, a detailed description thereof will not be repeated.
<<f6-2: Generation of Gradient Image>>
The processing of generating a gradient image shown in step S2102 of
<<f6-3: Determination of Intensity Value>>
The processing of determining an intensity value at each pixel location of a target image shown in step S2103 of
In this inverse modulo operation, factor D used when generating hybrid image 190 (remainder image 188) in the encoding process is estimated (selected) based on gradient image 296. That is, factor selection unit 2643 selects factor D in accordance with the value at each pixel location of gradient image 296. Although any method can be adopted as a method for selecting this factor D, factor D at each pixel location is selected with reference to Lookup table 2644 in the present embodiment. Lookup table 2644 is similar to Lookup table 1644 (
Inverse modulo operation unit 2645 performs an inverse modulo operation using selected factor D and remainder m for each pixel location, as well as corresponding value SI of side information 292. More specifically, inverse modulo operation unit 2645 calculates a list of candidate values C(q′) for the intensity value of reconstructed image 289 in accordance with the expression C(q′)=q′×D+m (where q′≧0, C(q′)<256), and among these calculated candidate values C(q′), one with the smallest difference (absolute value) from corresponding value SI of side information 292 is determined as a corresponding intensity value of reconstructed image 289.
For example, considering the case where factor D=8, remainder m=3, and corresponding value SI of side information 292=8, candidate values C(q′) are obtained as follows:
Candidate value C(0)=0×8+3=3 (difference from SI=5)
Candidate value C(1)=1×8+3=11 (difference from SI=3)
Candidate value C(2)=2×8+3=19 (difference from SI=11)
Among these candidate values C(q′), candidate value C(1) with the smallest difference from corresponding value SI of side information 292 is selected, and the corresponding intensity value of reconstructed image 289 is determined as “11”. The intensity value at each pixel location of reconstructed image 289 is thereby determined by each color component.
In this way, the process of reconstructing a remainder image region shown in step S210 of
The processing of reconstructing a target image shown in step S212 of
Since the residual image region and the remainder image region have been separated from hybrid image 190, reconstructed image 270 can be basically generated by simply combining reconstructed image 287 corresponding to the residual image region and reconstructed image 289 corresponding to the remainder image region. That is, for each pixel location, either one of reconstructed image 287 and reconstructed image 289 contains information on an inverted intensity value, and information contained in the other one is invalid. Therefore, reconstructed image 270 can be generated by combining (adding) the values of both reconstructed image 287 and reconstructed image 289 for each pixel location.
As the final output of the decoding process according to the present embodiment, at least reconstructed image 270 obtained as a result of processing as well as reference images 172 and 182 as input are output and/or stored. As an option, depth map 174 of reference image 172 and depth map 184 of reference image 182 may be output. Furthermore, reconstructed image 270 may be resized to any size depending on the difference in size from original target image 170 and/or two reference images 172 and 182.
Although the above-described description has been given paying attention to the set of one target image 170 and two reference images 172 and 182, a similar process is executed on all target images and respective corresponding reference images set for a plurality of input images (multi-view images or a sequence of video frames).
The data size reduction method according to the embodiment described above may be varied as will be described below.
In the decoding process by the data size reduction method according to the embodiment described above, reconstructed image 270 is generated by combining reconstructed image 287 corresponding to the residual image region and reconstructed image 289 corresponding to the remainder image region. However, a local noise due to some error may be produced in an image. In such a case, element combining unit 274 (
More specifically, element combining unit 274 calculates the intensity difference (absolute value) at corresponding pixel locations between the reconstructed image obtained by combining reconstructed image 287 and reconstructed image 289, and upsampled image 272 obtained by upsampling subsampled image 178. If there is a pixel in which this calculated intensity difference exceeds a predetermined threshold value, element combining unit 274 replaces the value of the generated reconstructed image corresponding to that pixel by the value of a corresponding pixel of subsampled image 178. Then, the reconstructed image after replacement is output finally. That is, the method according to the present variation includes processing of replacing a region, of a target image reconstructed by the inverted intensity value, having a relatively large difference from upsampled image 272 obtained by upsampling subsampled image 178, by a corresponding value of upsampled image 272.
By performing compensation (i.e. replacement of intensity values) using upsampled image 272 in this way, local noise and the like that may be produced in reconstructed image 270 can be reduced.
Although the above-described embodiment illustrates the processing of selecting factor D from among a plurality of candidates in generation of a remainder image, the processing of selecting factor D may be simplified further. For example, a gradient image for each gray scale image may be generated, and the threshold value for this gradient image may be determined stepwisely. For example, in the case of an 8-bit image, the threshold values for a gradient image may be set at “4”, “1” and “0”, respectively, such that if the gradient has a value of more than or equal to “4”, “32” is selected as factor D, if the gradient has a value of more than or equal to “1”, “128” is selected as factor D, and if the gradient has a value of “0”, “256” is selected as factor D.
A user can optionally set the combination of a threshold value for a gradient image and corresponding factor D.
Although in the above-described embodiment, a remainder image containing a remainder calculated for each pixel in accordance with selected factor D is generated, the remainder may be determined stepwisely. For example, a corresponding value in the set of predetermined remainders may be selected depending on which range each of remainders obtained by performing a modulo operation using certain factor D belongs to. For example, when factor D is “64”, the threshold value for the remainder is set at “16”. If the calculated remainder is more than or equal to “16”, “32” may be output as a final remainder, and if the calculated remainder is less than “16”, “0” may be output as a final remainder.
Threshold values and remainders to be output may be determined in three or more levels. By using such a remainder image, the amount of information can be reduced further.
Although a residual image having an n-bit gradation value is generated in the above-described embodiment, the remainder may be determined stepwisely. For example, when a calculated remainder is relatively large, the calculated remainder itself is used for maintaining the reconstructed image quality, while when the calculated remainder is relatively small, a predetermined value may be set because the image reconstruction processing is little affected.
For example, when the calculated remainder is more than or equal to a predetermined threshold value, the calculated remainder itself may be used, while when the calculated remainder is less than the threshold value, a predetermined value (e.g., 128) may be used.
The above-described embodiment illustrates the processing example of generating a hybrid image from a residual image and a remainder image and converting a target image into this hybrid image. Although the remainder image is used for reconstructing a region with a small error between reference images, such a region can also be interpolated by side information. Therefore, a hybrid image not containing a remainder image may be adopted.
Such a hybrid image contains information effective only for a residual image region (residual image 186), and does not contain information on a remainder image region (remainder image 188). When reconstructing a target image, the residual image region is reconstructed from a hybrid image, and the remainder image region is reconstructed from side information. This side information is generated based on reference images 172 and 182 as well as subsampled image 178 of target image 170 output together with the hybrid image, and the like.
By adopting such a hybrid image, data size can be reduced further.
According to the present embodiment, reconstruction to a higher quality image can be achieved as compared with an encoding process using only a residual image or only a remainder image. In addition, data size can be reduced further as compared with the encoding process using only a residual image or only a remainder image.
The present embodiment is applicable to various applications for image processing systems, such as data representation of multi-view images or a new data format before image compression.
According to the present embodiment, more efficient representation can be derived using a remainder-based data format for large-scale multi-view images.
Moreover, the converted data format can be used for devices with small power capacity, such as mobile devices. Therefore, according to the present embodiment, the possibility of providing 3D features more easily on mobile devices or low power consumption devices can be increased.
It should be understood that the embodiment disclosed herein is illustrative and non-restrictive in every respect. The scope of the present invention is defined by the claims not by the description above, and is intended to include any modification within the meaning and scope equivalent to the terms of the claims.
Number | Date | Country | Kind |
---|---|---|---|
2012-227261 | Oct 2012 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/077515 | 10/9/2013 | WO | 00 |