The present invention relates to a method, a program and an apparatus for reducing data size of a plurality of images containing mutually similar information.
Currently, research on various technologies for achieving ultra-realistic communication is being advanced. One of such technologies is a 3D display technology for providing high-definition 3D displays using multi-view images. Such 3D displays are achieved by parallax images obtained by capturing images of a subject from a large number of views (e.g., 200 views).
One of issues for putting such 3D displays into practical use resides in reduction of data size of multi-view images. Multi-view images contain information obtained by observing a subject from many views, resulting in increased data size. Various proposals have been made for such an issue.
For example, NPD 1 discloses a method called adaptive distributed coding of multi-view images. More specifically, this method is based on a modulo operator, and images obtained from respective views are encoded without exchanging mutual information, and in decoding, information exchange among the views is permitted. In other words, the method disclosed in above-described NPD 1 is mainly intended to be applied to distributed source coding, distributed video frames coding, and the like, linkage among views is not taken into consideration in an encoding process. This is because the method disclosed in NPD 1 is mainly directed to devices of low power consumption (e.g., mobile terminals, etc.) not having very high throughput.
According to the method disclosed in NPD 1, side information is utilized in an encoding process and a decoding process. As this side information, an encoder uses an original image, and a decoder uses a subsampled image or a virtual image, or a combination thereof.
The inventors of the present application have acquired new knowledge that by causing mutually similar images to exchange information, the image quality after decoding can be improved, and the application range can be widened. By the conventionally proposed methods, however, information exchange is not performed between mutually similar images in an encoder and a decoder. As a result, how to optimize processing and the like are completely unknown.
The present invention was made to solve problems as described above, and has an object to provide a method, a program and an apparatus for reducing data size of a plurality of images containing mutually similar information more efficiently.
According to an aspect of the present invention, a method for reducing data size of a plurality of images containing mutually similar information is provided. The present method includes the steps of acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, generating side information which is information on a virtual view at a location of the target image, based on at least one of the target image and the synthesized image, generating a gradient image based on the side information, determining a factor in accordance with a gradient for each pixel location of the gradient image, and performing a modulo operation using, as a modulus, a factor corresponding to an intensity value at each pixel location of the target image, to generate a remainder image composed of remainders of respective pixel locations calculated by the modulo operation, and outputting the first reference image, the second reference image and the remainder image as information representing the target image, the first reference image and the second reference image.
Preferably, the step of generating side information includes a step of combining a subsampled image of the target image and the synthesized image to generate the side information.
More preferably, the step of generating side information includes steps of determining an error distribution based on a difference between an image obtained by upsampling the subsampled image and the synthesized image, and assigning information on the image obtained by upsampling the subsampled image to a region with a relatively large error, and assigning information on the synthesized image to a region with a relatively small error.
Alternatively, more preferably, the step of generating side information includes steps of determining an error distribution based on a difference between an image obtained by upsampling the subsampled image and the synthesized image, and assigning more information on the image obtained by upsampling the subsampled image to a region with a relatively large error, and assigning more information on the synthesized image to a region with a relatively small error.
Preferably, the step of generating a gradient image includes a step of generating an image in which a region in the side information with a larger textural change has a larger intensity.
Preferably, the step of generating a gradient image includes a step of generating a gradient image by each color component constituting the side information.
More preferably, the step of generating a gradient image includes a step of applying edge detection, smoothing, a series of morphological operations, and smoothing sequentially to a gray scale image of each color component constituting the side information.
Preferably, the step of generating a remainder image includes a step of selecting a factor corresponding to the gradient with reference to predetermined correspondence.
Preferably, in the step of generating a remainder image, a factor is determined for each pixel location of the gradient image by each color component.
Preferably, the step of selecting includes steps of selecting the target image as well as the first reference image and the second reference image based on a baseline distance when the plurality of images are multi-view images, and selecting the target image as well as the first reference image and the second reference image based on a frame rate when the plurality of images represent a sequence of video frames.
Preferably, the present method further includes the steps of acquiring the first reference image, the second reference image and the remainder image having been output, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, generating side information based on acquired information and generating a gradient image based on the side information, and determining a factor in accordance with the gradient for each pixel location of the gradient image, and among candidate values calculated by an inverse modulo operation using the determined factor as a modulus and a value at a corresponding pixel location of the remainder image as a remainder, determining one with the smallest difference from a value at a corresponding pixel location of the side information as an intensity value at a corresponding pixel location of the target image.
According to another aspect of the present invention, a program for reducing data size of a plurality of images containing mutually similar information is provided. The program causes a computer to execute the steps of acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, generating side information which is information on a virtual view at a location of the target image, based on at least one of the target image and the synthesized image, generating a gradient image based on the side information, determining a factor in accordance with a gradient for each pixel location of the gradient image, and performing a modulo operation using, as a modulus, a factor corresponding to an intensity value at each pixel location of the target image, to generate a remainder image composed of remainders of respective pixel locations calculated by the modulo operation, and outputting the first reference image, the second reference image and the remainder image as information representing the target image, the first reference image and the second reference image.
According to still another aspect of the present invention, an apparatus for reducing data size of a plurality of images containing mutually similar information is provided. The apparatus includes a means for acquiring the plurality of images, and selecting, from among the plurality of images, a target image as well as a first reference image and a second reference image similar to the target image, a means for generating a synthesized image corresponding to the target image based on the first reference image and the second reference image, a means for generating side information which is information on a virtual view at a location of the target image, based on at least one of the target image and the synthesized image, a means for generating a gradient image based on the side information, a means for determining a factor in accordance with a gradient for each pixel location of the gradient image and performing a modulo operation using, as a modulus, a factor corresponding to an intensity value at each pixel location of the target image, to generate a remainder image composed of remainders of respective pixel locations calculated by the modulo operation, and a means for outputting the first reference image, the second reference image and the remainder image as information representing the target image, the first reference image and the second reference image.
According to the present invention, data size of a plurality of images containing mutually similar information can be reduced more efficiently.
An embodiment of the present invention will be described in detail with reference to the drawings. It is noted that, in the drawings, the same or corresponding portions have the same reference characters allotted, and detailed description thereof will not be repeated.
[A. Application Example]
First, a typical application example will be described for easy understanding of a data size reduction method according to the present embodiment. It is noted that the application range of the data size reduction method according to the present embodiment is not limited to a structure which will be described below, but can be applied to any structure.
More specifically, 3D displays reproduction system 1 includes an information processing apparatus 100 functioning as an encoder to which respective images (parallax images) are input from plurality of cameras 10, and an information processing apparatus 200 functioning as a decoder which decodes data transmitted from information processing apparatus 100 and outputs multi-view images to 3D display device 300. Information processing apparatus 100 performs a data compression process which will be described later along with an encoding process, thereby generating data suitable for storage and/or transmission. As an example, information processing apparatus 100 wirelessly transmits data (compressed data) containing information on the generated multi-view images using a wireless transmission device 102 connected thereto. This wirelessly transmitted data is received by a wireless transmission device 202 connected to information processing apparatus 200 through a wireless base station 400 and the like.
3D display device 300 includes a display screen mainly composed of a diffusion film 306 and a condenser lens 308, a projector array 304 which projects multi-view images on the display screen, and a controller 302 for controlling images to be projected by respective projectors of projector array 304. Controller 302 causes a corresponding projector to project each parallax image contained in multi-view images output from information processing apparatus 200.
With such an apparatus structure, a viewer who is in front of the display screen is provided with a reproduced 3D display of subject 2. At this time, a parallax image entering a viewer's view is intended to be changed depending on the relative positions of the display screen and the viewer, giving the viewer an experience as if he/she is in front of subject 2.
Such 3D displays reproduction system 1 is expected to be used for general applications in a movie theater, an amusement facility and the like, and to be used for industrial applications as a remote medical system, an industrial design system and an electronic advertisement system for public viewing or the like.
[B. Overview]
Considering multi-view images, moving pictures or the like generated by capturing images of subject 2 with the camera array as shown in
The data size reduction method according to the present embodiment can be applied to multi-view data representation as described above, and can also be applied to distributed source coding. Alternatively, the data size reduction method according to the present embodiment can be applied to video frames representation, and can also be applied to distributed video frames coding. It is noted that the data size reduction method according to the present embodiment may be used alone or as part of pre-processing before data transmission.
Assuming multi-view images captured with the camera array as shown in
The unchanged image and the depth map are used to synthesize (estimate) a virtual view at the location of an image to be converted into a remainder image. This depth map can also be utilized in a decoding process (a process of reconverting a converted image/a process of returning a converted image to original image form). In the reconversion process, the depth map for an unchanged image may be reconstructed using that unchanged image.
In the present embodiment, side information which is information on a virtual view at the location of an image to be converted is used for generating a remainder image. When input images are multi-view images, a synthesized virtual image (virtual view) is used as side information. Alternatively, a virtual image may be synthesized using an unchanged image and a depth map, and this synthesized virtual image may be used as side information.
Furthermore, before conversion into a remainder image, a target image itself to be converted into a remainder image may be used as side information. In this case, a synthesized virtual image and/or a subsampled image of a target image will be used as side information because the target image cannot be used as it is in a decoding process.
On the other hand, if input images represent a sequence of video frames, a frame interpolated or extrapolated between frames can be used as side information.
When generating a remainder image from side information, a gradient image is generated. The value of each gradient is an integer value, and a modulo operation or an inverse modulo operation is executed using this integer value.
By the data size reduction method according to the present embodiment, a remainder image 194 with which information on target image 170 can be reconstructed from information on proximate reference images 172 and 182 is generated, and this remainder image 194 is output instead of target image 170. Basically, remainder image 194 interpolates information, in information possessed by target image 170, that is lacking with information contained in reference images 172 and 182, and redundancy can be eliminated as compared with the case of outputting target image 170 as it is. Therefore, data size can be reduced as compared with the case of outputting target image 170 and reference images 172, 182 as they are.
As will be described later, target image 170 and reference images 172, 182 can be selected at any intervals as long as they contain mutually similar information. For example, as shown in
As shown in
Target image 170 and reference images 172, 182 can also be selected at any frame intervals similarly for a sequence of video frames as long as they contain mutually similar information. For example, as shown in
The data size reduction method according to the present embodiment may be used alone or as part of pre-processing before data transmission.
It is noted that “image capturing” in the present specification may include processing of arranging some object on a virtual space and rendering an image from a view optionally set for this arranged object (that is, virtual image capturing on a virtual space) as in computer graphics, for example, in addition to processing of acquiring an image of a subject with a real camera.
In the present embodiment, cameras can be optionally arranged in the camera array for capturing images of a subject. For example, any arrangement, such as one-dimensional arrangement (where cameras are arranged on a straight line), two-dimensional arrangement (where cameras are arranged in a matrix form), circular arrangement (where cameras are arranged entirely or partially on the circumference), spiral arrangement (where cameras are arranged spirally), and random arrangement (where cameras are arranged without any rule), can be adopted.
[C. Hardware Configuration]
Next, an exemplary configuration of hardware for achieving the data size reduction method according to the present embodiment will be described.
Referring to
Processor 104 reads a program stored in hard disk 110 or the like, and expands the program in memory 106 for execution, thereby achieving the encoding process according to the present embodiment. Memory 106 functions as a working memory for processor 104 to execute processing.
Camera interface 108 is connected to plurality of cameras 10, and acquires images captured by respective cameras 10. The acquired images may be stored in hard disk 110 or memory 106. Hard disk 110 holds, in a nonvolatile manner, image data 112 containing the acquired images and an encoding program 114 for achieving the encoding process and data compression process. The encoding process which will be described later is achieved by processor 104 reading and executing encoding program 114.
Input unit 116 typically includes a mouse, a keyboard and the like to accept user operations. Display unit 118 informs a user of a result of processing and the like.
Communication interface 120 is connected to wireless transmission device 102 and the like, and outputs data output as a result of processing executed by processor 104, to wireless transmission device 102.
Referring to
Processor 204, memory 206, input unit 216, and display unit 218 are similar to processor 104, memory 106, input unit 116, and display unit 118 shown in
Projector interface 208 is connected to 3D display device 300 to output multi-view images decoded by processor 204 to 3D display device 300.
Communication interface 220 is connected to wireless transmission device 202 and the like to receive image data transmitted from information processing apparatus 100 and output the image data to processor 204.
Hard disk 210 holds, in a nonvolatile manner, image data 212 containing decoded images and a decoding program 214 for achieving a decoding process. The decoding process which will be described later is achieved by processor 204 reading and executing decoding program 214.
The hardware itself and its operation principle of each of information processing apparatuses 100 and 200 shown in
All or some of functions of information processing apparatus 100 and/or information processing apparatus 200 may be implemented by using a dedicated integrated circuit such as ASIC (Application Specific Integrated Circuit) or may be implemented by using programmable hardware such as FPGA (Field-Programmable Gate Array) or DSP (Digital Signal Processor).
In a data server for managing images, for example, a single information processing apparatus will execute the encoding process and decoding process, as will be described later.
[D. Overall Procedure]
Next, an overall procedure of the data size reduction method according to the present embodiment will be described.
Referring to
Subsequently, processor 104 generates side information based on part or all of the target image and the synthesized image (step S104). That is, processor 104 generates side information which is information on a virtual view at the location of the target image based on at least one of the target image and the synthesized image. The side information contains information necessary for reconstructing the target image from the remainder image and the reference images.
Subsequently, processor 104 generates a gradient image from the generated side information (step S106). Then, processor 104 calculates a remainder image of the target image from the generated gradient image (step S108).
Finally, processor 104 at least outputs the remainder image and the reference images as information corresponding to the target image and the reference images (step S110). That is, processor 104 outputs the two reference images and the remainder image as information representing the target image and the two reference images.
As a decoding process, processing in steps S200 to S210 is executed. Specifically, processor 204 acquires information output as a result of the encoding process (step S200). That is, processor 204 at least acquires the two reference images and the remainder image having been output.
Subsequently, processor 204 generates a synthesized image corresponding to the target image from the reference images contained in the acquired information (step S202).
Subsequently, processor 204 generates side information from the acquired information (step S204). Then, processor 204 generates a gradient image from the generated side information (step S206).
Then, processor 204 reconstructs the target image from the side information, the gradient image and the remainder image (step S208). Finally, processor 204 outputs the reconstructed target image and the reference images (step S210).
[E. Encoding Process]
Next, the encoding process (steps S100 to S110 in
(e1: Functional Configuration)
(e2:Acquisition of Input Image and Depth Map)
Image acquisition processing shown in step S100 of
Target image 170 and reference images 172, 182 must contain mutually similar information. Therefore, in the case of multi-view images, target image 170 and reference images 172, 182 are preferably selected based on their baseline distance. That is, target image 170 and reference images 172, 182 are selected in accordance with parallaxes produced therebetween. In the case of a sequence of video frames (moving picture), frames to be a target are selected based on the frame rate. That is, the processing in step S100 of
In
By the data size reduction method according to the present embodiment, a synthesized image 176 corresponding to the target image may be generated using depth maps of reference images 172 and 182, as will be described later. Therefore, a depth map 174 of reference image 172 and a depth map 184 of reference image 182 are acquired using any method.
For example, in the case of using a camera array as shown in
In
In the case where the input plurality of images are multi-view images, and when a depth map for a view cannot be used or when a distance camera cannot be used, depth information estimation unit 152 generates depth maps 174 and 184 corresponding to reference images 172 and 182, respectively. As a method for estimating a depth map by depth information estimation unit 152, various methods based on stereo matching with which energy optimization as disclosed in NPD 2 is used together can be adopted. For example, optimization can be done using graph cuts as disclosed in NPD 3.
Depth maps 174 and 184 generated by depth information estimation unit 152 are stored in depth information buffer 154.
It is noted that when the input plurality of images represent a sequence of video frames (moving picture), it is not always necessary to acquire depth maps.
The following description will mainly illustrate a case where one set of input data contains target image 170, reference image 172 and corresponding depth map 174 as well as reference image 182 and corresponding depth map 184, as a typical example.
(e3: Generation of Synthesized Image)
The processing of generating a synthesized image shown in step S102 of
When the input plurality of images represent a sequence of video frames (moving picture), interpolation processing or extrapolation processing is performed based on information on frames corresponding to two reference images 172 and 182, thereby generating information on a frame corresponding to target image 170, which can be used as synthesized image 176.
(e4: Generation of Side Information)
The processing of generating side information shown in step S104 of
Subsampling unit 156 generates a subsampled image 178 from target image 170. In
Any method can be adopted for the processing of generating subsampled image 178 in subsampling unit 156. For example, pixel information can be extracted from target image 170 at every predetermined interval for output as subsampled image 178.
Alternatively, subsampled image 178 may be generated through any filtering process (e.g., nearest neighbor method, interpolation, bicubic interpolation, or bilateral filter). For example, subsampled image 178 of any size can be generated by dividing target image 170 into regions of predetermined size (e.g., 2×2 pixels, 3×3 pixels, etc.), and in each region, performing linear or non-linear interpolation processing on information on a plurality of pixels contained in that region.
Typically, a method for generating side information 190 can be selected optionally from among the following four methods (a) to (d).
(a) In the case where target image 170 itself is used as side information 190:
Side information selection unit 160 directly outputs input target image 170 as side information 190. Since target image 170 cannot be used as it is in a decoding process, a synthesized image generated based on reference images is used as side information.
(b) In the case where subsampled image 178 of target image 170 is used as side information 190:
Side information selection unit 160 directly outputs subsampled image 178 generated by subsampling unit 156.
(c) In the case where synthesized image 176 is used as side information 190:
Side information selection unit 160 directly outputs synthesized image 176 generated by image synthesis unit 158.
(d) In the case where the combination of subsampled image 178 and synthesized image 176 is used as side information 190:
Side information selection unit 160 generates side information 190 in accordance with a method which will be described later. That is, the processing of generating side information shown in step S104 of
More specifically, side information selection unit 160 first calculates a weighting factor used for combination. This weighting factor is associated with a reliability distribution of synthesized image 176 with respect to subsampled image 178 of target image 170. That is, the weighting factor is determined based on an error between synthesized image 176 and subsampled image 178 (target image 170) (or the degree of matching therebetween). The calculated error distribution is equivalent to the inverse of the reliability distribution. It can be considered that, as the error is smaller, the reliability is higher. That is, the region with a larger error is considered that the reliability of synthesized image 176 is lower. Thus, more information on subsampled image 178 (target image 170) is assigned to such a region. On the other hand, the region with a smaller error is considered that the reliability of synthesized image 176 is higher. Thus, more information on synthesized image 176 having lower redundancy is assigned.
In this way, when the scheme (d) is selected, side information selection unit 160 determines the error distribution based on the difference between upsampled image 179 having been obtained by upsampling subsampled image 178 and synthesized image 176. Side information selection unit 160 combines subsampled image 178 (or upsampled image 179) and synthesized image 176 based on determined error distribution R, to generate side information 190. Although various methods can be considered as a method for generating side information 190 using calculated error distribution R, the following processing examples can be adopted, for example.
(i) Processing Example 1: Binary Weighted Combination
In this processing example, calculated error distribution R is divided into two regions using any threshold value. Typically, a region where the error is higher than the threshold value is called a Hi region, and a region where the error is smaller than the threshold value is called a Lo region. Then, information on subsampled image 178 (substantially, upsampled image 179) or synthesized image 176 is assigned to each pixel of side information 190 in correspondence with the Hi region and the Lo region of error distribution R. More specifically, the value at a pixel location of upsampled image 179 having been obtained by upsampling subsampled image 178 is assigned to a corresponding pixel location of side information 190 corresponding to the Hi region of error distribution R, and the value at a pixel location of synthesized image 176 is assigned to a corresponding pixel location corresponding to the Lo region of error distribution R.
That is, if upsampled image 179 (image obtained by upsampling subsampled image 178) is denoted as SS and synthesized image 176 is denoted as SY, the value at a pixel location (x, y) of side information 190 (denoted as “SI”) is expressed as follows using a predetermined threshold value TH.
In this way, in this processing example, side information selection unit 160 assigns information on upsampled image 179 having been obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error.
(ii) Processing Example 2: Discrete Weighted Combination
In this processing example, calculated error distribution R is divided into n types of regions using (n−1) threshold values. Assuming the number k of the divided regions as 1, 2, . . . , and n in the order that the error increases, the value at the pixel location (x, y) of side information 190 (SI) is expressed as follows using the number k of the divided regions.
SI(x, y)=(k/n)×SY(x, y)+(1−k/n)×SS(x, y)
In this way, in this processing example, side information selection unit 160 assigns information on upsampled image 179 having been obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error.
(iii) Processing Example 3: Continuous Weighted Combination
In this processing example, an inverse value of the error at a pixel location is considered as a weighting factor, and side information 190 is calculated using this. Specifically, a value SI(x, y) at the pixel location (x, y) of side information 190 is expressed as follows.
SI(x, y)=(1/R(x, y))×SY(x, y)+(1−1/R(x, y))×SS(x, y)
In this way, in this processing example, side information selection unit 160 assigns information on upsampled image 179 having been obtained by upsampling subsampled image 178 to a region with a relatively large error, and assigns information on synthesized image 176 to a region with a relatively small error. In this processing example, upsampled image 179 (subsampled image 178) is dominant as the error is larger, and synthesized image 176 is dominant as the error is smaller.
(e5: Generation of Gradient Image)
The processing of generating a gradient image shown in step S106 of
Typically, gradient image 192 is generated by the following procedure.
(a) Resize side information 190 to an image size of a remainder image to be output.
(b) Apply Gaussian filtering to the resized side information to remove noise (Gaussian smoothing).
(c) Split the filtered side information to color components (i.e., a gray scale image is generated for each color component).
(d) Execute operations of (d1) to (d4) for the gray scale image of each color component.
(d1) Edge detection
(d2) Gaussian smoothing (once or more) (or Median filter)
(d3) a series of morphological operations (e.g., dilation (once or more), erosion (once or more), dilation (once or more))
(d4) Gaussian smoothing (once or more)
Through the operations as described above, a gradient image is generated for each color component constituting side information 190. That is, the processing of generating gradient image 192 shown in S106 of
The procedure described herein is merely an example, and the details of processing, procedure and the like of Gaussian smoothing and morphological operations can be designed appropriately.
Furthermore, processing of generating a pseudo gradient image may be adopted. That is, any filtering process may be adopted as long as an image in which a region with a larger textural change in side information 190 has a larger intensity can be generated.
(e6: Generation of Remainder Image)
The processing of generating a remainder image shown in step S108 of
In this way, the processing of generating a remainder image shown in step S108 of
As a method for selecting factor D, any method can be adopted. For example, the value of gradient image 192 itself may be selected as factor D. However, in order to improve the image quality after decoding, factor D is determined nonlinearly with respect to gradient image 192 in the present embodiment. Specifically, with reference to Lookup table 166, factor D corresponding to each pixel location of gradient image 192 is selected. Here, factor D is determined for each pixel location of each color component contained in gradient image 192.
In this way, the processing of generating a remainder image shown in step S108 of
Returning to
Modulo operation unit 168 performs a modulo operation on the intensity value at each pixel location using corresponding factor D as a modulus. More specifically, a minimum m with which the intensity value P=q×D+m (q≧0, D>0) at each pixel location holds is determined. Herein, q is a quotient, and m is a remainder.
Since “intensity value P=k×D+m” is calculated in processing of reconstructing target image 170 (decoding process) which will be described later, remainder m calculated at each pixel location by each color component is stored as remainder image 194. That is, remainder m at each pixel location constitutes remainder image 194. In
Remainder image 194 may be resized to any size using a well-known downsampling method or upsampling method.
As a final output of the encoding process by the data size reduction method according to the present embodiment, at least reference images 172 and 182 as input and remainder image 194 as a processing result are stored. As an option, depth map 174 of reference image 172 and depth map 184 of reference image 182 may be output. As another option, subsampled image 178 may be output together with remainder image 194. These pieces of information (images) added as options are suitably selected in accordance with the details of processing in the decoding process.
The above-described description has been given paying attention to the set of one target image 170 and two reference images 172, 182. Similar processing is executed on all of target images and respectively corresponding reference images set for a plurality of input images (multi-view images or a sequence of video frames).
(e7: Processing Example)
A processing example of an encoding process by the data size reduction method according to the present embodiment will now be described.
[F. Decoding Process]
Next, the details of a decoding process (steps S200 to S210 of
(f1: Functional Configuration)
Referring to
Information processing apparatus 200 reconstructs original target image 170 using encoded information (reference images 172, 182 and remainder image 194). For example, as shown in
(f2: Acquisition of Input Data and Depth Map)
The acquisition processing in the encoding process shown in step S200 of
On the other hand, if depth maps 174 and 184 are not input, depth information estimation unit 252 generates depth maps 174 and 184 corresponding to reference images 172 and 182, respectively. Since the method for estimating depth maps in depth information estimation unit 252 is similar to the above-described method for estimating depth maps in depth information estimation unit 152 (
(f3:Generation of Synthesized Image)
The processing of generating a synthesized image shown in step S202 of
(f4: Generation of Side Information)
The processing of generating side information shown in step S204 of
As described above, subsampled image 178 may not be contained in input data. In this case, side information selection unit 160 generates side information 290 based on synthesized image 276 generated by image synthesis unit 258.
On the other hand, if subsampled image 178 is contained in input data, side information selection unit 160 may use subsampled image 178 as side information 290, or may generate side information 290 by the combination of subsampled image 178 and synthesized image 276. For such processing of generating side information by the combination of subsampled image 178 and synthesized image 276, binary weighted combination, discrete weighted combination, continuous weighted combination, or the like can be adopted using the error distribution as described above. Since these processes have been described above, a detailed description thereof will not be repeated.
(f5: Generation of Gradient Image)
The processing of generating a gradient image shown in step S206 of
(f6: Reconstruction of Target Image)
The processing of reconstructing a target image shown in step S208 of
In this inverse modulo operation, factor D used when generating remainder image 194 in the encoding process is estimated (selected) based on gradient image 292. That is, factor selection unit 264 selects factor D in accordance with the value at each pixel location of gradient image 292. Although any method can be adopted as a method for selecting this factor D, factor D at each pixel location is selected with reference to Lookup table 266 in the present embodiment. Lookup table 266 is similar to Lookup table 166 (
Inverse modulo operation unit 268 performs an inverse modulo operation using selected factor D and remainder m for each pixel location, as well as corresponding value SI of side information 290. More specifically, inverse modulo operation unit 268 calculates a list of candidate values C(q′) for the intensity value of reconstructed image 294 in accordance with the expression C(q′)=q′×D+m (where q′≧0, C(q′)<256), and among these calculated candidate values C(q′), one with the smallest difference (absolute value) from corresponding value SI of side information 290 is determined as a corresponding intensity value of reconstructed image 294.
For example, considering the case where factor D=8, remainder m=3, and corresponding value SI of side information 290=8, candidate values C(q′) are obtained as follows:
Candidate value C(0)=0×8+3=3 (difference from SI=5)
Candidate value C(1)=1×8+3=11 (difference from SI=3)
Candidate value C(2)=2×8+3=19 (difference from SI=11)
Among these candidate values C(q′), candidate value C(1) with the smallest difference from corresponding value SI of side information 290 is selected, and the corresponding intensity value of reconstructed image 294 is determined as “11”. The intensity value at each pixel location of reconstructed image 294 is thereby determined by each color component.
In this way, the process of reconstructing a target image shown in step S208 of
As the final output of the decoding process according to the present embodiment, at least reconstructed image 294 obtained as a result of processing as well as reference images 172 and 182 as input are output and/or stored. As an option, depth map 174 of reference image 172 and depth map 184 of reference image 182 may be output. Furthermore, reconstructed image 294 may be resized to any size depending on the difference in size from original target image 170 and/or remainder image 194.
Although the above-described description has been given paying attention to the set of one target image 170 and two reference images 172 and 182, a similar process is executed on all target images and respective corresponding reference images set for a plurality of input images (multi-view images or a sequence of video frames).
[G. Advantages]
According to the present embodiment, side information which is more appropriate than in conventional cases can be generated, and reconstructed images can be improved in quality by using the side information according to the present embodiment.
The present embodiment is applicable to various applications for image processing systems, such as data representation of multi-view images or a new data format before image compression.
According to the present embodiment, more efficient representation can be derived using a remainder-based data format for large-scale multi-view images. Moreover, the converted data format can be used for devices with small power capacity, such as mobile devices. Therefore, according to the present embodiment, the possibility of providing 3D features more easily on mobile devices or low power consumption devices can be increased.
It should be understood that the embodiment disclosed herein is illustrative and non-restrictive in every respect. The scope of the present invention is defined by the claims not by the description above, and is intended to include any modification within the meaning and scope equivalent to the terms of the claims.
Number | Date | Country | Kind |
---|---|---|---|
2012-227262 | Oct 2012 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2013/077516 | 10/9/2013 | WO | 00 |