1. Field of the Invention
The present invention relates to a free viewpoint image combination technique using data of images captured from a plurality of viewpoints and distance information and, more particularly, to a free viewpoint image combination technique of data of multi-viewpoint images captured by a camera array image capturing device.
2. Description of the Related Art
In recent years, 3D contents are utilized actively mainly in the cinema industry. In order to achieve a higher sense of presence, the development of the multi-viewpoint image capturing technique and the multi-viewpoint display technique is in progress.
In the two-viewpoint display, a glasses-type 3D display is the mainstream. By generating image data for the right eye and image data for the left eye and switching the images viewed by each eye by the control of the glasses, a viewer can view a stereoscopic image. Further, as to the multi-viewpoint display, the lenticular lens and the glasses-less 3D display using the parallax barrier system are developed and utilized mainly for the purposes of digital signage.
In the image capturing device also, a stereo camera is developed for two-viewpoint image capturing and the camera array image capturing device (also referred to simply as a “camera array”, as known as camera array system, multiple lens camera, and the like), such as the Plenoptic camera and the camera array system, is developed for multi-viewpoint (three- or more-viewpoint) image capturing. Further, the research in the field called computational photography capable of capturing multi-viewpoint images by devising the image capturing device with comparatively less modification of the already existing camera configuration is aggressively in progress.
In the case where the multi-viewpoint image captured by the camera array image capturing device is displayed in the multi-viewpoint display device, it is necessary to adjust the difference in the number of viewpoints between the image capturing device and the display device. For example, in the case where a three-viewpoint image captured by a triple lens camera is displayed in a nine-viewpoint glasses-less 3D display, it is necessary to generate complementary images corresponding to six viewpoints from which no image is captured. Further, in the case where an image captured by a stereo camera is displayed in a glasses-type 3D display, although both have two viewpoints, the parallax optimum to viewing and listening is different depending on the display, and therefore, there is a case where the image is reconfigured from a viewpoint different from that of the captured image and output.
In order to implement the use cases as above, as a technique to generate image data from a viewpoint other than that of a captured image, a free viewpoint image combination technique is developed.
As a related technique, the standardization of MPEG-3DV (3D Video Coding) is in progress. MPEG-3DV is a scheme to encode depth information as well as multi-viewpoint image data. On the assumption that from the input of multi-viewpoint image data, outputs are produced for display devices with various numbers of viewpoints, such as the already-existing 2D display, the glasses-type 3D display, and the glasses-less 3D display, the number of viewpoints is controlled by use of the free viewpoint image combination technique. Further, as a technique to view and listen to a multi-view point video in a dialog manner also, the free viewpoint image combination technique is developed (Japanese Patent Laid-Open No. 2006-012161).
As a problem in the free viewpoint image combination technique, mention is made of improvement in image quality of a combined image and suppression of the amount of calculation. In free viewpoint image combination, an image from a virtual viewpoint is combined from a group of multi-viewpoint reference images. First, an image from a virtual viewpoint is generated from each reference image, but there occurs a deviation between generated virtual viewpoint images due to an error of distance information. Next, a group of virtual viewpoint images generated from each reference image are combined, but in the case where a group of virtual viewpoint images between which a deviation exists are combined, blurring occurs in the resultant combined image. Further, as the number of reference images and the number of image regions utilized for image combination increase, the amount of calculation will increase.
The image processing apparatus according to the present invention has an identification unit configured to identify an occlusion region in which an image cannot be captured from a first viewpoint position, a first acquisition unit configured to acquire first image data of a region other than the occlusion region obtained in the case where an image of a subject is captured from an arbitrary viewpoint position based on a three-dimensional model generated by using first distance information indicative of the distance from the first viewpoint position to the subject and taking the first viewpoint position as a reference, a second acquisition unit configured to acquire second image data of the occlusion region obtained in the case where the image of the subject is captured from the arbitrary viewpoint position based on a three-dimensional model of the occlusion region generated by using second distance information indicative of the distance from a second viewpoint position different from the first viewpoint position to the subject and taking the second viewpoint position as a reference, and a generation unit configured to generate combined image data obtained in the case where the image of the subject is captured from the arbitrary viewpoint position by combining the first image data and the second image data.
According to the present invention, it is possible to perform free viewpoint image combination using multi-view point image data with high image quality and at a high speed.
Further features of the present invention will become apparent from the following description of exemplary embodiments (with reference to the attached drawings).
Hereinafter, with reference to the attached drawings, preferred embodiments of the present invention are explained.
A chassis of an image capturing device 100 includes nine image capturing units 101 to 109 which acquire color image data and an image capturing button 110. All the nine image capturing units have the same focal length and are arranged uniformly on a square lattice.
Upon pressing down of the image capturing device 100 by a user, the image capturing units 101 to 109 receive optical information of a subject by a sensor (image capturing element) and the received signal is A/D-converted and a plurality of color images (digital data) is acquired at the same time.
By the camera array image capturing device described above, it is possible to obtain a group of color images (multi-viewpoint image data) of the same subject captured from a plurality of viewpoint positions.
Here, the number of image capturing units is set to nine, but the number of image capturing units is not limited to nine. The present invention can be applied as long as the image capturing device has a plurality of image capturing units. Further, the example in which the nine image capturing units are arranged uniformly on a square lattice is explained here, but the arrangement of the image capturing units is arbitrary. For example, it may also be possible to arrange them radially or linearly or quite randomly.
A central processing unit (CPU) 201 totally controls each unit described below.
A RAM 202 functions as a main memory, a work area, etc. of the CPU 201.
A ROM 203 stores control programs etc. executed by the CPU 201.
A bus 204 is a transfer path of various kinds of data and, for example, digital data acquired by the image capturing units 101 to 109 is transferred to a predetermined processing unit via the bus 204.
An operation unit 205 corresponds to buttons, mode dial, etc. and via which instructions of a user are input.
A display unit 206 displays captured images and characters. In the display unit 206, a liquid crystal display is widely used in general. Further, the display unit 206 may have a touch screen function and in such a case, it is also possible to handle instructions of a user using the touch screen as an input to the operation unit 205.
A display control unit 207 performs display control of images and characters displayed in the display unit 206.
An image capturing unit control unit 208 performs control of an image capturing system based on instructions from the CPU 201, such as focusing, shutter releasing and closing, and stop adjustment.
A digital signal processing unit 209 performs various kinds of processing, such as white balance processing, gamma processing, and noise reduction processing, on the digital data received via the bus 204.
An encoder unit 210 performs processing to convert digital data into a predetermined file format.
An external memory control unit 211 is an interface to connect to a PC and other media (for example, hard disk, memory card, CF card, SD card, USB memory).
An image processing unit 212 calculates distance information from the multi-viewpoint image data acquired by the image capturing units 101 to 109 or the multi-viewpoint image data output from the digital signal processing unit 209, and generates free viewpoint combined image data. Details of the image processing unit 212 will be described later.
The image capturing device includes components other than those described above, but they are not the main purpose of the present invention, and therefore, explanation thereof is omitted.
The image capturing units 101 to 109 include lenses 301 to 303, a stop 304, a shutter 305, an optical low-pass filter 306, an iR cut filter 307, a color filter 308, a sensor 309, and an A/D conversion unit 310. The lenses 301 to 303 are a zoom lens 301, a focus lens 302, and a camera shake correction lens 303, respectively. The sensor 309 is a sensor, for example, such as a CMOS and CCD.
In the case where the sensor 309 detects an amount of light of the subject, the detected amount of light is converted into a digital value by the A/D conversion unit 310 and output to the bus 204 as digital data.
In the present embodiment, the configuration and processing of each unit are explained on the premise that all images captured by the image capturing units 101 to 109 are color images, but part of or all the images captured by the image capturing units 101 to 109 may be changed into monochrome images. In such a case, the color filter 308 is omitted.
The image processing unit 212 has a distance information estimation unit 401, a separation information generation unit 402, and a free viewpoint image generation unit 403. The image processing unit 212 in the embodiment is explained as one component within the image capturing device, but it may also be possible to implement the function of the image processing unit 212 by an external device, such as a PC. That is, it is possible to implement the image processing unit 212 in the present embodiment as one function of the image capturing device or as an independent image processing apparatus.
Hereinafter, each component of the image processing unit 212 is explained.
The color multi-viewpoint image data acquired by the image capturing units 101 to 109 or the color multi-viewpoint image data output from the digital signal processing unit 209 (in the present embodiment, the number of viewpoints is nine in each case) input to the image processing unit 212 is first sent to the distance information estimation unit 401.
The distance information estimation unit 401 estimates distance information indicative of the distance from the image capturing unit to the subject (hereinafter, referred to as “distance information”) for each image at each viewpoint within the input multi-viewpoint image data. Details of the distance information estimation will be described later. The configuration may also be such that equivalent distance information is input from outside instead of the provision of the distance information estimation unit 401.
The separation information generation unit 402 generates information (separation information) that serves as a basis on which each viewpoint image configuring the multi-viewpoint image data is separated into two layers (a boundary layer that is a boundary of the subject and a main layer other than the boundary layer that is not a boundary of the subject). Specifically, each pixel within each viewpoint image is classified into two kinds of pixels, that is, a boundary pixel adjacent to the boundary of the subject (hereinafter, referred to as an “object boundary”) and a normal pixel other than the boundary pixel, and information enabling identification of the kind to which each pixel corresponds is generated. Details of separation information generation will be described later.
The free viewpoint image generation unit 403 generates image data at an arbitrary viewpoint position (free viewpoint image data) by rendering each three-dimensional model of the main layer (including the auxiliary main layer) and the boundary layer. Details of free viewpoint image generation will be described later.
A method for estimating distance information in the distance information estimation unit 401 is explained.
At step 501, the distance information estimation unit 401 applies an edge-preserving smoothing filter to one viewpoint image (target viewpoint image) within the nine-viewpoint image data that is input.
At step 502, the distance information estimation unit 401 divides the target viewpoint image into regions of a predetermined size (hereinafter, referred to as “small regions”). Specifically, neighboring pixels (pixel group) the color difference between which is equal to or less than a threshold value are integrated sequentially and the target viewpoint image is finally divided into small regions having a predetermined number of pixels (for example, regions having 100 to 1,600 pixels). The threshold value is set to a value appropriate to determine that colors to be compared are about the same color, for example, to “6” in the case where RGB are quantized by eight bits (256 colors), respectively. At first, neighboring pixels are compared and in the case where the color difference is equal to or less than the above-mentioned threshold value, both pixels are integrated. Next, the average colors of the integrated pixel groups are obtained, respectively, and compared with the average colors of neighboring pixel groups, and then, the pixel groups the color difference between which is equal to or less than the threshold value are integrated. The processing as described above is repeated until the size (number of pixels) of the pixel group reaches the small region configured by the fixed number of pixels described above.
At step 503, the distance information estimation unit 401 determines whether the division into small regions is completed for all the nine viewpoint images included in the nine-viewpoint image data. In the case where the division into small regions is completed, the procedure proceeds to step 504. On the other hand, in the case where the division into small regions is not completed yet, the procedure returns to step 501 and the processing to apply the smoothing filter and the processing to divide into small regions are performed by using the next viewpoint image as the target viewpoint image.
At step 504, the distance information estimation unit 401 calculates the initial amount of parallax of each divided small region for all the viewpoint images by referring to the viewpoint images around each viewpoint image (here, the viewpoint images located above, below, to the right, and to the left of each viewpoint image). For example, in the case where the initial amount of parallax of the viewpoint image relating to the image capturing unit 105 at the center is calculated, each viewpoint image of the image capturing units 102, 104, 106, and 108 is referred to. In the case of the viewpoint image relating to the image capturing unit at the end part, for example, for the viewpoint image of the image capturing unit 107, each viewpoint image of the image capturing units 104 and 108 is referred to and in the case of the viewpoint image of the image capturing unit 108, each viewpoint image of the image capturing units 105, 107, and 109 is referred to and thus the initial amount of parallax is calculated. The calculation of the initial amount of parallax is performed as follows.
First, each small region of the viewpoint image for which the initial amount of parallax is to be found and the corresponding small region in the viewpoint image to be referred to (reference viewpoint image) are compared. Here, the corresponding small region is the small region in the reference viewpoint image shifted by the amount corresponding to the parallax relative to the position of each small region of the viewpoint image for which the initial amount of parallax is to be found.
Next, the color difference between each pixel of the viewpoint image for which the initial amount of parallax is to be found and the corresponding pixel in the reference viewpoint image shifted by the amount corresponding to the parallax is calculated for all the pixels within the small region and a histogram is created.
Then, each histogram is created by changing the amount of parallax.
In the histogram obtained in this manner, the amount of parallax whose peak is high is the initial amount of parallax. The corresponding region in the viewpoint image to be referred to is set by adjusting the amount of parallax in the longitudinal direction and in the transverse direction. The reason is that the amount of parallax of one pixel in the longitudinal direction and the amount of parallax of one pixel in the transverse direction do not indicate the same distance.
The processing hitherto is explained by using a specific example.
Here, the comparison of the corresponding regions with the small region 602 as the target is performed by assuming the target viewpoint image as the viewpoint image of the image capturing unit 105 and the viewpoint image to be referred to as the viewpoint image of the image capturing unit 104.
Explanation is returned to the flowchart in
At step 505, the distance information estimation unit 401 repeatedly adjusts the initial amount of parallax by using the color difference between small regions, the difference in the initial amount of parallax, etc. Specifically, the initial amount of parallax is adjusted based on the idea that small regions adjacent to each other and the color difference between which is small have a strong possibility of having similar amounts of parallax and that small regions adjacent to each other and the difference in the initial amount of parallax between which is small have a strong possibility of having similar amounts of parallax.
At step 506, the distance information estimation unit 401 obtains distance information by performing processing to convert the amount of parallax obtained by the adjustment of the initial amount of parallax into a distance. The distance information is calculated by (camera interval×focal length)/(amount of parallax×length of one pixel), but the length of one pixel is different between the longitudinal direction and the transverse direction, and therefore, necessary conversion is performed so that the amount of parallax in the longitudinal direction and that in the transverse direction indicate the same distance.
Further, the converted distance information is quantized, for example, into eight bits (256 gradations). Then, the distance information quantized into eight bits is saved as 8-bit grayscale (256-gradation) image data (distance map). In the grayscale image of the distance information, the shorter the distance between the object and the camera, the closer to white (value: 255), the color of the object is, and the greater the distance between the object and the camera, the closer to black (value: 0), the color of the object is. For example, an object region 800 in
In this manner, the distance information corresponding to each pixel of each viewpoint image is calculated. In the present embodiment, the distance is calculated by dividing the image into small regions including a predetermined number of pixels, but it may also be possible to use another estimation method that obtains the distance based on the parallax between multi-viewpoint images.
The distance information corresponding to each viewpoint image obtained by the above-mentioned processing and the multi-viewpoint image data are sent to the subsequent separation information generation unit 402 and the free viewpoint image generation unit 403. It may also be possible to send the distance information corresponding to each viewpoint image and the multi-viewpoint image data only to the separation information generation unit 402 and to cause the separation information generation unit 402 to send the data to the free viewpoint image generation unit 403.
Next, processing to separate each viewpoint image into two layers, that is, the boundary layer in the vicinity of the boundary of the object in the image and the main layer other than the boundary of the object in the separation information generation unit 402 is explained.
At step 901, the separation information generation unit 402 acquires the multi-viewpoint image data and the distance information obtained by the distance information estimation processing.
At step 902, the separation information generation unit 402 extracts the object boundary within the viewpoint image. In the present embodiment, the portion where the difference between the distance information of the target pixel and the distance information of the neighboring pixel (hereinafter, referred to as a “difference in distance information”) is equal to or more than the threshold value is identified as the boundary of the object. Specifically, the object boundary is obtained as follows.
First, scan is performed in the longitudinal direction, the difference in distance information and the threshold value are compared and the pixel whose difference in distance information is equal to or more than a threshold value is identified. Next, scan is performed in the transverse direction, the difference in distance information and the threshold value are compared similarly and the pixel whose difference in distance information is equal to or more than the threshold value is identified. Then, the sum-set of the pixels identified in the longitudinal direction and in the transverse direction, respectively, is calculated and identified as the object boundary. The threshold value is set to a value, for example, such as “10”, in the case where the distance information is quantized into eight bits (0 to 255).
Here, the object boundary is obtained based on the distance information, but it may also be possible to use another method, such as a method for obtaining the object boundary by dividing an image into regions. However, it is desirable for the object boundary obtained by the region division of the image and the object boundary obtained from the distance information to agree with each other as much as possible. In the case where the object boundary is obtained by the region division of the image, it is suggested to correct the distance information in accordance with the obtained object boundary.
At step 903, the separation information generation unit 402 classifies each pixel within the viewpoint image into two kinds of pixels, that is, the boundary pixel and the normal pixel. Specifically, with reference to the distance information acquired at step 901, the pixel adjacent to the object boundary identified at step 902 is determined to be the boundary pixel.
At step 904, the separation information generation unit 402 determines whether the classification of the pixels of all the viewpoint images included in the input multi-viewpoint image data is completed. In the case where there is an unprocessed viewpoint image not subjected to the processing yet, the procedure returns to step 902 and the processing at step 902 and step 903 is performed on the next viewpoint image. On the other hand, in the case where the classification of the pixels of all the viewpoint images is completed, the procedure proceeds to step 905.
At step 905, the separation information generation unit 402 sends separation information capable of identifying the boundary pixel and the normal pixel to the free viewpoint image generation unit 403. For the separation information, for example, it may be considered that a flag “1” is attached separately to the pixel determined to be the boundary pixel and a flag “0” to the pixel determined to be the normal pixel. However, in the case where the boundary pixels are identified, it becomes clear that the rest of the pixels are the normal pixels, and therefore, it is sufficient for the separation information to be information capable of identifying the boundary pixels. In the free viewpoint image generation processing to be described later, a predetermined viewpoint image is separated into two layers (that is, the boundary layer configured by the boundary pixels and the main layer configured by the normal pixels) by using the separation information as described above.
Subsequently, free viewpoint image generation processing in the free viewpoint image generation unit 403 is explained.
At step 1101, the free viewpoint image generation unit 403 acquires the position information of an arbitrary viewpoint (hereinafter, referred to as a “free viewpoint”) in the free viewpoint image to be output. For example, the position information of the free viewpoint is given by coordinates as follows. In the present embodiment, it is assumed that coordinate information indicative of the position of the free viewpoint is given in the case where the position of the image capturing unit 105 is taken to be the coordinate position (0.0, 0.0) that serves as a reference. In this case, the image capturing unit 101 is represented by (1.0, 1.0), the image capturing unit 102 by (0.0, 1.0), the image capturing unit 103 by (−1.0, 1.0), and the image capturing unit 104 by (1.0, 0.0), respectively. Similarly, the image capturing unit 106 is represented by (−1.0, 0.0), the image capturing unit 107 by (1.0, −1.0), the image capturing unit 108 by (0.0, −1.0), and the image capturing unit 109 by (−1.0, −1.0). Here, in the case where a user desires to combine an image with the middle position of the four image capturing units 101, 102, 104, and 105 as a free viewpoint, it is necessary for the user to input the coordinates (0.5, 0.5). It is a matter of course that the method for defining coordinates is not limited to the above and it may also be possible to take the position of the image capturing unit other than the image capturing unit 105 to be a coordinate position that serves as a reference. Further, the method for inputting the position information of the free viewpoint is not limited to the method for directly inputting the coordinates described above and it may also be possible to, for example, display a UI screen (not shown schematically) showing the arrangement of the image capturing units on the display unit 206 and to specify a desired free viewpoint by the touch operation etc.
Although not explained as the target of acquisition at this step, the distance information corresponding to each viewpoint image and the multi-viewpoint image data are also acquired from the distance information estimation unit 401 or the separation information generation unit 402 as described above.
At step 1102, the free viewpoint image generation unit 403 sets a plurality of viewpoint images to be referred (hereinafter, referred to as a “reference image set”) in generation of the free viewpoint image data at the position of a specified free viewpoint. In the present embodiment, the viewpoint images captured by the four image capturing units close to the position of the specified free viewpoint are set as a reference image set. The reference image set in the case where the coordinates (0.5, 0.5) are specified as the position of the free viewpoint as described above is configured by the four viewpoint images captured by the image capturing units 101, 102, 104, and 105 as a result. As a matter of course, the number of viewpoint images configuring the reference image set is not limited to four and the reference image set may be configured by three viewpoint images around the specified free viewpoint. Further, it is only required for the reference image set to include the position of the specified free viewpoint, and it may also be possible to set viewpoint images captured by four image capturing units (for example, the image capturing units 101, 103, 107, and 109) not immediately adjacent to the specified free viewpoint position as the reference image set.
At step 1103, the free viewpoint image generation unit 403 performs processing to set one representative image and one or more auxiliary images on the set reference image set. In the present embodiment, among the reference image set, the viewpoint image closest to the position of the specified free viewpoint is set as the representative image and the other viewpoint images are set as the auxiliary images. For example, it is assumed that the coordinates (0.2, 0.2) are specified as the position of the free viewpoint and the reference image set configured by the four viewpoint images captured by the image capturing units 101, 102, 104, and 105 is set. In this case, the viewpoint image captured by the image capturing unit 105 closest to the position (0.2, 0.2) of the specified free viewpoint is set as the representative image and respective viewpoint images captured by the image capturing units 101, 102, and 104 are set as the auxiliary images. As a matter of course, the method for determining the representative image is not limited to this and another method may be used in accordance with the arrangement of each image capturing unit etc., for example, such as a method in which the viewpoint image captured by the image capturing unit closer to the camera center is set as the representative image.
At step 1104, the free viewpoint image generation unit 403 performs processing to generate a three-dimensional model of the main layer of the representative image. The three-dimensional model of the main layer is generated by construction of a square mesh by interconnecting four pixels including the normal pixels not adjacent to the object boundary.
To the X coordinate and the Y coordinate of the square mesh in units of one pixel constructed in the manner described above, the global coordinates calculated from the camera parameters of the image capturing unit 100 correspond and, to the Z coordinate, the distance from each pixel to the subject obtained from the distance information corresponds. Then, the three-dimensional model of the main layer is generated by texture-mapping the color information of each pixel onto the square mesh.
Explanation is returned to the flowchart in
At step 1105, the free viewpoint image generation unit 403 performs rendering of the main layer of the representative image at the viewpoint position of the auxiliary image.
Explanation is returned to the flowchart in
At step 1106, the free viewpoint image generation unit 403 generates an auxiliary main layer of the auxiliary image. Here, the auxiliary main layer corresponds to a difference between the main layer in the auxiliary image and the rendered image obtained at step 1105 (image obtained by rendering the main layer of the representative image at the viewpoint position of the auxiliary image).
Explanation is returned to the flowchart in
At step 1107, the free viewpoint image generation unit 403 performs processing to generate a three-dimensional model of the auxiliary main layer of the auxiliary image. The three-dimensional model of the auxiliary main layer is generated by the same processing as that of the three-dimensional model of the main layer of the representative image explained at step 1104. Here, the pixels set as the auxiliary main layer are handled as the normal pixels and other pixels as the boundary pixels. The three-dimensional model of the auxiliary main layer is generated by construction of a square mesh by interconnecting four pixels including the normal pixel not adjacent to the object boundary. The rest of the processing is the same as that at step 1104, and therefore, explanation is omitted here. Compared to the three-dimensional modeling of the main layer of the representative image, the number of pixels to be processed as the normal pixel in the three-dimensional modeling of the auxiliary main layer of the auxiliary image is small, and therefore, the amount of calculation necessary for generation of the three-dimensional model is small.
At step 1108, the free viewpoint image generation unit 403 performs rendering of the main layer of the representative image at the free viewpoint position. At step 1105, rendering of the three-dimensional model of the main layer of the representative image is performed at the viewpoint position of the auxiliary image, but at this step, rendering is performed at the free viewpoint position acquired at step 1101. This means that the reference viewpoint 1303 corresponds to the viewpoint position of the representative image and the target viewpoint 1308 corresponds to the free viewpoint position. Due to this, the image data of the region except for the above-described occlusion region obtained in the case where the image of the subject is captured from the free viewpoint position is acquired based on the three-dimensional model with the viewpoint position of the representative image as a reference. The rest of the processing is the same as that at step 1105, and therefore, explanation is omitted here.
At step 1109, the free viewpoint image generation unit 403 performs rendering of the auxiliary main layer of the auxiliary image at the free viewpoint position. That is, the free viewpoint image generation unit 403 performs rendering of the three-dimensional model of the auxiliary main layer of the auxiliary image generated at step 1107 at the free viewpoint position acquired at step 1101. This means that the reference viewpoint 1303 corresponds to the viewpoint position of the auxiliary image and the target viewpoint 1308 corresponds to the free viewpoint position in
The image generation necessary for free viewpoint image combination is performed up to here. The processing the calculation load of which is high is summarized as follows.
Generation of the three-dimensional model of the main layer of the representative image (step 1104)
Generation of the three-dimensional model of the auxiliary main layer of the auxiliary image (step 1107)
Rendering of the main layer of the representative image at the viewpoint position of the auxiliary image (step 1105)
Rendering of the main layer of the representative image at the free viewpoint position (step 1108)
Rendering of the auxiliary main layer of the auxiliary image at the free viewpoint position (step 1109)
As to the three-dimensional model generation at step 1104 and step 1107, the number of pixels of the auxiliary main layer of the auxiliary image is smaller than the number of pixels of the main layer of the representative image, and therefore, it is possible to considerably reduce the amount of calculation compared to the case where the main layer is utilized commonly in a plurality of reference images.
In the case where it is possible to increase the speed of the rendering processing for the three-dimensional model generation by, for example, performing the rendering processing at steps 1105, 1108, and 1109 by using a GPU (processor dedicated for image processing), the effect of the present invention is further enhanced.
Explanation is returned to the flowchart in
At step 1110, the free viewpoint image generation unit 403 generates integrated image data of the main layer and the auxiliary main layer by integrating the two rendering results (the rendering result of the main layer of the representative image and the rendering result of the auxiliary main layer of the auxiliary image) performed at the free viewpoint position. In the case of the present embodiment, one rendered image obtained by rendering the main layer of the representative image and three rendered images obtained by rendering the auxiliary main layer of the auxiliary image are integrated as a result. In the following, integration processing is explained.
First, the integration processing is performed for each pixel. Then, the color after integration can be acquired by a variety of methods and here, a case is explained where the weighted average of each rendered image is used, specifically, the weighted average based on the distance between the position of the specified free viewpoint and the reference image is used. For example, in the case where the specified free viewpoint position is equidistant from the four image capturing units corresponding to each viewpoint image configuring the reference image set, all the weights will be 0.25, equal to one another. In the case where the specified free viewpoint position is nearer to any of the image capturing units, the shorter the distance, the greater the weight is. At this time, the portion of the hole in each rendered image is not used in color calculation for integration. That is, the color after integration is calculated by the weighted average obtained from the rendered images with no hole. The portion of the hole in all the rendered images is left as a hole. The integration processing is explained by using
In this manner, the integrated image data of the main layer is generated.
Explanation is returned to the flowchart in
At step 1111, the free viewpoint image generation unit 403 generates 3D models of the boundary layer in the representative image and of the boundary layer in the auxiliary image. In the boundary layer in contact with the object boundary, neighboring pixels are not connected at the time of generation of the mesh. Specifically, one square mesh is constructed for one pixel and a three-dimensional model is generated.
At step 1112, the free viewpoint image generation unit 403 performs rendering of the boundary layer in the representative image and the boundary layer in the auxiliary image.
Explanation is returned to the flowchart in
At step 1113, the free viewpoint image generation unit 403 obtains the integrated image data of the boundary layer by integrating the rendered image group of the boundary layer. Specifically, by the same integration processing as that at step 1110, the rendered images (four) of the boundary layer generated from the four viewpoint images (one representative image and three auxiliary images) are integrated.
At step 1114, the free viewpoint image generation unit 403 obtains integrated image data of the two layers (the main layer (including the auxiliary main layer) and the boundary layer) by integrating the integrated image data of the main layer and the auxiliary main layer obtained at step 1110 and the integrated image data of the boundary layer obtained at step 1113. This integration processing is also performed on each pixel. At this time, an image with higher precision is obtained stably from the integrated image of the main layer and the auxiliary main layer than from the integrated image of the boundary layer, and therefore, the integrated image of the main layer and the auxiliary main layer is utilized preferentially. That is, only in the case where there is a hole in the integrated image of the main layer and the auxiliary main layer and there is no hole in the integrated image of the boundary layer, complement is performed by using the color of the integrated image of the boundary layer. In the case where there is a hole both in the integrated image of the main layer and the auxiliary main layer and in the integrated image of the boundary layer, there is left a hole.
In the present embodiment, the rendering of the main layer and the auxiliary main layer and the rendering of the boundary layer are performed in this order to suppress degradation in image quality in the vicinity of the object boundary.
At step 1115, the free viewpoint image generation unit 403 performs hole filling processing. Specifically, the portion left as a hole in two-layer integrated image data obtained at step 1114 is complemented by using the ambient color. In the present embodiment, the hole filling processing is performed by selecting the pixel located to be more distant according to the distance information from among the peripheral pixels adjacent to the pixel to be subjected to the hole filling processing. It may of course be possible to use another method as the hole filling processing.
At step 1116, the free viewpoint image generation unit 403 outputs the free viewpoint image data having been subjected to the hole filling processing to the encoder unit 210. In the encoder unit 210, the data is encoded by an arbitrary encoding scheme (for example, JPEG scheme) and output as an image.
According to the present embodiment, it is made possible to combine a captured image between respective viewpoints in the multi-viewpoint image data with high precision and at a high speed, and it is possible to produce a display without a feeling of unnaturalness in a display the number of viewpoints of which is different from that of the captured image, and to improve image quality in the image processing, such as refocus processing.
In the first embodiment, for the generation of the auxiliary main layer, the information of the region where a hole is left at the time of rendering of the main layer of the representative image at the viewpoint position of the auxiliary image is utilized. That is, the auxiliary main layer is generated by utilizing only the structure information. Next, an aspect is explained as a second embodiment, in which higher image quality is achieved by utilizing the color information in addition to the structure information for generation of the auxiliary main layer. Explanation of parts common to those of the first embodiment (processing in the distance information estimation unit 401 and the separation information generation unit 402) is omitted and here, the processing in the free viewpoint image generation unit 403, which is the different point, is explained mainly.
In the present embodiment, only the difference lies in that the color information is utilized in addition to the structure information in the generation processing of the auxiliary main layer in the free viewpoint image generation processing. Hence, the points peculiar to the present embodiment are explained mainly along the flowchart in
The acquisition of the position information of the free viewpoint at step 1101, the setting of the reference image set at step 1102, and the setting of the representative image and the auxiliary image at step 1103 are the same as those in the first embodiment. The processing to generate the 3D model of the main layer of the representative image at step 1104 and the processing to render the main layer of the representative image at the viewpoint position of the auxiliary image at step 1105 are also the same as those in the first embodiment.
At step 1106, the free viewpoint image generation unit 403 generates the auxiliary main layer of the auxiliary image by using the color information. Specifically, the auxiliary main layer is generated as follows.
As in the case of the first embodiment, it is assumed that the viewpoint image captured by the image capturing unit 105 is taken to be the representative image and the viewpoint image captured by the image capturing unit 104 is taken to be the auxiliary image. At this step, the auxiliary main layer of the auxiliary image is generated from the information indicative of the boundary layer and the main layer of the auxiliary image (see
First, as in the first embodiment, the auxiliary main layer is determined based on the structure information. In this stage, the occlusion region 1503 (see
As described above, in the present embodiment, not only the structure information but also the color information is utilized for generation of the auxiliary main layer in the auxiliary image.
The subsequent processing (from step 1107 to step 1116) is the same as that in the first embodiment, and therefore, explanation is omitted here.
According to the present embodiment, by utilizing the color information in addition to the structure information for generation of the auxiliary main layer of the auxiliary image, combination processing is performed on the region where there is a change in color that cannot be expressed by rendering of only the main layer of the representative image by performing rendering of also the auxiliary main layer of the auxiliary image. Due to this, it is made possible to achieve higher image quality.
Aspects of the present invention can also be realized by a computer of a system or apparatus (or devices such as a CPU or MPU) that reads out and executes a program recorded on a memory device to perform the functions of the above-described embodiment (s), and by a method, the steps of which are performed by a computer of a system or apparatus by, for example, reading out and executing a program recorded on a memory device to perform the functions of the above-described embodiment (s). For this purpose, the program is provided to the computer for example via a network or from a recording medium of various types serving as the memory device (e.g., computer-readable medium).
While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.
This application claims the benefit of Japanese Patent Application No. 2012-201478, filed Sep. 13, 2012, which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | Kind |
---|---|---|---|
2012-201478 | Sep 2012 | JP | national |