The present invention relates to an image processing apparatus and image processing method, and more particularly, to an image processing apparatus and image processing method configured to be able to generate high-quality synthetic-view images.
There exists view synthesis technology that generates images with arbitrary views. This view synthesis technology is a technology that generates 2D images for M views (>N) from 2D images for N views and depth images (depth information).
An overview of view synthesis technology will be described with reference to
As illustrated in
In the example in
In actual applications, view synthesis technology is used in conjunction with compression technology. An exemplary configuration of a system combining view synthesis technology and compression technology is illustrated in
In the system in
The multiview video encoding apparatus 13 encodes the 2D images 11 for N views and the depth images 12 for N views in an Advanced Video Coding (AVC) format or Multiview Video Coding (MVC) format, and supplies them to a multiview video decoding apparatus 14.
The multiview video decoding apparatus 14 takes the encoded 2D images 11 for N views and depth images 12 for N views supplied from the multiview video encoding apparatus 13, decodes them in a format corresponding to the AVC format or MVC format, and supplies them to a view synthesizing apparatus 15.
The view synthesizing apparatus 15 uses the 2D images 11 and depth images 12 for N views obtained as a result of the decoding by the multiview video decoding apparatus 14 to generate synthetic-view images for (M−N) views. The view synthesizing apparatus 15 outputs 2D images for M views, which consist of the 2D images 11 for N views and the synthetic-view images for (M−N) views, as reconstructed 2D images 16 for M views.
Meanwhile, a method of encoding and decoding image data for multiple views is described in PTL 1, for example.
With the system in
The present invention has been devised in light of such circumstances, and is configured to enable the generation of high-quality synthetic-view images.
An image processing apparatus according to a first aspect of the present invention is an image processing apparatus provided with receiving means that receives residual information, which is the error between synthetic-view images generated using reference 2D images and depth information, and 2D images at the view synthesis positions of the synthetic-view images, encoding means that encodes the reference 2D images to generate an encoded stream, and transmitting means that transmits the residual information received by the receiving means, the depth information, and the encoded stream generated by the encoding means.
An image processing method according to the first aspect of the present invention corresponds to an image processing apparatus according to the first aspect of the present invention.
In the first aspect of the present invention, residual information is received, residual information being the error between synthetic-view images, which are generated using reference 2D images and depth information, and 2D images at the view synthesis positions of the synthetic-view images. The reference 2D images are encoded to generate an encoded stream. The residual information, the depth information, and the encoded stream are transmitted.
An image processing apparatus according to a second aspect of the present invention is an image processing apparatus provided with receiving means that receives residual information and depth information, the residual information being the error between synthetic-view images generated using reference 2D images and the depth information, and 2D images at the view synthesis positions of the synthetic-view images, decoding means that decodes an encoded stream obtained as a result of encoding the reference 2D images, generating means that generates the synthetic-view images using the reference 2D images decoded by the decoding means and the depth information received by the receiving means, and residual information compensating means that adds the residual information received by the receiving means into the synthetic-view images generated by the generating means.
An image processing method according to the second aspect of the present invention corresponds to an image processing method according to the second aspect of the present invention.
In the second aspect of the present invention, residual information and depth information are received, the residual information being the error between synthetic-view images generated using reference 2D images and the depth information, and 2D images at the view synthesis positions of the synthetic-view images. An encoded stream obtained as a result of encoding the reference 2D images is decoded, and the synthetic-view images are generated using the decoded reference 2D images and the received depth information. The received residual information is added into the generated synthetic-view images.
According to the first aspect of the present invention, it is possible to transmit information for generating high-quality synthetic-view images.
According to the second aspect of the present invention, it is possible to generate high-quality synthetic-view images.
Herein,
As illustrated in
Then, when conducting view synthesis, the input images 1 are used to generate synthetic-view images 2, and those synthetic-view images 2 are compensated by the residual information to generate final synthetic-view images 41, as illustrated in
In this way, in the present invention, residual information is added into synthetic-view images 2, thereby making it possible to compensate for missing information and generate high-quality synthetic-view images 41.
Note that in
In the system in
In order to acquire residual information with a residual information acquiring apparatus 103, a view synthesizing apparatus 102 uses the 2D images 11 for N views and the depth images 12 for N views to generate synthetic-view images for (M−N) views similarly to the view synthesizing apparatus 15 in
The residual information acquiring apparatus 103 calculates the error between the synthetic-view images for (M−N) views supplied from the view synthesizing apparatus 102 and the 2D images 101 for (M−N) views at the view synthesis positions, and takes the result as residual information. The residual information acquiring apparatus 103 supplies the residual information to a multiview video encoding apparatus 104.
The multiview video encoding apparatus 104 encodes the 2D images 11 for N views, the depth images 12 for N views, and the residual information supplied from the residual information acquiring apparatus 103 in an AVC format or MVC format. Then, the multiview video encoding apparatus 104 supplies the encoded stream obtained as a result of the encoding to a multiview video decoding apparatus 105.
The multiview video decoding apparatus 105 decodes the encoded stream supplied from the multiview video encoding apparatus 104 in a format corresponding to the AVC format or MVC format, and obtains 2D images 11 for N views, depth images 12 for N views, and residual information. The multiview video decoding apparatus 105 supplies the 2D images 11 for N views and depth images 12 for N views to the view synthesizing apparatus 15, and supplies the residual information to a residual information compensating apparatus 106.
The residual information compensating apparatus 106 adds the residual information supplied from the multiview video decoding apparatus 105 into synthetic-view images for (M−N) views generated by the view synthesizing apparatus 15, and compensates for the missing information in the synthetic-view images for (M−N) views. The residual information compensating apparatus 106 outputs the compensated synthetic-view images for (M−N) views and the 2D images 11 for N views supplied from the view synthesizing apparatus 15 as reconstructed 2D images 107 for M views. The reconstructed 2D images 107 for M views are used to display a stereoscopic image, for example, and a user is able to view the stereoscopic image without using glasses.
In step S11 of
In step S12, the residual information acquiring apparatus 103 calculates residual information between the synthetic-view images for (M−N) views supplied from the view synthesizing apparatus 102 and 2D images 101 for (M−N) views at the view synthesis positions. The residual information acquiring apparatus 103 supplies the residual information to a multiview video encoding apparatus 104.
In step S13, the multiview video encoding apparatus 104 encodes the 2D images 11 for N views, the depth images 12 for N views, and the residual information supplied from the residual information acquiring apparatus 103 in an AVC format or MVC format. Then, the multiview video encoding apparatus 104 supplies the encoded stream obtained as a result to the multiview video decoding apparatus 105.
In step S14, the multiview video decoding apparatus 105 decodes the encoded stream in a format corresponding to the AVC format or MVC format, the encoded stream being the encoded 2D images 11 for N views, depth images 12 for N views, and residual information supplied from the multiview video encoding apparatus 104. The multiview video decoding apparatus 105 then supplies the 2D images 11 for N views, depth images 12 for N views, and residual information obtained as a result to the view synthesizing apparatus 15, and supplies the residual information to the residual information compensating apparatus 106.
In step S15, the view synthesizing apparatus 15 uses the 2D images 11 for N views and depth images 12 for N views supplied from the multiview video decoding apparatus 105 to conduct view synthesis for (M−N) views and generate synthetic-view images for (M−N) views. The view synthesizing apparatus 102 then supplies the synthetic-view images for (M−N) views and the 2D images 11 for N views to the residual information acquiring apparatus 103.
In step S16, the residual information compensating apparatus 106 adds the residual information supplied from the multiview video decoding apparatus 105 to the synthetic-view images for (M−N) views generated by the view synthesizing apparatus 15, and compensates for the missing information in the synthetic-view images for (M−N) views.
In step S16, the residual information compensating apparatus 106 outputs the compensated synthetic-view images for (M−N) views and the 2D images 11 for N views supplied from the view synthesizing apparatus 105 as reconstructed 2D images 107 for M views. The process then ends.
Although the 2D images 11 for N views, the depth images 12 for N views, and the residual information are all encoded in the foregoing description, information other than the 2D images 11 for N views may also not be encoded.
Additionally, it may be configured such that the multiview video encoding apparatus 104 also includes residual presence information indicating whether or not residual information exists for each synthetic-view image, and transmits this information to the multiview video decoding apparatus 105 together with the 2D images 11 for N views, the depth images 12 for N views, and the residual information.
Furthermore, it may also be configured such that the residual information transmitted to the multiview video decoding apparatus 105 together with the 2D images 11 for N views and the depth images 12 for N views is only residual information with respect to synthetic-view images at view synthesis positions farther outward than the views of the 2D images 11 for N views (in the example in
Note that in this specification, the term “system” represents the totality of an apparatus composed of a plurality of apparatus.
In addition, embodiments of the present invention are not limited to the foregoing embodiments, and various modifications are possible within a scope that does not depart from the principal matter of the present invention.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2010-111561 | May 2010 | JP | national |
| Filing Document | Filing Date | Country | Kind | 371c Date |
|---|---|---|---|---|
| PCT/JP2011/060014 | 4/25/2011 | WO | 00 | 11/6/2012 |