The disclosed embodiments of the present invention relate to depth map generation, and more particularly, to a method and an apparatus of determining a perspective model by utilizing region-based analysis and/or temporal smoothing.
Since the success of the three-dimensional (3D) movie such as Avatar, the 3D playback has enjoyed growing popularity. Almost all the television (TV) manufacturers put 3D functionality into their high-end TV products. One of the important required 3D techniques is 2D-to-3D conversion, which converts the traditional two-dimensional (2D) videos into 3D ones. It is important because most contents are still in the traditional 2D format. For a 2D monocular video input, objects and their geometry perspective information are estimated and modeled, and then a depth map is generated. With the produced depth map, depth image based rendering (DIBR) may be used to convert the original 2D monocular video into stereoscopic videos for the left and right eyes, respectively. In such a conventional processing flow, the most important issue is how to generate the depth map.
In order to correctly generate the depth map of the input 2D video, various cues are applied to estimate the depth information. Many conventional depth map generation methods are proposed to retrieve the depth map using different combinations of those depth cues. The perspective information is considered to generate an initial perspective/global depth map. Most of the conventional methods need an initial perspective/global depth map which represents the perspective view of a scene. However, the conventional initial depth map often provides only bottom-top perspective, which does not always represent the perspective of the environment, so vanishing line or feature point is required to model a more complex perspective of the environment. One conventional algorithm may carry out the vanishing line/feature point detection by Hough transform. However, it requires a time demanding and computational intensive full frame pixel operation. Regarding other conventional algorithms, they often produce instable vanishing line/feature point which jumps between frames, resulting in judder perceived on the created depth map.
In accordance with exemplary embodiments of the present invention, a method and an apparatus of determining a perspective model by utilizing region-based analysis and/or temporal smoothing are proposed, to solve the above-mentioned problems.
According to a first aspect of the present invention, an exemplary method for generating a target perspective model referenced for depth map generation is disclosed. The exemplary method includes: receiving a first input image; utilizing a region-based analysis unit for analyzing a plurality of regions in the first input image to extract image characteristics of the regions; and determining the target perspective model according to at least the image characteristics.
According to a second aspect of the present invention, an exemplary perspective model estimation apparatus for generating a target perspective model referenced for depth map generation is disclosed. The exemplary perspective model estimation apparatus includes a region-based analysis unit and a perspective model generation unit. The region-based analysis unit is arranged for receiving a first input image, and analyzing a plurality of regions in the first input image to extract image characteristics of the regions. The perspective model generation unit is arranged for determining the target perspective model according to at least the image characteristics.
According to a third aspect of the present invention, an exemplary method for generating a target perspective model referenced for depth map generation is disclosed. The exemplary method includes: receiving a first input image; determining a first perspective model in response to the first input image; and utilizing a perspective model generation unit for generating the target perspective model by a weighted sum of the first perspective model and at least one second perspective model.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
Certain terms are used throughout the description and following claims to refer to particular components. As one skilled in the art will appreciate, manufacturers may refer to a component by different names. This document does not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms “include” and “comprise” are used in an open-ended fashion, and thus should be interpreted to mean “include, but not limited to . . . ”. Also, the term “couple” is intended to mean either an indirect or direct electrical connection. Accordingly, if one device is electrically connected to another device, that connection may be through a direct electrical connection, or through an indirect electrical connection via other devices and connections.
The main concept of the present invention is to use region-based analysis and/or temporal smoothing to generate a perspective/global model referenced for depth map generation. Therefore, in accordance with the present invention, a fast and effective perspective/global model analysis method is used to generate a depth map to present the perspective of a scene. Besides, a simplified algorithm is employed to output a stable perspective/global model. As the proposed method uses a region-based analysis, the perspective/global model is generated with lower computational complexity. Moreover, the proposed temporal smoothing applied to the perspective/global model is capable of refining the perspective/global model to avoid judder of the generated depth map. Further details are described as below.
Please refer to
After the image characteristics N11-N58 of the regions 201 are obtained, the perspective mapping unit 114 of the perspective model generation unit 113 is operative to determine the target perspective model Gfinal according to the image characteristics N11-N58. In this embodiment, the perspective mapping unit 114 selects a specific region with a specific image characteristic from the regions 201 of the input image IMG1, and refers to the selected specific region to determine the target perspective model Gfinal. For example, the specific image characteristic of the selected specific region is a maximum image characteristic among the image characteristics of the regions. Specifically, in a case where each image characteristic is a count number of edges found in each region, the perspective mapping unit 114 therefore selects a region R35 having a largest count number of detected edges (i.e., a maximum image characteristic N35), and sets a feature point VP according to the selected region R35. For example, as shown in
After the target perspective model Gfinal is obtained, the depth map rendering unit 104 renders the depth map MAPdepth in response to the target perspective model Gfinal. For example, the depth map rendering unit 104 may refer to the edge distribution of the input image IMG1 and the feature point VP derived from a center of the region R35 to render the depth map MAPdepth, where the feature point VP would have a smallest depth value which represents the farthest distance in the depth map MAPdepth.
In a case where the value of the accumulated horizontal edge components is greater than the value of accumulated vertical edge components, a conventional horizontal mode may be employed for rendering the depth map MAPdepth such that the depth map MAPdepth would be an image having gradient in the vertical direction. In another case where the value of the accumulated vertical edge components is greater than the value of accumulated horizontal edge components, a conventional vertical mode may be employed for rendering the depth map MAPdepth such that the depth map MAPdepth would be an image having gradient in the horizontal direction. In yet another case where the value of the accumulated horizontal edge components is similar/identical to the value of accumulated vertical edge components, a conventional circle/perspective mode may be employed for rendering the depth map MAPdepth such that the depth map MAPdepth would be an image having gradient in the radial direction. As the present invention focuses on the derivation of the perspective/global model referenced for depth map generation, further description of the depth map rendering unit 104 is omitted here for brevity.
The perspective model estimation apparatus 102 shown in
Please refer to
The major difference between the perspective model estimation apparatuses 102 and 402 is the perspective model generation unit 413 that supports temporal smoothing. The operation of the perspective mapping unit 414 is similar to that of the perspective mapping unit 114. However, the output of the perspective mapping unit 114 directly acts as the target/final perspective model Gfinal of the perspective mapping unit 114, and the output of the perspective mapping unit 414 acts as an initial perspective model Ginit to be further processed by the following temporal smoothing unit 416.
For example, the feature point VP derived from the image characteristics obtained by the region-based analysis unit 112 may act as an initial feature point, and the perspective mapping unit 414 maps the initial feature point to the initial perspective model Ginit. Next, the temporal smoothing unit 416 is operative to generate the target/final perspective model Gfinal by a weighted sum of the initial perspective model Ginit and at least one another predicted perspective model Gpred. For example, as shown in
In this embodiment, the temporal smoothing unit 416 is capable of referring to information propagated from previous or further input image(s) to refine the initial perspective model Ginit into the target perspective model Gfinal, thereby avoiding judder of the depth map MAPdepth rendered according to the target perspective model Gfinal. For example, the temporal smoothing unit 416 may be implemented using a smoothing filter operative to make the feature point have a smooth transition in different images, where coefficients (e.g., weighting factors) of the smoothing filter are set by constants or adjusted in response to the image contents (e.g., the number of edges) or user preference. An exemplary temporal smoothing operation performed by the temporal smoothing unit 416 may be expressed using the following equation.
Gfinal(t1)=Wcurr*Ginit(t1)+Wprev*Gfinal(t0) (1)
In above equation (1), Gfinal (t1) represents the final perspective model generated by the temporal smoothing unit 416 at time t1, Ginit(t1) represents the initial perspective model generated by the perspective mapping unit 414 at time t1, Gfinal(t0) represents the other predicted perspective model Gpred which is generated by the temporal smoothing unit 416 at time t0, and Wcurr and Wprev are weighting factors.
As the feature point is mapped to a perspective model, the locations of the feature point obey the following equation accordingly.
P1′=Wcurr*P1+Wprev*P0′ (2)
In above equation (2), P1′ represents the final feature point at time t1, P1 represents the initial feature point at time t1, and P0′ represents the final feature point at time t0. Please refer to
The aforementioned temporal smoothing operation is for illustrative purposes only, and is not meant to be a limitation of the present invention. That is, using other weighting function to realize the temporal smoothing operation is feasible. For example, another exemplary temporal smoothing operation performed by the temporal smoothing unit 416 may be expressed using the following equation.
Gfinal(t1)=Wcurr*Ginit(t1)+Wadj*Gadj(t1)+Wprev*Gfinal(t0) (3)
In above equation (3), Gadj(t1) represents an adjusted perspective model at time t1, and Wadj is a weighting factor. The same objective of generating a target/final perspective model through temporal smoothing is achieved.
In one exemplary design, the weighting factors (e.g., Wcurr, Wprev and/or Wadj) used for temporal smoothing may be constants. Alternatively, the weighting factors (e.g., Wcurr, Wprev and/or Wadj) used for temporal smoothing may be set according to image characteristics/image contents.
For example, the perspective refinement unit 602 sets the weighting factors of the initial perspective model Ginit and the at least one predicted perspective model Gpred make the target perspective model Gfinal fit into an edge boundary in the input image IMG1, where in one exemplary design, the perspective refinement unit 602 may set the weighting factors through utilizing a bilateral based weighting function or a guided image filter. With a proper setting of the weighting factors (e.g., Wcurr, Wprev and/or Wadj), the final feature point P1′ is refined to an edge boundary to fit the pixel-level image details of the input image IMG1, as shown in
Besides, the temporal smoothing may also be implemented with a Schmitt trigger-like filter to avoid sudden noise of the initial perspective model. Please refer to
For example, the control unit 802 may be implemented using a Schmitt trigger-like filter for accumulating a count number of successive same/similar target perspective models that are previously produced. When the control unit 802 indicates that the similarity satisfies a predetermined criterion (e.g., the count number is greater than a predetermined threshold), the temporal smoothing unit 416 is allowed to generate the target perspective model Gfinal by using the aforementioned temporal smoothing (i.e., a weighted sum of the initial perspective model Ginit and the at least one detected perspective model Gpred). When the control unit 802 indicates that the similarity does not satisfy the predetermined criterion (e.g., the count number is not greater than the predetermined threshold), the perspective model generation unit 800 may directly utilize a previous target perspective model (e.g., Gpred) as the target perspective model Gfinal. An exemplary Schmitt trigger-like filtering employed by the perspective model generation unit 800 is illustrated in
It should be noted that part or all of the technical features proposed in the exemplary perspective model generation units 413, 600 and 800 may be employed in one perspective model generation unit, depending upon actual design requirement/consideration.
Step 1000: Start.
Step 1002: Receive a first input image (e.g., IMG1).
Step 1004: Analyze a plurality of regions in the first input image to extract image characteristics of the regions.
Step 1006: Determine a feature point according to a specific region selected from the regions by referring to the image characteristics. For example, a region having a largest number of edges included therein is selected, and a center of the selected region serves as the feature point.
Step 1008: Is temporal smoothing enabled? If yes, go to step 1010; otherwise, go to step 1014.
Step 1010: Map the feature point to a first perspective model (e.g., Ginit).
Step 1012: Generate the target perspective model (e.g., Gfinal) by a weighted sum of the first perspective model and at least one second perspective model (e.g., Gpred) For example, the at least one second perspective model may be derived from processing a second input image (e.g., IMG2) preceding or following the first input image. Go to step 1016.
Step 1014: Directly map the feature point to the target perspective model (e.g., Gfinal).
Step 1016: End.
As a person skilled in the art can readily understand the operation of each step shown in
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims the benefit of U.S. provisional application No. 61/579,669, filed on Dec. 23, 2011 and incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5917937 | Szeliski | Jun 1999 | A |
6584219 | Yamashita et al. | Jun 2003 | B1 |
8606043 | Kwon | Dec 2013 | B2 |
20050053276 | Curti | Mar 2005 | A1 |
20070159476 | Grasnick | Jul 2007 | A1 |
20090196492 | Jung et al. | Aug 2009 | A1 |
20100080448 | Tam et al. | Apr 2010 | A1 |
20110044531 | Zhang et al. | Feb 2011 | A1 |
20110096832 | Zhang et al. | Apr 2011 | A1 |
20110188780 | Wang | Aug 2011 | A1 |
20110199458 | Hayasaka | Aug 2011 | A1 |
20110273531 | Ito | Nov 2011 | A1 |
20120039525 | Tian | Feb 2012 | A1 |
20120169847 | Lee et al. | Jul 2012 | A1 |
20130095920 | Patiejunas | Apr 2013 | A1 |
Number | Date | Country |
---|---|---|
1495676 | May 2004 | CN |
102098526 | Jun 2011 | CN |
102104786 | Jun 2011 | CN |
102117486 | Jul 2011 | CN |
2011017310 | Feb 2011 | WO |
2011104151 | Sep 2011 | WO |
Entry |
---|
Harman, Rapid 2D to 3D Conversion, Stereoscopic Displays and Virtual Reality Systems IX, vol. 4660, pp. 78-86, 2002. |
Murata, A Real-Time 2-D to 3-D Image Conversion Technique Using Computed Image Depth, vol. 29, Issue 1, pp. 919-922, 1998 SID. |
Angot, A 2D to 3D video and image conversion technique based on a bilateral filter, vol. 7526, pp. 75260D-1-75260D-10, 2010 SPIE-IS&T. |
Number | Date | Country | |
---|---|---|---|
20130162631 A1 | Jun 2013 | US |
Number | Date | Country | |
---|---|---|---|
61579669 | Dec 2011 | US |