1. Field of the Invention
The present invention generally relates to mono-view depth estimation, and more particularly to a ground model for mono-view depth estimation.
2. Description of the Prior Art
When three-dimensional (3D) objects are mapped onto a two-dimensional (2D) image plane by prospective projection, such as with an image taken by a still camera or video captured by a video camera, a substantial amount of information, such as the 3D depth information, disappears because of the non-unique many-to-one transformation. Accordingly, an image point cannot uniquely determine its depth. Recapture or generation of the 3D depth information is thus a challenging task that is crucial in recovering a full, or at least an approximate, 3D representation.
In mono-view depth estimation, depth may be obtained from the monoscopic spatial and/or temporal domain. The term “monoscopic” or “mono” is used herein to refer to a characteristic in which the left and right eyes see the same perspective view of a given scene. One of the known mono-view depth estimation methods is performed by extracting the depth information from the degree of object motion, and is thus called a depth-from-motion method. The object with a higher degree of motion is assigned smaller (or nearer) depth, and vice versa. Another one of the conventional mono-view depth estimation methods is performed by assigning larger (or farther) depth to non-focused regions such as the background, and is thus called a depth-from-focus-cue method. A further conventional mono-view depth estimation methods is performed by detecting the intersection of vanishing lines, or vanishing point. The points approaching the vanishing point are assigned larger (or farther) depth, and vice versa.
As very limited information may be obtained from the monoscopic spatio-temporal domain, the conventional methods mentioned above, unfortunately, cannot solve all of the scene-contents in a real-world video/image. For the foregoing reason, a need has arisen to propose a novel depth estimation method generally for a versatile mono-view video/image.
In view of the foregoing, it is an object of the present invention to provide a ground model method and system for mono-view depth estimation, which is capable of providing correct and versatile depth and handling of a relatively large (i.e., great) variety of scenes whenever a depth diffusion region (DDR) is present or can be identified.
According to one embodiment, a two-dimensional (2D) image is first segmented into a number of objects. A DDR, such as for example the ground or a floor, is then detected among the objects. The DDR generally includes a region or relatively planar region that is about horizontal (e.g., a horizontal plane). The DDR is assigned a depth, such as for example, a depth monotonically increasing from bottom to top of the DDR. An object connected to the DDR is assigned depth according to the depth of the DDR at the connected location. For example, the depth of the connected object is assigned the same depth of the DDR at the connected location.
In step 11, an input device 20 provides or receives one or more two-dimensional (2D) input images to be image/video processed in accordance with the embodiment of the present invention. The input device 20 may in general be an electro-optical device that maps 3D object(s) onto a 2D image plane by prospective projection. In one embodiment, the input device 20 may be a still camera that takes the 2D image, or a video camera that captures a number of image frames. The input device 20, in another embodiment, may be a pre-processing device that performs one or more of digital image processing tasks, such as image enhancement, image restoration, image analysis, image compression or image synthesis. Moreover, the input device 20 may further include a storage device, such as a semiconductor memory or hard disk drive, which stores processed images from the pre-processing device. As discussed above, a relatively large amount of information, such as particularly the 3D depth information, is lost when 3D objects are mapped onto the 2D image plane, and therefore according to a feature of the invention the 2D image provided by the input device 20 is subjected to image/video processing through other blocks of the mono-view depth estimation system 200, which will be discussed below.
The input image/video is then processed, in step 12, by a segmentation unit 22 that partitions the input image into multiple regions, objects or segments. As used herein, the term “unit” is used to denote a circuit, a piece of program, or their combination. In general, the method and system of the present invention may be implemented in whole or in part using software and/or firmware, including, for example, one or more of a computer, a microprocessor, a circuit, an Application Specific Integrated Circuit (ASIC), a programmable gate array device, or other hardware. The purpose of the segmentation is to change the representation of the image into something easier to assign depth to in the later steps. The pixels in the same region have similar characteristics, such as color, intensity or texture, while the pixels between adjacent regions have distinct characteristics. Step 12 may be performed using one of the conventional segmentation techniques, or may be performed using a segmentation technique to be developed in the future.
In step 13, a depth diffusion region (DDR) is detected by a DDR detection unit 24. According to the disclosed ground model of the present embodiment, the DDR may be ground (or earth), ocean, flooring or any other region or surface that is about horizontal (e.g., a horizontal plane). A horizontal plane having the same segmentation characteristics and having substantive area can, according to a feature of the invention, probably be detected as the DDR.
When a DDR is identified (i.e., the yes branch of step 14), the DDR is assigned depth in step 15 by a DDR depth assignment unit 26. The depth assignment of the DDR (for example, the ground 32) may monotonically increase from the bottom to the top. According to one feature of the invention, the depth magnitude of the DDR can be inversely proportional to a vertical dimension of the DDR or location on the DDR. The depth assignment of the DDR may be formulated as follows:
DepthDDR(y) ↑ as y ↓
or
DepthDDR=k/y
where k is a constant.
In another embodiment, depth assignment of the DDR may increase from the bottom to the top in a non-monotonic manner. For example, DepthDDR=k/(y2).
Further, the depth of the object (or objects) connected to the DDR is assigned by the depth assignment unit 26, according to the DDR depth at the connected site. Taking the image in
DepthObj=DepthDDR(yObj)
Generally speaking, when a connected object rests or stands on the DDR (or the ground) at a connected point, the depth of the whole object is then assigned the same depth of the DDR at the connected or joined point.
When no DDR is identified or the object(s) are not connected to the DDR (i.e., the no branch of step 14), the image or partial image is assigned depth according to one of the conventional assignment methods or a technique to be developed in the future. In the flow diagram of
An output device 28 receives the depth map information (e.g., the final depth map) from the DDR depth assignment unit 26 and provides a resulting or output image. The output device 28, in one embodiment, may be a display device for presentation or viewing of the received depth information (e.g., depth map information). The output device 28, in another embodiment, may be a storage device, such as a semiconductor memory or hard disk drive, which stores the received depth information. Moreover, the output device 28 may further and/or alternatively include a post-processing device that performs one or more of digital image processing tasks, such as image enhancement, image restoration, image analysis, image compression or image synthesis.
According to the embodiment discussed above, the ground model methods and systems for mono-view depth estimation are capable of providing correct and versatile depth and handling of a relatively large variety of scenes whenever a DDR is present or capable of being determined or estimated.
Although specific embodiments have been illustrated and described, it will be appreciated by those skilled in the art that various modifications may be made without departing from the scope of the present invention, which is intended to be limited solely by the appended claims.