Certain embodiments of the invention relate to video processing. More specifically, certain embodiments of the invention relate to a method and system for utilizing multiple 3D source views for generating 3D image.
Digital video capabilities may be incorporated into a wide range of devices such as, for example, digital televisions, digital direct broadcast systems, digital recording devices, and the like. Digital video devices may provide significant improvements over conventional analog video systems in processing and transmitting video sequences with increased bandwidth efficiency.
Video content may be recorded in two-dimensional (2D) format or in three-dimensional (3D) format. In various applications such as, for example, the DVD movies and the digital TV (DTV), a 3D video is often desirable because it is often more realistic to viewers than the 2D counterpart. A 3D video comprises a left view video and a right view video.
Various video encoding standards, for example, MPEG-1, MPEG-2, MPEG-4, MPEG-C part 3, H.263, H.264/MPEG-4 advanced video coding (AVC), multi-view video coding (MVC) and scalable video coding (SVC), have been established for encoding digital video sequences in a compressed manner. For example, the MVC standard, which is an extension of the H.264/MPEG-4 AVC standard, may provide efficient coding of a 3D video. The SVC standard, which is also an extension of the H.264/MPEG-4 AVC standard, may enable transmission and decoding of partial bitstreams to provide video services with lower temporal or spatial resolutions or reduced fidelity, while retaining a reconstruction quality that is similar to that achieved using the H.264/MPEG-4 AVC.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
A system and/or method for utilizing multiple 3D source views for generating 3D image, substantially as shown in and/or described in connection with at least one of the figures, as set forth more completely in the claims.
Various advantages, aspects and novel features of the present invention, as well as details of an illustrated embodiment thereof, will be more fully understood from the following description and drawings.
Certain embodiments of the invention can be found in a method and system for utilizing multiple 3D source views for generating 3D image. In various embodiments of the invention, a monoscopic three-dimensional (3D) video generation device, which comprises one or more depth sensors, may be operable to capture a plurality of two-dimensional (2D) image frames and corresponding depth information of an object from a plurality of different viewing angles. The captured plurality of 2D image frames and the captured corresponding depth information may be utilized by the monoscopic 3D video generation device to generate 3D images of the object corresponding to one or more of the plurality of different viewing angles. In this regard, the monoscopic 3D video generation device may store the captured plurality of 2D image frames and the captured corresponding depth information. The plurality of 2D image frames may be captured via, for example, one or more image sensors in the monoscopic 3D video generation device. The corresponding depth information may be captured via, for example, the one or more depth sensors in the monoscopic 3D video generation device. The plurality of 2D image frames and the corresponding depth information may be captured while the monoscopic 3D video generation device is continuously changing positions with respect to the object. The changed positions may comprise, for example, positions above, below and/or around the object.
The monoscopic 3D video generation device may be operable to determine the one or more viewing angles for generating the 3D images of the object. One or more 3D models of the object corresponding to the determined one or more viewing angles may be generated by the monoscopic 3D video generation device utilizing the captured plurality of 2D image frames and the captured corresponding depth information. The monoscopic 3D video generation device may generate the 3D images of the object corresponding to the determined one or more viewing angles based on the generated one or more 3D models of the object. The monoscopic 3D video generation device may be configured to output the 3D images of the object to a display in the monoscopic 3D video generation device and/or output the 3D images of the object externally to a 3D video rendering device for rendering the 3D images of the object.
The monoscopic 3D video camera 102 may comprise a processor 104, a memory 106, one or more depth sensors 108 and one or more image sensors 114. The monoscopic 3D or single-view video camera 102 may capture images via a single viewpoint corresponding to the lens 101c. In this regard, EM waves in the visible spectrum may be focused on one or more image sensors 114 by the lens 101c. The monoscopic 3D video camera 102 may also capture depth information via the lens 101c (and associated optics).
The processor 104 may comprise suitable logic, circuitry, interfaces, and/or code that may be operable to manage operation of various components of the monoscopic 3D video camera 102 and perform various computing and processing tasks.
The memory 106 may comprise, for example, DRAM, SRAM, flash memory, a hard drive or other magnetic storage, or any other suitable memory devices. For example, SRAM may be utilized to store data utilized and/or generated by the processor 104 and a hard-drive and/or flash memory may be utilized to store recorded image data and depth data.
The depth sensor(s) 108 may each comprise suitable logic, circuitry, interfaces, and/or code that may be operable to detect EM waves in the infrared spectrum and determine depth information based on reflected infrared waves. For example, depth information may be determined based on time-of-flight of infrared waves transmitted by an emitter (not shown) in the monoscopic 3D video camera 102 and reflected back to the depth sensor(s) 108. Depth information may also be determined using a structured light method, for example. In such instance, a pattern of light such as a grid of infrared waves may be projected at a known angle onto an object by a light source such as a projector. The depth sensor(s) 108 may detect the deformation of the light pattern such as the infrared light pattern on the object. Accordingly, depth information for a scene may be determined or calculated using, for example, a triangulation technique.
The image sensor(s) 114 may each comprise suitable logic, circuitry, interfaces, and/or code that may be operable to convert optical signals to electrical signals. Each image sensor 114 may comprise, for example, a charge coupled device (CCD) image sensor or a complimentary metal oxide semiconductor (CMOS) image sensor. Each image sensor 114 may capture brightness, luminance and/or chrominance information.
In exemplary operation, the monoscopic 3D video camera 102 may be operable to capture a plurality of 2D image frames and corresponding depth information of an object from a plurality of different viewing angle, utilizing the image sensor(s) 114 and the depth sensor(s) 108, respectively. The captured 2D image frames and the captured corresponding depth information may be stored in the memory 106. The processor 104 may be operable to determine one or more of the plurality of different viewing angles for generating 3D images of the object. One or more 3D models of the object corresponding to the determined one or more viewing angles may be generated by the processor 104 utilizing the captured 2D image frames and the captured corresponding depth information. The processor 104 may then generate the 3D images of the object corresponding to the determined one or more viewing angles based on the generated one or more 3D models of the object.
The image in the frame 134 is a conventional 2D image. A viewer of the frame 134 perceives the same depth between the viewer and each of the objects 138, 140 and 142. That is, each of the objects 138, 140, 142 appears to reside on the reference plane 132. The image in the frame 136 is a 3D image. A viewer of the frame 136 perceives the object 138 being further from the viewer, the object 142 being closest to the viewer, and the object 140 being at an intermediate depth. In this regard, the object 138 appears to be behind the reference plane 132, the object 140 appears to be on the reference plane 132, and the object 142 appears to be in front of the reference plane 132.
The monoscopic 3D video camera 202a may comprise suitable logic, circuitry, interfaces and/or code that may be operable to capture 2D image frames and corresponding depth information. The monoscopic 3D video camera 202a may be substantially similar to the monoscopic 3D video camera 102 in
In exemplary operation, the monoscopic 3D video camera 202a may be operable to capture a plurality of 2D image frames and corresponding depth information of the object 201 from a plurality of different view angles, while the monoscopic 3D video camera 202a is continuously changing camera positions as illustrated by the positions of the monoscopic 3D video cameras 202a-202d. In such instance, each of the monoscopic 3D video cameras 202a-202d may capture the 2D image frames and the corresponding depth information from a different viewing angle. The captured plurality of 2D image frames and the captured plurality of corresponding depth information may be stored by the monoscopic 3D video camera 202a. The stored 2D image frames and the stored corresponding depth information may then be utilized by the monoscopic 3D video camera 202a to generate 3D images of the object 201 corresponding to one or more of the plurality of different viewing angles. Exemplary generation of the 3D images of the object 201 corresponding to the one or more viewing angles are described below with respect to
Although a monoscopic 3D video camera 202a is illustrated in
In the exemplary embodiment of the invention illustrated in
In exemplary operation, the monoscopic 3D video camera 202a may be operable to determine one or more viewing angles for generating 3D images of the object 201. For example, a front view, a left view, a back view and a right view may be determined for generating the 3D images 206a-206d, respectively. In this regard, a front view model, a left view model, a back view model and a right view model of the object 201 may be generated by the monoscopic 3D video camera 202a, utilizing the captured plurality of 2D image frames and the captured corresponding depth information. The monoscopic 3D video camera 202a may then generate the 3D image 206a, which comprises the front view 201a, based on the front view model of the object 201. The 3D image 206b, which comprises the left view 201b may be generated by the monoscopic 3D video camera 202a, based on the left view model of the object 201. The 3D image 206c, which comprises the back view 201c may be generated by the monoscopic 3D video camera 202a, based on the back view model of the object 201. The 3D image 206d, which comprises the right view 201d may be generated by the monoscopic 3D video camera 202a, based on the right view model of the object 201.
Although the 3D images 206a-206d corresponding to the front view, the left view, the back view and the right view are illustrated in
In the exemplary embodiment of the invention illustrated in
The processor 304 may comprise suitable logic, circuitry, interfaces, and/or code that may be operable to coordinate operation of various components of the monoscopic 3D video camera 300. The processor 304 may, for example, run an operating system of the monoscopic 3D video camera 300 and control communication of information and signals between components of the monoscopic 3D video camera 300. The processor 304 may execute code stored in the memory 306. In an exemplary embodiment of the invention, the processor 304 may be operable to generate one or more 3D models of an object, such as the object 201, corresponding to one or more viewing angles of the object 201. The one or more 3D models may be generated utilizing a plurality of 2D image frames and a corresponding depth information stored in the memory 306, where the stored 2D image frames and the stored corresponding depth information may be captured from a plurality of different viewing angles of the object 201. In this regard, the processor 304 may be operable to generate 3D images of the object 201 corresponding to the one or more viewing angles based on the generated one or more 3D models of the object 201.
The memory 306 may comprise, for example, DRAM, SRAM, flash memory, a hard drive or other magnetic storage, or any other suitable memory devices. For example, SRAM may be utilized to store data utilized and/or generated by the processor 304 and a hard-drive and/or flash memory may be utilized to store recorded image data and depth data. In an exemplary embodiment of the invention, the memory 306 may be operable to store a plurality of 2D image frames and corresponding depth information of an object such as the object 201, where the 2D image frames and the corresponding depth information may be captured from a plurality of different viewing angles of the object 201. The memory 306 may also store one or more 3D models corresponding to one or more of the plurality of different viewing angles of the object 201, where the 3D model(s) may be generated by the processor 304.
The depth sensor(s) 308 may each comprise suitable logic, circuitry, interfaces, and/or code that may be operable to detect EM waves in the infrared spectrum and determine depth information based on reflected infrared waves. For example, depth information may be determined based on time-of-flight of infrared waves transmitted by the emitter 309 and reflected back to the depth sensor(s) 308. Depth information may also be determined using a structured light method, for example. In such instance, a pattern of light such as a grid of infrared waves may be projected at a known angle onto an object by a light source such as a projector. The depth sensor(s) 308 may detect the deformation of the light pattern such as the infrared light pattern on the object. Accordingly, depth information for a scene may be determined or calculated using, for example, a triangulation technique.
The image signal processor or image sensor processor (ISP) 310 may comprise suitable logic, circuitry, interfaces, and/or code that may be operable to perform complex processing of captured image data and captured corresponding depth data. The ISP 310 may perform a plurality of processing techniques comprising, for example, filtering, demosaic, Bayer interpolation, lens shading correction, defective pixel correction, white balance, image compensation, color transformation and/or post filtering.
The audio module 305 may comprise suitable logic, circuitry, interfaces and/or code that may be operable to perform various audio functions of the monoscopic 3D video camera 300. In an exemplary embodiment of the invention, the audio module 305 may perform noise cancellation and/or audio volume level adjustment for a 3D scene.
The video/audio encoder 307 may comprise suitable logic, circuitry, interfaces and/or code that may be operable to perform video encoding and/or audio encoding functions. For example, the video/audio encoder 307 may encode or compress captured 2D video images and corresponding depth information and/or audio data for transmission to a 3D video rendering device.
The video/audio decoder 317 may comprise suitable logic, circuitry, interfaces and/or code that may be operable to perform video decoding and/or audio decoding functions.
The error protection module 315 may comprise suitable logic, circuitry, interfaces and/or code that may be operable to perform error protection functions for the monoscopic 3D video camera 300. For example, the error protection module 315 may provide error protection to encoded 2D video images and corresponding depth information and/or encoded audio data for transmission to a 3D video rendering device.
The input/output (I/O) module 312 may comprise suitable logic, circuitry, interfaces, and/or code that may enable the monoscopic 3D video camera 300 to interface with other devices in accordance with one or more standards such as USB, PCI-X, IEEE 1394, HDMI, DisplayPort, and/or analog audio and/or analog video standards. For example, the I/O module 312 may be operable to send and receive signals from the controls 322, output video to the display 320, output audio to the speaker 311, handle audio input from the microphone 313, read from and write to cassettes, flash cards, solid state drives, hard disk drives or other external memory attached to the monoscopic 3D video camera 300, and/or output audio and/or video externally via one or more ports such as a IEEE 1394 port, a HDMI and/or an USB port for transmission and/or rendering.
The image sensor(s) 314 may each comprise suitable logic, circuitry, interfaces, and/or code that may be operable to convert optical signals to electrical signals. Each image sensor 314 may comprise, for example, a charge coupled device (CCD) image sensor or a complimentary metal oxide semiconductor (CMOS) image sensor. Each image sensor 314 may capture brightness, luminance and/or chrominance information.
The optics 316 may comprise various optical devices for conditioning and directing EM waves received via the lens 318. The optics 316 may direct EM waves in the visible spectrum to the image sensor(s) 314 and direct EM waves in the infrared spectrum to the depth sensor(s) 308. The optics 316 may comprise, for example, one or more lenses, prisms, luminance and/or color filters, and/or mirrors.
The lens 318 may be operable to collect and sufficiently focus electromagnetic (EM) waves in the visible and infrared spectra.
The display 320 may comprise a LCD display, a LED display, an organic LED (OLED) display and/or other digital display on which images recorded via the monoscopic 3D video camera 300 may be displayed. In an embodiment of the invention, the display 320 may be operable to display 3D images.
The controls 322 may comprise suitable logic, circuitry, interfaces, and/or code that may enable a user to interact with the monoscopic 3D video camera 300. For example, the controls 322 may enable the user to control recording and playback. In an embodiment of the invention, the controls 322 may enable the user to select whether the monoscopic 3D video camera 300 records in 2D mode or 3D mode.
The optical viewfinder 324 may enable a user to view or see what the lens 318 “sees,” that is, what is “in frame”.
In operation, the image sensor(s) 314 may capture brightness, luminance and/or chrominance information associated with a 2D video image frame and the depth sensor(s) 308 may capture corresponding depth information. In various embodiments of the invention, various color formats, such as RGB and YCrCb, may be utilized. The depth information may be stored in the memory 306 as metadata or as an additional layer of information, which may be utilized when rendering a 3D video image from the 2D image information.
In an exemplary embodiment of the invention, the monoscopic 3D video camera 300 may be operable to capture a plurality of 2D image frames and corresponding depth information of the object 201 from a plurality of different viewing angle, utilizing the image sensor(s) 314 and the depth sensor(s) 308, respectively. In this regard, the 2D image frames and the corresponding depth information may be captured while the monoscopic 3D video camera 300 is continuously changing camera positions with respect to the object 201. The monoscopic 3D video camera 300 may store the captured 2D image frames and the captured corresponding depth information in the memory 306.
In an exemplary embodiment of the invention, the processor 304 may be operable to determine one or more of the plurality of different viewing angles for generating 3D images of the object 201. One or more 3D models of the object 201 corresponding to the determined one or more viewing angles may be generated by the processor 304 utilizing the captured 2D image frames and the captured corresponding depth information. The processor 304 may then generate the 3D images of the object 201 corresponding to the determined one or more viewing angles based on the generated one or more 3D models of the object 201. The monoscopic 3D video camera 300 may be configured to output, via the I/O module 312, the 3D images of the object 201 to the display 320. The monoscopic 3D video camera 300 may also be configured to output, via the I/O module 312, the 3D images of the object 201 externally to a 3D video rendering device for rendering the 3D images of the object 201.
In various embodiments of the invention, a monoscopic 3D video generation device such as the monoscopic 3D video camera 300 may comprise one or more depth sensors 308. The monoscopic 3D video camera 300 may be operable to capture a plurality of two-dimensional 2D image frames and corresponding depth information of an object such as the object 201 from a plurality of different viewing angles. The captured plurality of 2D image frames and the captured corresponding depth information may be utilized by a processor 304 in the monoscopic 3D video camera 300 to generate 3D images of the object 201 corresponding to one or more of the plurality of different viewing angles. In this regard, the captured plurality of 2D image frames and the captured corresponding depth information may be stored in the memory 306 in the monoscopic 3D video camera 300. The plurality of 2D image frames may be captured via, for example, one or more image sensors 314 in the monoscopic 3D video camera 300. The corresponding depth information may be captured via, for example, the one or more depth sensors 308. The plurality of 2D image frames and the corresponding depth information may be captured while the monoscopic 3D video camera 300 is continuously changing positions with respect to the object 201. The changed positions may comprise, for example; positions above, below and/or around the object 201.
The processor 304 in the monoscopic 3D video camera 300 may be operable to determine the one or more viewing angles for generating the 3D images of the object 201. One or more 3D models of the object 201 corresponding to the determined one or more viewing angles may be generated by the processor 304 utilizing the captured plurality of 2D image frames and the captured corresponding depth information. The processor 304 may generate the 3D images of the object 201 corresponding to the determined one or more viewing angles based on the generated one or more 3D models of the object 201. The monoscopic 3D video camera 300 may be configured to output, via an I/O module 312, the 3D images of the object 201 to a display 320 in the monoscopic 3D video camera 300. The monoscopic 3D video camera 300 may also be configured to output, via the I/O module 312, the 3D images of the object 201 externally to a 3D video rendering device for rendering the 3D images of the object 201.
Other embodiments of the invention may provide a non-transitory computer readable medium and/or storage medium, and/or a non-transitory machine readable medium and/or storage medium, having stored thereon, a machine code and/or a computer program having at least one code section executable by a machine and/or a computer, thereby causing the machine and/or computer to perform the steps as described herein for utilizing multiple 3D source views for generating 3D image.
Accordingly, the present invention may be realized in hardware, software, or a combination of hardware and software. The present invention may be realized in a centralized fashion in at least one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods described herein is suited. A typical combination of hardware and software may be a general-purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
The present invention may also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which when loaded in a computer system is able to carry out these methods. Computer program in the present context means any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: a) conversion to another language, code or notation; b) reproduction in a different material form.
While the present invention has been described with reference to certain embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the scope of the present invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the present invention without departing from its scope. Therefore, it is intended that the present invention not be limited to the particular embodiment disclosed, but that the present invention will include all embodiments falling within the scope of the appended claims.
This patent application makes reference to, claims priority to, and claims benefit from: U.S. Provisional Application Ser. No. 61/377,867, which was filed on Aug. 27, 2010; andU.S. Provisional Application Ser. No. 61/439,119, which was filed on Feb. 3, 2011. This application also makes reference to: U.S. Patent Application Ser. No. 61/439,193 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23461US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,274 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23462US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,283 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23463US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,130 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23464US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,290 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23465US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,297 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23467US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,201 filed on Feb. 3, 2011;U.S. Patent Application Ser. No. 61/439,209 filed on Feb. 3, 2011;U.S. Patent Application Ser. No. 61/439,113 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23472US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,103 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23473US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,083 filed on Feb. 3, 2011;U.S. patent application Ser. No. ______ (Attorney Docket No. 23474US03) filed on Mar. 31, 2011;U.S. Patent Application Ser. No. 61/439,301 filed on Feb. 3, 2011; andU.S. patent application Ser. No. ______ (Attorney Docket No. 23475US03) filed on Mar. 31, 2011. Each of the above stated applications is hereby incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61439119 | Feb 2011 | US | |
61377867 | Aug 2010 | US |