This application includes subject matter protected by copyright. All rights are reserved.
1. Technical Field
This disclosure relates generally to stereoscopic 3D content conversion and display technologies, methods and apparatus.
2. Background of the Related Art
Stereopsis is the process in visual perception leading to the sensation of depth from two slightly different projections of the world onto the retina of each eye. The differences in the two retinal images are referred to as binocular disparity.
It is desirable to be able to convert two-dimensional (“2D”) monoscopic (“2D”) content, such as video content, to three-dimensional (“3D”) stereoscopic content and, in particular, by creating a pair of left/right images for each original (source) 2D video frame. Such images can then be used for various display purposes, e.g., in auto-multiscopic displays. Auto-multiscopy is a method of displaying three-dimensional (3D) images that can be viewed without the use of special headgear or glasses by the viewer. This display method produces depth perception in the viewer, even though the image is produced by a flat device. Several technologies exist for auto-multiscopic 3D displays, such as a flat-panel solution that use lenticular lenses. If the viewer positions his or her head in certain viewing positions, he or she will perceive a different image with each eye, thus providing a stereo image.
This disclosure provides an automated method for producing 3D stereoscopic image pairs (left and right) from a single image source, such as a 2D video frame. The resulting 3D stereoscopic video content is then displayable as 3D content, e.g., on a compatible 3D display (such as a 3D TV), or stored (e.g., on permanent storage) for further processing (e.g., a video editing process).
The disclosure technique preferably is computationally-efficient and thus optimized to allow its use as part of a real-time 2D-to-3D video conversion solution. In one embodiment, the 3D stereoscopic image(s) are generated from a single source image by creating a pair of “luminosity maps” (or, more generally, data sets or structures) to assist into the separation of various elements of the 2D image. As used herein, a “luminosity map” identifies a set of absolute difference values that are generated by processing a source image frame. Each absolute difference value represents an absolute difference measured between a given pixel and the pixel at a distance of luminosity patch offset pixels to the left and to the right of the given pixel. Preferably, the luminosity maps are used in a volumetric stereopsis creation routine (a machine-implemented process) to generate disparity information between the illuminated elements in a more natural way than what is possible using other known approaches (e.g., a depth map approach). The disparity information that is generated in this manner is then used to generate left and right images that can be combined (e.g., using known 3D stereo encoding techniques, such as side-by-side, top-bottom, frame sequential or other formats) to form the stereo pair.
The automatic conversion technique more efficiently generates a realistic separation of the elements of the 2D image to improve the relative depth perception and to preserve the resolution of High Definition (HD) video content (e.g., 1080p or higher) than what is currently available from the prior art. This automatic conversion technique can be used as part of an off-line 3D conversion process to reduce and sometimes eliminate the need to apply additional transformations to the original 2D image to create a more elaborate 3D image.
The foregoing has outlined some of the more pertinent features of the invention. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed invention in a different manner or by modifying the invention as will be described.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
As noted, image capture using a camera (such as illustrated in
By way of additional background, the volumetric stereopsis creation process of this disclosure preferably leverages the notion of specular reflection, wherein an angle of reflection (θr) of a light source is equal (and coplanar) to an angle of incidence (θi) of the light source. This property is illustrated in
Referring now to
Preferably, the luminosity patch offset value also is based on an actual width of the object, which may be determined via a typical edge detection technique such as a Canny edge detector, a differential edge detector, or the like. Although in one embodiment, the edge detection method is applied on a per-line basis, this is not a limitation, as it is also possible to apply the edge detection technique on multiple lines. The wider an object is within a video frame, the wider the luminosity patch offset value should be to generate a luminosity map that more closely mimics the natural stereopsis of the human brain. To handle every desired object in an image frame, preferably the luminosity patch offset is adjusted dynamically while processing each pixel (and its components), in particular, by incorporating an additional edge detection step prior to the generation of the luminosity map. This prior step is used (optionally) to define the luminosity patch offset as a fraction of the width of each detected object. By default, the luminosity patch offset is adjusted by a fixed percentage (usually configured between a minimum and a maximum percentage) of the detected width, but in one implementation, this fixed percentage may be controlled by an operator via a user interface, or programmatically.
As can be seen in
Preferably, the volumetric stereopsis creation process is executed separately for each component (RGB) of a pixel to avoid unwanted artifacts. This means that the luminosity patch offset as described preferably is applied to each component of a given pixel. For performance reasons, especially when the volumetric stereopsis creation process is used as part of a real-time 3D conversion solution, it is desirable to apply the process uniformly to each sub-component of the pixel in a single pass. The luminosity map for each pixel preferably is determined by subtracting from it the value of a pixel component to the left of a current pixel component for the WEST side, and by subtracting the value of the pixel component to the right of the current pixel component for the EAST side. This approach helps to enhance the details of certain elements of an image without the need to manually manipulate each pixel.
The volumetric stereopsis creation routine works as follows. For each frame in a movie or sequence of images that are desired to be processed, preferably each line of each frame is processed in sequence. The luminosity map is calculated for each pixel to determine the edge of each object and to enhance the edges by increasing the specular reflection as needed (based on the width of the luminosity patch offset) until each pixel of each desired image in the movie or sequence is processed. Using the original source image (
As a result of executing the volumetric stereopsis creation routine, the left image (such as shown in
If desired, the 2D to 3D volumetric stereopsis creation process is parameterized to adjust the 3D effects for each video frame (or image). One embodiment uses the following parameters:
The code listing in
Generally, and has been described, the process determines a difference between a leftmost pixel and a rightmost pixel of a luminosity patch offset determined by a configurable luminosity patch offset parameter. When the difference is zero, the luminosity patch is still within the boundaries of an object and there are no edges to enhance. A small difference may represent a fluctuation of shade (or color) inside an object, while a larger difference indicates an edge of the object being illuminated (which edge should therefore be further illuminated). By enhancing the edges (and therefore the perceived volume of the object with specular reflection) in this manner, the image mimics the way the left eye and the right eye perceive a given object from slightly different angles, thereby helping to trigger stereopsis in the human brain. If desired, this stereopsis can be further enhanced by disparity techniques that are beyond the scope of this disclosure.
The processing loop defined in
Preferably, the processing loop in
In other words, preferably the luminosity process described above operates on a per-pixel basis and does not use nor require any knowledge about the calculated luminosity values of other pixels.
Thus, according to this disclosure, for each pixel (of each frame of a 2D source image), a left value is generated, as well as a right value. The left values for all pixels in the frame are collected and comprise the left luminosity map (
In a preferred embodiment, and for each pixel in a current frame of the source image, the value of the corresponding pixel in the left image of a stereoscopic pair is obtained by calculating the difference between the left luminosity value and the right luminosity value. The resulting difference is then multiplied by a color threshold to eliminate any potential brightness problem and then added to the value of the pixel in the current frame to generate the corresponding pixel in the left image of a stereoscopic pair. The value of the corresponding pixel in the right image of a stereoscopic pair is obtained by calculating the difference between the right luminosity value and the left luminosity value. The resulting difference is multiplied by a color threshold to eliminate any potential brightness problem and then added to the value of the pixel in the current frame to generate the corresponding pixel in the right image of the stereoscopic pair.
The above-described technique can be optimized to operate within a single processing loop so that storage of the luminosity maps is obviated.
The disclosed technique may be used in a number of applications. One such application is a 3D conversion device (3D box or device) that can accept multiple 3D formats over a standard video interface. The 3D conversion box implements the above-described technique. For instance, version 1.4 of the HDMI specification defines the following formats: Full resolution Side-by-Side, Half resolution Side-by-Side, Frame alternative (used for Shutter glasses solutions), Field alternative, Left+depth, and Left+depth+Graphics+Graphics depth.
A 3D box may be implemented in two (2) complementary versions, as shown in
A representative design of a hardware platform required to deliver the above 3D Box is based on the use of a digital signal processor/field-programmable gate array (DSP/FPGA) platform with the required processing capabilities. To allow for the embedding of this capability in a variety of devices including, but not limited to, an auto-multiscopic display, the DSP/FPGA may be assembled as a module 1800 as shown in
The above-described hardware and/or software systems in which the technique for producing 3D stereoscopic image pairs (left and right) from a single image source are implemented are merely representative. The described functionality may be practiced, typically in software, on one or more computing machines. Generalizing, a computing machine typically comprises commodity hardware and software, storage (e.g., disks, disk arrays, and the like) and memory (RAM, ROM, and the like). An apparatus for carrying out the computation (or portions thereof) comprises a processor, and computer memory holding computer program instructions executed by the processor for carrying out the one or more described operations. The particular machines used in a system of this type are not a limitation. One or more of the above-described functions or operations may be carried out by processing entities that are co-located or remote from one another. A given machine includes network interfaces and software to connect the machine to a network in the usual manner. A machine may be connected or connectable to one or more networks or devices, including display devices. More generally, the above-described functionality is provided using a set of one or more computing-related entities (systems, machines, processes, programs, libraries, functions, or the like) that together facilitate or provide the inventive functionality described above. A representative machine is a network-based data processing system running commodity hardware, an operating system, an application runtime environment, and a set of applications or processes that provide the functionality of a given system or subsystem. As described, the product or service may be implemented in a standalone server, or across a distributed set of machines.
The functionality may be integrated into a camera or other image capture or processing device/apparatus/machine, an audiovisual player/system, an audio/visual receiver, or any other such system, sub-system or component. As illustrated and described, the functionality (or portions thereof) may be implemented in a standalone device or component.
While the above describes a particular order of operations performed by certain embodiments, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
While given components of the system have been described separately, one of ordinary skill will appreciate that some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like.
Having described our invention, what we now claim is set forth below.
This application is based on and claims priority from Ser. No. 61/385,319, filed Sep. 22, 2010.
Number | Date | Country | |
---|---|---|---|
61385319 | Sep 2010 | US |