This application includes subject matter protected by copyright. All rights are reserved.
1. Technical Field
This disclosure relates generally to auto-stereoscopic 3D display technologies and methods.
2. Background of the Related Art
Stereopsis is the process in visual perception leading to the sensation of depth from two slightly different projections of the world onto the retina of each eye. The differences in the two retinal images are referred to as binocular disparity.
Auto-multiscopy is a method of displaying three-dimensional (3D) images that can be viewed without the use of special headgear or glasses by the viewer. This display method produces depth perception in the viewer, even though the image is produced by a flat device. Several technologies exist for auto-multiscopic 3D displays, such as a flat-panel solution that use lenticular lenses. If the viewer positions his or her head in certain viewing positions, he or she will perceive a different image with each eye, thus providing a stereo image.
This disclosure provides an automatic method for producing 3D multi-view interweaved image(s) from a stereoscopic image pair source to be displayed via an auto-multiscopic display. The technique is optimized to allow its use as part of a real-time 3D video handling system.
Preferably, the 3D interweaved image(s) are generated from a stereo pair where partial disparity is calculated between the pixels of the stereo images. The partial disparity information is then used at a sub-pixel level to produce a series of target (intermediary) views for the sub-pixel components at each image position (x, y). Then, these target views are used to generate a desired number of views resulting in glass-free 3D via an auto-multiscopic display. The technique more efficiently preserves the resolution of the High-Definition (HD) video content (e.g., 1080p or higher) than what is currently available from the prior art.
The technique may be used with or in conjunction with auto-multiscopic 3D displays, such as a flat panel display using a lenticular lens.
The foregoing has outlined some of the more pertinent features of the invention. These features should be construed to be merely illustrative. Many other beneficial results can be attained by applying the disclosed invention in a different manner or by modifying the invention as will be described.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
Image capture using a camera (such as illustrated in
As illustrated in
More specifically, the partial disparity analyser process 200 is triggered via a start signal (step 1) from an external process or processor (not shown). Upon receiving the start signal, the partial disparity analyser 200 reads from memory 204 the content of the left 206 and right 208 images of the stereo pair; it then calculates the disparity segments for each specific patch of X lines and Y columns (as described in more detail below). The partial disparity analyser 200 fetches the required number of pixels for each of the X lines and Y columns patch being analyzed from the left 206 and right 208 images. The resulting disparity segments 210 are stored in memory 204 for later use by the sub-pixel view generator 202.
The sub-pixel view generator 202 is fed with sub-pixel target views 214 for Blue (Btv), Green (Gtv) and Red (Rtv) sub-components based on the processing performed by a per pixel loop 216; loop 216 is responsible for selecting the proper target views based on the disparity segments 210 determined by the partial disparity analyzer 200. The sub-pixel view generator 202 uses the sub-pixel target views 214, the left 206 and right 208 images and the disparity segments 210 to interweave each sub-pixel into the proper target view, which results in an interweaved image 216 that is stored in memory 204. After processing every pixel of the left 206 and right 208 images stored in memory 204, the sub-pixel view generator 202 sets a done signal to notify the external process or processor that the interweaved image 216 is ready to be stored on a media storage and/or transferred to a 3D display.
The following provides additional details regarding the partial disparity analyzer, and the sub-pixel view generator components/functions.
Stereo matching by computing correlation or sum of squared differences is a known technique. Disparity computation is commonly done using digital stereo images, but only on a pixel basis. According to the partial disparity analysis of this disclosure, partial disparity information is retrieved (or obtained) preferably by taking a “patch” (a group of N consecutive sub-pixels) every (StepX, StepY) pixels in a first (e.g. left) image, and then finding a best corresponding patch at each valid disparity between a searching range (position−StepX to position+StepX) in a second (e.g., right) image. For example, for a disparity of 0, the two patches are at the exact same location in both images. For a disparity of 1, the patch in the right image is moved one (1) pixel to the left. The absolute difference is then computed for corresponding sub-pixels in each patch. These absolute differences are then summed to compute a final SAD (“sum of absolute difference”) score. After this SAD score has been computed for all valid disparities in the search range, preferably the disparity that produces the lowest SAD score is determined to be the disparity at that location in the right image.
As the image view generator proceeds, the left image begins to distort and fades out, while the right image is already distorted toward the left and faded in. Generally, the goal of the view generator/interweaver component is to smooth out the distortion between the left and right images of a stereoscopic pair. For each intermediate view generated (and inserted) between the leftmost and rightmost views, preferably the distortion is compensated by a factor based on a position of the generated target view relative to the leftmost and rightmost images. Therefore, at the beginning of the process, the first generated views (images) are much like the left source image, while the middle generated view (image) is a blend of the left source image distorted halfway toward the right view (image) source and the right source image distorted halfway back toward the left one. The last generated images typically are similar to the right source image. More specifically, typically the distortion is balanced between the leftmost and the rightmost image based on percentages that reflect the relative position of the target view, preferably as follows:
Percentage of leftmost view=1−(Target View #)/Total # of Target Views
Percentage of rightmost view=(Target View #)/Total # of Target Views
This is illustrated in
A preferred implementation of the “line pairs” technique is as follows. In particular, preferably the line pairs are relocated by using control points that are explicitly specified. Preferably, the lines are then moved exactly where they are projected. All that is not located on the lines is relatively projected to that position. Preferably, the influence of the differences between lines and of the weight ratio for each distance is further adjusted by additional constant values (described in more detail below). These constants facilitate preserving the quality of the stereopsis. Preferably, all segments of lines are referenced for each pixel and the deformation by influence is global. The sum of iterations for each image/frame to be performed preferably is proportional to the product of the pixel count of the images/frame and the number of line pairs used. Preferably, the number of line pairs is directly linked to the distance between two points of the disparity analyzer. A default number for the width of the patch is 128, although this is not limiting. Using different values influences the performance of the algorithm.
Using a stereoscopic pair as a reference target for the leftmost and rightmost views, along with the calculated partial disparity list segment pair generated by the disparity analyzer module (see
By way of example only, a positive slant for a nine (9) view lens would be represented by the 3×9 pixels patch 700 shown in
The purpose of a pair of lines is to define, identify and position a mapping from one image to the other (one pair of lines defined relative to the left image and one pair of lines relative to the right image). Lines are specified by pairs of pixel coordinates (PQ), scalars are bold lowercase italics, and primed variables (X′, u′) are values defined relative to the Right image. The term line means a directed line segment. A pair of corresponding lines in the left and right image defines the coordinate mapping from the destination image pixel coordinate X to the left targeted image pixel coordinate X′ such that, for a line PQ in the left image, there is P′Q′ in the right image.
There are two perpendicular vectors with the same length as the input vector; either the left or right one can be used, as long as it is consistently used throughout. The value u is the position along the line, and v is the distance from the line. The value u goes from 0 to 1 as the pixel moves from P to Q, and is less than 0 or greater than 1 outside that range. The value for v is the perpendicular distance in pixels from the line. If there is just one line pair, the transformation of the image proceeds as follows.
For each pixel X in the Left image, find the corresponding u, v, find the X′ in the Right image for that u, v such that: LeftImage(X)=RightImage(X′).
Preferably, all pixel coordinates are transformed by either a rotation, translation, and/or a scale. Preferably, the pixels lengthwise of the line in the source image are copied above the line in the targeted image. Because only the u coordinate is normalized by the length of the line, (the v is always the distance in pixels), preferably the target views are scaled along the direction by the ratio of the length of the lines. Preferably, the scaling is applied in the direction of the line.
For all coordinate transformation, preferably a weight value is calculated for each line as follows. For each line pairs, a Xi′ position is calculated. For the left destination image, the difference between the pixel location is the displacement Di=Xi′−X. A weighted average of those displacements is then calculated. The weighted average (value) represents the distance from X to the line.
To determine the X position sampled in the left image, preferably the average value of all displacements is added to the current pixel location X′. As long as the position remains anywhere within the image the weight never goes to zero; the weight assigned to each line is stronger when the pixel is exactly on the line, and weaker when the pixel is further away from it.
A representative implementation of the “transformation of all of the pair lines” process is provided by the code illustrated in
Because the “lines” are directed line segments, the distance from a line to a point depends on the value of u as follows:
if 0<u<1: the distance is abs (v)
if u<0: the distance is from P to the point
if u>1: the distance is from Q to the point.
In
The final mapping of the pixel operation blends the stereo pairs with one another (left and right) based on the relative position of the (intermediate) target views between the leftmost and rightmost views. To achieve this, a corresponding set of lines in the left and in the right images (line pairs) is defined. Each occurring target view is then specified by generating a new set of line segments, and then interpolating these lines from their positions in left to the positions in right. This technique is illustrated in
An example of the metamorphosis process for components Blue, Green and Red is shown in
The above process bring a significant improvement when compared to simply cross-dissolving the left and right image to obtain an intermediate view. When comparing the result, the partial disparity analysis and the view generator/interweaver processes deliver more realistic results with smoother transition between the intermediate target views and better preserve the High Definition (HD) resolution than what is possible with the prior art.
Thus, according to this disclosure, a computationally-efficient method is described to compute partial disparity information to generate multiple images from a stereoscopic pair in advance of an interweaving process for the display of the multiple images onto an auto-stereoscopic (glass-free) 3D display. The partial disparity information may be calculated as part of a real-time 3D conversion or as an off-line (non-real-time) 3D conversion for auto-stereoscopic display. Preferably, the partial disparity information is calculated at an interval of X horizontal lines and at an interval of Y vertical lines. In particular, in a preferred embodiment, the partial disparity information is derived by calculating a sum of all differences (SAD) inside a range of a specified number of pixels to the left and to the right of a reference position (at which the partial disparity information is desired to be calculated). In operation, a reference value for the SAD calculation is obtained from the left image of the stereo pair and calculated using a range of pixels from the right image, and vice versa. In a preferred embodiment, the “best” SAD score is a lowest calculated SAD value for each position between a leftmost and rightmost range from the reference position. After the calculation, coordinates of the position with the lowest SAD score are then grouped to form a list of line segment pairs that correspond to disparity line pairs. The disparity line pairs identify and position a mapping from a position in the left image and a position of the same element in the right image. The calculated disparity line pairs are used to control a deformation (by relative influence) to the distance between the pixel and the disparity lines. In particular, the lines are specified by a pair of pixel coordinates in the left image and a pair of pixel coordinates in the right image such that, for a disparity line in the left image, there is a corresponding line in the right image. In this approach, a distortion correction is calculated as a percentage of the leftmost view and a percentage of the rightmost view. Preferably, the percentage from the leftmost view is calculated by dividing a view number of a target view by a total number of target views and subtracting the resulting value from one (1), and vice versa from the rightmost view. The calculated percentages are then applied to line pairs to control the deformation between intermediate views by applying a relative influence to the distance between the pixel and the disparity lines.
Thus, the above-described technique determines disparity line pairs that are then used to determine an amount of transformation that needs to be applied to an intermediate view that lies between left and right images of a stereo pair. The amount of transformation may be a rotation, a translation, a scaling, or some combination. Preferably, the amount of transformation for each pixel in a given intermediate view is influenced by a weighted average distance of the pixel and a nearest point on all of the disparity lines (as further adjusted by one or more constant values). Preferably, the distance between a pixel and a disparity line is calculated by tracing a perpendicular line between a disparity line and the pixel. In the described approach, a first constant is used to adjust the weighted average distance to smooth out the transformation. A second constant is used to establish strengths of the different disparity lines relative to the distance of the pixel from the disparity line. A third constant adjusts the influence of each line depending on the length of each disparity line. Preferably, the transformation is applied in the direction of the disparity lines; in the alternative, the transformation is applied from the line toward the pixel. The direction of the transformation is applied uniformly for all pixels and disparity lines in the preferred approach. The transformation results are generated and stored for each intermediate view, or generated and stored only for a final interweaved view.
In the described approach, preferably the final mapping of each pixel in the resulting interweaved image blends the stereo pair (left and right image) with one another based on the relative position of the intermediate target views between the left and right images of the original stereo pair. The final mapping preferably assigns a value to each sub-pixel (RGB, or BGR) based on a most relevant intermediate view for each sub-pixel of the pixel. The most relevant intermediate view for each sub-pixel at the pixel position preferably is determined by a factor based on the position of the generated target view relative to the leftmost and the rightmost images.
The disclosed technique may be used in a number of applications. One such application is a 3D conversion device (3D box or device) that can accept multiple 3D formats over a standard video interface. The 3D conversion box implements the above-described technique. For instance, version 1.4 of the HDMI specification defines the following formats: Full resolution Side-by-Side, Half resolution Side-by-Side, Frame alternative (used for Shutter glasses solutions), Field alternative, Left+depth, and Left+depth+Graphics+Graphics depth.
A 3D box may be implemented in two (2) complementary versions, as shown in
A representative design of a hardware platform required to deliver the above 3D Box is based on the use of a digital signal processor/field-programmable gate array (DSP/FPGA) platform with the required processing capabilities. To allow for the embedding of this capability in a variety of devices including, but not limited to, an auto-multiscopic display, the DSP/FPGA may be assembled as a module 1800 as shown in
As previously noted, the hardware and software systems in which the partial disparity information computation is implemented are merely representative. The inventive functionality may be practiced, typically in software, on one or more machines. Generalizing, a machine typically comprises commodity hardware and software, storage (e.g., disks, disk arrays, and the like) and memory (RAM, ROM, and the like). An apparatus for carrying out the computation comprises a processor, and computer memory holding computer program instructions executed by the processor for carrying out the one or more described operations. The particular machines used in a system of this type are not a limitation. One or more of the above-described functions or operations may be carried out by processing entities that are co-located or remote from one another. A given machine includes network interfaces and software to connect the machine to a network in the usual manner. A machine may be connected or connectable to one or more networks or devices, including display devices. More generally, the above-described functionality is provided using a set of one or more computing-related entities (systems, machines, processes, programs, libraries, functions, or the like) that together facilitate or provide the inventive functionality described above. A representative machine is a network-based data processing system running commodity hardware, an operating system, an application runtime environment, and a set of applications or processes that provide the functionality of a given system or subsystem. As described, the product or service may be implemented in a standalone server, or across a distributed set of machines.
The functionality may be integrated into a camera, an audiovisual player/system, an audio/visual receiver, or any other such system, sub-system or component. As illustrated and described, the functionality (or portions thereof) may be implemented in a standalone device or component.
While the above describes a particular order of operations performed by certain embodiments, it should be understood that such order is exemplary, as alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, or the like. References in the specification to a given embodiment indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic.
While given components of the system have been described separately, one of ordinary skill will appreciate that some of the functions may be combined or shared in given instructions, program sequences, code portions, and the like.
This application is based on and claims priority from Ser. No. 61/311,889, filed Mar. 9, 2010.
Number | Date | Country | |
---|---|---|---|
61311889 | Mar 2010 | US |