This disclosure relates generally to the field of image processing. More particularly, but not by way of limitation, it relates to a technique for providing a pseudo-three-dimensional (3D) dynamic rendering of an image on a two-dimensional (2D) display.
A conventional display device (for example, MacBook Pro®, iPad®, iPhone®, and iMac® programmable devices) (“MACBOOK PRO,” “IPAD,” “IPHONE,” and “IMAC” are registered trademarks of Apple Inc.) renders 2D images that are well suited for displaying images that have been captured with conventional cameras that create 2D images.
However, the world is not a 2D world, but a 3D world. Having a technique for generating a pseudo-3D view of 2D images would be useful.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention may be practiced without these specific details. In other instances, structure and devices are shown in block diagram form in order to avoid obscuring the invention. References to numbers without subscripts or suffixes are understood to reference all instance of subscripts and suffixes corresponding to the referenced number. Moreover, the language used in this disclosure has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter, resort to the claims being necessary to determine such inventive subject matter. Reference in the specification to “one embodiment” or to “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment of the invention, and multiple references to “one embodiment” or “an embodiment” should not be understood as necessarily all referring to the same embodiment.
References to “a medium” on which are stored software for causing a programmable device to perform the techniques described below should be understood to encompass multiple physical media. Similarly reference to a programmable control unit that executes the software should be understood to encompass execution of the software by multiple programmable control units.
A technique is presented below for rendering still images and videos that enable a conventional 2D display to give an observer the appearance that the 2D display is actually a window into a 3D world. This technique is referred to herein as “depth rendering.” The depth rendering is accomplished by simulating the physical imaging characteristics of parallax and depth of field. The depth rendering typically employs three inputs: 1) an input image or video; 2) a way to segment the input image into 2 or more distinct regions; and 3) relative orientation information between the display and the observer.
Depth rendering is a dynamic rendering of an image on a 2D display that changes as the observer adjusts their orientation (spatial-relationship) with the device in real time. By segmenting an image into 2 or more different regions the regions can be parameterized and then numerically altered to simulate the effect of viewing a 3D scene. More specifically, depth rendering employs a simulation of parallax and a simulation of depth of field.
The parameters of an image segment that can be altered to create the parallax effect include (but are not limited to) the position, scale, rotation, perspective and distortion of the image segment. The parameters of an image segment that can be altered to create the depth of field effect include (but are not limited to) the blur, sharpness, scale, position, rotation, color, contrast, saturation, hue and luminance of the image segment. In some embodiments, the depth rendering may modify parameters of less than all of the image segments.
Depth rendering involves rendering the various image segments as separate planes that are then superimposed on top of each other. Conceptually the superimposition can be visualized as a vertical stack of image segments ordered by increasing depth from top to bottom of the stack such that segments near the top of the stack may or may not occlude segments near the bottom of the stack. A given image segment is changed by altering one or more of the parameters of the segment. The change in the segment affects the occlusion of the image segments below the given image segment in the stack, which in turn simulates the effect of parallax and or depth of field. By coupling the parameters of the image segments to the relative observer-display orientation data, the effects of parallax and depth of field can be simulated in real time to create a user experience that mimics viewing a 3D scene.
In addition to image segment processing based on relative orientation information, depth rendering can also employ for each image segment conventional image processing effects such as color changes, contrast, hue, saturation or any other image filtering technique for creative or artistic effect. For example, in one embodiment, a specific image segment's saturation or contrast may be adjusted relative to the other image segments to draw the observer's attention to that segment.
The parallax simulation and the depth of field simulation can be controlled automatically based on available orientation data or programmatically by other user input, such as controls of a user interface (UI).
When the input image is a conventional 2D image and a coarse depth map is provided, the input image can be segmented by depth into N separate regions. Since the input image is a conventional 2D image and not a model of a 3D scene, the effect of parallax needs to be approximated and cannot be inferred from the available data. To that end, the scale of the segments may be increased monotonically with stack position with largest scale at the top of the stack. The parallax effect can then be simulated by translating position of the segments relative to each other within the stack as a function of the relative orientation.
In one embodiment, monotonically increasing a scaling parameter for the segments in the stack ensures that all image segments in the stack will necessarily occlude some portions of the neighboring lower segments in the stack. In this embodiment a larger scale parameter implies that the image segment occupies a larger area of the image. In an embodiment where a larger scale parameter indicates the image segment occupies a smaller area of the image, the scaling parameter may be monotonically decreased. The amount of occlusion determines the extent of the parallax effect that can be achieved, with the larger the amount of occlusion, the larger the extent of the parallax effect can be achieved.
For example, if the user moves his/her head to the right (with respect to the display) or if the user turns the device to the left (with respect to the observer's face) then the image segments are to be translated to the left in the rendered image such that the observer perceives the effect of “looking behind” the closer image segments in the stack. In this example, the amount of translation to be applied monotonically increases with stack position such that the segments at the top of the stack (closest) receive the largest translation and the segment at the bottom of the stack receives no translation at all. In some embodiments, movement of the user's head may include turning the head, in addition to or instead of translational movement of the head. One of skill in the art will recognize that either movement of the user or movement of the device (or both) may occur to cause a change in the relative position and orientation of the user and the device.
In observer position 100A, the observer is effectively looking straight on to the rendered image, and background segment 110, mid-ground segment 120, and foreground segment 130 are stacked centered horizontally. Now the observer moves to position 100B to the right side of the image (or the image moves to the left). As illustrated in
As shown in
The storage device 214 is typically a magnetic hard drive, an optical drive, a non-volatile solid-state memory device, or other types of memory systems which maintain data (e.g. large amounts of data) even after power is removed from the system. While
Positioning circuitry such as a Global Positioning System (GPS) receiver 224 may be used to determine the position of the programmable device 100. Similarly, a gyroscope 226 and accelerometer 228 or other motion and rotation-sensing circuitry may provide information to determine the position, movement, and orientation of the programmable device 100. An image sensor 208, such as a camera, may also provide a way for the programmable device 100 to determine the position and orientation of a user relative to the programmable device 100.
Referring now to
The depth rendering techniques described herein can work with any image that can be segmented. For example, this image may be a conventional 2D image captured by a single camera, a stereo image captured with 2 or more 2D cameras, or even a synthetically rendered image (2D or 3D).
Although generally described herein as employing segmentation based on depth (in terms of distance from the observer), any segmentation technique may be used as desired. When a depth map is available, the image may be segmented into regions based on a depth ordering of the segments. The segmentation used for depth rendering may be as coarse as segmenting the 2D image into as few as 2 regions. Alternately, any number of regions may be used as desired and as may be constrained by the practical computational limits imposed by the programmable device used to implement the techniques described herein.
Similarly, the size and position, rotation, perspective, and relative orientation of the image segments in the output image of the depth rendering may be calculated from any available relative orientation data, and depending on what relative orientation data is available, different effects may be applied to the depth rendering.
Using a sequence of images that sweep through different focusing points a coarse model can be generated that separates a subject object from the background of the image. A simple binning of focus groups into near and far may be sufficient for a depth map. A pair of images, one made with flash illumination and one without shows how the flash illuminates the scene and can be used to separate foreground from background objects. In certain types of images, such as portraits, where a portion of the image is well isolated from the rest of the image, segmentation may be done based on that portion of the image. In another technique, multiple images with movement of objects in the image can be used to segment the moving objects from the non-moving objects. Any desired technique for creating a depth map may be used.
In addition, devices equipped with accelerometers and/or gyroscopes, such as the programmable device 100, can detect how an observer holds the device and can track changes in the orientation of the device. Furthermore, devices with built in cameras can detect the presence of an observer and track the orientation of the observer with respect to the device by detecting the location and orientation of the observer's face, body, eyes, gaze, gesture, etc. and track in real time how the relative position and orientation of the observer changes.
The depth map may be used to determine a number of segments at differing depths. In the example depth map 500 of
When starting from a 2D image, less depth information is available. In such a scenario, segmentation may be accomplished using object recognition techniques that detect objects in the image and define segments associated with the recognized objects. Other segmentation techniques may be used as desired, including simple segmenting by color. As stated above, any segmentation technique may be used to segment the image into at least 2 layers. Although 2 layers may be used, having greater than 2 layers may improve the volumetric effect of the pseudo-3D image.
Then in block 830, the pseudo-3D image may be presented to simulate depth of field with parallax. In one embodiment, the scaling is performed corresponding to a depth ordering of the layers, with more foreground layers are scaled monotonically greater than the more background layers, so that each foreground layer is larger relative to its immediately more background layer than in the original image. In one embodiment, color grading or other color differentiation techniques may be used as desired to help the foreground objects “pop out” better. In one embodiment, blurring may be used in addition to or instead of color differentiation techniques, typically blurring background layers more than foreground layers, or making the foreground layers sharper than the background layers.
By moving the stack of layers relative to each other, the pseudo-3D image may be manipulated to show a parallax effect, such as is illustrated in
In some embodiments, in addition to or instead of simple translation of the foreground objects, perspective transformations, such as keystoning, may be used to simulate a 3D rotation of the pseudo-3D image.
Where the foreground segment forms an opening, e.g., a doughnut with a center hole, inpainting techniques may be used as desired to adjust the view of the background. In one embodiment, a background view through a hole may be generated as a separate background segment or incorporated as part of a background segment, so that the background viewed through the hole changes when the foreground segment is translated to produce the parallax effect.
In block 940, the programmable device may detect a change in the orientation of the observer relative to the programmable device. This may involve detection of movement of the observer, detection of movement of the programmable device, or both. For example, a programmable device laying on a static surface may detect that the observer has moved (translation in 1 or more dimensions, rotation about 1 or more axes, or both) relative to the programmable device. In another example, the programmable device may detect that the observer has moved or rotated the programmable device.
After calculating the change in point of view of the observer, in block 950 the layers may be translated to correspond to the change in relative orientation of the observer and the programmable device. In embodiments where rotational or perspective changes are applied in addition to translation, these changes may also be applied to the pseudo-3D image to simulate the view the observer would have of an actual 3D object instead of a 2D image.
In block 960, as discussed above, the programmable device may need to fill or inpaint holes in the more foreground segments to improve the 3D effect.
By using segmentation of a 2D image and modifying their relative scale according to a depth ordering, a pseudo-3D image may be created from the 2D image that allows an observer to observe parallax effects that approximate or simulate the effect of different views of a 3D object. While the effect may be subtle, the pseudo-3D effect can enhance the user experience of the programmable device.
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention therefore should be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.