The human brain gets its three-dimensional (3D) cues in multiple ways. One of these ways is via stereo vision, which corresponds to the difference between viewed images presented to the left and right eye. Another way is by motion parallax, corresponding to the way a viewer's view of a scene changes when the viewing angle changes, such as when the viewer's head moves.
Current 3D displays are based upon stereo vision. In general, 3D televisions and other displays output separate video frames to each eye via 3D goggles or glasses with lenses that block certain frames and pass other frames through. Examples include using two different colors for the left and right images with corresponding filters in the goggles, using the polarization of light and corresponding different polarization for the left and right images, and using shutters in the goggles. The brain combines the frames in way that viewers experience 3D depth as a result of the stereo cues.
Recent technology allows different frames to be directed to each eye without glasses, accomplishing the same result. Such displays are engineered to present different views from different angles, typically by arranging the screen's pixels between some kind of optical barrier or optical lenses.
Three-dimensional display technology works well when the viewer's head is mostly stationary. However, the view does not change when the viewer's head moves, whereby the stereo cues contradict the motion parallax. This contradiction causes some viewers to experience fatigue and discomfort when viewing content on 3D displays.
This Summary is provided to introduce a selection of representative concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used in any way that would limit the scope of the claimed subject matter.
Briefly, various aspects of the subject matter described herein are directed towards a hybrid stereo image/motion parallax technology that uses stereo 3D vision technology for presenting different images to each eye of a viewer, in combination with motion parallax technology to adjust rendering or acquisition of each image for the positions of a viewer's eyes. In this way, the viewer receives both stereo cues and parallax cues as the viewer moves while viewing a 3D scene.
In one aspect, the left and right images captured by a stereo camera and received and processed for motion parallax adjustment according to position sensor data that corresponds to a current viewer position. These adjusted images are then output for separate left and right display to a viewer's left eye and right eye, respectively. Alternatively, the current viewer position may be used to acquire the images of the scene, e.g., by correspondingly moving a robot stereo camera. The technology also applies to multiple viewers viewing the same scene, including on the same screen if independently tracked and given an independent view.
In one aspect, viewer head and/or eye position is tracked. Note that eye position may be tracked directly for each eye or estimated for each eye from head tracking data, which may include the head position in 3D space plus the head's gaze direction (and/or rotation, and possibly more, such as tilt) and thus provides data corresponding to a position for each eye. Thus, “position data” includes the concept of the position of each eye regardless of how obtained, e.g., directly or via estimation from head position data.
Goggles with sensors or transmitters may be used in the tracking, including the same 3D filtering goggles that use lenses or shutters for passing/blocking different images to the eyes; (note that as used herein, a “shutter” is a type of filter, that is, a timed one). Alternatively, computer vision may be used to track the head or eye position, particularly for use with goggle-free 3D display technology. Notwithstanding, a computer vision system may be trained to track the position of goggles or the lens or lenses of goggles.
Tracking the current viewer position corresponding to each eye further allows for images to be acquired or adjusted based on both horizontal parallax and vertical parallax. Thus, tilt, viewing height and head rotation/tilt data for example also may be used in adjusting or acquiring images, or both.
Other advantages may become apparent from the following detailed description when taken in conjunction with the drawings.
The present invention is illustrated by way of example and not limited in the accompanying figures in which like reference numerals indicate similar elements and in which:
Various aspects of the technology described herein are generally directed towards a hybrid stereo image/motion parallax system that uses stereo 3D vision technology for presenting different images to each eye, in combination with motion parallax technology to adjust the left and right images for the positions of a viewer's eyes. In this way, the viewer receives both stereo cues and parallax cues as the viewer moves while viewing a 3D scene, which tends to result in greater visual comfort/less fatigue to the viewer. To this end, the position of each eye (or goggle lens, as described below) may be tracked, directly or via estimation. A 3D image of a scene is rendered in real time for each eye using a perspective projection computed from the point of view of the viewer, thereby providing parallax cues.
It should be understood that any of the examples herein are non-limiting. As such, the present invention is not limited to any particular embodiments, aspects, concepts, structures, functionalities or examples described herein. Rather, any of the embodiments, aspects, concepts, structures, functionalities or examples described herein are non-limiting, and the present invention may be used various ways that provide benefits and advantages in display technology in general.
As is known in single image (“mono”) parallax scenarios, the image captured by a camera can be adjusted by relatively straightforward geometric computations to match a viewer's general head position and thus the horizontal viewing angle. For example, head tracking systems based on camera and computer vision algorithms have been used to implement a “mono 3D” effect, as explained for example in Cha Zhang, Zhaozheng Yin and Dinei Florêncio, “Improving Depth Perception with Motion Parallax and Its Application in Teleconferencing.” Proceedings of MMSP'09, Oct. 5-7, 2009, http://research.microsoft.com/en-us/um/people/chazhang/publications/mmsp09_ChaZhang.pdf. In such a mono-parallax scenario, a “virtual” camera basically exists that seems to move within the scene being viewed as the viewer's head moves horizontally. However, no such known technology works with separate left and right images, and thus stereo images are not contemplated. Moreover, head tilt, viewing height and/or head rotation do not change the viewed image.
Instead of a virtual camera, it is understood that the cameras of
As described herein, motion parallax processing is performed by a motion parallax processing component 112 for left and right images, providing parallax adjusted left and right images 114 and 115, respectively. Note that it is feasible to estimate the eyes' positions from head (or single eye) position data, however this cannot adjust for head tilt, pitch and and/or head gaze rotation/direction unless more information about the head than only its general position is sensed and provided as data to the motion parallax processing component. Accordingly, the sensed position data also may include head tilt, pitch and/or head rotation data.
Thus, as generally represented in
In summary, as generally represented in
As the viewer 110 moves, the position of the viewer is tracked in real time, and translated into corresponding changes in both the left and right images 214 and 215. This results in an immersive 3D experience that combines both stereo cues and motion parallax cues.
Turning to aspects related to position/eye tracking, such tracking may be accomplished in various ways. One way includes multi-purpose goggles that combine stereo filters and a head-tracking device, e.g., implemented as sensors or transmitters in the goggle's stems. Note that various eyewear configured to output signals for use in head-tracking, such as including transmitters (e.g., infrared) that are detected and triangulated, are known in the art. Magnetic sensing is another known alternative.
Another alternative is to use head tracking systems based on camera and computer vision algorithms. Autostereoscopic displays that direct light to individual eyes, and thus are able to provide separate left and right image viewing for 3D effects, are described in U.S. patent application Ser. Nos. 12/819,238, 12/819,239 and 12/824,257, hereby incorporated by reference. Microsoft Corporation's Kinect™ technology has been adapted for head tracking/eye tracking in one implementation.
In general, the computer vision algorithms for eye tracking use models based on the analysis of multiple images of human heads. Standard systems may be used with displays that do not require goggles. However, when the viewer is wearing goggles, a practical problem arises in that goggles cover the eyes, and thus cause many existing face tracking mechanisms to fail. To overcome this issue, in one implementation, face tracking systems are trained with a set of images of people wearing goggles (instead of or in addition to training with images of normal faces). Indeed, a system may be trained with a set of images of people wearing the specific goggles used by a particular 3D system. This results in very efficient tracking, as goggles tend to stand out as a very recognizable object in the training data. In this way, a computer vision-based eye tracking system may be tuned to account for the presence of goggles.
Step 304 represents computing the parallax adjustments based upon the geometry of the viewer's left eye position. Step 306 represents computing the parallax adjustments based upon the geometry of the viewer's right eye position. Note that it is feasible to use the same computation for both eyes, such as if obtained as head position data and rotation and/or tilt are not being considered, since the stereo camera separation already provides some (fixed) parallax differences. However even the small two-inch or so distance between eyes makes a difference in parallax and the resulting viewer perception, including when rotating/tilting the head, and so forth.
Steps 308 and 310 represent adjusting each image based on the parallax-projection computations. Step 312 outputs the adjusted images to the display device. Note that this may be in a conventional signal provided to a conventional 3D display device, or may be separate left and right signals to a display device configured to receive separate images. Indeed, the technology described herein may incorporate the motion parallax processing component 112 (and possibly the sensor or sensors 110) in the display device itself, for example, or may incorporate the motion parallax processing component 112 into the cameras.
Step 314 repeats the process, such as for every left and right frame (or a group of frames/time duration, since a viewer can only move so fast). Note that alternatives are feasible, e.g., the left image parallax adjustment and output make take turns with the right image parallax adjustment and output, e.g., the steps of
Indeed, while the technology described herein has been described with reference to a single viewer, it is understood that multiple viewers of the same display can each receive his or her own parallax adjusted stereo image. Displays that can direct different left and right images to multiple viewers' eyes are known (e.g., as described in the aforementioned patent applications), and thus as long as the processing power is sufficient to sense multiple viewers' positions and perform the parallax adjustments, multiple viewers can simultaneously view the same 3D scene with individual stereo and left and right parallax adjusted views.
As can be seen, there is described herein a hybrid 3D video system that combines stereo display with dynamic composition of the left and right images to enable motion parallax rendering. This may be accomplished by inserting a position sensor in motion parallax goggles, including motion parallax goggles with separate filtering lenses, and/or by computer vision algorithms for eye tracking. Head tracking software may be tuned to account for the viewer wearing goggles.
The hybrid 3D system may be applied to video and/or to graphic applications that display a 3D scene, and thereby allow viewers to physically or otherwise navigate through various parts of a stereo image. For example, displayed 3D scenes may correspond to video games, 3D teleconferences, and data representations.
Moreover, the technology described herein overcomes a significant flaw with current display technology that takes into account only horizontal parallax, namely by also adjusting for vertical parallax, (provided shutter glasses are used, or that the display is able to direct light both horizontally and vertically, unlike some lenticular or other goggle-free technology that can only produce horizontal parallax). The separate eye tracking/head sensing described herein may correct parallax for any head position, (e.g., tilted sideways some number of degrees).
The techniques described herein can be applied to any device. It can be understood, therefore, that handheld, portable and other computing devices and computing objects of all kinds are contemplated for use in connection with the various embodiments. Accordingly, the below general purpose remote computer described below in
Embodiments can partly be implemented via an operating system, for use by a developer of services for a device or object, and/or included within application software that operates to perform one or more functional aspects of the various embodiments described herein. Software may be described in the general context of computer executable instructions, such as program modules, being executed by one or more computers, such as client workstations, servers or other devices. Those skilled in the art will appreciate that computer systems have a variety of configurations and protocols that can be used to communicate data, and thus, no particular configuration or protocol is considered limiting.
With reference to
Computer 410 typically includes a variety of computer readable media and can be any available media that can be accessed by computer 410. The system memory 430 may include computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) and/or random access memory (RAM). By way of example, and not limitation, system memory 430 may also include an operating system, application programs, other program modules, and program data.
A viewer can enter commands and information into the computer 410 through input devices 440. A monitor or other type of display device is also connected to the system bus 422 via an interface, such as output interface 450. In addition to a monitor, computers can also include other peripheral output devices such as speakers and a printer, which may be connected through output interface 450.
The computer 410 may operate in a networked or distributed environment using logical connections to one or more other remote computers, such as remote computer 470. The remote computer 470 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, or any other remote media consumption or transmission device, and may include any or all of the elements described above relative to the computer 410. The logical connections depicted in
As mentioned above, while exemplary embodiments have been described in connection with various computing devices and network architectures, the underlying concepts may be applied to any network system and any computing device or system in which it is desirable to improve efficiency of resource usage.
Also, there are multiple ways to implement the same or similar functionality, e.g., an appropriate API, tool kit, driver code, operating system, control, standalone or downloadable software object, etc. which enables applications and services to take advantage of the techniques provided herein. Thus, embodiments herein are contemplated from the standpoint of an API (or other software object), as well as from a software or hardware object that implements one or more embodiments as described herein. Thus, various embodiments described herein can have aspects that are wholly in hardware, partly in hardware and partly in software, as well as in software.
The word “exemplary” is used herein to mean serving as an example, instance, or illustration. For the avoidance of doubt, the subject matter disclosed herein is not limited by such examples. In addition, any aspect or design described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects or designs, nor is it meant to preclude equivalent exemplary structures and techniques known to those of ordinary skill in the art. Furthermore, to the extent that the terms “includes,” “has,” “contains,” and other similar words are used, for the avoidance of doubt, such terms are intended to be inclusive in a manner similar to the term “comprising” as an open transition word without precluding any additional or other elements when employed in a claim.
As mentioned, the various techniques described herein may be implemented in connection with hardware or software or, where appropriate, with a combination of both. As used herein, the terms “component,” “module,” “system” and the like are likewise intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on computer and the computer can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.
The aforementioned systems have been described with respect to interaction between several components. It can be appreciated that such systems and components can include those components or specified sub-components, some of the specified components or sub-components, and/or additional components, and according to various permutations and combinations of the foregoing. Sub-components can also be implemented as components communicatively coupled to other components rather than included within parent components (hierarchical). Additionally, it can be noted that one or more components may be combined into a single component providing aggregate functionality or divided into several separate sub-components, and that any one or more middle layers, such as a management layer, may be provided to communicatively couple to such sub-components in order to provide integrated functionality. Any components described herein may also interact with one or more other components not specifically described herein but generally known by those of skill in the art.
In view of the exemplary systems described herein, methodologies that may be implemented in accordance with the described subject matter can also be appreciated with reference to the flowcharts of the various figures. While for purposes of simplicity of explanation, the methodologies are shown and described as a series of blocks, it is to be understood and appreciated that the various embodiments are not limited by the order of the blocks, as some blocks may occur in different orders and/or concurrently with other blocks from what is depicted and described herein. Where non-sequential, or branched, flow is illustrated via flowchart, it can be appreciated that various other branches, flow paths, and orders of the blocks, may be implemented which achieve the same or a similar result. Moreover, some illustrated blocks are optional in implementing the methodologies described hereinafter.
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
In addition to the various embodiments described herein, it is to be understood that other similar embodiments can be used or modifications and additions can be made to the described embodiment(s) for performing the same or equivalent function of the corresponding embodiment(s) without deviating therefrom. Still further, multiple processing chips or multiple devices can share the performance of one or more functions described herein, and similarly, storage can be effected across a plurality of devices. Accordingly, the invention is not to be limited to any single embodiment, but rather is to be construed in breadth, spirit and scope in accordance with the appended claims.