1. Field of the Invention
The present invention relates generally to hardware and software for user interaction with digital three-dimensional data. More specifically, it relates to devices having displays and to human interaction with data displayed on the devices.
2. Description of the Related Art
The amount of three-dimensional content available on the Internet and in other contexts is increasing at a rapid pace. Consumers are getting more accustomed to hearing about “3D” in various contexts, such as movies, video games, and online virtual cities. Three-dimensional content may be found in medical imaging (e.g., examining MRIs), modeling and prototyping, information visualization, architecture, tele-immersion and collaboration, geographic information systems (e.g., Google Earth), and in other fields. Current systems, including computers and cell phones, but more generally, content display systems (e.g., TVs) fall short of taking advantage of 3D content by not providing an immersive user experience. For example, they do not provide an intuitive, natural and unobtrusive interaction with 3-D objects.
With respect to mobile devices, presently, such devices do not provide users who are seeking interaction with digital 3D content on their mobile devices with a natural, intuitive, and immersive experience. Mobile device users are not able to make gestures or manipulate 3D objects using bare hands in a natural and intuitive way.
Although some displays allow users to manipulate 3D content with bare hands in front of the display (monitor), current display systems that are able to provide some interaction with 3D content require inconvenient or intrusive peripherals that make the experience unnatural to the user. For example, some current methods of providing tactile or haptic feedback require vibro-tactile gloves. In other examples, current methods of rendering 3-D content include stereoscopic displays (requiring the user to wear a pair of special glasses), auto-stereoscopic displays (based on lenticular lenses or parallax barriers that cause eye strain and headaches as usual side effects), head-mounted displays (requiring heavy head gear or goggles), and volumetric displays, such as those based on oscillating mirrors or screens (which do not allow bare hand direct manipulation of 3-D content).
In addition mobile device displays, such as displays on cell phones, only allow for a limited field of view (FOV). This is due to the fact that the mobile device display size is generally limited by the size of the device. For example, the size of a non-projection display cannot be larger than the mobile device that contains the display. Therefore, existing solutions for mobile displays (which are generally light-emitting displays) limit the immersive experience for the user. Furthermore, it is presently difficult to navigate through virtual worlds and 3-D content via a first-person view on mobile devices, which is one aspect of creating an immersive experience. Mobile devices do not provide satisfactory user awareness of virtual surroundings, another important aspect of creating an immersive experience.
Some display systems require a user to reach behind the monitor. However, in these systems the user's hands must physically touch the back of the monitor and is only intended to manipulate 2-D images, such as moving images on the screen.
A user is able to use a mobile device having a display, such as a cell phone or a media player, to view and manipulate 3D content displayed on the device by reaching behind the device and manipulating a perceived 3D object. The user's eyes, device, and a perceived 3D object are aligned or “in-line,” such that the device performs as a type of in-line mediator between the user and the perceived 3D object. This alignment results in a visual coherency to the user when reaching behind the device to make hand gestures and movements to manipulate the 3D content. That is, the user's hand movements behind the device are at a natural and intuitive distance and are aligned with the 3D object displayed on the device monitor so that the user has a natural visual impression that she is actually handling the 3D object shown on the monitor.
One embodiment of the present invention is a method of detecting manipulation of a digital 3D object displayed on a device having a front side with a display monitor facing the user and a back side having a sensor facing away from the user. A hand or other object may be detected within a specific area of the back side of the device having the sensor, such as a camera. The hand is displayed on the monitor and its movements within a specific area of the back side of the device are tracked. The movements are the result of the user intending to manipulate the displayed 3D object and are made the user in manipulating a perceived 3D object behind the device, but without having to physically touch the backside of the device. A collision between the displayed hand and the displayed 3D object may be detected by the device resulting in a modification of the image of the 3D object displayed on the device. In this manner the device functions as a 3D in-line mediator between the user and the 3D object.
In another embodiment, a display device includes a processor and a memory component storing digital 3D content data. The device also includes a tracking sensor component for tracking movement of an object that is in proximity of the device. In one embodiment, the tracking sensor component faces the back of the device (away from the user) and is able to detect movements and gestures of a hand of a user who reaches behind the device. A hand tracking module processes movement data from the tracking sensor and a collision detection module detects collisions between a user's hand and a 3D object.
References are made to the accompanying drawings, which form a part of the description and in which are shown, by way of illustration, particular embodiments:
Methods and systems for using a display device as a three-dimensional (3D) in-line mediator for interacting with digital 3D content displayed on the device are described in the various figures. The use of a display device as an in-line mediator enables intuitive bare hand manipulation of digital 3D content by allowing a user to see the direct effect of the user's handling of the 3D content on the display by reaching behind the display device. In this manner, the device display functions as an in-line mediator between the user and the 3D content, enabling a type of visual coherency for the user. That is, the 3D content is visually coherent or in-line from the user's perspective. The user's hand, or a representation of it, is shown on the display, maintaining the visual coherency. Furthermore, by reaching behind the device, the user's view of the 3D content on the display is not obstructed by the user's arm or hand.
Mobile device 104 may be a cell phone, a media player (e.g., MP3 player), portable gaming device, or any type of smart handset device having a display. It is assumed that the device is IP-enabled or capable of connecting to a suitable network to access 3D content over the Internet. However, the various embodiments described below do not necessarily require that the device be able to access a network. For example, the 3D content displayed on the device may be resident on a local storage of the device, such as on a hard disk or other mass storage component or on a local cache. The sensor component on mobile device 104 and the accompanying sensor software may be one or more of various types of sensors, a typical one being a conventional camera. Implementations of various sensor components are described below. Although the methods and systems of the various embodiments are described using a mobile device, they may equally apply to nomadic devices, such as laptops and netbook computers (i.e., devices that are portable), and to stationary devices, such as desktop computers, workstations, and the like, as shown in
Similarly,
The sensor component, also referred to as a tracking component, may be implemented using various types of sensors. These sensors may be used to detect the presence of a user's hand (or any object) behind the mobile device's monitor. In a preferred embodiment, a standard or conventional mobile device or cell phone camera is used to sense the presence of a hand and its movements or gestures. Image differentiation or optic flow may also be used to detect and track hand movement. In other embodiments, a conventional camera may be replaced with infrared detection components to perform hand detection and tracking. For example, a mobile device camera facing away from the user and that is IR sensitive (or has its IR filter removed), possibly in combination with additional IR illumination (e.g., LED), may look for the brightest object within the ranger of the camera, which will likely be the user's hand. Dedicated infrared sensors with IR illumination may also be used. In another embodiment, redshift thermal imaging may be used. This option provides passive optical components that redshift a standard CMOS imager to be able to detect long wavelength and thermal infrared radiation. Another type of sensor may be ultrasonic gesture detection sensors. Sensor software options include off-the-shelf gesture recognition tools, such as software for detecting hands using object segmentation and/or optic flow. Other options include spectral imaging software for detecting skin tones, pseudo-thermal imaging, and 3D depth cameras using time-of-flight.
The user begins by moving a hand behind the device (hereafter, for ease of illustration, the term “device” may convey mobile device screens and laptop/desktop monitors). At step 402 a tracking component detects the presence of the user's hand. There are various ways this can be done. One conventional way is by detecting the skin tone of the user's hand. As described above, there are numerous types of tracking components or sensors that may be used. Which one that is most suitable will likely depend on the features and capabilities of the device (i.e., mobile, nomadic, stationary, etc.). A typical cell phone camera is capable of detecting the presence of a human hand. An image of the hand (or hands) is transmitted to a compositing component.
At step 404 the hand is displayed on the screen. The user sees either an unaltered view of her hand (not including the background behind and around the hand) or an altered representation of the hand. If an image of the user's hand is displayed, known compositing techniques may be used. For example, some techniques may involve combining two video sources-one for the 3D content and another representing video images of the user's hand. Other techniques for overlaying or compositing the images of the hand over the 3D content data may be used and which technique is most suitable will likely depend on the type of device. If the user's hand is mapped to an avatar hand or other digital representation, software from the 3D content provider or other conventional software may be used to perform a mapping of the user hand images to an avatar image, such as a robotic hand. Thus, after step 404, a representation of a stationary user's hand can be seen on the device. That is, its presence has been detected and is being represented on the device.
At step 406 the user starts moving the hand, either by moving it up, down, left, right, or inward or outward (relative to the device) or by gesturing (or both). The initial position of the hand and its subsequent movement can be described in terms of x, y, and z coordinates. The tracking component begins tracking hand movement and gesturing, which has horizontal, vertical, and depth components. For example, a user may be viewing a 3D virtual world room on the device and wants to move an object that is in the far left corner of the room (which has a certain depth) and to the near right corner of the room. In one embodiment of the invention, the user may have to move his hand to a position that is, for example, about 12 inches behind and slightly left of the device. This may require that the user extend her arm out a little further than what would be considered a normal or natural distance. After grabbing the object, as discussed in step 408 below, the user moves her hand to a position that is maybe 2-3 inches behind and to the right of the device. This example illustrates that there is a depth component in the hand tracking that is implemented to maintain the in-line mediation performed by the device.
At step 408 the digital representation of the user's hand on the device collides or touches an object. This collision is detected by comparing sensor data from the tracking sensor and geometrical data from the 3D data repository. The user moves her hand behind the device in a way that causes the digital representation of her hand on the screen to collide with the object, at which point she can grab, pick up, or otherwise manipulate the object. The user's hand may be characterized as colliding with the perceived object that is “floating” behind the device, as described in
In one embodiment, an “input-output coincidence” model is used to close a human-computer interaction feature referred to as a perception-action loop, where perception is what the user sees and action is what the user does. This enables a user to see the consequences of an interaction, such as touching a 3-D object, immediately. As described above, a user's hand is aligned with or in the same position as the 3-D object that is being manipulated. That is, from the user's perspective, the hand is aligned with the 3-D object so that it looks like the user is lifting or moving a 3-D object as if it were a physical object. What the user sees makes sense based on the action being taken by the user. In one embodiment, the system provides tactile feedback to the user upon detecting a collision between the user's hand and the 3-D object.
At step 410 the image of the 3D scene is modified to reflect the user's manipulation of the 3D object. If there is no manipulation of a 3D object (and thus no object collision), the image on the screen changes as the user moves her hand, as it does when the user manipulates a 3D object. The changes in the 3D image on the screen may be done using known methods for processing 3D content data. These methods or techniques may vary depending on the type of device, the source of the data, and other factors. The process then repeats by returning to step 402 where the presence of the user's hand is again detected. The process described in
Hand tracking module 504 identifies features of the user's hand positions, including the positions of the fingers, wrist, and arm. It determines the location of these body parts in the 3D environment. Data from module 504 goes to two components related to hand and arm position: gesture detection module 508 and hand collision detection module 510. In one embodiment, a user “gesture” results in a modification of 3D content 501. A gesture may include lifting, holding, squeezing, pinching, or rotating a 3D object. These actions typically result in some type of modification of the object in the 3D environment. A modification of an object may include a change in its location (lifting or turning) without there being an actual deformation or change in shape of the object. The gesture detection data may be applied directly to the graphics data representing 3D content 501.
In another embodiment, tracking sensor component 502 may also track the user's face. In this case, face tracking data is transmitted to face tracking module 506. Face tracking may be utilized in cases where the user is not vertically aligned (i.e., the user's head is not looking directly at the middle of the screen) with the device and the perceived object.
In another embodiment, data from hand collision detection module 510 may be transmitted to a tactile feedback controller 512, which is connected to one or more actuators 514 which are external to device 500. In this embodiment, the user may receive haptic feedback when the user's hand collides with a 3D object. Generally, it is preferred that actuators 514 be as unobtrusive as possible. In one embodiment, they are vibrating wristbands, which may be wired or wireless. Using wristbands allows for bare hand manipulation of 3D content as described above. Tactile feedback controller 512 receives a signal that there is a collision or contact and causes tactile actuators 514 to provide a physical sensation to the user. For example, with vibrating wristbands, the user's wrist will sense a vibration or similar physical sensation indicating contact with the 3-D object.
As is evident from the figures and the various embodiments, the present invention enables a user to interact with digital 3D content in a natural and immersive way by enabling visual coherency, thereby creating an immersive volumetric interaction with the 3D content. In one embodiment, a user uploads or executes 3D content onto a mobile computing device, such as a cell phone. This 3D content may be a virtual world that the user has visited using a browser on the mobile device (e.g., Second Life or any other site that provides virtual world content). Other examples include movies, video games, online virtual cities, medical imaging (e.g., examining MRIs), modeling and prototyping, information visualization, architecture, tele-immersion and collaboration, and geographic information systems (e.g., Google Earth). The user holds the display of the device upright at a comfortable distance in front of the user's eyes, for example at 20-30 centimeters. The display of the mobile device is used as a window into the virtual world. Using the mobile device as an in-line mediator between the user and the user's hand, the user is able to manipulate 3D objects shown on the display by reaching behind the display of the device and make hand gestures and movements around a perceived object behind the display. The user sees the gestures and movements on the display and the 3D object that they are affecting.
As discussed above, one aspect of creating an immersive and natural user interaction with 3D content using a mobile device is enabling the user to have bare-hand interaction with objects in the virtual world. That is, allowing the user to manipulate and “touch” digital 3D objects using the mobile device and not requiring the user to use any peripheral devices, such as gloves, finger sensors, motion detectors, and the like.
CPU 622 is also coupled to a variety of input/output devices such as display 604, keyboard 610, mouse 612 and speakers 630. In general, an input/output device may be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, or other computers. CPU 622 optionally may be coupled to another computer or telecommunications network using network interface 640. With such a network interface, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the above-described method steps. Furthermore, method embodiments of the present invention may execute solely upon CPU 622 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing.
Although illustrative embodiments and applications of this invention are shown and described herein, many variations and modifications are possible which remain within the concept, scope, and spirit of the invention, and these variations would become clear to those of ordinary skill in the art after perusal of this application. Accordingly, the embodiments described are illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims.
This application claims priority of U.S. provisional patent application No. 60/093,651 filed Sep. 2, 2008 entitled “GESTURE AND MOTION-BASED NAVIGATION AND INTERACTION WITH THREE-DIMENSIONAL VIRTUAL CONTENT ON A MOBILE DEVICE,” of which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61093651 | Sep 2008 | US |