In an augmented reality system, a user's view of the real world is enhanced with virtual computer-generated graphics. These graphics are spatially registered so that they appear aligned with the real world from the perspective of the viewing user. For example, the spatial registration can make a virtual character appear to be standing on a real table.
Augmented reality systems have previously been implemented using head-mounted displays that are worn by the users. A video camera captures images of the real world in the direction of the user's gaze, and augments the images with virtual graphics before displaying the augmented images on the head-mounted display. Alternative augmented reality display techniques exploit large spatially aligned optical elements, such as transparent screens, holograms, or video-projectors to combine the virtual graphics with the real world.
For each of the above augmented reality display techniques, there is a problem of how the user interacts with the augmented reality scene that is displayed. Where interaction is enabled, it has previously been implemented using indirect interaction devices, such as a mouse or stylus that can monitor the movements of the user in six degrees of freedom to control an on-screen object. However, when using such interaction devices the user feels detached from the augmented reality environment, rather than feeling that they are part of (or within) the augmented reality environment.
Furthermore, because the graphics displayed in the augmented reality environment are virtual, the user is not able to sense when they are interacting with the virtual objects. In other words, no haptic feedback is provided to the user when interacting with a virtual object. This results in a lack of a spatial frame of reference, and makes it difficult for the user to accurately manipulate virtual objects or activate virtual controls. This effect is accentuated in a three-dimensional augmented reality system, where the user may find it difficult to accurately judge the depth of a virtual object in the augmented reality scene.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known augmented reality systems.
The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
Techniques for user-interaction in augmented reality are described. In one example, a direct user-interaction method comprises displaying a 3D augmented reality environment having a virtual object and a real first and second object controlled by a user, tracking the position of the objects in 3D using camera images, displaying the virtual object on the first object from the user's viewpoint, and enabling interaction between the second object and the virtual object when the first and second objects are touching. In another example, an augmented reality system comprises a display device that shows an augmented reality environment having a virtual object and a real user's hand, a depth camera that captures depth images of the hand, and a processor. The processor receives the images, tracks the hand pose in six degrees-of-freedom, and enables interaction between the hand and the virtual object.
Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
Like reference numerals are used to designate like parts in the accompanying drawings.
The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
Although the present examples are described and illustrated herein as being implemented in a desktop augmented reality system, the system described is provided as an example and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of augmented reality systems.
Described herein is an augmented reality system and method that enables a user to interact with the virtual computer-generated graphics using direct interaction. The term “direct interaction” is used herein to mean an environment in which the user's touch or gestures directly manipulates a user interface (i.e. the graphics in the augmented reality). In the context of a regular two-dimensional computing user interface, a direct interaction technique can be achieved through the use of a touch-sensitive display screen. This is distinguished from an “indirect interaction” environment where the user manipulates a device that is remote from the user interface, such as a computer mouse device.
Note that in the context of the augmented reality system, the term “direct interaction” also covers the scenario in which a user manipulates an object (such as a tool, pen, or any other object) within (i.e. not remote from) the augmented reality environment to interact with the graphics in the environment. This is analogous to using a stylus to operate a touch-screen in a 2D environment, which is still considered to be direct interaction.
An augmented reality system is a three-dimensional system, and the direct interaction also operates in 3D. Reference is first made to
A camera 106 is arranged to capture images of one or more real objects controlled or manipulated by the user. The objects can be, for example, body parts of the user. For example, the camera 106 can capture images of at least one hand 108 of the user. In other examples, the camera 106 may also captures images comprising one or more forearms. The images of the hand 108 comprise the fingertips and palm of the hand. In a further example, the camera 106 can capture images of a real object held in the hand of the user.
In one example, the camera 106 is a depth camera (also known as a z-camera), which generates both intensity/color values and a depth value (i.e. distance from the camera 106) for each pixel in the images captured by the camera. The depth camera can be in the form of a time-of-flight camera, stereo camera or a regular camera combined with a structured light emitter. The use of a depth camera enables three-dimensional information about the position, pose, movement, size and orientation of the real objects to be determined. In some examples, a plurality of depth cameras can be located at different positions, in order to avoid occlusion when multiple objects are present, and enable accurate tracking to be maintained.
In other examples, a regular 2D camera can be used to track the 2D position, posture and/or movement of the user-controlled real objects, in the two dimensions visible to the camera. A plurality of regular 2D cameras can be used, e.g. at different positions, to derive 3D information on the real objects.
The camera provides the captured images of the user-controlled real objects to a computing device 110. The computing device 110 is arranged to use the captured images to track the real objects, and generate the augmented reality environment 102, as described in more detail below. Details on the structure of the computing device are discussed with reference to
The above-described augmented reality system of
The computing device 110 uses the information on the position and pose of the real objects to control interaction between the real objects and the one or more virtual objects 112. The computing device 110 uses the tracked position of the objects in the real world, and translates this to a position in the augmented reality environment. The computing device 110 then inserts an object representation that has substantially the same pose as the real object into the augmented reality environment at the translated location. The object representation is spatially aligned with the view of the real object that the user can see on the display device 104, and the object representation may or may not be visible to the user on the display device 104. The object representation can, in one example, be a computer-derived virtual representation of a body part or other object, or, in another example, is a mesh or point-cloud object directly derived from the camera 106 images. As the user moves the real object, the object representation moves in a corresponding manner in the augmented reality environment 102.
As the computing device 110 also knows the location of the virtual objects 112, it can determine whether the object representation is coincident with the virtual objects 112 in the augmented reality environment, and determine the resulting interaction. For example, the user can move his or her hand 108 underneath virtual object 112 to scoop it up in the palm of their hand, and move it from one location to another. The augmented reality system is arranged so that it appears to the user that the virtual object 112 is responding directly to the user's own hand 108. Many other types of interaction with the virtual objects (in addition to scooping and moving) are also possible. For example, the augmented reality system can implement a physics simulation-based interaction environment, which models forces (such as impulses, gravity and friction) imparted/acting on and between the real and virtual objects. This enables the user to push, pull, lift, grasp and drop the virtual objects, and generally manipulate the virtual objects as if they were real.
However, in the direct-interaction augmented reality system of
The flowchart of
Images are received 202 from the camera 106 at the computing device 110. The images show a first and second object controlled by the user 100. The first object is used as an interaction proxy and frame of reference, as described below, and the second object is used by the user to directly interact with a virtual object. For example, the first object can be a non-dominant hand of the user 100 (e.g. the user's left hand if they are right-handed, or vice versa) and the second object can be the dominant hand of the user 100 (e.g. the user's right hand if they are right-handed, or vice versa). In other examples, the first object can be an object held by the user, a forearm, a palm of either hand, and/or a fingertip of either hand, and the second object can be a digit of the user's dominant hand.
The images from the camera 106 are then analyzed by the computing device 110 to track 204 the position, movement, pose, size and/or shape of the first and second objects controlled by the user. If a depth camera is used, then the movement and position in 3D can be determined, as well as an accurate size.
Once the position and orientation of the first and second object has been determined by the computing device 110, an equivalent, corresponding position and orientation is calculated in the augmented reality environment. In other words, the computing device 110 determines where in the augmented reality environment the real objects are located given that, from the user's perspective, the real objects occupy the same space as the virtual objects in the augmented reality environment. This corresponding position and orientation in the virtual scene can be used to control direct interaction between the real objects and the virtual objects.
Once the corresponding position and orientation of the objects has been calculated for the augmented reality environment, the computing device 110 can use this information to update the augmented reality environment to display spatially aligned graphics (this utilizes information on the users gaze or head position, as outlined below with reference to
The user 100 can then interact with the virtual object rendered relative to the first object using the second object, and the computing device 110 uses the tracked locations of the objects such that interaction is triggered 208 when the first and second objects are in contact. In other words, when a virtual object is rendered onto or around the first object (e.g. the user's non-dominant hand), then the user can interact with the virtual object when the second object (e.g. the user's dominant hand) is touching the first object. To achieve this, the computing device 110 can use the information regarding the position and orientation of the first object to generate a virtual “touch plane”, which is coincident with a surface of the first object, and determine from the position of the second object that the second object and the touch plane converge. Responsive to determining that the second object and the touch plane converge, the interaction can be triggered.
In a further example, the virtual object is not rendered on top of the first object, but is instead rendered at a fixed location. In this example, to interact with the virtual object, the user moves the first object to be coincident with the virtual object, and can then interact with the virtual object using the second object.
The result of this is that the user is using the first object as a frame of reference for where in the augmented reality environment the virtual object is located. A user can intuitively reach for a part of their own body, as they have an inherence awareness of where their limbs are located in space. In addition, this also provides haptic feedback, as the user can feel the contact between the objects, and hence knows that interaction with the virtual object is occurring. Because the virtual object maintains the spatial relationship with first object, this stays true even if the user's objects are not held at a constant location, thereby reducing mental and physical fatigue on the user.
Reference is now made to
The user 100 can then use a digit of the dominant hand 300 to actuate the first button 304 or second button 306 by touching the palm of the non-dominant hand 302 at the location of the first button 304 or second button 306, respectively. The user 100 can feel when they touch their own palm, and the computing device 110 uses the tracking of the objects to ensure that the actuation of the button occurs when the dominant and non-dominant hands make contact.
Note that in other examples, the virtual object can be in the form of different types of controls can be rendered, such as menu items, toggles, icons, or any other type of user-actuatable control. In further examples, the controls can be rendered elsewhere on the user's body, such as along the forearm of the non-dominant hand.
By manipulating the virtual object 112 directly in the palm of the non-dominant hand 302, the manipulations are more accurate as the user has a reference plane on which to perform movements. Without such a reference plane, the user's dominant hand makes the movements in mid-air, which is much more difficult to control precisely. Haptic feedback is also provided as the user can feel the contact between the dominant and non-dominant hands.
In another example, rather than touching the palm with a fingertip, the user 100 can touch two fingertips together to activate a control. For example, the thumb of hand 302 can act as an activation digit, and whenever the thumb is touched to one of the other fingertips, the associated control is activated. For example, the user 100 can bring the fingertips of the thumb and first finger together to paste a virtual object into the palm of hand 302.
The above-described examples all provide haptic feedback to the user by using one object as an interaction proxy for interaction between another object and a virtual object (in the form of an object to be manipulated or a control). These examples can be used in isolation or combined in any way.
Reference is now made to
Firstly, the computing device 110 (or a processor within the computing device 110) generates and displays 600 the 3D augmented reality environment 102 that the user 100 is to interact with, in a similar manner to that described above. The augmented reality environment 102 can be any type of 3D scene with which the user can interact.
Depth images showing at least one of the user's hands are received 602 from depth camera 106 at the computing device 110. The depth images are then used by the computing device 110 to track 604 the position and pose of the hand of the user in six degrees-of-freedom (6DOF). In other words, the depth images are used to determine not only the position of the hand in three dimensions, but also its orientation in terms of pitch, yaw and roll.
The pose of the hand in 6DOF is monitored 606 to detect a predefined gesture. For example, the pose of the hand can be compared to a library of predefined poses by the computing device 110, wherein each predefined pose corresponds to a gesture. If the pose of the hand is sufficiently close to a predefined pose in the library, then the corresponding gesture is detected. Upon detecting a given gesture, an associated interaction is triggered 608 between the hand of the user and a virtual object.
The detection of gestures enables rich, complex interactions to be used in the direct touch augmented reality environment. Examples, of such interactions are illustrated with reference to
In further embodiments, a more “freeform” interaction technique can also be used, which does not utilize discrete gestures such as the pinch gesture illustrated with reference to
In the example of
A further example of a gesture-based interaction technique using the mechanism of
Reference is now made to
The optical beam-splitter 904 is positioned in the augmented reality system 900 so that, when viewed by the user 100, it reflects light from a display screen 906 and transmits light from the user-interaction region 902. The display screen 906 is arranged to display the augmented reality environment under the control of the computing device 110. Therefore, the user 100 looking at the surface of the optical beam-splitter 904 can see the reflection of the augmented reality environment displayed on the display screen 906, and also their hand 108 in the user-interaction region 802 at the same time. View-controlling materials, such as privacy film, can be used on the display screen 906 to prevent the user from seeing the original image directly on screen. Together, the display screen 906 and the optical beam-splitter form the display device 104 referred to above.
The relative arrangement of the user-interaction region 902, optical beam-splitter 904, and display screen 906 therefore enables the user 100 to concurrently view both a reflection of a computer generated image (the augmented reality environment) from the display screen 906 and the hand 108 located in the user-interaction region 902. Therefore, by controlling the graphics displayed in the reflected augmented reality environment, the user's view of their own hand in the user-interaction region 902 can be augmented.
Note that in other examples, different types of display can be used. For example, a transparent OLED panel can be used, which can display the augmented reality environment, but is also transparent. Such an OLED panel enables the augmented reality system to be implemented without the use of an optical beam splitter.
The augmented reality system 900 also comprises the camera 106, which captures images in the user interaction region 902, to allow the tracking of the real objects, as described above. In order to further improve the spatial registration of the augmented reality environment with the user's hand 108, a further camera 908 can be used to track the face, head or eye position of the user 100. Using head or face tracking enables perspective correction to be performed, so that the graphics are accurately aligned with the real objects. The camera 908 shown in
This augmented reality system can utilize the interaction techniques described above to provide improved direct interaction between the user 100 and the virtual objects rendered in the augmented reality environment. The user's own hands (or other body parts or held objects) are visible through the optical beam splitter 904, and by visually aligning the augmented reality environment 102 and the user's hand 108 (using camera 908) it can appear to the user 100 that their real hands are directly manipulating the virtual objects. Virtual objects and controls can be rendered so that they appear superimposed on the user's hands and move with the hands, enabling the haptic feedback technique, and the camera 106 enables the pose of the hands to be tracked and gestures recognized.
Reference is now made to
Computing device 110 comprises one or more processors 1002 which may be microprocessors, controllers or any other suitable type of processor for processing computer executable instructions to control the operation of the device in order to implement the augmented reality direct interaction techniques.
The computing device 110 also comprises an input interface 1004 arranged to receive and process input from one or more devices, such as the camera 106. The computing device 110 further comprises an output interface 1006 arranged to output the augmented reality environment 102 to display device 104.
The computing device 110 also comprises a communication interface 1008, which can be arranged to communicate with one or more communication networks. For example, the communication interface 1008 can connect the computing device 110 to a network (e.g. the internet). The communication interface 1008 can enable the computing device 110 to communicate with other network elements to store and retrieve data.
Computer-executable instructions and data storage can be provided using any computer-readable media that is accessible by computing device 110. Computer-readable media may include, for example, computer storage media such as memory 1010 and communications media. Computer storage media, such as memory 1010, includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transport mechanism. Although the computer storage media (such as memory 1010) is shown within the computing device 110 it will be appreciated that the storage may be distributed or located remotely and accessed via a network or other communication link (e.g. using communication interface 1008).
Platform software comprising an operating system 1012 or any other suitable platform software may be provided at the memory 1010 of the computing device 110 to enable application software 1014 to be executed on the device. The memory 1010 can store executable instructions to implement the functionality of a 3D augmented reality environment rendering engine 1016, object tracking engine 1018, haptic feedback engine 1020 (arranged to triggering interaction when body parts are in contact), gesture recognition engine 1022 (arranged to use the depth images to recognize gestures), as described above, when executed on the processor 1002. The memory 1010 can also provide a data store 1024, which can be used to provide storage for data used by the processor 1002 when controlling the interaction in the 3D augmented reality environment.
The term ‘computer’ is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term ‘computer’ includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
The methods described herein may be performed by software in machine readable form on a tangible storage medium e.g. in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and where the computer program may be embodied on a computer readable medium. Examples of tangible (or non-transitory) storage media include disks, thumb drives, memory etc and do not include propagated signals. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
This acknowledges that software can be a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls “dumb” or standard hardware, to carry out the desired functions. It is also intended to encompass software which “describes” or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.
Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.
Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to ‘an’ item refers to one or more of those items.
The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.
The term ‘comprising’ is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements.
It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.