Traditionally, user interaction with a computer has been by way of a keyboard and mouse. Tablet PCs have been developed which enable user input using a stylus, and touch sensitive screens have also been produced to enable a user to interact more directly by touching the screen (e.g. to press a soft button). However, the use of a stylus or touch screen has generally been limited to detection of a single touch point at any one time.
Recently, surface computers have been developed which enable a user to interact directly with digital content displayed on the computer using multiple fingers. Such a multi-touch input on the display of a computer provides a user with an intuitive user interface. An approach to multi-touch detection is to use a camera either above or below the display surface and to use computer vision algorithms to process the captured images.
Multi-touch capable interactive surfaces are a prospective platform for direct manipulation of 3D virtual worlds. The ability to sense multiple fingertips at once enables an extension of the degrees-of-freedom available for object manipulation. For example, while a single finger could be used to directly control the 2D position of an object, the position and relative motion of two or more fingers can be heuristically interpreted in order to determine the height (or other properties) of the object in relation to a virtual floor. However, techniques such as this can be cumbersome and complicated for the user to learn and perform accurately, as the mapping between finger movement and the object is an indirect one.
The embodiments described below are not limited to implementations which solve any or all of the disadvantages of known surface computing devices.
The following presents a simplified summary of the disclosure in order to provide a basic understanding to the reader. This summary is not an extensive overview of the disclosure and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.
Surface computer user interaction is described. In an embodiment, an image of a user's hand interacting with a user interface displayed on a surface layer of a surface computing device is captured. The image is used to render a corresponding representation of the hand. The representation is displayed in the user interface such that the representation is geometrically aligned with the user's hand. In embodiments, the representation is a representation of a shadow or a reflection. The process is performed in real-time, such that movement of the hand causes the representation to correspondingly move. In some embodiments, a separation distance between the hand and the surface is determined and used to control the display of an object rendered in a 3D environment on the surface layer. In some embodiments, at least one parameter relating to the appearance of the object is modified in dependence on the separation distance.
Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein:
Like reference numerals are used to designate like parts in the accompanying drawings.
The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present example may be constructed or utilized. The description sets forth the functions of the example and the sequence of steps for constructing and operating the example. However, the same or equivalent functions and sequences may be accomplished by different examples.
Although the present examples are described and illustrated herein as being implemented in a surface computing system, the system described is provided as an example and not a limitation. As those skilled in the art will appreciate, the present examples are suitable for application in a variety of different types of touch-based computing systems.
The term ‘surface computing device’ is used herein to refer to a computing device which comprises a surface which is used both to display a graphical user interface and to detect input to the computing device. The surface can be planar or can be non-planar (e.g. curved or spherical) and can be rigid or flexible. The input to the surface computing device can, for example, be through a user touching the surface or through use of an object (e.g. object detection or stylus input). Any touch detection or object detection technique used can enable detection of single contact points or can enable multi-touch input. Also note that, whilst in the following description the example of a horizontal surface is used, the surface can be in any orientation. Therefore, a reference to a ‘height above’ a horizontal surface (or similar) refers to a substantially perpendicular separation distance from the surface.
The surface computing device 100 comprises a surface layer 101. The surface layer 101 can, for example, be embedded horizontally in a table. In the example of
The surface computing device 100 further comprises a display device 105, an image capture device 106, and a touch detection device 107. The surface computing device 100 also comprises one or more light sources 108 (or illuminants) arranged to illuminate objects above the surface layer 101.
In this example, the display device 105 comprises a projector. The projector can be any suitable type of projector, such as an LCD, liquid crystal on silicon (LCOS), Digital Light Processing (DLP) or laser projector. In addition, the projector can be fixed or steerable. Note that, in some examples, the projector can also act as the light source for illuminating objects above the surface layer 101 (in which case the light sources 108 can be omitted).
The image capture device 106 comprises a camera or other optical sensor (or array of sensors). The type of light source 108 corresponds to the type of image capture device 106. For example, if the image capture device 106 is an IR camera (or a camera with an IR-pass filter), then the light sources 108 are IR light sources. Alternatively, if the image capture device 106 is a visible light camera, then the light sources 108 are visible light sources.
Similarly, in this example, the touch detection device 107 comprises a camera or other optical sensor (or array of sensors). The type of touch detection device 107 corresponds with the edge-illumination of the transparent pane 103. For example, if the transparent pane 103 is edge-lit with one or more IR LEDs, then the touch detection device 107 comprises an IR camera, or a camera with an IR-pass filter.
In the example shown in
In use, the surface computing device 100 operates in one of two modes: a ‘projection mode’ when the switchable diffuser 102 is in its diffuse state and an ‘image capture mode’ when the switchable diffuser 102 is in its transparent state. If the switchable diffuser 102 is switched between states at a rate which exceeds the threshold for flicker perception, anyone viewing the surface computing device sees a stable digital image projected on the surface.
The terms ‘diffuse state’ and ‘transparent state’ refer to the surface being substantially diffusing and substantially transparent, with the diffusivity of the surface being substantially higher in the diffuse state than in the transparent state. Note that in the transparent state the surface is not necessarily totally transparent and in the diffuse state the surface is not necessarily totally diffuse. Furthermore, in some examples, only an area of the surface can be switched (or can be switchable).
With the switchable diffuser 102 in its diffuse state, the display device 105 projects a digital image onto the surface layer 101. This digital image can comprise a graphical user interface (GUI) for the surface computing device 100 or any other digital image.
When the switchable diffuser 102 is switched into its transparent state, an image can be captured through the surface layer 101 by the image capture device 106. For example, an image of a user's hand 109 can be captured, even when the hand 109 is at a height ‘h’ above the surface layer 101. The light sources 108 illuminate objects (such as the hand 109) above the surface layer 101 when the switchable diffuser 102 is in its transparent state, so that the image can be captured. The captured image can be utilized to enhance user interaction with the surface computing device, as outlined in more detail hereinafter. The switching process can be repeated at a rate greater than the human flicker perception threshold.
In either the transparent or diffuse states, when a finger is pressed against the top surface of the transparent pane 103, it causes the TIR light to be scattered. The scattered light passes through the rear surface of the transparent pane 103 and can be detected by the touch detection device 107 located behind the transparent pane 103. This process is known as frustrated total internal reflection (FTIR). The detection of the scattered light by the touch detection device 107 enables touch events on the surface layer 101 to be detected and processed using computer-vision techniques, so that a user of the device can interact with the surface computing device. Note that in alterative examples, the image capture device 106 can be used to detect touch events, and the touch detection device 107 omitted.
The surface computing device 100 described with reference to
Referring to
During the time instances when the switchable diffuser 102 is in the transparent state, the image capture device 106 is used to capture 201 images through the surface layer 101. These images can show one or more hands of one or more users above the surface layer 101. Note that fingers, hands or other objects that are in contact with the surface layer can be detected by the FTIR process and the touch detection device 107, which enables discrimination between objects touching the surface, and those above the surface.
The captured images can be analyzed using computer vision techniques to determine the position 202 of the user's hand (or hands). A copy of the raw captured image can be converted to a black and white image using a pixel value threshold to determine which pixels are black and which are white. A connected component analysis can then be performed on the black and white image. The result of the connected component analysis is that connected areas that contain reflective objects (i.e. connected white blocks) are labeled as foreground objects. In this example, the foreground object is the hand of a user.
The planar location of the hand relative to the surface layer 101 (i.e. the x and y coordinates of the hand in the plane parallel to the surface layer 101 ) can be determined simply from the location of the hands in the image. In order to estimate the height of the hand above the surface layer (i.e. the hand's z-coordinate or the separation distance between the hand and the surface layer), several different techniques can be used.
In a first example, a combination of the black and white image and the raw captured image can be used to estimate the hand's height above the surface layer 101. The location of the ‘center of mass’ of the hand is found by determining the central point of the white connected component in the black and white image. The location of the center of mass is then recorded, and the equivalent location in the raw captured image is analyzed. The average pixel intensity (e.g. the average grey-level value if the original raw image is a grayscale image) is determined for a predetermined region around the center of mass location. The average pixel intensity can then be used to estimate the height of the hand above the surface. The pixel intensity that would be expected for a certain distance from the light sources 108 can be estimated, and this information can be used to calculate the height of the hand.
In a second example, the image capture device 106 can be a 3D camera capable of determining depth information for the captured image. This can be achieved by using a 3D time-of-flight camera to determine depth information along with the captured image. This can use any suitable technology for determining depth information, such as optical, ultrasonic, radio or acoustic signals. Alternatively, a stereo camera or pair of cameras can be used for the image capture device 106, which capture the image from different angles, and allow depth information to be calculated. Therefore, the image captured during the switchable diffuser's transparent state using such an image capture device enables the height of the hand above the surface layer to be determined.
In a third example, a structured light pattern can be projected onto the user's hand when the image is captured. If a known light pattern is used, then the distortion of the light pattern in the captured image can be used to calculate the height of the user's hand. The light pattern can, for example, be in the form of a grid or checkerboard pattern. The structured light pattern can be provided by the light source 108, or alternatively by the display device 105 in the case that a projector is used.
In a fourth example, the size of the user's hand can be used to determine the separation between the user's hand and the surface layer. This can be achieved by the surface computing device detecting a touch event by the user (using the touch detection device 107), which therefore indicates that the user's hand is (at least partly) in contact with the surface layer. Responsive to this, an image of the user's hand is captured. From this image, the size of the hand can be determined. The size of the user's hand can then be compared to subsequent captured images to determine the separation between the hand and the surface layer, as the hand appears smaller the further from the surface layer it is.
In addition to determining the height and location of the user's hand, the surface computing device is also arranged to use the images captured by the image capture device 106 to detect 203 selection of an object by the user for 3D manipulation. The surface computing device is arranged to detect a particular gesture by the user that indicates that an object is to be manipulated in 3D (e.g. in the z-direction). An example of such a gesture is the detection of a ‘pinch’ gesture.
Whenever the thumb and index finger of one hand approach each other and ultimately make contact, a small, ellipsoid area is cut out from the background. This therefore leads to the creation of a small, new connected component in the image, which can be detected using connected component analysis. This morphological change in the image can be interpreted as the trigger for a ‘pick-up’ event in the 3D environment. For example, the appearance of a new, small connected component within the area of a previously detected, bigger component triggers a pick-up of an object in the 3D environment that is located at the location of the user's hand (i.e. at the point of the pinch gesture). Similarly, the disappearance of the new connected component triggers a drop-off event.
In alternative examples, different gestures can be detected and used to trigger 3D manipulation events. For example, a grab or scoop gesture of the user's hand can be detected.
Note that the surface computing device is arranged to periodically detect gestures and to determine the height and location of the user's hand, and these operations are not necessarily performed in sequence, but can be performed concurrently or in any order.
When a gesture is detected and triggers a 3D manipulation event for a particular object in the 3D environment, the position of the object is updated 204 in accordance with the position of the hand above the surface layer. The height of the object in the 3D environment can be controlled directly, such that the separation between the user's hand and the surface layer 101 is directly mapped to the height of the virtual object from a virtual ground plane. As the user's hand is moved above the surface layer, so the picked-up object correspondingly moves. Objects can be dropped off at a different location when users let go of the detected gesture.
This technique enables the intuitive operation of interactions with 3D objects on surface computing devices that were difficult or impossible to perform when only touch-based interactions could be detected. For example, users can stack objects on top of each other in order to organize and store digital information. Objects can also be put into other virtual objects for storage. For example, a virtual three-dimensional card box can hold digital documents which can be moved in and out of this container by this technique.
Other, more complex interactions can be performed, such as assembly of complex 3D models from constituting parts, e.g. with applications in the architectural domain. The behavior of the virtual objects can also be augmented with a gaming physics simulation, for example to enable interactions such as folding soft, paper like objects or leafing through the pages of a book more akin to the way users perform this in the real world. This technique can be used to control objects in a game such as a 3D maze where the player moves a game piece from the starting position at the bottom of the level to the target position at the top of the level. Furthermore, medical applications can be enriched by this technique as volumetric data can be positioned, oriented and/or modified in a manner similar to interactions with the real body.
Furthermore, in traditional GUIs, fine control of object layering often involves dedicated, often abstract UI elements such as a layer palette (e.g. Adobe™ Photoshop™) or context menu elements (e.g. Microsoft™ Powerpoint™). The above-described technique allows for a more literal layering control. Objects representing documents or photographs can be stacked on top of each other in piles and selectively removed as desired.
However, when interacting with virtual objects using the above-described technique a cognitive disconnect on the part of the user can occur because the image of the object shown on the surface layer 101 is two-dimensional. Once the user lifts his hand off the surface layer 101 the object under control is not in direct contact with the hand anymore which can cause the user to be disoriented and gives rise to an additional cognitive load, especially when fine-grained control over the object's position and height is preferred for the task at hand. To counteract this one or more of the rendering techniques described below can be used to compensate for the cognitive disconnect and provide the user with the perception of a direct interaction with the 3D environment on the surface computing device.
Firstly, to address the cognitive disconnect, a rendering technique is used to increase the perceived connection between the user's hand and virtual object. This is achieved by using the captured image of the user's hand (captured by the image capture device 106 as discussed above) to render 205 a representation of the user's hand in the 3D environment. The representation of the user's hand in the 3D environment is geometrically aligned with the user's real hands, so that the user immediately associates his own hands with the representations. By rendering a representation of the hand in the 3D environment, the user does not perceive a disconnection, despite the hand being above, and not in contact with, the surface layer 101. The presence of a representation of the hand also enables the user to more accurately position his hands when they are being moved above the surface layer 101.
In one example, the representation of the user's hand that is used is in the form of a representation of a shadow of the hand. This is a natural and instantly understood representation, and the user immediately connects this with the impression that the surface computing device is brightly lit from above. This is shown illustrated in
The shadow representations can be rendered by using the captured image of the user's hand discussed above. As stated above, the black and white image that is generated contains the image of the user's hand in white (as the foreground connected component). The image can be inverted, such that the hand is now shown in black, and the background in white. The background can then be made transparent to leave the black ‘silhouette’ of the user's hand.
The image comprising the user's hand can be inserted into the 3D scene in every frame (and updated as new images are captured). Preferably, the image is inserted into the 3D scene before lighting calculations are performed in the 3D environment, such that within the lighting calculation the image of the user's hand casts a virtual shadow into the 3D scene that is correctly aligned with the objects present. Because the representations are generated from the image captured of the user's hand, they accurately reflect the geometric position of the user's hand above the surface layer, i.e. they are aligned with the planar position of the user's hand at the time instance that the image was captured. The generation of the shadow representation is preferably performed on a graphics processing unit (GPU). The shadow rendering is performed in real-time, in order to provide the perception that it is the user's real hands that are casting the virtual shadow, and so that that the shadow representations move in unison with the user's hands.
The rendering of the representation of the shadow can also optionally utilize the determination of the separation between the user's hand and the surface layer. For example, the rendering of the shadows can cause the shadows to become more transparent or dim as the height of the user's hands above the surface layer increases. This is shown illustrated in
In an alternative example, instead of rendering representations of a shadow of the user's hand, representations of a reflection of the user's hand can be rendered. In this example, the user has the perception that he is able to see a reflection of his hands on the surface layer. This is therefore another instantly understood representation. The process for rendering a reflection representation is similar to that of the shadow representation. However, in order to be able to provide a color reflection, the light sources 108 produce visible light, and the image capture device 106 captures a color image of the user's hand above the surface layer. A similar connected component analysis is performed to locate the user's hand in the captured image, and the located hand can then be extracted from the color captured image and rendered on the display beneath the user's hand.
In a further alternative example, the rendered representation can be in the form of a 3D model of a hand in the 3D environment. The captured image of the user's hand can be analyzed using computer vision techniques, such that the orientation (e.g. in terms of pitch, yaw and roll) of the hand is determined, and the position of the digits analyzed. A 3D model of a hand can then be generated to match this orientation and provided with matching digit positions. The 3D model of the hand can be modeled using geometric primitives that are animated based on the movement of the user's limbs and joints. In this way, a virtual representation of the users hand can be introduced into the 3D scene and is able to directly interact with the other virtual objects in the 3D environments. Because such a 3D hand model exists within the 3D environment (as opposed to being rendered on it), the users can interact more directly with the objects, for example by controlling the 3D hand model to exert forces onto the sides of an object and hence pick it up through simple grasping.
In a yet further example, as an alternative to generating a 3D articulated hand model, a particle system-based approach can be used. In this example, instead of tracking the user's hand to generate the representation, only the available height estimation is used to generate the representation. For example, for each pixel in the camera image a particle can be introduced into the 3D scene. The height of the individual particles introduced into the 3D scene can be related to the pixel brightness in the image (as described hereinabove)—e.g. very bright pixels are close to the surface layer and darker pixels are further away. The particles combine in the 3D environment to give a 3D representation of the surface of the user's hand. Such an approach enables users to scoop objects up. For example, one hand can be positioned onto the surface layer (palm up) and the other hand can then be used to push objects onto the palm. Objects already residing on the palm can be dropped off by simply tilting the palm so that virtual objects slide off.
The generation and rendering of representations of the user's hand or hands in the 3D environment therefore enables the user to have an increased connection to objects that are manipulated when the user's hands are not in contact with the surface computing device. In addition, the rendering of such representations also improves user interaction accuracy and usability in applications where the user does not manipulate objects from above the surface layer. The visibility of a representation that the user immediately recognizes aids the user in visualizing how to interact with a surface computing device.
Referring again to
The processing of the 3D environment is arranged such that a virtual light source is situated above the surface layer. A shadow is then calculated and rendered for the object using the virtual light source, such that the distance between object and shadow is proportional to the height of the object. Objects on the virtual floor are in contact with their shadow, and the further away an object is from the virtual floor the greater the distance to its own shadow.
The rendering of object shadows is illustrated in
Preferably, the object shadow calculation is performed entirely on the GPU so that realistic shadows, including self-shadowing and shadows cast onto other virtual objects, are computed in real-time. The rendering of object shadows conveys an improved depth perception to the users, and allows users to understand when objects are on-top of or above other objects. The object shadow rendering can be combined with hand shadow rendering, as described above.
The techniques described above with reference to
Referring once more to
With reference to
Therefore, the result of this technique is that objects that move away from the virtual ground are gradually de-saturated, starting from the top most point. When the object reaches the highest possible position it is rendered solid black. Conversely, when lowered back down the effect is inverted, such that the object regains its original color or texture.
This is illustrated in
With reference to
Therefore, the result of this technique is that, with increasing height, objects change from being opaque to being completely transparent. The raised object is cut-off at the predetermined height threshold. Once the entire object is higher than the threshold only the shadow of the object is rendered.
This is illustrated in
With reference to
Therefore, the result of this technique is that, with increasing height, the object gradually disappears as it is raised (and gradually re-appears as it is lowered). Once the object is raised sufficiently high above the virtual ground, then it completely disappears and only the shadow remains (as illustrated in
The “dissolve” technique is illustrated in
A variation of the “fade-to-transparent” and “dissolve” techniques is to retain a representation of the object as it becomes less opaque, so that the object does not completely disappear from the surface layer. An example of this is to convert the object to a wireframe version of its shape as it is raised and disappears from the display on the surface layer. This is illustrated in
The techniques described above with reference to
A further enhancement that can be used to increase the user's connection to the object's being manipulated in the 3D environment is to increase the impression to the user that they are holding the object in their hand. In other words, the user perceives that the object has left the surface layer 101 (e.g. due to dissolving or fading-to-transparent) and is now in the user's raised hand. This can be achieved by controlling the display means 105 to project an image onto the user's hand when the switchable diffuser 102 is in the transparent state. For example, if the user has selected and lifted a red block by raising his hand above the surface layer 101, then the display means 105 can project red light onto the user's raised hand. The user can therefore see the red light on his hand, which assists the user in associating his hand with holding the object.
As stated hereinabove, the 3D environment interaction and control techniques described with reference to
Reference is first made to
Reference is now made to
Reference is next made to
Computing-based device 1300 comprises one or more processors 1301 which can be microprocessors, controllers, GPUs or any other suitable type of processors for processing computing executable instructions to control the operation of the device in order to perform the techniques described herein. Platform software comprising an operating system 1302 or any other suitable platform software can be provided at the computing-based device 1300 to enable application software 1303-1313 to be executed on the device.
The application software can comprise one or more of:
The computer executable instructions can be provided using any computer-readable media, such as memory 1314. The memory is of any suitable type such as random access memory (RAM), a disk storage device of any type such as a magnetic or optical storage device, a hard disk drive, or a CD, DVD or other disc drive. Flash memory, EPROM or EEPROM can also be used.
The computing-based device 1300 comprises at least one image capture device 106, at least one light source 108, at least one display device 105 and a surface layer 101. The computing-based device 1300 also comprises one or more inputs 1315 which are of any suitable type for receiving media content, Internet Protocol (IP) input or other data.
The term ‘computer’ is used herein to refer to any device with processing capability such that it can execute instructions. Those skilled in the art will realize that such processing capabilities are incorporated into many different devices and therefore the term ‘computer’ includes PCs, servers, mobile telephones, personal digital assistants and many other devices.
The methods described herein may be performed by software in machine readable form on a tangible storage medium. The software can be suitable for execution on a parallel processor or a serial processor such that the method steps may be carried out in any suitable order, or simultaneously.
This acknowledges that software can be a valuable, separately tradable commodity. It is intended to encompass software, which runs on or controls “dumb” or standard hardware, to carry out the desired functions. It is also intended to encompass software which “describes” or defines the configuration of hardware, such as HDL (hardware description language) software, as is used for designing silicon chips, or for configuring universal programmable chips, to carry out desired functions.
Those skilled in the art will realize that storage devices utilized to store program instructions can be distributed across a network. For example, a remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software as needed, or execute some software instructions at the local terminal and some at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.
Any range or device value given herein may be extended or altered without losing the effect sought, as will be apparent to the skilled person.
It will be understood that the benefits and advantages described above may relate to one embodiment or may relate to several embodiments. The embodiments are not limited to those that solve any or all of the stated problems or those that have any or all of the stated benefits and advantages. It will further be understood that reference to ‘an’ item refers to one or more of those items.
The steps of the methods described herein may be carried out in any suitable order, or simultaneously where appropriate. Additionally, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.
The term ‘comprising’ is used herein to mean including the method blocks or elements identified, but that such blocks or elements do not comprise an exclusive list and a method or apparatus may contain additional blocks or elements.
It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.