Natural user inputs (NUI) are increasingly used as methods of interacting with a computing system. NUIs may comprise gestural input and/or voice commands, for example. In some approaches, gestures may be used to control aspects of an application running on a computing system. Such gestures may be detected by various sensing techniques, such as image sensing, motion sensing, and touch sensing.
Embodiments are disclosed that relate to detecting two hand natural user inputs. For example, one disclosed embodiment provides a method comprising receiving first hand tracking data regarding a first hand of a user and second hand tracking data regarding a second hand of the user from a sensor system, wherein the first hand tracking data and the second hand tracking data temporally overlap. A gesture is then detected based on the first hand tracking data and the second hand tracking data, and one or more aspects of the computing device are controlled based on the gesture detected.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
Environment 100 includes a computing system 102 to which a display device 104 and a sensor system 106 are operatively coupled. In some embodiments, computing system 102 may be a videogame console or a multimedia device configured to facilitate consumption of multimedia (e.g., music, video, etc.). In other embodiments, computing system 102 may be a general-purpose computing device, or may take any other suitable form. Example hardware which may be included in computing system 102 is described below with reference to
Computing system 102 is configured to accept various forms of user input from one or more users 108. As such, traditional user-input devices such as a keyboard, mouse, touch-screen, gamepad, or joystick controller (not shown in the drawings) may be operatively coupled to computing system 102. Computing system 102 is also configured to accept natural user input (NUI) from at least one user. NUI input may comprise gesture input and/or vocal input from user 108, for example. As shown in the illustrated example, user 108 is performing a NUI in the form of hand gestures to computing system 102, thereby affecting aspects of a game application 107 running on the computing system.
Display device 104 is configured to output visual content received from computing system 102, and also may output audio content. Display device 104 may be any suitable type of display, including but not limited to a liquid-crystal display (LCD), organic light-emitting diode (OLED) display, cathode ray tube (CRT) television, etc. While shown in the depicted example as a large-format display, display 104 may assume other sizes, and may comprise two or more displays. Other types of display devices, including projectors and mobile device displays, are also contemplated.
Sensor system 106 facilitates reception of NUI by tracking one or more users 108. Sensor system 106 may utilize a plurality of tracking technologies, including but not limited to time-resolved stereoscopy, structured light, and time-of-flight depth measurement. In the depicted example, sensor system 106 includes at least one depth camera 110 configured to output depth maps having a plurality of pixels. Each pixel in a depth map includes a depth value encoding the distance from the depth camera to a surface (e.g., user 108) imaged by that pixel. In other embodiments, depth camera 110 may output point cloud data having a plurality of points each defined by a three-dimensional position including a depth value indicating the distance from the depth camera to the surface represented by that point. While shown as a device separate from computing system 102 and placed atop display device 104, sensor system 106 may be integrated with the computing system, and/or placed in any other suitable location.
In the depicted example, sensor system 106 collects first hand tracking data of a first hand of user 108, and second hand tracking data of a second hand of the user. By simultaneously tracking both hands of user 108, a potentially larger range of hand gestures may be received and interpreted by computing system 102 than where single hand gestures are utilized. As described in further detail below, such hand gestures may comprise simultaneous input from both hands of user 108, as well as non-simultaneous (e.g., temporally separated) input from both hands of the user. Further, tracking both hands of a user also may be used to interpret a single hand gesture based upon contextual information from the non-gesture hand. As used herein, a “two hand gesture” refers to a gesture which includes temporally overlapping input from both hands, while a “one hand gesture” refers to a gesture which the gesture intended to be used as input us performed by a single hand. The term “simultaneous” may be used herein to refer to gestures of both hands that are at least partially temporally overlapping, and is not intended to imply that the gestures start and/or stop at a same time.
Any suitable methods may be used to track the hands of a user. For example, depth imagery from sensor system 106 may be used to model a virtual skeleton based upon user posture and movement.
A plurality of parameters may be assigned to each joint 204 in virtual skeleton 200. These parameters may include position and rotational orientation respectively encoded via Cartesian coordinates and Euler angles, for example. Joint positions may be tracked over time and at a relatively high frequency (e.g., 30 frames per second) to facilitate tracking of a human subject in real-time. Further, one or more of the position, rotational orientation, and motion of two or more joints may be collectively tracked (or averaged for a single net joint) to determine a geometric posture of a portion of virtual skeleton 200. For example, hand postures such as an
“OK” posture and a “thumbs-up” posture may be recognized, in addition to hand states such as an open and a closed state, which respectively indicate whether a hand is open or closed.
By tracking the position and posture of both hands of user 108 over time, and in some embodiments additional parameters, a wide variety of aspects of computing system 102 may be controlled by a rich collection of gestures. The meaning of a given gesture—i.e., the action mapped to the gesture which affects computing system 102 in some manner—may be derived from hand state (e.g., open, closed), whether motion between each hand is symmetric or asymmetric, whether motion in one hand is accompanied or preceded by a static posture held by the other hand, etc. Thus, gesture meanings may be derived from one or more the position, motion, and posture of each hand in relation to each other and their change over time.
In addition to increasing the number of available gestures, simultaneously tracking two hands may also afford an increase in the quality of one hand gesture detection and interpretation. In some approaches, tracking data for a first hand may be used to prove context for the interpretation of tracking data of a second hand. For example, the first hand may be identified as performing a known, one hand gesture. Conversely, the second hand may be moving asymmetrically relative to the first hand in a manner that does not map to a known gesture, whether as a one hand gesture or a two hand gesture in combination with the first hand. As the second hand is undergoing significant motion which does not map to a known gesture, it may be assumed that the user did not intend on gestural interaction and is merely making movements which are random or intended for something or someone else, as one hand gestures may be performed deliberately, and thus when holding the other hand relatively still. In this way, erroneous gesture interpretation, and the action(s) thereby effected, may be prevented.
At 302, method 300 comprises receiving from a sensor system first hand tracking data regarding a first hand of a user temporally overlapping second hand tracking data regarding a second hand of the user. Receiving the first and second hand tracking data may include receiving position data at 304, rotational orientation data at 305, motion data at 306, posture data at 308, and/or state data at 310. In some embodiments, position and motion data may be received for one or more hand joints in each hand of a virtual skeleton, while posture and state data may be received for some or all of the hand joints in each hand. Receiving first and second hand tracking data may further include receiving temporally overlapping hand tracking data at 312 and non-temporally overlapping hand tracking data at 314. Temporally overlapping hand tracking data may be used to detect and interpret simultaneous two hand gestures in which input from both hands is simultaneously supplied. Alternatively or additionally, non-temporally overlapping hand tracking data may be used to inform gesture detection and interpretation with preceding hand tracking data. For example, a hand gesture comprising a static posture followed by hand motion may be identified using non-temporally overlapping hand tracking data.
Next, at 316, method 300 comprises detecting one or more gestures based on the first and second hand tracking data received at 302. Gesture detection may include detecting one or more two hand gestures at 318 and may further include detecting one or more one hand gestures at 320, wherein one hand gestures may be detected via two hand tracking. Example one and two hand gestures are shown and described below. Depending on the parameters included in the received tracking data, one or more of the position, rotational orientation, motion, posture, and state of each hand may be used to find a matching gesture in a dictionary of known gestures.
In some scenarios, both a one hand gesture and a two hand gesture may be detected based on received hand tracking data. A variety of approaches may be employed to select whether to interpret the detected gesture as a one hand gesture or two hand gesture. One example optional approach may be to preferentially select the two hand gesture, as indicated at 322. Such an assumption may be made based upon the likelihood that the user intended to perform the two hand gesture, rather than perform a one hand gesture accompanied by hand input that unintentionally resembles the two hand gesture. Other approaches are possible, however, including preferentially selecting the one hand gesture, prompting the user for additional feedback to clarify whether the one or two hand gesture was intended, and utilizing confidence data regarding how closely each detected gesture resembles the corresponding identified system gesture.
Next, at 324, method 300 may comprise optionally providing feedback based on the gesture detected (or preferentially selected) at 316 (322). For example, visual feedback may be provided to the user via a display device operatively coupled to the computing system executing method 300. Providing feedback may include indicating the correctness of the gesture at 326, previewing an action to which the gesture maps at 328, suggesting a subsequent gesture or next step of the currently detected gesture at 330, indicating whether the currently detected gesture is a two hand gesture and/or a one hand gesture at 332, drawing a progress bar illustrating the progress toward completion of the gesture at 334, and/or indicating whether the action that the gesture maps to controls an aspect of an application or an OS running on the computing system at 336. It will be understood that these specific forms of feedback are described for the purpose of example, and are not intended to be limiting in any manner.
Continuing, method 300 comprises, at 338, controlling one or more aspects of a computing system based on the detected gesture. Such controlling may include, but is not limited to, controlling one or more aspects of the application at 340, and controlling one or more aspects of an operating system (OS) at 342.
In
As shown in
In
In some embodiments, game application 107 may be caused to resume full occupation of window 400 by reversing the order in which the steps of the OS pull-in gesture are performed. In other embodiments, a separate gesture may be provided to resume full window occupation by an application and exit the OS GUI.
The OS pull-in gesture illustrated by
The OS pull-in gesture shown in
As shown in
In
As shown in
In
As mentioned above, the application pull-in gesture shown in
As shown in
In
In
In
The OS overlay gesture shown in
As shown in
In
In
In
The application pull-down gesture shown in
While the example gestures shown and described above generally correspond to hand gestures performed in three-dimensional space, other types of gestures are within the scope of this disclosure such. For example, finger gestures in which gestures are performed by a single finger may be used to control aspects of an application and OS. Hand and/or finger gestures typically performed on tactile touch sensors (e.g., as found in mobile electronic devices) are also contemplated.
As shown in
In
In
In
As shown in
In
In
In
It will be appreciated that the gestures illustrated in
While the effects of gestural interaction are conveyed by changes in output from display device 104 in environment 100, it will be appreciated that the approaches described herein are also applicable to environments which lack a display device. In this example, interpreted gestural input may be conveyed by means other than a display device, and in some scenarios, used to physically control an apparatus. As non-limiting examples, gestural input may be tracked and interpreted to open and close doors, control the position of end effectors of robotic arms, etc.
In some embodiments, the methods and processes described herein may be tied to a computing system of one or more computing devices. In particular, such methods and processes may be implemented as a computer-application program or service, an application-programming interface (API), a library, and/or other computer-program product.
Computing system 1000 includes a logic subsystem 1002 and a storage subsystem 1004. Computing system 1000 may optionally include a display subsystem 1006, input subsystem 1008, communication subsystem 1010, and/or other components not shown in
Logic subsystem 1002 includes one or more physical devices configured to execute instructions. For example, the logic machine may be configured to execute instructions that are part of one or more applications, services, programs, routines, libraries, objects, components, data structures, or other logical constructs. Such instructions may be implemented to perform a task, implement a data type, transform the state of one or more components, achieve a technical effect, or otherwise arrive at a desired result.
The logic machine may include one or more processors configured to execute software instructions. Additionally or alternatively, the logic machine may include one or more hardware or firmware logic machines configured to execute hardware or firmware instructions. Processors of the logic machine may be single-core or multi-core, and the instructions executed thereon may be configured for sequential, parallel, and/or distributed processing. Individual components of the logic machine optionally may be distributed among two or more separate devices, which may be remotely located and/or configured for coordinated processing. Aspects of the logic machine may be virtualized and executed by remotely accessible, networked computing devices configured in a cloud-computing configuration.
Storage subsystem 1004 includes one or more physical devices configured to hold instructions executable by the logic machine to implement the methods and processes described herein. When such methods and processes are implemented, the state of storage subsystem 1004 may be transformed—e.g., to hold different data.
Storage subsystem 1004 may include removable and/or built-in devices. Storage subsystem 1004 may include optical memory (e.g., CD, DVD, HD-DVD, Blu-Ray Disc, etc.), semiconductor memory (e.g., RAM, EPROM, EEPROM, etc.), and/or magnetic memory (e.g., hard-disk drive, floppy-disk drive, tape drive, MRAM, etc.), among others. Storage subsystem 1004 may include volatile, nonvolatile, dynamic, static, read/write, read-only, random-access, sequential-access, location-addressable, file-addressable, and/or content-addressable devices.
It will be appreciated that storage subsystem 1004 includes one or more physical devices and excludes propagating signals per se. However, aspects of the instructions described herein alternatively may be propagated by a communication medium (e.g., an electromagnetic signal, an optical signal, etc.), as opposed to being stored on a physical device.
Aspects of logic subsystem 1002 and storage subsystem 1004 may be integrated together into one or more hardware-logic components. Such hardware-logic components may include field-programmable gate arrays (FPGAs), program- and application-specific integrated circuits (PASIC / ASICs), program- and application-specific standard products (PSSP/ASSPs), system-on-a-chip (SOC), and complex programmable logic devices (CPLDs), for example.
The terms “module,” “program,” and “engine” may be used to describe an aspect of computing system 1000 implemented to perform a particular function. In some cases, a module, program, or engine may be instantiated via logic subsystem 1002 executing instructions held by storage subsystem 1004. It will be understood that different modules, programs, and/or engines may be instantiated from the same application, service, code block, object, library, routine, API, function, etc. Likewise, the same module, program, and/or engine may be instantiated by different applications, services, code blocks, objects, routines, APIs, functions, etc. The terms “module,” “program,” and “engine” may encompass individual or groups of executable files, data files, libraries, drivers, scripts, database records, etc.
It will be appreciated that a “service”, as used herein, is an application program executable across multiple user sessions. A service may be available to one or more system components, programs, and/or other services. In some implementations, a service may run on one or more server-computing devices.
When included, display subsystem 1006 may be used to present a visual representation of data held by storage subsystem 1004. This visual representation may take the form of a graphical user interface (GUI). As the herein described methods and processes change the data held by the storage machine, and thus transform the state of the storage machine, the state of display subsystem 1006 may likewise be transformed to visually represent changes in the underlying data. Display subsystem 1006 may include one or more display devices utilizing virtually any type of technology. Such display devices may be combined with logic subsystem 1002 and/or storage subsystem 1004 in a shared enclosure, or such display devices may be peripheral display devices.
When included, input subsystem 1008 may comprise or interface with one or more user-input devices such as a keyboard, mouse, touch screen, or game controller. In some embodiments, the input subsystem may comprise or interface with selected natural user input (NUI) componentry. Such componentry may be integrated or peripheral, and the transduction and/or processing of input actions may be handled on- or off-board. Example NUI componentry may include a microphone for speech and/or voice recognition; an infrared, color, stereoscopic, and/or depth camera for machine vision and/or gesture recognition; a head tracker, eye tracker, accelerometer, and/or gyroscope for motion detection and/or intent recognition; as well as electric-field sensing componentry for assessing brain activity.
When included, communication subsystem 1010 may be configured to communicatively couple computing system 1000 with one or more other computing devices. Communication subsystem 1010 may include wired and/or wireless communication devices compatible with one or more different communication protocols. As non-limiting examples, the communication subsystem may be configured for communication via a wireless telephone network, or a wired or wireless local- or wide-area network. In some embodiments, the communication subsystem may allow computing system 1000 to send and/or receive messages to and/or from other devices via a network such as the Internet.
It will be understood that the configurations and/or approaches described herein are example in nature, and that these specific embodiments or examples are not to be considered in a limiting sense, because numerous variations are possible. The specific routines or methods described herein may represent one or more of any number of processing strategies. As such, various acts illustrated and/or described may be performed in the sequence illustrated and/or described, in other sequences, in parallel, or omitted. Likewise, the order of the above-described processes may be changed.
The subject matter of the present disclosure includes all novel and nonobvious combinations and subcombinations of the various processes, systems and configurations, and other features, functions, acts, and/or properties disclosed herein, as well as any and all equivalents thereof.