Touchless or in-air gestural interfaces often rely on mouse and touch-based input conventions, and thus treat a user's hand as an input pointer. Accordingly, these in-air gesture interfaces often adopt visual metaphors developed for pointer-based systems. The physical analogues of these metaphors, however, are often ill-suited for three-dimensional gesture interfaces. For example, when using in-air gestures in conjunction with a display screen, a dimensional disparity often exists between the unhindered three-dimensional movement in space of the user's hand and the two-dimensional output of a display screen. Accordingly, users are not typically adept at mentally projecting three-dimensional movements onto a two-dimensional display. Moreover, when providing a gesture, it may be necessary for a user to simultaneously divide their attention between performing the gesture and monitoring the visual feedback provided on the display. Accordingly, three-dimensional movements may not necessarily be intuitive for a user.
Described is a system and technique allowing a user to interact with a device using self-referential gestures. In an implementation, described is a method including detecting, by a computing device, a user within a field-of-view of a capture device operatively coupled to the computing device, and identifying first and second reference points on the detected user, the first reference point providing an indication of a position of a first hand of the user. The method may also include detecting a gesture based on a movement of the first reference point relative to the second reference point, and performing, by the computing device and in response to the movement, a first action.
In an implementation, described is a method including detecting, by a computing device, a user within a field-of-view of a capture device operatively coupled to the computing device, and identifying first and second reference points, the first reference point providing an indication of a position of a first hand of the user. The method may also include determining, by the computing device, one or more axes in a three-dimensional space relative to a position of the user, the three-dimensional space including an origin corresponding to the second reference point, detecting a gesture based on a movement of the first reference point relative to the second reference point, and performing, by the computing device and in response to the movement, a first action.
In an implementation, described is a system including a processor configured to detect a user within a field-of-view of a capture device operatively coupled to the computing device, and identify first and second reference points on the detected user, the first reference point providing an indication of a position of a first hand of the user. The processor may also be configured to detect a gesture based on a movement of the first reference point relative to the second reference point, and perform, in response to the movement, a first action.
The accompanying drawings, which are included to provide a further understanding of the disclosed subject matter, are incorporated in and constitute a part of this specification. The drawings also illustrate implementations of the disclosed subject matter and together with the detailed description serve to explain the principles of implementations of the disclosed subject matter. No attempt is made to show structural details in more detail than may be necessary for a fundamental understanding of the disclosed subject matter and various ways in which it may be practiced.
Described is a system and technique allowing a user to interact with a device using self-referential gestures. Self-referential gestures allow a user to rely on their inherent knowledge of body positioning to allow movements such as hand movements to be intuitively performed. The disclosure describes determining various reference points on the user and detecting hand movements relative to these reference points. In addition, a device may define axes and/or an origin in a three-dimensional space relative to a position of the user within a field-of-view of a capture device. Accordingly, gesture movements may be detected and/or measured based on references that correspond to the user's body in order to provide a more intuitive interaction experience.
The device 10 (or computing device) may include or be part of a variety of types of devices, such as a set-top box, television, media player, mobile phone (including a “smartphone”), computer, or other type of device. The processor 12 may be any suitable programmable control device and may control the operation of one or more processes, such as gesture recognition as discussed herein, as well as other processes performed by the device 10. As described herein, actions may be performed by a computing device, which may refer to a device (e.g. device 10) and/or one or more processors (e.g. processor 12). The bus 11 may provide a data transfer path for transferring between components of the device 10.
The memory 14 may include one or more different types of memory which may be accessed by the processor 12 to perform device functions. For example, the memory 14 may include any suitable non-volatile memory such as read-only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory, and the like, and any suitable volatile memory including various types of random access memory (RAM) and the like.
The communications circuitry 13 may include circuitry for wired or wireless communications for short-range and/or long range communication. For example, the wireless communication circuitry may include Wi-Fi enabling circuitry for one of the 802.11 standards, and circuitry for other wireless network protocols including Bluetooth, the Global System for Mobile Communications (GSM), and code division multiple access (CDMA) based wireless protocols. Communications circuitry 13 may also include circuitry that enables the device 10 to be electrically coupled to another device (e.g. a computer or an accessory device) and communicate with that other device. For example, a user input component such as a wearable device may communicate with the device 10 through the communication circuitry 13 using a short-range communication technique such as infrared (IR) or other suitable technique.
The storage 15 may store software (e.g., for implementing various functions on device 10), and any other suitable data. The storage 15 may include a storage medium including various forms volatile and non-volatile memory. Typically, the storage 15 includes a form of non-volatile memory such as a hard-drive, solid state drive, flash drive, and the like. The storage 15 may be integral with the device 10 or may be separate and accessed through an interface to receive a memory card, USB drive, optical disk, a magnetic storage medium, and the like.
An I/O controller 16 may allow connectivity to a display 18 and one or more I/O devices 17. The I/O controller 16 may include hardware and/or software for managing and processing various types of I/O devices 17. The I/O devices 17 may include various types of devices allowing a user to interact with the device 10. For example, the I/O devices 17 may include various input components such as a keyboard/keypad, controller (e.g. game controller, remote, etc.) including a smartphone that may act as a controller, a microphone, and other suitable components. The I/O devices 17 may also include components for aiding in the detection of gestures including wearable components such as a watch, ring, or other components that may be used to track body movements (e.g. holding a smartphone to detect movements).
The device 10 may or may not be coupled to a display. In implementations where the device 10 is coupled to a display (as shown in
The device 10 may include a capture device 19 (as shown in
The capture device 19 may be configured to capture depth information including a depth image using techniques such as time-of-flight, structured light, stereo image, or other suitable techniques. The depth image may include a two-dimensional pixel area of the captured image where each pixel in the two-dimensional area may represent a depth value such as a distance. The capture device 19 may include two or more physically separated cameras that may view a scene from different angles to obtain visual stereo data to generate depth information. Other techniques of depth imaging may also be used. The capture device 19 may also include additional components for capturing depth information of an environment such as an IR light component, a three-dimensional camera, and a visual image camera (e.g. RGB camera). For example, with time-of-flight analysis the IR light component may emit an infrared light onto the scene and may then use sensors to detect the backscattered light from the surface of one or more targets (e.g. users) in the scene using a three-dimensional camera or RGB camera. In some instances, pulsed infrared light may be used such that the time between an outgoing light pulse and a corresponding incoming light pulse may be measured and used to determine a physical distance from the capture device 19 to a particular location on a target.
When detecting gesture movements, specific gestures may be detected based on information defining a gesture, condition, and/or other information. For example, gestures may be recognized based on information such as a distance of movement (either absolute or relative to the size of the user), a threshold velocity of the movement, a confidence rating, and other criteria. The device may identify one or more reference points on the user in order to track gesture movements. For example, the capture device may employ depth-based full-body tracker that identifies skeletal joints. A joint may include points at which bones connect, and accordingly, allow for movement. For example, a joint may include joints associated with a hand, wrist, elbow, shoulders and/or chest, face (e.g. jaw), hips, knees, ankles, and feet among others. In another example, the device may select a finger or a palm of an open hand as a reference point when tracking hand movements. When detecting gesture movements, the device may track movements using a coordinate system for a three-dimensional space. The device may define a coordinate space relative to an orientation of the capture device, relative to a position of the user, and/or other technique. In order define and/or translate a coordinate system based on a position of a user, the device may utilize a reference point as an origin of the coordinate system. This point of origin may relate to a natural point of reference for a user when performing self-referential gestures. For example, the device may select a point on a central part of the body (e.g. torso) of a user as a reference point when tracking body movements such as the center of a chest, sternum, solar plexus, center of gravity, or within regions such as the thorax, abdomen, pelvis, and the like. The device may also use the head as a reference point for an origin. In another example, the device may use a hand and/or an initial movement of a hand to establish a point of origin for a coordinate system. Accordingly, the device may detect and/or measure subsequent hand movements relative to the established point on the hand. For example, a user may perform an open palm gesture, and in response, the device may establish a point of origin within the palm of the hand. Accordingly, a Y-axis may be defined as substantially along the established point on the palm to a point (e.g. fingertip) of the corresponding index or middle finger (the X-axis and Z-axis may then be defined based on the defined Y-axis).
As described, gestures may include movements within a three-dimensional environment, and accordingly, the gestures may include components of movement along one or more axes. As shown in the example of
A field-of-view as described herein may include an area perceptible by one or more capture devices (e.g. perceptible visual area). In an implementation, the device may determine one or more identities (e.g. via a recognition technique) in response to detecting the presence of the one or more users. For example, the device may attempt to identify a user within the field-of-view in order to perform context and/or user specific actions. For example, the device may perform facial recognition for disambiguation. For instance, the device may disambiguate a gesture such as a pointing gesture to determine the identity of the user that is being referenced. In another example, the device may disambiguate words of a speech commands that may supplement a gesture. For example, these speech commands may include words such as personal pronouns (e.g. “open may calendar,” “send him this picture,” etc.).
In 504, the device may identify first and second reference points on the detected user. The device may track particular features of the user, for example, using skeletal tracking to identify particular points of interest. For example, the reference point may correspond to a joint on the user as well as other points on the body such as on the user's head, torso, etc. In an implementation, the first reference point may provide an indication of a position of a first hand of the user. For example, the point may include a point on the palm and/or finger of the user. As described further herein, a reference point may also include a point within the three-dimensional space.
In 506, the device may determine one or more axes in a three-dimensional space relative to a position of the user. As described above, the axes may be determined based on reference points on the user. When determining movements, the device may define a three-dimensional space that includes an origin for a coordinate system. For example, the origin may correspond to a reference point that may or may not be used to define one or more axes. In one example, the origin may correspond to a reference point on a torso of the user. In another example, the origin may correspond to a reference point on the first hand of the user. In addition, the device may establish a point of origin based on an initial gesture. For example, the device may establish an origin within a palm of the first hand as a result of the user performing a gesture by the first hand with a substantially open palm. Accordingly, the device may determine subsequent gesture movements relative to the initial gesture.
In 508, the device may detect a gesture based on a movement of the first reference point relative to the second reference point. Techniques described herein may determine movements based on reference points of the user's body rather than points relative to the capture device. The movement of the first reference point relative to the second reference point may include a change in distance, a rotation, a change in position, and other types of movements that may correspond to a gesture. For example, the movement may include a hand touching the second reference point.
Returning to
Various implementations may include or be embodied in the form of computer-implemented process and an apparatus for practicing that process. Implementations may also be embodied in the form of a computer-readable storage containing instructions embodied in a non-transitory and tangible storage and/or memory, wherein, when the instructions are loaded into and executed by a computer (or processor), the computer becomes an apparatus for practicing implementations of the disclosed subject matter.
The flow diagrams described herein are included as examples. There may be variations to these diagrams or the steps (or operations) described therein without departing from the implementations described herein. For instance, the steps may be performed in parallel, simultaneously, a differing order, or steps may be added, deleted, or modified. Similarly, the block diagrams described herein are included as examples. These configurations are not exhaustive of all the components and there may be variations to these diagrams. Other arrangements and components may be used without departing from the implementations described herein. For instance, components may be added, omitted, and may interact in various ways known to an ordinary person skilled in the art.
References to “one implementation,” “an implementation,” “an example implementation,” and the like, indicate that the implementation described may include a particular feature, but every implementation may not necessarily include the feature. Moreover, such phrases are not necessarily referring to the same implementation. Further, when a particular feature is described in connection with an implementation, such feature may be included in other implementations whether or not explicitly described. The term “substantially” may be used herein in association with a claim recitation and may be interpreted as “as nearly as practicable,” “within technical limitations,” and the like. Terms such as first, second, etc. may be used herein to describe various elements, and these elements should not be limited by these terms. These terms may be used distinguish one element from another. For example, a first reference point may be termed a second reference point, and, similarly, a second reference point may be termed a first reference point.
The foregoing description, for purpose of explanation, has been described with reference to specific implementations. However, the illustrative discussions above are not intended to be exhaustive or to limit implementations of the disclosed subject matter to the precise forms disclosed. The implementations were chosen and described in order to explain the principles of implementations of the disclosed subject matter and their practical applications, to thereby enable others skilled in the art to utilize those implementations as well as various implementations with various modifications as may be suited to the particular use contemplated.