Various embodiments relate generally to user interface functionality, and, more particularly, relate to a method and apparatus for motion gesture recognition.
As computing and communications devices become increasingly more dynamic and convenient, users of the devices have become increasingly reliant on the functionality offered by the devices in a variety of settings. Due to advances made in screen technologies, accelerometers and other user interface input devices and hardware, users continue to demand more convenient and intuitive user interfaces. To meet the demands of users or encourage utilization of new functionality, innovation in the design and operation of user interfaces must keep pace.
Example methods, example apparatuses, and example computer program products are described herein that provide motion gesture recognition. One example method may include receiving motion gesture test data that was captured in response to a user's performance of a motion gesture. The motion gesture test data may include acceleration values in each of three dimensions of space that have directional components that are defined relative to an orientation of a device. The example method may further include transforming the acceleration values to derive transformed values that are independent of the orientation of the device, and performing a comparison between the transformed values and a gesture template to recognize the motion gesture performed by the user.
An additional example embodiment is an apparatus configured to support motion gesture recognition. The example apparatus may comprise at least one processor and at least one memory including computer program code, where the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus to perform various functionalities. In this regard, the example apparatus may be caused to receive motion gesture test data that was captured in response to a user's performance of a motion gesture. The motion gesture test data may include acceleration values in each of three dimensions of space that have directional components that are defined relative to an orientation of a device. The example apparatus may be further caused to transform the acceleration values to derive transformed values that are independent of the orientation of the device, and perform a comparison between the transformed values and a gesture template to recognize the motion gesture performed by the user.
Another example embodiment is a computer program product comprising at least one non-transitory computer readable medium having computer program code stored thereon, wherein the computer program code, when executed by an apparatus (e.g., one or more processors), causes an apparatus to perform various functionalities. In this regard, the program code may cause the apparatus to receive motion gesture test data that was captured in response to a user's performance of a motion gesture. The motion gesture test data may include acceleration values in each of three dimensions of space that have directional components that are defined relative to an orientation of a device. The program code may also cause the apparatus to transform the acceleration values to derive transformed values that are independent of the orientation of the device, and perform a comparison between the transformed values and a gesture template to recognize the motion gesture performed by the user.
Another example apparatus comprises means for receiving motion gesture test data that was captured in response to a user's performance of a motion gesture. The motion gesture test data may include acceleration values in each of three dimensions of space that have directional components that are defined relative to an orientation of a device. The example apparatus may further include means for transforming the acceleration values to derive transformed values that are independent of the orientation of the device, and means for performing a comparison between the transformed values and a gesture template to recognize the motion gesture performed by the user.
Having thus described some example embodiments in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
a-4d illustrate motion direction and orientation relationships that may be used to determine a third rotation angle according to various example embodiments;
Example embodiments will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments are shown. Indeed, the embodiments may take many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. The terms “data,” “content,” “information,” and similar terms may be used interchangeably, according to some example embodiments, to refer to data capable of being transmitted, received, operated on, and/or stored.
As used herein, the term ‘circuitry’ refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry); (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions); and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.
This definition of ‘circuitry’ applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term “circuitry” would also cover an implementation of merely a processor (or multiple processors) or portion of a processor and its (or their) accompanying software and/or firmware. The term “circuitry” would also cover, for example and if applicable to the particular claim element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or other network device.
According to various example embodiments, motion gesture recognition may be utilized to trigger various applications and functionalities that may be implemented by a computing device. Within the context of this application, motion gestures can be specific motions, or series of specific motions, that are performed by a user in a three-dimensional space. In this regard, an individual waving good-bye or drawing a circle with their hand are examples of a motion gestures. A user of a computing device may hold the device, or at least the motion gesture detection hardware, in their hand while the motion gesture is performed in order to permit the motion gesture to be detected and recognized. In some instances, the motion gesture detection hardware may not necessarily be held, but may rather be affixed to, for example, the individual's wrist via a strap.
One example of hardware that may be configured to facilitate the detection of motion gestures is an accelerometer sensor. An accelerometer sensor may be used in devices (e.g., mobile devices) to permit the tracking of a user's physical movements by detection of movement accelerating in particular direction. Many accelerometer sensors, possibly with the assistance of supporting hardware and software, generate acceleration values of vectors in each of three dimensions to describe the acceleration of the sensor.
Through the accelerometer's ability to track a user's movement within a coordinate axis that is dependent upon the orientation of a device, such as the one depicted in
When accelerometer sensors that are built into devices, such as mobile phones, the three-dimensional acceleration values that are output from an accelerometer sensors can be dependent on the device's current orientation. The signal or data from accelerometer may be represented as a vector of a three-tuple (x, y, z) that includes the acceleration information in the x, y, and z directions based on the device's current orientation. For personalized motion gesture recognition, a user may therefore be required to train a device (or create a motion gesture template) in consideration of the orientation dependency of the gesture recognition process. For example, if a user creates a motion gesture template using a gesture when the device is held in an upright orientation (e.g., the top edge of a mobile phone screen being closest to the ceiling and bottom edge of the phone screen being closest to the floor), the user may need to ensure that all future gestures are performed with this same orientation for a gesture match to be found.
Many motion gesture recognition techniques use Dynamic Time Warping (DTW) and Hidden Markov Models (HMM) to identify motion gestures. However, if the three-dimensional orientation of the device is not the same during the motion gesture, the DTW or HMM classifier may not be able to recognize the gesture because the three-dimensional acceleration values used to create the motion gesture template are different from the three-dimensional motion gesture test data (the data derived from a motion gesture match attempt or test performed by the user). As such, the usability of such a solution can be insufficient for widespread adoption.
It has been determined that human gestures are often performed in a two-dimensional plane, since two-dimensional motion gestures are easier for a user to remember. However, even given this two-dimensional nature of motion gestures, users often do not necessarily orient the motion gesture in the same relative plane when they perform the gesture, thereby introducing the need to consider aspects of a third dimension when performing motion gesture recognition. Various example embodiments therefore perform motion gesture recognition in consideration of the third dimensional values and permit motion gesture recognition when a device is used in any orientation and the motion gestures are performed in any arbitrary plane.
The usability of three-dimensional motion gestures can be increased if, in accordance with various example embodiments, the data derived from a test motion gesture is rotated such that the rotated test data shares a common plane with the motion gesture template. Motion gesture recognition and value comparisons can then be performed, for example, using DTW or HMM, with respect to the values that contribute to the shared two-dimensional planar space, even though acceleration in three-dimensions was originally being considered to perform the rotation. In this regard, according to some example embodiments, a determination of the rotation angle about each of the three dimensional axes can be determined in order to facilitate motion gesture recognition regardless of the orientation of the device by facilitating the ability to define a common two-dimensional plane. According to some example embodiments, the device orientation variation problem described above can also be resolved through the use of an accelerometer or equivalent hardware, without the need for additional gyros or magnetometers, thereby providing a low cost solution.
According to a first example embodiment, a complete orientation-independent solution for accelerometer-based, or accelerometer only-based, gesture recognition is provided. The example method and associated apparatus embodiments can determine the two-dimensional plane that the motion gesture test data is to be rotated onto by using Principal Component Analysis (PCA). According to various example embodiments, PCA is an orthogonal transformation technique that converts a set of correlated variable values into a set of values of uncorrelated variables called principal components. Since the number of principal components can be less than or equal to the number of original variables, three dimensional acceleration values can be converted or transformed into two dimensional acceleration values. The PCA transformation may be defined such that a first principal component has a maximum variance and accounts for as much of the variability in the data as possible, and each subsequent component has a highest variance possible under the condition that the component is orthogonal to the preceding components.
As such, using PCA, the three-dimensional orientation-dependent motion gestures can be transformed into two-dimensional orientation-independent gestures. DTW or HMM classifiers can then be applied to the resultant two-dimensional acceleration values to perform motion gesture recognition. Additionally, according to some example embodiments, if the gesture is a one dimensional or a three dimensional gesture, the same or a similar technique can be applied to determine a corresponding one-dimension or three-dimension coordinate system of the gesture resulting in orientation independency.
In some instances, an intended motion gesture may vary in time duration and speed. As such, according to various example embodiments, the acceleration values, either before or after transformation using PCA, may be resampled to a fixed length and scaled to a common range.
Additionally, a preprocessing operation may be performed. In this regard, suppose the motion gesture template data set is {Ti(xn,yn,zn)n=1N
If more than one training sample is collected for a particular motion gesture, multiple templates may be created for the motion gesture. Even if each training sample is performed using different orientations, a consistent motion gesture template may be generated. The following example Add-Samples-Into-Template procedure may be used to combine the additional samples into the gesture template, without orientation dependency, where R is a new sample.
Having defined the motion gesture template, motion gesture test data can be captured and applied to determine whether a match can be found. In this regard, suppose the motion gesture test data is S(xn,yn,zn)n=1M, where M is the length of the test data sample.
It is noteworthy that, according to some example embodiments, the application of PCA causes the directional information of the acceleration values to be lost (in both the template data and the test data). As such, according to some example embodiments, a motion gesture that involves drawing a circle in clockwise direction may have equivalent PCA outputs as a motion gesture drawn in the anticlockwise direction. However, the significance of having gesture orientation independency far outweighs the value of afforded gesture directional information. Additionally, users may often be more interested in performing motion gestures of the same shape rather than performing motion gestures of the same shape, in the same direction.
Since the directional information in the motion gesture may be lost after PCA transformation, when DTW classifiers are used for gesture recognition, four variations of the transformed test data may need to be tested against the template data to determine if a gesture match is found. Accordingly, a modified DTW classifier may be generated in accordance with various example embodiments. The following example Modified-DTW-Classifier procedure provides one example for modifying the DTW classifier for orientation independent gesture recognition.
For HMM classifiers, an individual HMM model may be learned or derived from each template set {T′i(xn,yn)n=1N}. A similar approach may then be applied to modify the HMM classifier for testing the four possibilities, however, with consideration of the maximum probability. The following example Modified-HMM-Classifier procedure provides one example for modifying the HMM classifier for orientation independent gesture recognition.
In view of the example embodiments provided above,
The example embodiments provided above utilize PCA as a mechanism for generating orientation independent acceleration values for use in motion gesture recognition. An alternative example technique, which may also be combined with the PCA techniques described above, may rely upon device orientation assumptions to determine rotational angles about each of the three dimensional axes to obtain orientation independent values.
In this regard, a heuristic technique may be employed to determine the third rotation angle under an assumption that a user often holds a device in a predictable relationship to a coordinate system defined with respect to the user's body. Accordingly, a full rotation matrix may be derived due to this assumption, which can be utilized to rotate motion gesture data onto a coordinate system that is based on the Earth and user's body.
According to various example embodiments, the third rotation angle may be determined by observing initial movement/acceleration values of the device, after the device has been rotated to a flat position parallel to Earth. Subsequently, the DTW or HMM classifiers may be applied on the rotated motion gesture test data for orientation independent gesture recognition.
As mentioned above, according to example embodiments of this alternative technique, some assumptions are considered. A first assumption may be that the device's top edge is furthest away from the user compared to the bottom edge. A second assumption may be that the initial orientation of the device is within +/−90 degrees of the z axis defined by the user's body (e.g., a device with a top edge pointing forward with respect to user's body is at 0 degrees relative to the z axis defined by the user's body). A third assumption may be that a two-dimensional vertical plane is used for motion gestures.
According to various example embodiments, the Earth may be used as a reference for motion gestures, and therefore the gravitational force or pull may be utilized as a basis for rotating the acceleration values. In this regard, the rotation may be performed such that the device may be considered to be in a flat position parallel to the Earth's ground and rotated such that the device is pointing to the North pole (azimuth plane) based on the Earth's magnetic field. By doing so, training template and test gesture data may be rotated to the Earth's frame thereby providing a common initial orientation of the device for all gestures.
The orientation information may be recorded just prior to the gesture and when the device is not moving. To be able to do this, the device may include an enabled accelerometer sensor to be able to monitor the gravitational force of the Earth and a magnetometer (or electronic compass) to be able to monitor magnetic/electric fields of the Earth for heading with in the azimuth plane. However, magnetometer interference and distortion of magnetic/electric fields can occur depending on the device's surroundings. If the device is located indoors, in a car, or surrounded by high rise buildings where metal structures exist, the magnetometer may be rendered inaccurate. Additionally, continuous calibration of the magnetometer may be required.
Therefore, rather than relying on the use of a magnetometer or similar hardware to rotate the device to a common reference point, example embodiments determine each of three dimensional angles of rotation to rotate the values provided by the accelerometer and determine a common initial orientation of the device.
The example method begins at 300 where the acceleration values are considered relative to the user's body coordinate system. In this regard, the three dimensional accelerometer data xB, yB, zB is considered in view of the assumptions described above, where B represents body frame of the user. At 310, the top face orientation of the device is verified. In this regard, a check may be performed to determine if the device's screen or top face is facing up towards the sky by monitoring the initial stationary orientation (just prior to gesture movement) of the device with the accelerometer z axis. If z axis acceleration is less than zero gravity, each y and z axis accelerometer data point may be multiplied by −1. By doing so, the device is forced to be oriented as if the top face (e.g., the side with the display screen) is facing towards the sky, and facilitates rotation calculations for this orientation.
At 320, the rotation angles about the x and y axes may be determined. One technique for determining the angles is to calculate Euler's angle for θ and φ (x and y axis) based on the initial orientation of the device using the gravitational force or pull. Refer again to
At 330, the acceleration values may be rotated onto Earth-based x and y coordinate axes. In this regard, each accelerometer data point xB, yB, zB may be rotated to xE, yE, zE causing a rotation from the user's body coordinate system to an Earth coordinate system for the x and y values. To calculate the rotations, the accelerometer values, in the form of a vector may be rotated using the rotation matrix R. As a result, the values have been modified as if the device had been rotated into a flat position, parallel to the Earth's ground. Rz(ψ) need not be included at this point, since no third angle rotation information exists.
a
E
=[Rx(φ)Ry(θ)]−1aB
At 340, the rotation angle about the z axis may be calculated based on the rotated, initial acceleration values along the x and y axes. In this regard, at the initial point of the gesture movement, the initial values of x and y axis acceleration may be monitored. Using the table below, in association with the content
At 350, the acceleration values may be rotated onto a coordinate system that is defined in the x and y axes relative to the Earth and in the z axis relative to the user's body. In this regard, each accelerometer data point calculated above may again be rotated from xE, yE to xE2, yE2 and where zE2=zE. Accordingly, the values are now completely rotated relative to a common reference point. The x and y axes are rotated to Earth's frame with respect to the gravitational force of the Earth and z-axis is rotated with respect to the user's body, where the device is pointing forward.
a
E2
=[Rz(ψ)]aE
The rotated values can then be applied to DTW or HMM classifiers to perform gesture recognition. The example methods of
Having described some example embodiments above,
Referring now to
Whether configured as hardware or via instructions stored on a computer-readable storage medium, or by a combination thereof, the processor 505 may be an entity and means capable of performing operations according to example embodiments while configured accordingly. Thus, in example embodiments where the processor 505 is embodied as, or is part of, an ASIC, FPGA, or the like, the processor 505 may be specifically configured hardware for conducting the operations described herein. Alternatively, in example embodiments where the processor 505 is embodied as an executor of instructions stored on a computer-readable storage medium, the instructions may specifically configure the processor 505 to perform the algorithms and operations described herein. In some example embodiments, the processor 505 may be a processor of a specific device (e.g., mobile communications device) configured for employing example embodiments by further configuration of the processor 505 via executed instructions for performing the algorithms, methods, and operations described herein.
The memory device 510 may be one or more tangible and/or non-transitory computer-readable storage media that may include volatile and/or non-volatile memory. In some example embodiments, the memory device 510 comprises Random Access Memory (RAM) including dynamic and/or static RAM, on-chip or off-chip cache memory, and/or the like. Further, memory device 510 may include non-volatile memory, which may be embedded and/or removable, and may include, for example, read-only memory, flash memory, magnetic storage devices (e.g., hard disks, floppy disk drives, magnetic tape, etc.), optical disc drives and/or media, non-volatile random access memory (NVRAM), various type of solid-state storage (e.g., flash memory), and/or the like. Memory device 510 may include a cache area for temporary storage of data. In this regard, some or all of memory device 510 may be included within the processor 505. In some example embodiments, the memory device 510 may be in communication with the processor 505 and/or other components via a shared bus. In some example embodiments, the memory device 510 may be configured to provide secure storage of data, such as, for example, the characteristics of the reference marks, in trusted modules of the memory device 510.
Further, the memory device 510 may be configured to store information, data, applications, computer-readable program code instructions, and/or the like for enabling the processor 505 and the example apparatus 500 to carry out various functions in accordance with example embodiments described herein. For example, the memory device 510 may be configured to buffer input data for processing by the processor 505. Additionally, or alternatively, the memory device 510 may be configured to store instructions for execution by the processor 505.
The I/O interface 506 may be any device, circuitry, or means embodied in hardware or a combination of hardware and software that is configured to interface the processor 505 with other circuitry or devices, such as the user interface 525. In some example embodiments, the I/O interface may embody or be in communication with a bus that is shared by multiple components. In some example embodiments, the processor 505 may interface with the memory 510 via the I/O interface 506. The I/O interface 506 may be configured to convert signals and data into a form that may be interpreted by the processor 505. The I/O interface 506 may also perform buffering of inputs and outputs to support the operation of the processor 505. According to some example embodiments, the processor 505 and the I/O interface 506 may be combined onto a single chip or integrated circuit configured to perform, or cause the apparatus 500 to perform, various functionalities.
In some embodiments, the apparatus 500 or some of the components of apparatus 500 (e.g., the processor 505 and the memory device 510) may be embodied as a chip or chip set. In other words, the apparatus 500 may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon. The apparatus 500 may therefore, in some cases, be configured to implement embodiments on a single chip or as a single “system on a chip.” As such, in some cases, a chip or chipset may constitute means for performing the functionalities described herein and with respect to the processor 505.
The user interface 525 may be in communication with the processor 505 to receive user input via the user interface 525 and/or to present output to a user as, for example, audible, visual, mechanical, or other output indications. The user interface 525 may include, for example, a keyboard, a mouse, a joystick, a display (e.g., a touch screen display), a microphone, a speaker, camera, accelerometer, or other input/output mechanisms. Further, the processor 505 may comprise, or be in communication with, user interface circuitry configured to control at least some functions of one or more elements of the user interface. The processor 505 and/or user interface circuitry may be configured to control one or more functions of one or more elements of the user interface through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processor 505 (e.g., volatile memory, non-volatile memory, and/or the like). The user interface 525 may also be configured to support the implementation of haptic feedback. In this regard, the user interface 525, as controlled by processor 505, may include a vibra, a piezo, and/or an audio device configured for haptic feedback as described herein. In some example embodiments, the user interface circuitry is configured to facilitate user control of at least some functions of the apparatus 500 through the use of a display and configured to respond to user inputs. The processor 505 may also comprise, or be in communication with, display circuitry configured to display at least a portion of a user interface, the display and the display circuitry configured to facilitate user control of at least some functions of the apparatus 500.
The accelerometer sensor 515 may be a hardware device that is configured to measures the direction and magnitude of the acceleration of the sensor and/or the apparatus 500. The accelerometer sensor 515 may be configured to provide acceleration directions and values to the processor 505, via the I/O 506, for analysis as described herein. The accelerometer may be a multi-axis accelerometer that provides the acceleration relative to a three-dimensional coordinate system that may be oriented in accordance with the particular orientation of the apparatus 500 at that time.
The device orientation manager 540 of example apparatus 500 may be any means or device embodied, partially or wholly, in hardware, a computer program product, or a combination of hardware and a computer program product, such as processor 505 implementing stored instructions to configure the example apparatus 500, memory device 510 storing executable program code instructions configured to carry out the functions described herein, or a hardware configured processor 505 that is configured to carry out the functions of the device orientation manager 540 as described herein. In an example embodiment, the processor 505 comprises, or controls, the device orientation manager 540. The device orientation manager 540 may be, partially or wholly, embodied as processors similar to, but separate from processor 505. In this regard, the device orientation manager 540 may be in communication with the processor 505. In various example embodiments, the device orientation manager 540 may, partially or wholly, reside on differing apparatuses such that some or all of the functionality of the device orientation manager 540 may be performed by a first apparatus, and the remainder of the functionality of the device orientation manager 540 may be performed by one or more other apparatuses.
Further, the apparatus 500 and the processor 505 may be configured to perform various functionalities via device orientation manager 540. In this regard, the device orientation manager 540 may be configured to implement the operations described herein. For example, the device orientation manager 540 may be configured to implement the functionality described above with respect to
According to some example embodiments, the device orientation manager 540 may be further configured to perform a Principal Component Analysis (PCA) transformation on the acceleration values to derive two-dimensional transformed values. Additionally, or alternatively, the device orientation manager 540 may be configured to identify a highest valued component and a second highest valued component provided by the PCA transformation, and scale the highest valued component and the second highest valued component to a common magnitude level to generate the two-dimensional transformed values. Further, according to some example embodiments, performing the comparison between the transformed values and the gesture template may include applying Dynamic Time Warping (DTW) classifiers to the transformed values to perform gesture recognition or applying Hidden Markov Model (HMM) classifiers to the transformed values to perform gesture recognition.
Additionally, or alternatively, according to various example embodiments, the device orientation manager 540 may be configured to transform the acceleration values by determining a first rotation angle about a first axis and a second rotation angle about a second axis, rotating the acceleration values relative to an predefined frame to compute preliminary rotated acceleration values, and determining a third rotation angle about a third axis based on rotated acceleration values along the first axis and the second axis. Further, according to some example embodiments, transforming the acceleration values further comprises rotating the preliminary rotated acceleration value for the first axis based on the third rotation angle to derive a final rotated acceleration value for the first axis, and rotating the preliminary rotated acceleration value for the second axis based on the third rotation angle to derive a final rotated acceleration value for the second axis. Additionally, or alternately, according to some example embodiments, the device orientation manager 540 may be configured to determine a relationship between movement of the device and the orientation of the first axis and the second axis, and select a calculation for the third rotation angle based on the relationship.
Referring now to
The mobile device 10 may also include an antenna 12, a transmitter 14, and a receiver 16, which may be included as parts of a communications interface of the mobile device 10. The speaker 24, the microphone 26, display 28 (which may be a touch screen display), and the keypad 30 may be included as parts of a user interface.
Accordingly, execution of instructions associated with the operations of the flowchart by a processor, or storage of instructions associated with the blocks or operations of the flowcharts in a computer-readable storage medium, support combinations of operations for performing the specified functions. It will also be understood that one or more operations of the flowcharts, and combinations of blocks or operations in the flowcharts, may be implemented by special purpose hardware-based computer systems and/or processors which perform the specified functions, or combinations of special purpose hardware and program code instructions.
Many modifications and other embodiments set forth herein will come to mind to one skilled in the art to which these embodiments pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the embodiments are not to be limited to the specific ones disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Moreover, although the foregoing descriptions and the associated drawings describe example embodiments in the context of certain example combinations of elements and/or functions, it should be appreciated that different combinations of elements and/or functions may be provided by alternative embodiments without departing from the scope of the appended claims. In this regard, for example, different combinations of elements and/or functions other than those explicitly described above are also contemplated as may be set forth in some of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.