The present invention relates generally to motion sensing devices, and more specifically to recognizing motion gestures based on motion sensors of a motion sensing device.
Motion sensors, such as inertial sensors like accelerometers or gyroscopes, can be used in electronic devices. Accelerometers can be used for measuring linear acceleration and gyroscopes can be used for measuring angular velocity of a moved device. The markets for motion sensors include mobile phones, video game controllers, PDAs, mobile internet devices (MIDs), personal navigational devices (PNDs), digital still cameras, digital video cameras, and many more. For example, cell phones may use accelerometers to detect the tilt of the device in space, which allows a video picture to be displayed in an orientation corresponding to the tilt. Video game console controllers may use accelerometers to detect motion of the hand controller that is used to provide input to a game. Picture and video stabilization is an important feature in even low- or mid-end digital cameras, where lens or image sensors are shifted to compensate for hand jittering measured by a gyroscope. Global positioning system (GPS) and location base service (LBS) applications rely on determining an accurate location of the device, and motion sensors are often needed when a GPS signal is attenuated or unavailable, or to enhance the accuracy of GPS location finding.
Most existing portable (mobile) electronic devices tend to use only the very basic of motion sensors, such as an accelerometer with “peak detection” or steady state measurements. For example, current mobile phones use an accelerometer to determine tilting of the device, which can be determined using a steady state gravity measurement. Such simple determination cannot be used in more sophisticated applications using, for example, gyroscopes or other applications having precise timing requirements. Without a gyroscope included in the device, the tilting and acceleration of the device is not sensed reliably. And since motion of the device is not always linear or parallel to the ground, measurement of several different axes of motion using an accelerometer or gyroscope is needed for greater accuracy.
More sophisticated motion sensors typically are not used in electronic devices. Some attempts have been made for more sophisticated motion sensors in particular applications, such as detecting motion with certain movements. But most of these efforts have failed or are not robust enough as a product. This is because the use of motion sensors to derive motion is complicated. For example, when using a gyroscope, it is not trivial to identify the tilting or movement of a device. Using motion sensors for image stabilization, for sensing location, or for other sophisticated applications, requires in-depth understanding of motion sensors, which makes motion sensing design very difficult.
Furthermore, everyday portable consumer electronic devices for the consumer market are desired to be low-cost. Yet the most reliable and accurate inertial sensors such as gyroscopes and accelerometers are typically too expensive for many consumer products. Low-cost inertial sensors can be used bring many motion sensing features to portable electronic devices. However, the accuracy of such low-cost sensors are limiting factors for more sophisticated functionality.
For example, such functionality can include motion gesture recognition implemented on motion sensing devices to allow a user to input commands or data by moving the device or otherwise cause the device sense the user's motion. For example, gesture recognition allows a user to easily select particular device functions by simply moving, shaking, or tapping the device. Prior gesture recognition for motion sensing devices typically consists of examining raw sensor data such as data from gyroscopes or accelerometers, and either hard-coding patterns to look for in this raw data, or using machine learning techniques (such as neural networks or support vector machines) to learn patterns from this data. In some cases the required processing resources for detecting gestures using machine learning can be reduced by first using machine learning to learn the gesture, and then hard-coding and optimizing the result of the machine learning algorithm.
Several problems exist with these prior techniques. One problem is that gestures are very limited in their applications and functionality when implemented in portable devices. Another problem is that gestures are often not reliably recognized. For example, raw sensor data is often not the best data to examine for gestures because it can greatly vary from user to user for a particular gesture. In such a case, if one user trains a learning system or hard-codes a pattern detector for that user's gestures, these gestures will not be recognized correctly when a different user uses the device. One example of this is in the rotation of wrist movement. One user might draw a pattern in the air with the device without rotating his wrist at all, but another user might rotate his wrist while drawing the pattern. The resulting raw data will look very different from user to user. A typical solution is to hard-code or train all possible variations of a gesture, but this solution is expensive in processing time and difficult to implement.
Accordingly, a system and method that provides varied, robust and accurate gesture recognition with low-cost inertial sensors would be desirable in many applications.
The invention of the present application relates to mobile devices providing motion gesture recognition. In one aspect, a method for processing motion to control a portable electronic device includes receiving, on the device, sensed motion data derived from motion sensors of the device, where the sensed motion data is based on movement of the portable electronic device in space. The motion sensors provide six-axis motion sensing and include at least three rotational motion sensors and at least three accelerometers. A particular operating mode is determined to be active while the movement of the device occurs, where the particular operating mode is one of a plurality of different operating modes available in the operation of the device. One or more motion gestures are recognized from the motion data, where the one or more motion gestures are recognized from a set of motion gestures that are available for recognition in the active operating mode of the device. Each of the different operating modes of the device, when active, has a different set of motion gestures available for recognition. One or more states of the device are changed based on the one or more recognized motion gestures, including changing output of a display screen on the device.
In another aspect of the invention, a method for recognizing a gesture performed by a user using a motion sensing device includes receiving motion sensor data in device coordinates indicative of motion of the device, the motion sensor data received from a plurality of motion sensors of the motion sensing device including a plurality of rotational motion sensors and linear motion sensors. The motion sensor data is transformed from device coordinates to world coordinates, the motion sensor data in the device coordinates describing motion of the device relative to a frame of reference of the device, and the motion sensor data in the world coordinates describing motion of the device relative to a frame of reference external to the device. A gesture is detected from the motion sensor data in the world coordinates.
In another aspect of the invention, a system for detecting gestures includes a plurality of motion sensors providing motion sensor data, the motion sensors including a plurality of rotational motion sensors and linear motion sensors. At least one feature detector is each operative to detect an associated data feature derived from the motion sensor data, each data feature being a characteristic of the motion sensor data, and each feature detector outputting feature values describing the detected data feature. At least one gesture detector is each operative to detect a gesture associated with the gesture detector based on the feature values.
Aspects of the present invention provide more flexible, varied, robust and accurate recognition of motion gestures from inertial sensor data of a mobile or handheld motion sensing device. Multiple rotational motion sensors and linear motion sensors are used, and appropriate sets of gestures can be recognized in different operating modes of the device. The use of world coordinates for sensed motion data allows minor variations in motions from user to user during gesture input to be recognized as the same gesture without significant additional processing. The use of data features in motion sensor data allows gestures to be recognized with reduced processing compared to processing all the motion sensor data.
The present invention relates generally to motion sensing devices, and more specifically to recognizing motion gestures using motion sensors of a motion sensing device. The following description is presented to enable one of ordinary skill in the art to make and use the invention and is provided in the context of a patent application and its requirements. Various modifications to the preferred embodiment and the generic principles and features described herein will be readily apparent to those skilled in the art. Thus, the present invention is not intended to be limited to the embodiment shown but is to be accorded the widest scope consistent with the principles and features described herein.
To more particularly describe the features of the present invention, please refer to
Device 10 includes an application processor 12, memory 14, interface devices 16, a motion processing unit 20, analog sensors 22, and digital sensors 24. Application processor 12 can be one or more microprocessors, central processing units (CPUs), or other processors which run software programs for the device 10. For example, different software application programs such as menu navigation software, games, camera function control, navigation software, and phone or a wide variety of other software and functional interfaces can be provided. In some embodiments, multiple different applications can be provided on a single device 10, and in some of those embodiments, multiple applications can run simultaneously on the device 10. In some embodiments, the application processor implements multiple different operating modes on the device 10, each mode allowing a different set of applications to be used on the device and a different set of gestures to be detected. This is described in greater detail below with respect to
Multiple layers of software can be provided on a computer readable medium such as electronic memory or other storage medium such as hard disk, optical disk, etc., for use with the application processor 12. For example, an operating system layer can be provided for the device 10 to control and manage system resources in real time, enable functions of application software and other layers, and interface application programs with other software and functions of the device 10. A motion algorithm layer can provide motion algorithms that provide lower-level processing for raw sensor data provided from the motion sensors and other sensors. A sensor device driver layer can provides a software interface to the hardware sensors of the device 10.
Some or all of these layers can be provided in software 13 of the processor 12. For example, in some embodiments, the processor 12 can implement the gesture processing and recognition described herein based on sensor inputs from a motion processing unit (MPU™) 20 (described below). Other embodiments can allow a division of processing between the MPU 20 and the processor 12 as is appropriate for the applications and/or hardware used, where some of the layers (such as lower level software layers) are provided in the MPU. For example, in embodiments allowing processing by the MPU 20, an API layer can be implemented in layer 13 of processor 12 which allows communication of the states of application programs running on the processor 12 to the MPU 20 as well as API commands (e.g., over bus 21), allowing the MPU 20 to implement some or all of the gesture processing and recognition described herein. Some embodiments of API implementations in a motion detecting device are described in co-pending U.S. patent application Ser. No. 12/106,921, incorporated herein by reference in its entirety.
Device 10 also includes components for assisting the application processor 12, such as memory 14 (RAM, ROM, Flash, etc.) and interface devices 16. Interface devices 16 can be any of a variety of different devices providing input and/or output to a user, such as a display screen, audio speakers, buttons, touch screen, joystick, slider, knob, printer, scanner, camera, computer network I/O device, other connected peripheral, etc. For example, one interface device 16 included in many embodiments is a display screen 16a for outputting images viewable by the user. Memory 14 and interface devices 16 can be coupled to the application processor 12 by a bus 18.
Device 10 also can include a motion processing unit (MPU™) 20. The MPU is a device including motion sensors that can measure motion of the device 10 (or portion thereof) in space. For example, the MPU can measure one or more axes of rotation and one or more axes of acceleration of the device. In preferred embodiments, at least some of the motion sensors are inertial sensors, such as gyroscopes and/or accelerometers. In some embodiments, the components to perform these functions are integrated in a single package. The MPU 20 can communicate motion sensor data to an interface bus 21, e.g., I2C or Serial Peripheral Interface (SPI) bus, to which the application processor 12 is also connected. In one embodiment, processor 12 is a controller or master of the bus 21. Some embodiments can provide bus 18 as the same bus as interface bus 21.
MPU 20 includes motion sensors, including one or more rotational motion sensors 26 and one or more linear motion sensors 28. For example, in some embodiments, inertial sensors are used, where the rotational motion sensors are gyroscopes and the linear motion sensors are accelerometers. Gyroscopes 26 can measure the angular velocity of the device 10 (or portion thereof) housing the gyroscopes 26. From one to three gyroscopes can typically be provided, depending on the motion that is desired to be sensed in a particular embodiment. Accelerometers 28 can measure the linear acceleration of the device 10 (or portion thereof) housing the accelerometers 28. From one to three accelerometers can typically be provided, depending on the motion that is desired to be sensed in a particular embodiment. For example, if three gyroscopes 26 and three accelerometers 28 are used, then a 6-axis sensing device is provided providing sensing in all six degrees of freedom.
In some embodiments the gyroscopes 26 and/or the accelerometers 28 can be implemented as MicroElectroMechanical Systems (MEMS). Supporting hardware such as storage registers for the data from motion sensors 26 and 28 can also be provided.
In some embodiments, the MPU 20 can also include a hardware processing block 30. Hardware processing block 30 can include logic or controllers to provide processing of motion sensor data in hardware. For example, motion algorithms, or parts of algorithms, may be implemented by block 30 in some embodiments, and/or part of or all the gesture recognition described herein. In such embodiments, an API can be provided for the application processor 12 to communicate desired sensor processing tasks to the MPU 20, as described above. Some embodiments can include a hardware buffer in the block 30 to store sensor data received from the motion sensors 26 and 28. A motion control 36, such as a button, can be included in some embodiments to control the input of gestures to the electronic device 10, as described in greater detail below.
One example of an MPU 20 is described below with reference to
The device 10 can also include other types of sensors. Analog sensors 22 and digital sensors 24 can be used to provide additional sensor data about the environment in which the device 10 is situation. For example, sensors such one or more barometers, compasses, temperature sensors, optical sensors (such as a camera sensor, infrared sensor, etc.), ultrasonic sensors, radio frequency sensors, or other types of sensors can be provided. In the example implementation shown, digital sensors 24 can provide sensor data directly to the interface bus 21, while the analog sensors can be provide sensor data to an analog-to-digital converter (ADC) 34 which supplies the sensor data in digital form to the interface bus 21. In the example of
A FIFO (first in first out) buffer 42 can be used as a hardware buffer for storing sensor data which can be accessed by the application processor 12 over the bus 21. The use of a hardware buffer such as buffer 42 is described in several embodiments below. For example, a multiplexer 44 can be used to select either the DMA 38 writing raw sensor data to the FIFO buffer 42, or the data RAM 40 writing processed data to the FIFO buffer 42 (e.g., data processed by the ALU 36).
The MPU 20 as shown in
An aspect of the invention pre-processes the raw sensor data of the device 10 by changing coordinate systems or converting to other physical parameters, such that the resulting “augmented data” looks similar for all users regardless of the small, unintentional differences in user motion. This augmented data can then be used to train learning systems or hard-code pattern recognizers resulting in much more robust gesture recognition, and is a cost effective way of utilizing motion sensor data from low-cost inertial sensors to provide a repeatable and robust gesture recognition.
Some embodiments of the invention use inertial sensors such as gyroscopes and/or accelerometers. Gyroscopes output angular velocity in device coordinates, while accelerometers output the sum of linear acceleration in device coordinates and tilt due to gravity. The outputs of gyroscopes and accelerometers is often not consistent from user to user or even during the use of the same user, despite the users intending to perform or repeat the same gestures. For example, when a user rotates the device in a vertical direction, a Y-axis gyroscope may sense the movement; however, with a different wrist orientation of a user, the Z-axis gyroscope may sense the movement.
Training a system to respond to the gyroscope signal differently depending on the tilt of the device (where the tilt is extracted from the accelerometers and the X-axis gyroscope) would be very difficult. However, doing a coordinate transform from device coordinates to world coordinates simplifies the problem. Two users providing different device tilts are both rotating the device downward relative to the world external to the device. If the augmented data angular velocity in world coordinates is used, then the system will be more easily trained or hard-coded, because the sensor data has been processed to look the same for both users.
In the examples of
When sensing the motion of
The two styles of movement can be made to appear the same by providing augmented data by first converting the sensor data from device coordinates to world coordinates.
In some cases, recognizing gestures only relative to the world may not produce the desired augmented data. When using a device 10 that is portable, the user may not intend to perform motion gestures relative to the world. As shown in
One way to avoid these problems is to examine what the user is trying to do. The user performs gestures relative to his or her own body, which may be vertical or horizontal; this is called “human body coordinates.” Another way to describe “human body coordinates” is as “local world coordinates.”
However, it is not possible to measure local world coordinates directly without also having sensors on the user's body. An indirect way to accomplish the same task is to assume that the device is being held in a particular way by the user relative to the user's body when the gesture is attempted and so the user's body position can be assumed based on the device position to approximate local world coordinates. When the device is moved slowly, the local world coordinate system is updated and moved while the device is being moved, so that the local world coordinate system tracks the direction of the user's body. It is assumed that with slow movement, the user is simply looking at or adjusting the device without intending to input any gestures, and the local world coordinate system should thus track the user orientation. The slow movement can be determined as movement under a predetermined threshold velocity or other motion-related threshold. For example, when the angular velocity of the device 10 (as determined from gyroscope data) is under a threshold angular velocity, and the linear velocity of the device 10 (as determined from accelerometer data) is under a threshold linear velocity, the movement can be considered slow enough to update the local world coordinate system with the movement of the device. Alternatively, one of the angular velocity or linear velocity can be examined for this purpose.
However, when the device is moved quickly (over the threshold(s)), the movement is assumed to be for inputting a gesture, and the local world coordinate system is kept fixed while the device is moving. The local world coordinate system for the gesture will then be the local world coordinate system just before the gesture started; the assumption is that the user was directly looking at a screen of the device before beginning the gesture and the user remains approximately in the same position during the gesture. Thus, while the device is stationary or being moved slowly, the “world” is updated, and when the device is moved quickly, the gesture is analyzed relative to last updated “world,” or “local world.”
Thus, motion sensor data in device coordinates is received from the sensors of the device, where the data in device coordinates describes motion of the device relative to a frame of reference of the device. The data in the device coordinates is transformed to augmented motion sensor data in world coordinates, such as local world coordinates, where the data in world coordinates describes motion of the device relative to a frame of reference external to the device. In the case of local world coordinates, the frame of reference is the user's body. A gesture can be detected more accurately and robustly from the motion sensor data in the world coordinates.
System 150 includes a gyroscope calibration block 152 that receives the raw sensor data from the gyroscopes 26 and which calibrates the data for accuracy. The output of the calibration block 152 is angular velocity in device coordinates 170, and can be considered one portion of the augmented sensor data provided by system 150.
System 150 also includes an accelerometer calibration block 154 that receives the raw sensor data from the accelerometers 28 and which calibrates the data for accuracy. For example, such calibration can be the subtraction or addition of a known constant determined for the particular accelerometer or device 10. The gravity removal block 156 receives the calibrated accelerometer data and removes the effect of gravity from the sensor data, thus leaving data describing the linear acceleration of the device 10. This linear acceleration data 180 is one portion of the augmented sensor data provided by system 150. The removal of gravity uses a gravity acceleration obtained from other components, as described below.
A gravity reference block 158 also receives the calibrated accelerometer data from calibration block 154 and provides a gravity vector to the gyroscope calibration block 152 and to a 3D integration block 160. 3-D integration block 160 receives the gravity vector from gravity reference block 158 and the calibrated gyroscope data from calibration block 152. The 3-D integration block combines the gyroscope and accelerometer data to produce a model of the orientation of the device using world coordinates. This resulting model of device orientation is the quaternion/rotation matrix 174 and is one portion of the augmented sensor data provided by system 150. Matrix 174 can be used to provide world coordinates for sensor data from existing device coordinates.
A coordinate transform block 162 receives calibrated gyroscope data from calibration block 152, as well as the model data from the 3-D integration block 160, to produce an angular velocity 172 of the device in world coordinates, which is part of the augmented sensor data produced by the system 150. A coordinate transform block 164 receives calibrated linear acceleration data from the remove gravity block 156, as well as the model data from the 3-D integration block 160, to produce a linear acceleration 176 of the device in world coordinates, which is part of the augmented sensor data produced by the system 150.
Gravitational acceleration data 178 in device coordinates is produced as part of the augmented sensor data of the system 150. The acceleration data 178 is provided by the quaternion/rotation matrix 174 and is a combination of gyroscope data and accelerometer data to obtain gravitational data. The acceleration data 178 is also provided to the remove gravity block 156 to allow gravitational acceleration to be removed from the accelerometer data (to obtain the linear acceleration data 180).
One example follows of the 3-D integration block combining gyroscope and accelerometer data to produce a model of the orientation of the device using world coordinates. Other methods can be used in other embodiments.
The orientation of the device is stored in both quaternion form and rotation matrix form. To update the quaternion, first the raw accelerometer data is rotated into world coordinates using the previous rotation matrix:
a′=Ra
The vector a contains the raw accelerometer data, R is the rotation matrix representing the orientation of the device, and a′ is the resulting acceleration term in world coordinates. A feedback term is generated from the cross product of a′ with a vector representing gravity:
f=k(a×g)
Constant k is a time constant which determines the timescale in which the acceleration data is used. A quaternion update term is generated from this by multiplying with the current quaternion:
qaccelerometer=fq
A similar update term is generated from the gyroscope data using quaternion integration:
q
gyroscope=0.5qw(dt)
The vector w contains the raw gyroscope data, q is the current quaternion, and dt is the sample time of the sensor data. The quaternion is updated as follows:
q′=normalize(q+qaccelerometer+qgyroscope)
This new quaternion becomes the “current quaternion,” and can be converted to a rotation matrix. Angular velocity from both accelerometers and gyroscopes can be obtained as follows:
W
device
=q
−1(qaccelerometer+qgyroscope/(0.5dt))
Angular velocity in world coordinates can be obtained as follows:
wworld=Rwdevice
Linear acceleration in world coordinates can be obtained as follows:
a
world
=a′−g
Linear acceleration in device coordinates can be obtained as follows:
A
device
=R
−1
a
world
Relative timing of features in motion data can be used to improve gesture recognition. Different users may perform gestures faster or slower relative to each other, which can make gesture recognition difficult. Some gestures may require particular features (i.e., characteristics) of the sensor data to occur in a particular sequence and with a particular timing. For example, a gesture may be defined as three features occurring in a sequence. For one user, feature 2 might occur 100 ms after feature 1, and feature 3 might occur 200 ms after feature 2. For a different user performing the gesture more slowly, feature 2 might occur 200 ms after feature 1, and feature 3 might occur 400 ms after feature 2. If the required timing values are hard-coded, then many different ranges of values will need to be stored, and it will be difficult to cover all possible user variances and scenarios.
To provide a more flexible recognition of gestures that takes into account variance in gesture feature timing, an aspect of the present invention recognizes gestures using relative timing requirements. Thus the timing between different features in motion data can be expressed and detected based on multiples and/or fractions of a basic time period used in that gesture. The basic time period can be, for example, the time between two data features. For example, when relative timing is used, for whatever time t1 exists between features 1 and 2 of a gesture, the time between features 2 and 3 can be defined as approximately two times t1. This allows different users to perform gestures at different rates without requiring algorithms such as Dynamic Time Warping, which are expensive in CPU time.
Relative peaks or magnitudes in motion sensor data can also be used to improve gesture recognition. Similar to the variance in timing of features when gestures are performed by different users or at different times as described above, one user may perform a gesture or provide features with more energy or speed or quickness than a different user, or with variance at different times. For example, a first user may perform movement causing a first feature that is detected by a gyroscope as 100 degrees per second, and causing a second feature that is detected by the gyroscope as 200 degrees per second, while a second user may perform movement causing the first feature that is detected as 200 degrees per second and causing a second feature that is detected as 400 degrees per second. Hard-coding these values for recognition would require training a system with all possible combinations. One aspect of the present invention expresses the features as peak values (maximum or minimum) that are relative to each other within the gesture, such as multiples or fractions of a basic peak magnitude. Thus, if a first peak of a gesture is detected as a magnitude of p1, a second peak must have a magnitude roughly twice p1 to satisfy the requirements of the gesture and be recognized as such.
One method of the present invention to more accurately determine whether detected motion is intended for a gesture is to correlate an angular gesture with linear acceleration. The presence of linear acceleration indicates that a user is moving the device using the wrist or elbow, rather than just adjusting the device in the hand.
In another method, motion sensor data that may include a gesture is compared to a background noise floor acquired while no gestures are being detected. The noise floor can filter out motions caused by a user with shaky hands, or motions caused by an environment in which there is a lot of background motion, such as on a train. To prevent the gesture triggering due to noise, the signal to noise ratio of the motion sensor data must be above a noise floor value that is predetermined, or dynamically determined based on current detected conditions (e.g., a current noise level can be detected by monitoring motion sensor data over a period of time). In cases with a lot of background noise, the user can still deliver a gesture, but the user will be required to use more power when performing the gesture.
The method starts at 202, and in step 203, sensed motion data is received from the sensors 26 and 28, including multiple gyroscopes (or other rotational sensors) and accelerometers as described above. The motion data is based on movement of the device 10 in space. In step 204, the active operating mode of the device 10 is determined, i.e., the operating mode that was active when the motion data was received.
An “operating mode” of the device provides a set of functions and outputs for the user based on that mode, where multiple operating modes are available on the device 10, each operating mode offering a set of different functions for the user. In some embodiments, each operating mode allows a different set of applications to be used on the device. For example, one operating mode can be a telephone mode that provides application programs for a telephone functions, while a different operating mode can provide a picture or video viewer for use with a display screen 16a of the device 10. In some embodiments, operating modes can correspond to broad applications, such as games, image capture and processing, and location detection (e.g., as described in copending application Ser. No. 12/106,921). Alternatively, in other embodiments, operating modes can be defined more narrowly based on other functions or application programs.
The active operating mode is one operating mode that is selected for purposes of method 200 when the motion data was received, and this mode can be determined based on one or more device operating characteristics. For example, the mode can be determined based on user input, such as the prior selection of a mode selection button or control or a detected motion gesture from the user, or other movement and/or orientation of the device 10 in space. The mode may alternatively or additionally be determined based on a prior or present event that has occurred or is occurring; for example, a cellular phone operating mode can automatically be designated the active operating mode when the device 10 receives a telephone call or text message, and while the user responds to the call.
In step 205, a set of gestures is selected, this set of gestures being available for recognition in the active operating mode. In preferred embodiments, at least two different operating modes of the device 10 each has a different set of gestures that is available for recognition when that mode is active. For example, one operating mode may be receptive to character gestures and shake gestures, while a different operating mode may only be receptive to shake gestures.
In step 206, the received motion data (and any other relevant data) is analyzed and one or more motion gestures are recognized in the motion data, if any such gestures are present and correctly recognized. The gestures recognized are included in the set of gestures available for the active operating mode. In step 207, one or more states of the device 10 are changed based on the recognized motion gesture(s). The modification of states of the device can be the changing of a status or display, the selection of a function, and/or the execution or activation of a function or program. For example, one or more functions of the device can be performed, such as updating the display screen 16a, answering a telephone call, sending out data to another device, entering a new operating mode, etc., based on which gesture(s) were recognized. The process 200 is then complete at 208.
Examples of types of motion gestures suitable for use with the device 10 are described below.
A shake gesture typically involves the user intentionally shaking the motion sensing device in one angular direction to trigger one or more associated functions of the device. For example, the device might be shaken in a “yaw direction,” with a peak appearing on only one gyroscope axis. If the user shakes with some cross-axis error (e.g., motion in another axis besides the one gyroscope axis), there may be a peak along another axis as well. The two peaks occur at the same time, and the zero-crossings (corresponding to the change of direction of the motion sensing device during the shaking) also occur at the same time. As there are three axes of rotation (roll, pitch, and yaw), each can be used as a separate shaking command.
For example,
In
In
A tap gesture typically includes the user hitting or tapping of the device 10 with a finger, hand, or object sufficiently to cause a large pulse of movement of the device in space. The tap gesture can be used to control any of a variety of functions of the device.
Rejecting spikes in motion sensor data due to movement other than tapping can also be difficult. In one method to make this rejection more robust, spikes having significant amplitude are rejected if they occur at the end of device movements (e.g., the end of the portion of motion data being examined). The assumption is that the device 10 was relatively motionless before a tap gesture occurred, so the tap gesture causes a spike at the start of a movement. However, a spike may also appear at the end of a movement, due to a sudden stop of the device by the user. This end spike should be rejected.
Tap gestures can be used in a variety of ways to initiate device functions in applications or other programs running on the device. For example, in one example embodiment, tapping can be configured to cause a set of images to move on a display screen such that a previously-visible or highlighted image moves and a next image available becomes highlighted or otherwise visible. Since tapping has no direction associated with it, this tap detection may be coupled with an additional input direction detection to determine which direction the images should move. For example, if the device is tilted in a left direction, the images move left on a display screen. If the device is tilted in a backward direction, the images move backward, e.g., moving “into” the screen in a simulated 3rd dimension of depth. This feature allows the user to simultaneously control the time of movement of images (or other displayed objects) and the direction of movement of the images using tilting and tap gestures.
Some examples of other gestures suitable for use with the present invention are described below.
Characters (including letters, numbers, and other symbols) can be considered as combinations of linear and circle movements of the device 10. By combining the detection algorithms for lines and circles, characters can be detected. Since precise angular movement on the part of the user is usually not possible, the representation can be approximate.
Other gestures can also be defined as desired. Any particular gesture can be defined as requiring one or more of the above gestures, or other types of gestures, in different combinations. For example, gestures for basic yaw, pitch, and roll movements of the device 10 can be defined, as movements in each of these axes. These gestures can also be combined with other gestures to define a compound gesture.
In addition, in some embodiments a gesture may be required to be input and detected multiple times, for robustness, i.e., to make sure the intended gesture has been detected. For example, three shake gestures may be required to be detected, in succession, to detect the three as a single gesture and to implement the function(s) associated with the shake gesture. Or, three tap gestures may be required to be detected instead of just one.
Applications using a Motion Control
One feature of the present invention for increasing the ability to accurately detect motion gestures involves using gestures (device motion) in combination with input detected from an input control device of the motion sensing device 10. The input control provides an indication for the device to detect gestures during device motion intended by the user for gesture input. For example, a button, switch, knob, wheel, or other input control device, all referred to herein as a “motion control” 36 (as shown in
The motion control on the device can be used to determine whether the device is in a “motion mode” or not. When the device is in a motion mode, the processor or other controller in the device 10 can allow motion of the device to be detected to modify the state of the device, e.g., detected as a gesture. For example, when the motion control is in its inactive state, e.g., when not activated and held by the user, the user moves the device naturally without modifying the state of the device. However, while the motion control is activated by the user, the device is moved to modify one or more states of the device. The modification of states of the device can be the selection of a function and/or the execution or activation of a function or program. For example, a function can be performed on the device in response to detecting a gesture from motion data receiving while in the motion mode. The device exits the motion mode based on a detected exit event. For example, in this embodiment, the exit event occurs when the motion control is released by the user and the activation signal from the motion control is no longer detected. In some embodiments, the modification of states of the device based on the motion data only occurs after the motion mode has been exited, e.g., after the button is released in this embodiment. When not in the motion mode, the device (e.g. processor or other applicable controller in the device) ignores input sensed motion data for the purposes of motion gesture recognition. In some embodiments, the sensed motion data can still be input and used for other functions or purposes, such as computing a model of the orientation of the device as described previously; or only particular predetermined types of gestures or other motions can still be input and/or recognized, such as a tap gesture which in some embodiments may not function well when used with some embodiments of a motion control. In other embodiments, all sensed motion data is ignored for any purposes when not in motion mode, e.g., the sensors are turned off. For example, the release of the button may cause a detected spike in device motion, but this spike occurs after release of the button and so is ignored.
The operation of a motion mode of the device can be dependent on the operating mode of the device. For example, the activation of a motion control to enter motion mode may be required for the user to input motion gestures while the device is in some operating modes, while in other operating modes of the device, no motion control activation is required. For example, when in an image display operating mode which allows scrolling a set of images or other objects across a display screen 16a of the device based on movement of the device, the activation of a motion mode may be required (e.g., by the user holding down the motion control). However, when in a telephone mode in which the user can make or answer cell phone calls, no motion mode activation or motion control activation need be required for the user to input motion gestures to answer the phone call or perform other telephone functions on the device 10. In addition, different operating modes of the device 10 can use the motion control and motion mode in different ways. For example, one operating mode may allow motion mode to be exited only by the user deactivating the motion control, while a different operating mode may allow motion mode to be exited by the user inputting a particular motion gesture.
As an example, a set of icons may be displayed on the display screen of the device that are not influenced by movement of the device while the motion control is not activated. When the motion control on the device is depressed, the motion of the device as detected by the motion sensors can be used to determine which icon is highlighted, e.g. move a cursor or indicator to different icons. This motion can be detected as, for example, rotation in a particular axis, or in more than one axis (which can be considered a rotation gesture), where the device is rotated in space; or alternatively, as a linear motion or a linear gesture, where the device is moved linearly in space. When the motion control is released, the icon highlighted at release is executed to cause a change of one or more states in the device, e.g., perform an associated function, such as starting an application program associated with the highlighted icon. To aid the user in selecting an icon, additional visual feedback can be presented which is correlated with device motion, such as including a continuously moving cursor overlayed on top of the icons in addition to a discretely moving indicator or cursor that moves directly from icon to icon, or continuously moving an icon a small amount (correlated with device motion) to indicate that particular icon would be selected if the motion control were released.
In another application, a set of images may be displayed in a line on the display screen. When the motion control is depressed, the user can manipulate the set of images forward or backward by moving the device in a positive or negative direction, e.g. as a gesture, such as tilting or linearly moving the device forward (toward the user, as the user looks at the device) or backward (away from the user). When the user moves the device past a predetermined threshold magnitude (e.g., tilting the device more than a predetermined amount), the images may be moved continuously on the screen without additional input from the user. When the motion control is released, the device 10 controls the images to stop moving.
In another application, holding down the button may initiate panning or zooming within an image, map, or web page displayed on a display screen of the device. Rotating the device along different axes may cause panning the view of the display screen along corresponding axes, or zooming along those axes. The different functions may be triggered by different types of movements, or they may be triggered by using different buttons. For example, one motion control can be provided for panning, and a different motion control provided for zooming. If different types of movements are used, thresholding may be used to aid in determining which function should be triggered. For example, if a panning motion is moving the device in one axis and a zooming motion is moving the device in a different axis, both panning and zooming can be activated by moving the device along both axes at once. However, if a panning movement is executed past a certain threshold amount of movement, than the device can implement only panning, ignoring the movement on the zooming axis.
In some embodiments, the motion control need not be held by the user to activate the motion mode of the device, and/or the exit event is not the release of the motion control. For example, the motion control can be “clicked,” i.e., activated (e.g., pressed) and then released immediately, to activate the motion mode that allows device motion to modify one or more states of the device. The device remains in motion mode after the motion control is clicked. A desired predefined exit event can be used to exit the motion mode when detected, so that device motion no longer modifies device states. For example, a particular shake gesture can be detected from the motion data, from motion provided by the user (such as a shake gesture having a predetermined number of shakes) and, when detected, exits motion mode. Other types of gestures can be used in other embodiments to exit the motion mode. In still other embodiments, the exit event is not based on user motion. For example, motion mode can be exited automatically based on other criteria, such as the completion of a detected gesture (when the gesture is detected correctly by the device).
In order to resolve and process human motion on a device 10, it is necessary to acquire sensor data at high rates. For example, a sampling rate such as 100 Hz may be needed. For a one-second gesture and assuming six motion sensors are provided on the device, such a sampling rate requires processing 600 data points for 6 degrees of freedom of motion. However, it is rarely necessary to process all 600 data points, since the human motion can be reduced by extracting important features from the sensor data, such as the magnitude of peaks in the motion waveform, or the particular times of zero crossings. Such data features typically occur at about 2 Hz when the user is performing gestures. Thus, for example, if four features are examined for each of the 6 degrees of freedom, the total number of data points during one second of motion will be 48 points. The amount of data to be processed has thus been reduced by more than a factor of 10 by concentrating only on particular features of movement data, rather than processing all data points describing all of the motion.
Some example methods of reducing the required sampling rate of data for a device processor by using hardware to find features in motion sensor data is described in copending patent application Ser. No. 12/106,921, previously incorporated herein by reference.
Data features, as referred to herein, are characteristics of the waveform 350 of motion sensor data that can be detected from the waveform 350 and which can be used to recognize that a particular gesture has been performed. Data features can include, for example, a maximum (or minimum) height (or magnitude) 354 of the waveform 350, and a peak time value 356 which is the time at which the maximum height 354 occurred. Additional data features can include the times 358 at which the waveform 350 made a zero crossing (i.e., a change in direction of motion in an axis of movement, such as transitioning from positive values to negative values or vice-versa). Another data feature can include the integral 360 providing a particular area of the waveform 350, such as an integral of the interval between two zero crossings 358 as shown in
System 370 includes a raw data and pre-processing block 372, which receives the raw data from the sensors and also provides or receives augmented data as described above with reference to
The raw and augmented data is provided from block 372 to a number of low-level data feature detectors 374, where each detector 374 detects a different feature in the sensor data. For example, Feature 1 block 374a can detect the peaks in motion waveforms, Feature 2 block 374b can detect zero crossings in motion waveforms, and Feature 3 block 374c can detect and determine the integral of the area under the waveform. Additional feature detectors can be used in different embodiments. Each feature detector 374 provides timer values 376, which indicate the time values appropriate to the data feature detected, and provides magnitude values 378, which indicates magnitudes appropriate to the data feature detected (peak magnitudes, value of integral, etc.).
The timer and magnitude values 378 and 376 are provided to higher-level gesture detectors 380. Gesture detectors 380 each use the timing and magnitude values 378 and 376 from all the feature detectors 374 to detect whether the particular gesture associated with that detector has occurred. For example, gesture detector 380a detects a particular Gesture 1, which may be a tap gesture, by examining the appropriate time and magnitude data from the feature detectors 374, such as the peak feature detector 374a. Similarly, gesture detector 380b detects a particular Gesture 2, and gesture detector 380c detects a particular Gesture 3. As many gesture detectors 380 can be provided as different types of gestures that are desired to be recognized on the device 10.
Each gesture detector 380 provides timer values 382, which indicate the time values at which the gesture was detected, and provides magnitude values 384, which indicate magnitude values describing the data features of the gesture that was detected (peak, integral, etc.).
The raw and augmented data 372 also is provided to a monitor 390 that monitors states and abort conditions of the device 10. Monitor 390 includes an orientation block 392 that determines an orientation of the device 10 using the raw and augmented data from processing block 372. The device orientation can be indicated as horizontal, vertical, or other states as desired. This orientation is provided to the gesture detectors 380 for use in detecting appropriate gestures (such as gestures requiring a specific device orientation or a transition from one orientation to another orientation). Monitor 390 also includes a movement block 394 which determines the amount of movement that the device 10 has moved in space, e.g. angular and linear movement, using the raw and augmented sensor data from block 372. The amount of movement is provided to gesture detectors 380 for use in detecting gestures (such as gestures requiring a minimum amount of movement of the device 10).
Abort conditions 396 are also included in monitor 390 for use in determining whether movement of the device 10 aborts a potentially recognized gesture. The abort conditions include conditions that, when fulfilled, indicate that particular device movement is not a gesture. For example, the background noise described above can be determined, such that movement within the noise amplitude is caused to be ignored by using the abort conditions 396. In another example, certain spikes of motion, such as a spike following a curve as described above with reference to
Final gesture output block 398 receives all the timer and magnitude values from the gesture detectors 380 and also receives the abort indicators from abort conditions block 396. The final block 398 outputs data for non-aborted gestures that were recognized by the gesture detectors 380. The output data can be to components of the device 10 (software and/or hardware) that process the gestures and perform functions in response to the recognized gestures.
The features from block 404 can be output to a motion logic processing block 406, which is included in a programmable block 408 of the device 10. The programmable block 408, for example, can be implemented as software and/or firmware implemented by a processor or controller. The motion logic can include numerical output and in some embodiments gesture output.
In alternative embodiments, the entire gesture system 370 may run on an external processor that receives raw data from the motion sensors and hard-wired block 402. In some embodiments, the entire gesture system 370 may run in the hard-wired hardware on/with the motion sensors.
Many of the above-described techniques and systems can be implemented with additional or alternate types of sensor than the gyroscopes and/or accelerometers described above. For example, a six-axis motion sensing device including the gesture recognition techniques described above can include three accelerometers and three compasses. Other types of usable sensors can include optical sensors (visible, infrared, ultraviolet, etc.), magnetic sensors, etc.
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.
This application claims the benefit of U.S. Provisional Application No. 61/022,143, filed Jan. 18, 2008, entitled, “Motion Sensing Application Interface,” and This application is a continuation-in-part of U.S. patent application Ser. No. 12/106,921 (4360P), filed Apr. 21, 2008, entitled, “Interfacing Application Programs and Motion Sensors of a Device,” all of which are incorporated herein by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | 12106921 | Apr 2008 | US |
Child | 12252322 | US |