Interacting with a smartwatch typically requires the use of both hands/arms. For example, while the watch is worn on one arm, the hand on the other arm interacts with the screen or buttons on the watch. This bi-manual requirement often makes interaction awkward and difficult, thereby reducing the convenience of the smartwatch. For example, if the opposing hand is holding an object or is inaccessible, such as while holding coffee, holding an umbrella, pushing a stroller, etc, interaction is constrained to be passive glancing or reading. Similarly, due to the small size and fidelity of the screen on a smartwatch, touch interactions are often cumbersome, involving multiple steps, or are error prone.
Smartwatches typically implement a limited set of one-handed interactions powered by an inertial measurement unit (IMU), such as an accelerometer, gyroscope, etc. The most common is a “lift to wake” gesture that activates the screen when the watch is lifted in a manner indicative of a user wanting to glance at the screen. Conversely, when the user drops their arm, the screen goes to sleep. Some smartwatches also implement navigation gestures, such as scrolling, that are performed by a user twisting their wrist. However, an IMU provides for sensing gestures that involve large physical movements of the watch/arm. This is undesirable in many use cases. For example, the gestures are not discreet and may be socially awkward, and they disrupt the user's focus on the watch.
Other types of sensors may not be practical for integration into smartwatches to detect gestures by the user. For example, cameras, electrocardiogram sensors, ultrasonic sensors, strain gauges, etc. are large and expensive to integrate. Moreover, such sensors generally require per-user and/or per-session calibration.
There are many common use cases where a low-fidelity, discreet, single-handed gesture is desirable to interact with a smartwatch. For example, if a user receives a notification and wants to quickly dismiss it, they could perform a quick brushing gesture using fingers of the arm on which the watch is worn. Similarly, if they wanted to send a short reply to a message notification, they could clench their fist, dictate their response, and then release their fist to send it. The gestures may also be continuous, allowing a user to adjust a parameter. For example, the user may bend their fingers to adjust the volume of music playing on their phone, where the amount of finger movement determines the change in volume level. The present disclosure provides for integration of optical sensors into a smartwatch to achieve discreet, single hand control gestures. The optical sensors are small enough to integrate into the watch chassis, efficient to operate, tolerant to the position and fit of the watch, and usable without per-user or per-session calibration for a smooth user experience.
One aspect of the disclosure provides a method for detecting hand motions by a wearable device, including emitting, by a light source, light towards a user's skin, receiving, by an optical sensor, reflected light, reading, by one or more processors, raw sensor data from the optical sensor, filtering, by the one or more processors, the raw sensor data to reduce noise, identifying, by the one or more processors, features in the filtered signal that correspond to movements of the wrist, hand, or fingers. The method further includes matching, by the one or more processors, the identified features to a specific gestural action, and performing, by the one or more processors, an interface operation corresponding to the specific gestural action. The wearable device may be a smartwatch worn on the user's arm, wherein the light is reflected off target objects, such as hemoglobin, and non-target objects in the user's arm. The filtering may include high-pass filtering and low-pass filtering, and may in some instances further include a median filter adapted to reduce spikes in the raw sensor data. According to some examples, the method may further include receiving sensor data from an inertial measurement unit (IMU), wherein the filtering comprises using the sensor data from the IMU to filter out noise from the raw sensor data from the optical sensor. Identifying features may include changing points in the filtered sensor data. The matching may include matching the identified features to recorded features for particular gestures. The interface operation may control a function of the wearable electronic device.
Another aspect of the disclosure provides a wearable electronic device adapted to receive input from hand gesture of an arm on which the device is worn. The device includes a light source configured to light towards the user's arm, an optical sensor adapted to receive reflected light from the user's arm, a memory storing information regarding gestural actions and interface operations, and one or more processors in communication with the optical sensor and the memory. The one or more processors may be configured to receive raw sensor data from the optical sensor, filter the raw sensor data to reduce noise, identify features in the filtered signal that correspond to movements of the wrist, hand, or fingers of the arm wearing the device, match the identified features to a specific gestural action, and perform an interface operation corresponding to the specific gestural action.
Another aspect of the disclosure provides a system, including a photoplethysmogram (PPG) sensor adapted to be positioned in close proximity to a user's arm, an inertial measurement unit (IMU) adapted to be positioned in close proximity to the user's arm, wherein each of the PPG sensor and the IMU are configured to receive data signals measuring movements of the user's arm, and one or more processing units, configured to process the received data signals, such processing including extracting features indicating particular hand gestures, and correlate the detected features with input operations for the one or more processors.
The present disclosure provides for an optical sensor embedded into a watch chassis of a smartwatch, with an algorithm to sense gestures from movements of the user's fingers, wrist, and/or arm on the arm that is wearing the watch. The optical sensor is designed to be small and power-efficient, and the algorithm is designed to be robust to noise and tolerant to user variance. The optical sensor may include, for example, an LED and photodiode. For example, a photoplethysmogram (PPG) signal is used to detect specific physical movements/articulations of the fingers/arm/wrist, which are analyzed as gestures that invoke an action on the watch.
Raw sensor data from the photodiode is read and processed to reduce noise. Features in the processed data are identified, wherein such features correspond to movements of the wrist, hand, or fingers. A gesture detection algorithm matches these features, and potentially other signals, to a specific gestural action. An interface action is performed on the device that corresponds to the gestural action.
The raw sensor data may be read through an analogue-to-digital converter (ADC) to quantize the signal so that it can be digitally processed. This signal is then filtered for noise as a preprocessing stage. In some examples, some of this filtering may be performed on the analogue signal before the ADC.
The signal preprocessing may include a low-pass filter to remove high-frequency electrical noise, and a high-pass filter to remove the DC offset and drift. Sources of noise, offset, and drift may include leakage of light from external sources, such as if the watch is not worn tightly, reflectivity of the user's skin, which may vary by skin tone, and temperature of the sensor. The user's heart rate may also be a source of noise that is filtered out or suppressed, for example, by setting a cut-off frequency of the filters appropriately, or by using the calculated heart rate to set an adaptive band-stop filter. In one example, signal preprocessing includes applying a median filter to remove spikes from the signal, using a high-pass filter to filter out the baseline DC offset and drift from the signal, and using a low-pass filter to remove high-frequency noise and to further smooth the signal. The filters are configured such that hand gestures are not filtered out. Hand gestures are low-frequency movements, and an aggressive cut-off frequency may lead to a loss of features caused by hand gestures.
Features are extracted from the resultant preprocessed signal for changes or patterns indicative of wrist, hand, or finger movements. These features are extracted against a background of movements in other parts of the arm that are not of interest, such as bending of the elbow. Any of one or more various approaches may be used to identify these features, such as peak detection, signal variance, Haar-like feature encoding, frequency analysis (spectrogram), etc. Features may also be extracted from a first-order derivative of the signal, indicating a velocity of the motion, and a second-order derivative of the signal, indicating an acceleration of the motion. For each of the processed signal, the velocity of the signal, and the acceleration of the signal, features can be extracted from the time-domain and the frequency-domain. In general, amplitude changes, such as local maxima and minima, may be used to extract the time-domain features, and a short-time Fourier transform (STFT) may be used to extract the frequency-domain features.
In some cases, thresholds may be applied to these features to screen them for signals that are indicative of actual gestures. For example, local minima and maxima may only be considered if the difference with neighboring minima/maxima is greater than some threshold, thus filtering out points created by unintended movement and noise, and maintaining those representing significant shape changes caused by intended gestures. Further, features may be dependent on other feature thresholds. For example, local minima/maxima may only be considered if there is a spike in the variance of the signal. A spike in variance is indicative of an intentional user gesture, and therefore may be selected to create a window around which the algorithm will attempt to extract features and detect a gesture.
These features inform a gesture detection algorithm that identifies a complete, intentional gesture from a user. The detection algorithm may involve heuristic components, such as thresholds on the feature values or pattern matching metrics, and machine learning components that have been trained on feature samples from both intentional gestures and accidental noise. For example, the features may be reduced using a Linear Discriminant Analysis (LDA) or Quadratic Discriminant Analysis (QDA) to find the boundaries between different gesture classes.
When attempting to detect multiple gesture classes, or those that are not easily discriminable with heuristic methods, machine learning models may be trained on the above features to provide a gesture recognizer. Examples of such machine learning models may include K-Nearest Neighbor, Random Forests, Recurrent Neural Networks, etc. The detector may also use signals from other sensors to filter possible noise. For example, certain gestures may only be available when the user has already lifted their arm, which may be detected by an IMU, and the watch's screen is on.
The system described herein provides for improved user experience, as a user can discreetly manipulate a smartwatch using only the hand on which the smartwatch is being worn. For example, using finger or hand gestures, such as clenching a fist, moving a finger up and down, tapping fingers and thumb together, etc., the user may perform a variety of operations on the smartwatch.
According to other examples, the watch 100 may be worn tighter on one person's wrist than on another person's wrist. Wearing the watch looser may result in ambient light leakage. To account for such leakage, feature extraction may be modified to use features that are robust to light leakage. For example, the watch may detect a level of looseness or tightness on the user's wrist, and adapt the feature extraction techniques accordingly. In other examples, the watch may be shaped to have a convex bottom, so that space between user skin and the watch bottom is minimized regardless of the tightness of the watch band, thereby reducing ambient light leakage.
The smartwatch 100 may include one or more processors 616, one or more memory units 612, as well as other components. For example, the device 100 may include one or more sensors 618, wireless pairing interface 619, and a battery 617.
The memory 612 may store information accessible by the one or more processors 616, including data 614 instructions 615 that may be executed or otherwise used by the one or more processors 616. For example, memory 612 may be of any type capable of storing information accessible by the processor(s), including a computing device-readable medium, or other medium that stores data that may be read with the aid of an electronic device, such as a volatile memory, non-volatile as well as other write-capable and read-only memories. By way of example only, memory 612 may be a static random-access memory (SRAM) configured to provide fast lookups. Systems and methods may include different combinations of the foregoing, whereby different portions of the instructions and data are stored on different types of media.
The data 614 may be retrieved, stored or modified by the one or more processors 616 in accordance with the instructions 615. For instance, data 614 may include a correlation of detected features with particular gestures, a correlation of gestures with actions to be taken by the smartwatch 100, and/or any of a variety of other types of data. Although the claimed subject matter is not limited by any particular data structure, the data may be stored in computing device registers, in a relational database as a table having a plurality of different fields and records, XML documents or flat files. The data may also be formatted in any computing device-readable format.
The instructions 615 may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the one or more processors 616. For example, the instructions may be stored as computing device code on the computing device-readable medium. In that regard, the terms “instructions” and “programs” may be used interchangeably herein. The instructions may be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. The instructions 615 may be executed to detect a gesture using signals from the sensors 618, determine an action corresponding to the detected gesture, and perform the action. Functions, methods and routines of the instructions are explained in more detail below.
The one or more processors 616 may be microprocessors, logic circuitry (e.g., logic gates, flip-flops, etc.) hard-wired into the device 100 itself, or may be a dedicated application specific integrated circuit (ASIC). It should be understood that the one or more processors 616 are not limited to hard-wired logic circuitry, but may also include any commercially available processing unit, or any hardware-based processors, such as a field programmable gate array (FPGA). In some examples, the one or more processors 616 may include a state machine. The processors 616 may be configured to execute the instruction 615 to, for example, perform a method such as described below in connection with
The one or more sensors 618 may include any of a variety of mechanical or electromechanical sensors for detecting gestures. Such sensors may include, for example, an IMU, an optical sensor, such as a photoplethysmogram (PPG), etc. According to some examples, the sensors 618 may further include an accelerometer, gyroscope, barometer, audio sensor, vibration sensor, heat sensor, radio frequency (RF) sensor, etc.
The short range wireless pairing interface 619 may be used to form connections with other devices, such as a smartphone, earbuds, etc. The connection may be, for example, a Bluetooth connection or any other type of wireless pairing. By way of example only, connections with other devices may include an ACL link.
Although
The light source 712 may be, for example, a light emitting diode (LED), activated using an LED driver 714 and an LED controller 716. The LED controller 716 may further be controlled by sensor controller 718. The light emitted from the light source 712 should be strong enough to penetrate into the skin 705. A wavelength of the light may be chosen to maximize the amount of light reflected from a target object and minimize the amount of light reflected by non-target objects. For measuring a heart rate, for example, the target object is hemoglobin in the user's arteries, and the non-target objects are ambience and surrounding skin, muscle, and bone. According to some examples, a narrow spectrum of green (˜550 nm), red (˜660 nm), or infrared (˜940 nm) light may be chosen for this purpose.
In the case of detecting the user's heart rate, each cardiac cycle of the heart produces a pulse of pressure to move blood from the heart through the arteries. When observing a single point in an artery, this pulse produces a spike in blood volume as it is pushed through. This change in blood volume produces a commensurate change in the amount of light it reflects from the LED 712 to the photodiode 722. The frequency of these changes, which make up the AC component of the signal, can be analyzed as the user's heart rate. However, as arteries are continuous vessels without valves or switches, any constriction or relaxation to an artery produces a change in pressure characteristics throughout its entire path. For example, if an artery is pinched, the pressure on the side coming from the heart will increase as blood volume builds. Conversely, the pressure will fall on the other side. Changes in the arteries that flow through the arm, into the hand, and to the fingers are analyzed to determine gestures made by a user's hand. The movement of the hand, fingers, and wrist twists and bends these arteries, producing substantial changes to the pressure and blood volume in the arm. These changes from movements in the hand appear as fluctuations in the DC component of the signal from the photodiode 722, and are aggressively filtered out by applications that seek to analyze the heart rate.
The raw sensor data may be read through the ADC 720 to quantize the signal so that it can be digitally processed. This signal is then filtered for noise as a preprocessing stage. In some examples, some of the filtering may be performed on the analogue signal before the ADC 720.
The IMU sensor, mentioned above with respect to
PPG signals may be noisy due to various factors, such as electrical noise, light leakage, skin color, or the user's heartbeats. In some examples, the PPG signal may be pre-processed to reduce such noise. For example, a high-pass filter may be used to filter out baseline DC offset and drift from the signal. Alternatively or additionally, a low-pass filter may be used to remove high-frequency noise and to further smooth the signal.
A high-pass filter passes signals of frequency higher than a cutoff frequency. By way of example, a resistor-capacitor (RC) high-pass filter computes output samples yiby adding the difference between the last two input samples x1 and xi-1 to the last output sample yi-1and weighting them by some damping parameter α. Given the sampling frequency fq of the signal, and a desired cutoff frequency c, an input signal x is filtered as follows:
y
i=α(yi-1+xi−xi-1),
where α is the ratio of two components: α=1/(2π·c)]/[1/(2π·c)+1/fq]. Other implementations may use, for example, finite impulse response (FIR) or infinite impulse response (IIR) discrete-time filters to achieve a similar response.
A low pass filter only passes signals with a frequency that is lower than a cutoff. The low-pass filter may be implemented, for example, by doing a convolution between the input signal x and a coefficient window w which moves through the input signal:
where w contains the 1 weights of the filter coefficients selected for the desired cutoff frequency.
According to some examples, signal pre-processing may include three steps. For example, a median filter may first be applied to remove spikes from the signal. The high-pass filter may be applied next to filter out the baseline DC offset and drift, followed by the low-pass filter to remove high-frequency noise and to further smooth the signal.
The filters should be configured such that hand gestures are not filtered out. Specifically, hand gestures are low-frequency movements, and an aggressive cut-off frequency may lead to a loss of features caused by hand gestures.
Features are extracted from the filtered signal for changes or patterns indicative of wrist, hand, or finger movements. The features are a reduction of the waveform shape that are informative of aspects of wrist, hand, or finger movements. For example, features may include the temporal pattern of local minima/maxima in a signal (wavelet analysis), the magnitude of a signal between local minima/maxima, the peak components of the signal's spectrogram (Fourier analysis). In some examples, rather than identifying specific movements, such as bending the index finger vs. bending the wrist, classes of movements may be identified, such as bending up, bending down, twisting, rubbing, etc. The features are extracted against a background of movements in other parts of the arm that are not of interest, such as bending the elbow. In some examples, multiple approaches may be used to identify these features. For example, any combination of peak detection, signal variance, Haar-like feature encoding, and frequency analysis/spectrogram may be used.
Features may also be extracted from a first-order derivative of the signal, indicating a velocity of the motion, and/or a second-order derivative of the signal, indicating an acceleration of the motion.
In some examples, thresholds may be applied to these features to screen them for signals that are indicative of actual gestures. For example, local minima and maxima may be considered if the difference with neighboring minima/maxima is greater than some threshold. As such, points created by unintended movement and noise may be filtered out, while maintaining points representing significant shape changes caused by intended gestures.
Features may also be dependent on other feature thresholds. For example, local minima/maxima may be considered if there is a spike in the variance of the signal. A spike in variance is indicative of an intentional user gesture, and therefore may be selected to create a window around which the algorithm will attempt to extract features and detect a gesture. By way of example only, a window including signal data 500 ms before and 1500 ms after the spike may be analyzed to determine the gesture.
The extracted features inform a gesture detection component that identifies a complete, intentional gesture from a user. The gesture detection component detects the moment when a hand gesture occurs. In some examples, a user may be provided with early feedback once the user starts a gesture. For example, a motion detection model may respond immediately after it detects any abrupt changes in the received PPG/IMU signal, which may occur before a gesture completes. The motion detection model may further filter out noise caused by unintended hand motions, and only signal sequences caused by significant hand motions can pass. Therefore, the motion detection model reduces the burden of the gesture recognition component.
The detection component may involve heuristic components, such as thresholds on the feature values, and machine learning components that have been trained on feature samples from both intentional gestures and accidental noise. For example, the features described above may be reduced using a Linear Discriminant Analysis (LDA) or Quadratic Discriminant Analysis (QDA) to find the boundaries between different gesture classes. When attempting to detect multiple gesture classes, or those that are not easily discriminable with heuristic methods, machine learning models, such as K-Nearest Neighbor, Random Forests, and Recurrent Neural Networks, may be trained on the above features to provide a gesture recognizer.
According to some examples, gesture detection may include obtaining a velocity of a PPG signal, calculating a moving variance of the velocity, calculating the proportional change of the moving variance, and detection of the gesture.
A velocity sequence may be obtained by taking a first order derivative of the PPG signal. The velocity sequence may represent a change in status of the user's hand, such as a change from a motionless state to a moving state, or a change from slow movement to quick movement.
To calculate the moving variance, a sliding window may be used to traverse through the filtered signal, and the variance of the velocity may be calculated in each timing window. The window may be set to, for example, 200 ms with a stride of 50 ms. It should be understood that various window settings may be used.
The proportional change of the moving variance may be calculated based on the change in variance in each time window compared to historical status.
When a gesture occurs, it comes with a rapid change in the signal velocity, and the proportional changes will be higher than when there is no gesture. Accordingly, a threshold may be set for the proportional changes, such that a hand gesture is detected above the threshold.
Once a potential gesture is detected at time t, signal sequence that contains time t may be used to further recognize this gesture. Features are extracted from the PPG signal and the IMU signal, and used to recognize a gesture.
PPG signal sequences that have passed the gesture detection model may be used as inputs to feature generation algorithms. Once the gesture detection model has detected a hand motion at time t, a PPG signal sequence that contains this time t is extracted and used as raw inputs to the feature generation model. For example, a signal sequence x and velocity sequence v may be used to generate PPG features.
Various metrics may be calculated from the PPG signal sequence x and velocity sequence v. Examples of such metrics include root mean square of sequence x, average of sequence x, kurtosis of sequence x, skewness of sequence x, etc. These metrics are generally time-independent.
Time dependent features may be obtained using a peak detection feature extraction algorithm, on the PPG signal sequence x, for example. According to this algorithm, local maxima and minima are identified, noted as changing points representing abrupt changes in the sequence x. Each changing point is traversed to get the vertical distance with its neighboring changing point. The distances are sorted, and pairs of changing points forming the top distances are kept, while the others may be discarded. This process helps filter out small jitters in the signal sequence, so that only the significant shape changes will be reserved as features.
In block 710, light is emitted towards the user's skin. For example, the light may be from a light source on the wearable device. According to some examples, the light source may be an LED in an optical sensor, such as a PPG sensor, in the wearable device. The light may penetrate the user's skin, for example, to reach the user's blood vessels, arteries, muscle tissue, etc.
In block 720, reflected light is received at the optical sensor. For example, the light may be reflected off the user's skin, blood vessels, hemoglobin, etc. The reflected light may have different intensity, frequency, or other characteristics based on what it reflected off. For example, it may have reflected off a target object, such as hemoglobin, or a non-target object, such as anything else.
In block 730, the raw sensor data is read from the optical sensor. For example, one or more processors may read the raw sensor data. The raw sensor data may take the form of a signal have a particular shape.
In block 740, the raw sensor data is filtered to reduce or eliminate noise. For example, a median filter may reduce spikes. A high-pass filter may filter out baseline DC offset and drift from the signal. A low-pass filter may remove high-frequency noise and further smooth the signal. One, all, or any combination of such filters may be used.
In block 750, features are identified in the filter signal, wherein the features correspond to movements of the user's wrist, hand, or fingers. The features may be, for example, the temporal pattern of local minima/maxima in a signal, the magnitude of a signal between local minima/maxima, or the peak components of the signal's spectrogram.
In block 760, it is determined whether the identified features match a gestural action. For example, the wearable device may store a number of features in correlation with particular gestures, such as squeeze, tap, hold, etc. Accordingly, the gesture made by the user may be determined based on the identified features of the filtered signal. If no match is identified, the method 700 may return to block 710.
If the features match a gestural action, in block 770 an interface operation corresponding to the matched gestural action is performed. The interface operation may be, for example, any operation to control a function of the wearable device. For example, the interface operation may adjust a volume, scroll through a menu or lines of text, change information displayed, change audio emitted, turn on a microphone, pair or unpair with other wireless devices, etc.
In some examples, identifying the gestural action may simply include determining an operation to be performed. For example, the identified features may be matched with an operation, without first identifying the motion that caused such features.
The foregoing systems and methods are beneficial in that they enable one-handed interactions with smartwatches and other wearable electronic devices. Sensors on the smartwatch are configured to detect one-handed gestures in real time, without requirement of per-user or per-session calibration. The gestures may be simple, one-shot, discrete, and serve various use cases. In this regard, user experience is improved because users will not have to stop an activity or free an opposite hand to enter input to the smartwatch.
Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the embodiments should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible embodiments. Further, the same reference numbers in different drawings can identify the same or similar elements.
The present application claims the benefit of the filing date of U.S. Provisional Patent Application No. 62/757,973 filed Nov. 9, 2018, the disclosure of which is hereby incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62757973 | Nov 2018 | US |