This disclosure relates generally to gesture recognition for navigating user interfaces and performing other actions on wearable devices.
Wearable computers, such as wrist-worn smartwatch, have grown in popularity and are being used for a variety of purposes, such as health monitoring and fitness applications. A user typically interacts with their smartwatch through a touch display and/or crown using hand/finger gestures, such as tap, swipe or pinch. These gestures however, require the user to have a free hand available to perform the gesture. There are many scenarios, however, where a free hand is not available, such as when the user is holding a baby or groceries or if the user is physically disabled.
Embodiments are disclosed for machine learning (ML) based gesture recommendation with a framework for adding user-customized gestures.
In an embodiment, a method comprises: receiving sensor data indicative of a gesture made by a user, the sensor data obtained from at least one sensor of a wearable device worn on a limb of the user; generating a current encoding of features extracted from the sensor data using a machine learning model with the features as input; generating similarity metrics between the current encoding and each encoding in a set of previously generated encodings for gestures; generating similarity scores based on the similarity metrics; predicting the gesture made by the user based on the similarity scores; and performing an action on the wearable device or other device based on the predicted gesture.
In an embodiment, the limb is a wrist of the user and the sensor data is obtained from a combination of a biosignal and at least one motion signal.
In an embodiment, the biosignal is a photoplethysmography (PPG) signal and the at least one motion signal is acceleration.
In an embodiment, the similarity metrics are distance metrics.
In an embodiment, the machine learning model is a neural network.
In an embodiment, the similarity scores are predicted by a neural network.
In an embodiment, the neural network used to predict the similarity scores is a deep neural network that includes a sigmoid activation function.
In an embodiment, the action corresponds to navigating a user interface on the wearable device or other device.
In an embodiment, the machine learning model is a neural network trained using sample data for pairs of gestures obtained from a known set of gestures, where each gesture in the pair is annotated with a label indicating that the gesture is from a same class or a different class, and a feature vector for each gesture in the pair is separately extracted and then encoded using the first machine learning model.
In an embodiment, the machine learning model uses a different loss function for each gesture in each pair during training.
Other embodiments can include an apparatus, computing device and non-transitory, computer-readable storage medium.
Particular embodiments described herein provide one or more of the following advantages. A user is free to create a custom input gesture for interacting with a user interface (UI) of a wearable device (e.g., a smartwatch) using one or more data samples collected by sensors of the wearable device without updating the software on the device. To add a customized gesture, the user performs the customized gesture and sensor data resulting from the gesture are captured by motion sensors (e.g., accelerometers, angular rate sensors) and a biosignal sensor of the wearable device, such as a photoplethysmography (PPG). The customized gesture can be added by the user, thus avoiding large-scale data collection to add customized gestures prior to shipping the wearable device.
The captured sensor data is input into an encoder network configured to generate a first encoding of the features (e.g., a feature vector) for the customized gesture. A set (e.g., one or more) of previously generated encodings for at least one other gesture is obtained from memory of the wearable device. A similarity metric (e.g., a distance metric) is computed for all pairwise combinations of the first encoding and the set of encodings, and pairwise similarity score(s) are computed using a ML model trained to predict similarity score(s) between pairs of input gestures. The pair of gestures that is most similar among all the pairs of gestures based on its similarity score is selected as the gesture intended by the user. At least one action (e.g., a predefined action) associated with the selected gesture is initiated or performed on the wearable device or another device in accordance with the selected gesture.
The details of one or more implementations of the subject matter are set forth in the accompanying drawings and the description below. Other features, aspects and advantages of the subject matter will become apparent from the description, the drawings and the claims.
The examples shown in
In some embodiments, the biosignal sensor(s) is a PPG sensor configured to detect blood volume changes in microvascular bed of tissue of a user (e.g., where the user is wearing the device on his/her body, such as his/her wrist). The PPG sensor may include one or more light-emitting diodes (LEDs) which emit light and a photodiode/photodetector (PD) which detects reflected light (e.g., light reflected from the wrist tissue). The biosignal sensor(s) are not limited to a PPG sensor, and may additionally or alternatively correspond to one or more of: an electroencephalogram (EEG) sensor, an electrocardiogram (ECG) sensor, an electromyogram (EMG) sensor, a mechanomyogram (MMG) sensor (e.g., piezo resistive sensor) for measuring muscle activity/contractions, an electrooculography (EOG) sensor, a galvanic skin response (GSR) sensor, a magnetoencephalogram (MEG) sensor and/or other suitable sensor(s) configured to measure bio signals.
In some embodiments, wearable device 101 includes non-bio signal sensor(s) that include one or more motion sensors for detecting device motion. For example, the motion include but are not limited to accelerometers and angular rate sensors (e.g., gyroscopes) for detecting device acceleration and angular rates, respectively. As discussed further below with respect to
In the disclosure that follows, ML-based training and inference frameworks are disclosed that allow a user to add one or more customized gestures to a set of previously learned gestures, such as, for example, adding a shaking gesture where the user rotates their hand clockwise and counterclockwise, as shown in
The sensor data is input into encoder network 204, which includes a separate processing path for each sensor modality. In an embodiment, a PPG data path includes PPG data feature extractor 205, self-attention network 208 and self-attention network 211. An accelerometer data path includes acceleration data feature extractor 206, self-attention network 209 and self-attention network 212. A gyro data path includes gyro data feature extractor 207, self-attention network 210 and self-attention network 213. The outputs of each of these data processing paths are combined in feature selector 214, which selects particular features from particular sensor modality processing paths as input into convolution layers 215.
Feature extractors 205, 206, 207 can be implemented using a suitable feature extraction technique, including but not limited to using a convolutional neural network (CNN) for feature extraction. Self-attention networks 208, 213 include a series of convolutional layers and normalization layers (e.g. batch normalization) that are trained to learn which sensor data is most important based on context (e.g., the self-attention networks 208, 213 are to enhance or diminish the input feature prior to prediction. In an embodiment, the self-attention networks 208, 213 are repeated twice to increase the depth of encoder network 204 and the ability of the encoder network 204 to extract more relevant features for gesture prediction head 216. The output of encoder network 204 are feature encodings (e.g., a feature vector) are input into gesture prediction head 216, which includes a fully connected layer for predicting gestures
In some embodiments, gesture prediction head 216 includes a fully connected layer (e.g., a CNN layer or dense layer) to predict a gesture performed by the user. The gesture may correspond to a single-handed gesture performed by the same hand that is coupled to wearable device 101, as shown in
In some embodiments, ML model 200 is trained on different device(s) (e.g., one or more smartwatches other than the electronic device based on sensor output data prior to being deployed on wearable device 101. The sensor output data for training may correspond to output from one or more biosignal sensor(s) and/or from one or more non-biosignal sensors (e.g., motion sensors). In some embodiments, ML model 200 is trained across multiple users, for example, who provided different types of gestures while wearing a device (e.g., another smartwatch with biosignal and/or non-biosignal sensor(s)) and confirmed the gestures (e.g., via a training user interface) as part of a training process. In some embodiments, video of the user making the gestures is used to manually or automatically annotate training data for prediction the gestures. In this manner, ML model 200 is trained to predict gestures across a general population of users, rather than one specific user. In some embodiments, the training data is augmented or domain adapted to improve diversification of the training data so that prediction can be made under a variety of environment conditions.
As previously described, it is desirable to allow a user to add customized gestures to an existing set of learned gestures. For example, wearable device 101 may be deployed to users with encoder network 204 trained on the pinch, clench, tap and knock gestures, but not trained on the shake gesture. One solution would be to train encoder network 204 on training data for the shake gesture, as described above, and then deploy an updated encoder network 204 to the installed customer base. This solution, however, can take many months and a decision on which new gesture to add would likely be based on the gesture with the most demand, which may not be desirable to all users. Accordingly, an alternative solution is disclosed that allows the user to add a customized gesture, which described in reference to
The raw input of network 220 is n-sec (e.g., 1 sec) 6 degree-of-freedom (DOF) IMU signals 221 (three axes for accelerometer, and three axes for gyroscope) sampled at 100 Hz. In an embodiment, the input signal 221 are preprocessed by respective Butterworth filters 222 (e.g., 0.22-8 Hz, 8-32 Hz, 32 Hz) using cascaded second-order sections, leading to 100×4 input 223 for one channel.
The concept of EfficientNet is adopted to balance the number of trainable parameters and model performance, as described in Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In International Conference on Machine Learning. PMLR, 6105-6114. Specifically, for each input channel, two inverted residual blocks are employed (e.g., MBConv 224a, 224b in
The latter half of the pre-trained model consists of a stack of five fully connected layers 229 with sizes as 80, 40, 20, 10, and 5. In an embodiment, a batch normalization layer and a dropout layer (p=0.5) are inserted between every two fully connected layers to improve model generalizability. The output of the final layers corresponds to the confidence of the five classes. The whole model has 106 k parameters. In an embodiment, cross-entropy is used as the loss function, and an Adam optimizer is used during the training.
In an embodiment, the additional gesture prediction head C employs a two-layer fully connected network. The first layer 230 is a feature processing layer and the second layer 231 is an output layer. In an embodiment, the first layer 230 includes leaky ReLU (α=0.3) as the activation function and has a L2 kernel regularizer (λ=5e−5) and a dropout layer (p=0.5) to reduce overfitting. The second layer 231 uses Softmax activation that corresponds to the prediction confidence of the final classes. The number of the classes is equal to the number of customized gestures plus one more class for the negative case.
Therefore, when users create their first customized gesture, the additional prediction head C is trained as a binary classifier. When a second gesture is added, a new three-class prediction head is trained from scratch, etc. Since the additional prediction head C is light-weighted, the training process is fast. In real-time, the additional prediction head C works together with network 220 to recognize distinctive gestures and are both robust to negative data. In an embodiment, if both predict a gesture, the one with the highest confidence is the final prediction. The framework leverages the first half of the pre-trained model as a feature extractor and transfers the extracted features to new gesture recognition tasks. By training the additional prediction head C for incremental classes, the performance of the existing (default) gestures is not impacted, addressing the forgetting old problem. Then, the few-shot challenge with a series of data processing techniques is performed.
In an embodiment, if a new custom gesture is similar to existing gestures, or performed inconsistently by the user, or close to daily activities of the user, feedback (e.g., through a text display or audio on a smartwatch) can be provided to users with a request to define another gesture. Moreover, if a new/custom gesture is novel and performed consistently, but the model is trained with fair performance, users can be offered to choose finishing or collecting a few more samples. Such a feedback can help users to better understand the process and design gestures.
In an embodiment, a gesture customization process flow is described as follows. A user creates a custom gesture and the smartwatch or other wearable device captures a plurality of shots (e.g., 3 shots) of repetition of the custom gesture performed by the user. The few shots are segmented and pre-analyzed to determine if the gesture is: (1) similar to existing gestures, (2) inconsistent among shots, (3) easily confused gestures performed with daily activities, or (4) a consistent novel gesture. If (1) through (3), the user is asked to define a new gesture. If (4), then the training process will proceed a confidence score for the training is computed. If the confidence score indicates a poor result, then the user is asked to define a new gesture. If the confidence score indicates good confidence, the recording is finished and the new gesture is added to the existing gestures. If the confidence score indicates fair confidence, the user is informed of the fair confidence and they are asked if they want to perform more shots. If the user does not perform more shots, then the recording of gestures terminates and the new gestures are added to the existing gestures. If the user performs more shots, the training continues, the confidence score for the training is computed, and either more shots are recorded or the recording terminates, and so forth.
In an embodiment, the segmented data is feed into the pre-trained model (
During the segmentation, a distance matrix of potential gesture repetitions is checked and those gestures that are far from the rest of the repetitions are filtered out. After the filtering, if the number of repetitions left is less than the expected number (e.g., 3 when the framework requires a 3-shot recording), then the user did not perform the gesture consistently.
To find whether the new gesture is close to common daily behaviors, the negative data collected is leveraged. The pre-trained model (
The similarity metric is then input into ML model 308 which is trained to predict a similarity score 309 between two input gestures based on the similarity metric for the pair. In an embodiment, ML model 308 is a deep neural network with one or more dense layers coupled to an activation function that outputs a similarity score (e.g., a probability). In some embodiments, the activation function can be any suitable linear or non-linear activation function. In some embodiments, the activation function is a logistic activation function, such as a sigmoid function or softmax function. In some embodiments, ML model 308 takes the distance metric from metric generator 307 (e.g., a vector of feature differences) and feeds the distance metric into a fully connected layer or layers. The output of ML model 308 is one value that shows the similarity between the two input gestures. If the value is 1, then the input gestures are similar; if the value is zero, then the input gestures are dissimilar.
Process 500 begin by receiving sensor data indicative of a gesture made by a user (501). The sensor is obtained from at least one sensor of a wearable device (e.g., accelerometer, gyro, PPG) worn on a limb (e.g., wrist) of the user that was used to make the gesture. For example, the gesture can be a customized gesture made by the user while wearing the wearable device on her wrist. The gesture can be, for example, the shake gesture shown in
Process 500 continues by generating a current encoding of features extracted from the sensor data using ML model with the features as input (502). For example, a neural network can be used as the ML model. In an embodiment, the neural network includes one or more self-attention networks, a described in reference to
Process 500 continues by generating similarity metrics between the current encoding and each encoding in a set of previously generated encodings for gestures (503). For example, a distance metric can be the similarity metric for measuring a distance between feature vectors for different gesture pairs in a n-dimensional space (e.g., Euclidean distance). Similarity metrics can be computed for pairs of feature vectors: (shake, pinch), (shake, clench), (shake, tap) and (shake, knock), (shake, shake).
Process 500 continues by generating similarity scores based on the similarity metrics (504). For example, a deep neural network can be used to predict the similarity scores based on the similarity metrics. In an embodiment, the deep neural network can be coupled to a sigmoid activation function that outputs probabilities of match.
Process 500 continues by determining a gesture made by the user based on the similarity scores (505). For example, the gesture pair with the highest similarity score (e.g., the highest probability of match) can be selected as the user's gesture. Using the above gesture examples, the pair (shake, shake) would have the highest similarity score. In an embodiment, if none of the pairs are sufficiently close (e.g., all the similarity scores fall below a minimum threshold probability), then no action takes place as the system assumes that the detection bio signals and/or motion signal were not gestures.
Process 500 continues by performing an action on the wearable device or another device based on the determined gesture (506). For example, the determined gesture can be used to navigate a graphical user interface (GUI) presented on a display of the wearable device or otherwise interact with the user interface, perform a desired function such as invoking or closing an application, or initiating and ending a communication modality.
Sensors, devices and subsystems can be coupled to peripherals interface 606 to provide multiple functionalities. For example, one or more motion sensors 610, light sensor 612 and proximity sensor 614 can be coupled to peripherals interface 606 to facilitate motion sensing (e.g., acceleration, rotation rates), lighting and proximity functions of the wearable device. Location processor 615 can be connected to peripherals interface 606 to provide geo-positioning. In some implementations, location processor 615 can be a GNSS receiver, such as the Global Positioning System (GPS) receiver. Electronic magnetometer 616 (e.g., an integrated circuit chip) can also be connected to peripherals interface 606 to provide data that can be used to determine the direction of magnetic North. Electronic magnetometer 616 can provide data to an electronic compass application. Motion sensor(s) 610 can include one or more accelerometers and/or gyros configured to determine change of speed and direction of movement. Barometer 617 can be configured to measure atmospheric pressure. Biosignal sensor 620 can be one or more of a PPG sensor, an electroencephalogram (EEG) sensor, an electrocardiogram (ECG) sensor, an electromyogram (EMG) sensor, a mechanomyogram (MMG) sensor (e.g., piezo resistive sensor) for measuring muscle activity/contractions, an electrooculography (EOG) sensor, a galvanic skin response (GSR) sensor, a magnetoencephalogram (MEG) sensor and/or other suitable sensor(s) configured to measure biosignals.
Communication functions can be facilitated through wireless communication subsystems 624, which can include radio frequency (RF) receivers and transmitters (or transceivers) and/or optical (e.g., infrared) receivers and transmitters. The specific design and implementation of the communication subsystem 624 can depend on the communication network(s) over which a mobile device is intended to operate. For example, architecture 600 can include communication subsystems 624 designed to operate over a GSM network, a GPRS network, an EDGE network, a Wi-Fi™ network and a Bluetooth™ network. In particular, the wireless communication subsystems 624 can include hosting protocols, such that the mobile device can be configured as a base station for other wireless devices.
Audio subsystem 626 can be coupled to a speaker 628 and a microphone 630 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording and telephony functions. Audio subsystem 626 can be configured to receive voice commands from the user.
I/O subsystem 640 can include touch surface controller 642 and/or other input controller(s) 644. Touch surface controller 642 can be coupled to a touch surface 646. Touch surface 646 and touch surface controller 642 can, for example, detect contact and movement or break thereof using any of a plurality of touch sensitivity technologies, including but not limited to capacitive, resistive, infrared and surface acoustic wave technologies, as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch surface 646. Touch surface 646 can include, for example, a touch screen or the digital crown of a smart watch. I/O subsystem 640 can include a haptic engine or device for providing haptic feedback (e.g., vibration) in response to commands from processor 604. In an embodiment, touch surface 646 can be a pressure-sensitive surface.
Other input controller(s) 644 can be coupled to other input/control devices 648, such as one or more buttons, rocker switches, thumb-wheel, infrared port and USB port. The one or more buttons (not shown) can include an up/down button for volume control of speaker 628 and/or microphone 630. Touch surface 646 or other controllers 644 (e.g., a button) can include, or be coupled to, fingerprint identification circuitry for use with a fingerprint authentication application to authenticate a user based on their fingerprint(s).
In one implementation, a pressing of the button for a first duration may disengage a lock of the touch surface 646; and a pressing of the button for a second duration that is longer than the first duration may turn power to the mobile device on or off. The user may be able to customize a functionality of one or more of the buttons. The touch surface 646 can, for example, also be used to implement virtual or soft buttons.
In some implementations, the mobile device can present recorded audio and/or video files, such as MP3, AAC and MPEG files. In some implementations, the mobile device can include the functionality of an MP3 player. Other input/output and control devices can also be used.
Memory interface 602 can be coupled to memory 650. Memory 650 can include high-speed random access memory and/or non-volatile memory, such as one or more magnetic disk storage devices, one or more optical storage devices and/or flash memory (e.g., NAND, NOR). Memory 650 can store operating system 652, such as the iOS operating system developed by Apple Inc. of Cupertino, Calif. Operating system 652 may include instructions for handling basic system services and for performing hardware dependent tasks. In some implementations, operating system 652 can include a kernel (e.g., UNIX kernel).
Memory 650 may also store communication instructions 654 to facilitate communicating with one or more additional devices, one or more computers and/or one or more servers, such as, for example, instructions for implementing a software stack for wired or wireless communications with other devices. Memory 650 may include graphical user interface instructions 656 to facilitate graphic user interface processing; sensor processing instructions 658 to facilitate sensor-related processing and functions; phone instructions 660 to facilitate phone-related processes and functions; electronic messaging instructions 662 to facilitate electronic-messaging related processes and functions; web browsing instructions 664 to facilitate web browsing-related processes and functions; media processing instructions 666 to facilitate media processing-related processes and functions; GNSS/Location instructions 668 to facilitate generic GNSS and location-related processes and instructions; and gesture recognition instructions 670 that implement the gesture recognition processes described in reference to
Each of the above identified instructions and applications can correspond to a set of instructions for performing one or more functions described above. These instructions need not be implemented as separate software programs, procedures, or modules. Memory 650 can include additional instructions or fewer instructions. Furthermore, various functions of the mobile device may be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits.
Before actual data processing, it is noteworthy that there is no readily available sample. When users record data of their new customized gestures, they can either do gestures consecutively in a row, or follow some instructions to do one gesture at a time and repeat several times, depending on the interaction design. In either way, it may be undesirable to ask users to provide the exact start and end timestamp of the gesture. Therefore, the signal sequence is segmented to obtain data samples.
In an embodiment, the sensor signals (e.g., accelerometer, gyro, PPG signals) are input into respective middle bandpass filters. A peak detection algorithm is applied to the output signals of middle bandpass filters (e.g., 8-32 Hz) to identify potential moments of performing hand gestures. In an embodiment, the sum of the magnitude of the filtered accelerometer and gyroscope signals is calculated, and an absolute moving average window (e.g., 1 sec) is applied to smooth the data. In an embodiment, a peak detection method is used to find local maxima by comparing neighboring values (e.g., with distance threshold as 1 sec), where a peak is ignored if it is lower than an overall average of signal magnitude. If any time reference is available (e.g., a countdown mechanism), peaks can be further filtered according to the reference. An n-sec window (e.g., 1 sec) is centered at these potential peaks, and input into a feature extraction part of a pre-trained model. In an embodiment, a distance matrix (e.g., a Euclidean distance matrix) of normalized embedding vectors is computed and the peaks whose embeddings are far from other embeddings based on a empirically set threshold (e.g., 0.8) are removed. In this manner, pronounced, repetitive hand movement periods that correspond to the target gestures are segmented from the sensor signals.
Once the peaks are determined, a n-sec window (e.g., 1.5 sec) is centered at each final peak to ensure that a gesture is fully covered by the window. Data augmentation techniques are then applied to these windows.
After data segmentation, several data augmentation techniques are used to generate a larger number of samples. In an embodiment, three time series data augmentation techniques and all their combinations (e.g., seven combinations (23-1) in total) are used to generate positive samples: 1) zooming, to simulate different gesture speed, randomly chosen from ×0.9 to ×1; 2) scaling, to simulate different gesture strength, with the scaling factor s˜N(1, 0.22), s∈[0, 2]; and 3) time-warping, to simulate gesture temporal variance, with 2 interpolation knots and warping randomness w˜N(1, 0.052),w∈[0,2].
In an embodiment, three augmentation techniques are employed to generate negative data: 1) cutting out by masking a random portion (e.g., 0.5 sec) of signals by zero; 2) reversing signals; and 3) shuffling by slicing signals into pieces (e.g., 0.1 sec pieces) and generating a random permutation. These augmentations are typically used in other machine learning tasks to augment positive data; however, in this embodiment the techniques are used to augment negative data to ensure the model only recognizes valid gestures. The positive augmentation techniques describe above can also be applied to the negative data to generate more negative samples.
Although the data augmentation can generate signals with larger variance from the data recorded by users, these augmented data may not be close to the actual gesture variance introduced by natural human behavior. Therefore, more data can be synthesized from both the raw signals and the augmented signals that simulate the natural motion variance. In an embodiment, a A-encoder is trained, which is a self-supervised encoder-decoder model that can capture the difference between two samples (i.e., Δ) belonging to the same gesture, and use the difference to synthesize more new gesture samples.
In an embodiment, a Δ-encoder is trained as follows. The Δ-encoder takes two samples (sampleInput and sampleRef) from the same class as the input, feeds sampleInput through a few neural network layers to be a very small embedding called Δ-vector (similar to a typical autoencoder, and then use the Δ-vector and the sampleRef to reconstruct sampleInput. The intuition comes from that the size of Δ-vector is so small that it focuses on capturing the difference between sampleInput and sampleRef, which is then used to rebuild sampleInput with sampleRef as the reference. After the Δ-encoder is trained, it can take another sample from the new class as a new sampleRef, and generate a new sample of the same class with a Δ-vector. This Δ-vector can either be obtained by feeding any existing sample from other classes through the encoder, or randomly generated.
In an embodiment, data of the existing four gestures previously described above (pinching the thumb and index finger together, clenching the hand into a fist, tapping one or more fingers on a surface and knocking a fist on a surface) are used to train a Δ-encoder. During the training, two samples are randomly drawn from the same gesture and the same user to ensure that the model captures the within-user variance instead of the between-user variance. the feature embeddings (e.g., of length 120) are used as the input and the output of the Δ-encoder to save computation cost. In an embodiment, both the encoder and decoder have one hidden layer with a size of 4096 and uses leaky ReLU (α=0.3) as the activation function. In an embodiment, the size of Δ-vector is set as 5.
Using the training set from the four gestures described above, the model is trained with, e.g., 200 epochs and has a, e.g., 0.5 exponential decay on the learning rate every 30 epochs. The epoch with the best results on the validation set is saved. The Δ-vectors from the testing set of the four gestures are also calculated and saved to be used to generate new samples. In real-time, when the customized gesture data goes through the augmentation stage, the Δ-encoder is used to generate extra samples of the customized gestures (both positive and negative data) that contain more natural gesture variance.
After the data augmentation and data synthesis, a large amount of data with appropriate variance is obtained to train the gesture prediction head. To further improve the robustness of the model, in some embodiments the adversarial training regularization is employed when learning the model. Adversarial regularization is used to train a model with adversarial-perturbed data (perturbed towards the decision boundary or inverse gradient decent so that the training process becomes harder) in addition to the original training data. It can prevent the model from overfitting and classify the data points close to the boundary more robustly. In an embodiment, customized gesture data from the same user tend to be blended with the existing four gestures near the boundary. Adversarial regularization can help to enhance classification performance, especially for the purpose of reducing false-positive. In an embodiment, the adversarial regularization loss weight and the reverse gradient step size are set to 0.2.
Accordingly, through a series of data segmentation, data augmentation, data synthesis, and adversarial training, a robust prediction head can be learned for each new user that can accurately recognize their customized gestures with a low false-positive rate.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub combination or variation of a sub combination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
As described above, some aspects of the subject matter of this specification include gathering and use of data available from various sources to improve services a mobile device can provide to a user. The present disclosure contemplates that in some instances, this gathered data may identify a particular location or an address based on device usage. Such personal information data can include location-based data, addresses, subscriber account identifiers, or other identifying information.
The present disclosure further contemplates that the entities responsible for the collection, analysis, disclosure, transfer, storage, or other use of such personal information data will comply with well-established privacy policies and/or privacy practices. In particular, such entities should implement and consistently use privacy policies and practices that are generally recognized as meeting or exceeding industry or governmental requirements for maintaining personal information data private and secure. For example, personal information from users should be collected for legitimate and reasonable uses of the entity and not shared or sold outside of those legitimate uses. Further, such collection should occur only after receiving the informed consent of the users. Additionally, such entities would take any needed steps for safeguarding and securing access to such personal information data and ensuring that others with access to the personal information data adhere to their privacy policies and procedures. Further, such entities can subject themselves to evaluation by third parties to certify their adherence to widely accepted privacy policies and practices.
In the case of advertisement delivery services, the present disclosure also contemplates embodiments in which users selectively block the use of, or access to, personal information data. That is, the present disclosure contemplates that hardware and/or software elements can be provided to prevent or block access to such personal information data. For example, in the case of advertisement delivery services, the present technology can be configured to allow users to select to “opt in” or “opt out” of participation in the collection of personal information data during registration for services.
Therefore, although the present disclosure broadly covers use of personal information data to implement one or more various disclosed embodiments, the present disclosure also contemplates that the various embodiments can also be implemented without the need for accessing such personal information data. That is, the various embodiments of the present technology are not rendered inoperable due to the lack of all or a portion of such personal information data. For example, content can be selected and delivered to users by inferring preferences based on non-personal information data or a bare minimum amount of personal information, such as the content being requested by the device associated with a user, other non-personal information available to the content delivery services, or publicly available information.
This application claims the benefit of priority of U.S. Provisional Patent Application No. 63/197,307, filed Jun. 4, 2021, and U.S. Provisional Patent Application No. 63/239,905, filed Sep. 1, 2021, which applications are incorporated by reference herein in their entirety.
Number | Date | Country | |
---|---|---|---|
63239905 | Sep 2021 | US | |
63197307 | Jun 2021 | US |