This application claims priority to, and is a continuation of, PCT Application No. PCT/EP2015/051774, filed on Jan. 29, 2015, entitled “METHOD AND SYSTEM FOR HANDWRITING AND GESTURE RECOGNITION,” which, in turn, claims the benefit of priority based on EP Application No. 14156530.9, filed on Feb. 25, 2014, both of which are hereby incorporated by reference.
The description generally relates to electronic data processing, and more particularly, relates to methods, computer program products and systems for handwriting and gesture recognition.
There are multiple approaches for systems and methods for electronic character recognition. Some approaches are directed to handwriting recognition of characters written onto a two-dimensional surface, such as a touch screen or specific electronic paper. In such scenarios, a decomposition of characters into strokes can be performed. In this context, a stroke corresponds to a part of the line which is drawn to form the respective character (e.g., the letter “A” consists of three strokes). Such approaches may not recognize characters which are written into the air, that is written virtually by a user performing a three dimensional movement, because the respective three dimensional trajectory cannot be processed by such systems.
In some implementations, a system can analyze each stroke made by a motion of a user based on a direction of motion at the beginning portion and the end portion of the stroke. A character may then be recognized based on a combination of the strokes. The user has to also manually indicate the beginning and end of each stroke to the system by using a respective switch function of the system. This manual operation of the switch prevents the user from continuous and fluent writing. Other example systems can provide solutions for the recognition of three-dimensional hand writing using a camera to determine the absolute position of the user's hand for determining the trajectory performed by the user's hand while writing.
An improvement in recognizing virtual three-dimensional hand writing, (e.g., for writing letters, signs or words into the air) could be beneficial. Providing a system and method for supporting continuous and fluent writing without a need for complex systems including stationary sensors could be advantageous.
It may be advantageous to provide decoding systems and methods for improved character and handwriting recognition in the case of virtual three-dimensional hand writing. Virtual in this context is to be understood as writing without a medium (e.g., paper, display, etc.) forcing the writer to write in two dimensions. Instead the writer can, for example, write into the air without limitations to make movements in a third dimension. In the context of the following description the term “character” refers to any letter, sign or symbol which can be composed from a sequence of strokes. This includes for example all characters of the American Standard Code for Information Interchange (ASCII) or Unicode but also Japanese, Chinese or other Asian characters as well as other signs like squares, circles or arrows.
In one example implementation, a decoding computer system for handwriting recognition includes an interface component for receiving measurement data from a motion sensor unit. The motion sensor unit is physically coupled with a movable part of a user's body. For example, the motion sensor unit may be attached to the user's hand. It may be part of any kind of wearable item, for example a glove, a bracelet, a watch or a ring worn by the user. It may also be imprinted onto the skin, injected into the skin, or implanted, or otherwise temporarily or permanently attached to the human body. It may also be part of a device held by the user (e.g., a smartphone, an electronic pen, etc.). Furthermore, the computer system itself may be a part of a device held or worn by the user. That is, the motion sensor unit may be attached to the user's body either temporarily or permanently. The measurement data includes sensor data of at least one sensor of the motion sensor unit. The sensor data may correspond to a second derivation in time of a trajectory of the motion sensor unit. Higher order derivations in time may be used as well in case an appropriate sensor is available. That is, the measurement data may include acceleration data provided by an acceleration sensor which is part of the motion sensor unit. In an alternative implementation, the motion sensor unit may include sensors, such as a gyroscope, a magnetometer or a barometer. In such implementations, the measurement data may include data regarding the rotation and orientation of the motion sensor unit or the air pressure. For example, the motion sensor may include a gyroscope in addition to or instead of an acceleration sensor in which case the sensor data may correspond to the angular velocity. For example, the motion sensor may further include a barometer in addition to the acceleration sensor and/or the gyroscope. The respective sensor data then further includes the air pressure. A difference in air pressure for two locations of the motion sensor indicates a difference in height for the two sensor locations and can thus be used as a measure for vertical motion. Using a combination of such various sensor measurement data types can improve the accuracy of the handwriting recognition method. Further, such measurement data types provide measures of the relative movement of the motion sensor making a stationary fixed sensor setup obsolete because the suggested handwriting recognition does not depend on absolute location measurement of the motion sensor unit for trajectory determination.
The computer system includes a data storage component for storing technical profiles of primitive motion units wherein the technical profiles include at least a plurality of predefined acceleration profiles. That is, a technical profile of a primitive motion unit in the context of this document is a profile which reflects physical data such as acceleration, orientation, rotation and/or pressure data either as raw data or in a preprocessed format wherein the physical data is associated with performing a three dimensional movement to draw or write a respective character or sign. Thereby, the physical data (e.g., acceleration, angular velocity, air pressure, etc.) characterizes the respective physical movement. In other words, each acceleration profile includes at least acceleration data characterizing a movement associated with a specific portion of a potential trajectory of the motion sensor unit in the context of at least a previous or subsequent portion of the potential trajectory. The context of a movement associated with a specific portion of a potential trajectory is defined by a previous and/or a subsequent portion. The context in which a portion of the potential trajectory is embedded has an impact on the respective technical profile and can be used to differentiate similar portions of different trajectories occurring in different contexts. For example, a context-dependent sequence of technical profiles representing an up-movement and a subsequent down-movement is different in cases where a pause is made or not between the two movements. Therefore, it may be advantageous to store context-dependent sequences of profiles because a mere concatenation of basic technical profiles (primitives) may not reflect the actual sensor measurement data in most cases.
The system further includes a decoding component for comparing the received sensor data with the plurality of predefined acceleration profiles to identify a sequence of portions of the trajectory associated with the motion sensor unit. For example, the decoding component can identify a particular character corresponding to the received sensor data if the identified sequence of portions of the trajectory of the motion sensor unit is associated with a predefined (e.g., defined by an expert or derived from available knowledge or automatically learned from training data) context-dependent sequence of portions of a specific potential trajectory representing the character. In other words, each character can be specified by one or more characteristic context-dependent sequences of technical profiles. For example, the decoder can calculate a similarity score, for example by calculating the likelihood of a Hidden Markov Model given the observed sensor signals by using the Viterbi algorithm together with Hidden Markov Models as technical profiles, between the received measurement data and respective predefined context-dependent sequences of technical profiles. The particular character associated with the context-dependent sequence of technical profiles with the highest similarity score is then identified by the system. Furthermore, the decoder can provide a representation of the identified handwritten text, sign or signs to an output device. For example, the text, sign or signs (e.g., a character or text) can be displayed on a display device or printed or it may be converted into an audio signal and conveyed as spoken language. The decoded character or sign may also be used as part of an instruction for controlling a digital device or may be used for manipulating virtual objects. For example, in case the decoded character or sign corresponds to a specific symbol like an arrow or a dash to the right, it may be used to trigger a scrolling function to the right. For example, in case the decoded character corresponds to a push gesture (i.e., moving the hand forward like closing a door) it may be used to trigger select function. A sequence of such symbolic characters in three-dimensional (3D) space may be used to manipulate virtual objects.
In one example implementation, the received measurement data may be transformed into a feature space which is characteristic of the respective movement of the motion sensor unit. The goal of this step can be to transform the raw data in a way that the data provided to the decoding component contains only relevant information for the handwriting recognition task. Therefore, the transformed measurement data may include less data and information than the original raw sensor data characterizing the movement. For example, the preprocessing component can perform such a feature extraction from the original raw data by using mean and/or variance normalization. The stored technical profiles may characterize the portions of the potential trajectory with a corresponding representation in the feature space. This allows calculating a similarity between the transformed data extracted from the sensor data with corresponding technical profiles representing the respective portions in the feature space.
In one example implementation, a detection component may separate handwriting-related measurement data from other measurement data of the motion sensor unit. Various known data separation methods can be used for this purpose enabling the computer system to recognize sporadic writing and depending on the used method also reduce the processing load for the handwriting recognition decoding. Such an automatic separation/segmentation system can enable an always-on operation mode of the proposed system, that is, the system can continuously run in the background and, therefore, allows accurate handwriting recognition for sporadic and continuous writing.
In one example implementation, a dictionary stores one or more context-dependent technical profile sequences for each identifiable character. Each context-dependent technical profile sequence is representative of a potential trajectory of the motion sensor unit associated with an identifiable character. If the dictionary includes multiple context-dependent technical profile sequences for a particular identifiable character, they can represent multiple different potential trajectories of the motion sensor to write the particular identifiable character. Such identifiable characters may be learned by the system or derived from data in an automatic fashion as described later. As a consequence, the system becomes more robust against varying character size and shape, writing habits, and other user peculiarities. The dictionary may also store context-dependent technical profile sequences for strings or words. Such a (word) context-dependent technical profile sequence represents a potential trajectory of the motion sensor unit associated with a multi-character string. It includes one or more connecting technical profiles representing connecting portions of the potential trajectory between at least a previous character and a subsequent character of the multi-character string. The connecting technical profiles facilitate the handwriting recognition in continuous writing as they represent movements performed by the user which are not part of characters but which may have impact on the resulting context-dependent sequence of technical profiles.
The data storage component may further store a group profile which represents a group of contexts. The group of contexts can be associated with multiple similar context-dependent technical profiles. By grouping such similar context-dependent technical profiles the overall number of technical profiles which need to be stored can be flexibly controlled. That means, if there is enough training data, a high number of context-dependent technical profiles can be used. If there is less training data, more context-dependent technical profiles might be grouped together to reduce the number of parameters that need to be estimated from the training data. In other words, the more context-dependent technical profiles are used, the more training data is necessary. The grouping of contexts allows for flexibly adapting to the amount of available training data. A reduced number of context-dependent technical profiles also allows for example to save memory consumed by the data storage component.
In one example implementation, the computer system may include a language database configured to provide to the decoding component probabilities for character sequences. Such language databases are sometimes also referred to as language models and can be used to limit the search space and to provide guidance to the search which improves the accuracy of the handwriting recognition and speed up the decoding time. In cases where two characters are associated with very similar context-dependent sequences of technical profiles (e.g., in handwriting lower “a” and lower “d” are written by very similar movements) the system may have difficulties in identifying the character merely based on the similarity calculation. However, the language model can provide the information that the character sequence “and” has a higher probability than the character sequence “dnd” since “and” is a frequently occurring English word while “dnd” has no meaning in the English language. This language model probability can then be used to influence the identification of the respective character. Additionally, the language model can provide information about word sequences. For example, the sequence of the words “be my guest” has a higher probability than the sequence “be my quest”. Both sequences differ only by the letter “g” and “q”, which might be hard to discriminate. The probability for the word sequences can positively influence the identification of the correct characters and words.
In further implementations, a computer program product when loaded into a memory of the computer system and executed by at least one processor of the computer system causes the computer system to execute the steps of a respective computer implemented method for performing the functions of the computer system.
Further aspects of the implementations will be realized and attained by means of the elements and combinations particularly depicted in the appended claims. It is to be understood that both, the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the implementations as described.
Briefly turning to
Returning to
The computer system 100 includes a data storage component 130 for storing technical profiles of characters wherein the technical profiles include at least a plurality of predefined acceleration profiles. Enhancing the technical profiles by orientation profiles, angular rate profiles (rotation profile) and air pressure profiles can improve the overall accuracy of the handwriting recognition system. This will be explained in more detail in
The system 100 further includes a decoding component 120 for comparing the received sensor data 11 with the plurality of predefined technical (e.g., acceleration) profiles 130 to identify a sequence of portions of the trajectory 20 associated with the motion sensor unit 10.
For example, a dictionary 140 can be used to store such character specific context-dependent sequences of technical profiles. The decoder can compare the received measurement data with the stored context-dependent sequence profiles and outputs/returns the character sequence which is most similar (has highest similarity). A representation of the identified characters/character sequence can be provided to the output device 200. For example, the character can be displayed on a display device (e.g., computer screen, augmented reality glasses, etc.) or printed or it may be converted into an audio signal and conveyed as spoken language. The decoded character may also be used as part of an instruction for controlling a digital device.
The dictionary 140 is configured to define the mapping from characters and words to portions (primitives) and their respective technical profiles. The dictionary 140 can be generated from separate character and word dictionaries. A character dictionary defines the mapping from characters to portions. There can be more than one possibility to write a particular character. Therefore, for one character multiple technical profile variants can be specified/defined in the dictionary. The dictionary can be flexibly expanded by adding new characters, new words, new variants by specifying the sequences of technical profiles accordingly. Table 1 shows an example for multiple variants in the character dictionary for the letter “E”. In the example a simplified notation is used to denote portions corresponding to a “down” movement (D), portions corresponding to a “right” movement (R), portions corresponding to a “left” movement (L), portions corresponding to a “down-left” movement (DL) and portions corresponding to an “up-left” movement (UL). Further down in the specification a more granular notation will be introduced.
Table 2 shows an example of a word dictionary entry which is a straight-forward mapping of a word (string) to the respective sequence of individual characters.
Table 3 shows by way of example how the generated dictionary can look under the assumption that dictionary variants of “E” are not mixed within one word i.e., that the user is consistent in the way of writing an E within one word, and the character “L” is mapped to the portion sequence “D R”. Two consecutive characters may be linked by a connecting portion. However, this connecting portion is not mandatory. The proposed method automatically detects if the connecting portion better fits the given signal data. These portions can be optionally inserted between the strokes of the individual characters and are shown for clarity reasons in brackets in the example dictionary entries of table 3. For example if all characters are written in place, that is, each character is written virtually over the other, a motion to the left is necessary between the characters for the first variant of “EEL” shown in table 3. This is because writing the character “E” typically ends at the right side and writing of both characters “E” and “L” starts at the left side. Thus, a motion symbol for a left motion may be inserted between the characters with respect to the context.
In one implementation, the computer system may include a preprocessing component, which transforms the raw sensor measurement data into the feature space. Thereby, the dimensionality and/or the number of samples of the raw data may be changed so that only relevant information regarding the handwriting recognition task is present after the transformation. The transformation may also contain a fusion of different sensors (i.e., multiple sensors of different sensor types). For example, the data from an accelerometer, gyroscope and magnetometer can be fused together to estimate the absolute orientation of the sensor. Several fusion techniques for such an orientation estimation can be used, for example, a Kalman-Filter, a Mahony or a Madgewick filter.
Typical preprocessing steps which can be applied to the received sensor raw data are mean normalization and variance normalization. A mean normalization removes constant offsets in the raw data signal by subtracting the signal mean (and thus setting the mean to zero). For acceleration signals, mean normalization can remove the influence of gravity acceleration to a certain extent. Gravity acceleration is always present on earth and—depending on the hardware implementation—might be measured along the axes of the acceleration sensor depending on its orientation. For example, during handwriting, the orientation of the sensor is not constant but it is not a subject to major changes either. Therefore, removing the constant offset can remove the gravity influence to some extent. If gyroscopes and magnetometers are available, a more precise method for gravity subtraction can be used. For example, Kalman filter, and Madgwick/Mahony filter methods can be used to estimate sensor orientation from the measurement data over time. The estimated orientation can be used to subtract the gravity instead of simply subtracting the signal mean. The estimated orientation might also be used as additional dimensions in the feature space.
Variance normalization sets the variance of the raw data signal to one and can compensate high or low amplitudes caused by different writing speeds and styles (both, writing speed and style may vary across users). The joint application of mean and variance normalization is commonly denoted as z-normalization.
Other optional preprocessing steps can include filtering of the received raw sensor data signal. As human motion usually is associated with low frequencies (approx. <30 Hz), the data signals may, for example, be low-pass filtered to eliminate high frequency parts in the signal caused by tremor of the user or sensor noise. Besides the signal normalization the raw signal may be filtered with a moving average filter, which is one option within a wide range of other known filters.
Other transformations commonly used in signal preprocessing and feature extraction for pattern recognition include but are not limited to integration and derivation of the signal, down—or upsampling, signal compression, changing bit resolution, application of a windowing function to compute statistical features like the signal mean within a window or in combination with the short-time Fourier transformation for the extraction of spectral features. Methods like Principal Components Analysis (PCA) or Linear Discriminant Analysis (LDA) or Independent Component Analysis (ICA) are commonly used to reduce the dimensionality of the feature space. Signal approximation like e.g., spline approximation, piecewise linear approximation, symbolic representation or wavelet transform might be used to represent the signal in a compressed form. Information fusion might be used to fuse sensor measurements to generate higher level information, e.g. computation of the sensor orientation from acceleration, gyroscope and magnetometer readings by using a Kalman filter or one of its variants, to name only a few options. The transformation may also contain stacking of subsequent samples over time or n-order derivatives of subsequent samples.
Any sort of combination of methods might be used jointly to transform the original signal (measurement raw data). The output of the preprocessing is the signal transformed into the feature space. The dimensionality and the number of samples might change through the transformation. The transformed samples are usually called feature vectors and thus, the output of the preprocessing is a sequence of feature vectors.
A sequence of characteristic feature vectors of the sensor measurement data signals can be extracted from the received sensor measurement data. For example, a windowing function can be applied to the measurement data or preprocessed data and the average per window is computed for each of the signal dimensions x, y, z. Other approaches include but are not limited to the usage of signal peaks or the zero-crossing rate. The output of such data preprocessing is the transformation of the original signal to the feature space.
The decoding component 120 may use statistical and probabilistic techniques such as Hidden Markov Models (HMM) or Conditional Random Fields (CRF), or Shallow and Deep Neural networks with an appropriate algorithm to solve the HMM or CRF decoding problem (such as appropriate message passing variants of Viterbi beam search with a prefix tree, stack decoder strategies like A*Star or finite state transducers). In case of HMMs are used, the output of the decoding component is an n-best list of the n best hypotheses together with the respective likelihood scores which quantify the likelihood that a hypothesis represents the given signal. The 1-best hypothesis, i.e. the one with the highest likelihood is typically taken as the recognizer output. The likelihood is computed by quantifying the grade of fit between the predefined technical profiles and the observed feature sequence associated with the trajectory 20. Thus, the likelihood can be used as a measure of similarity between the technical profiles and the sensor signals transformed into the feature space. It may also take into account the likelihood of the character and/or word sequence in general. The latter can be computed by integrating the language model into the decoding process (language database) 150. The language model includes probabilities for sequences of characters or words, typically specific to the language and the domain of application.
A character language model can return the probability of a character given a fixed number of its predecessors. A word language model returns the probability for a word given a history of words observed so far. This allows exploiting syntactic and semantic properties of a language by e.g. deriving statistical properties from training text via machine learning algorithms, which people skilled in the art of speech and handwriting recognition are familiar with. A language model can be implemented as a statistical n-gram model or a grammar. A grammar may restrict the character sequences that can be recognized, and may allow for greater robustness of the recognition. The influence of the likelihoods of the technical profiles versus the influence of the language model can be adjusted by a weighting factor. In other words, the decoding component may use two different kinds of probability scores, e.g., probability scores that quantify the similarity with motion patterns and probability scores of character or word occurrence in sequences.
In one implementation the computer system further may include a detection component 160 configured to filter the received sensor data so that only sensor data associated with writing motion for a character is provided to the decoding component. Thereby, the system can continuously run in the background and automatically detect when a user writes. This may provide more convenience to the user, as no switch or special gesture has to be manually activated to indicate the beginning and end of writing. In other words, this feature enables the user to permanently wear the system and not have to worry about having to manually/consciously switch the system ON and OFF—it will always run but only recognize characters or words when the user indeed writes. The detection component 160 can segment the incoming measurement data 11 into handwriting and non-handwriting parts. For example, this can be achieved by using a binary classification: the incoming data stream is windowed by applying a sliding window; the individual windows are then classified in handwriting or non-handwriting resulting in a segmentation of the input signal.
Another possibility for separating handwriting-related data from other measurement data is to handle the non-handwriting motion by a garbage model within a Hidden-Markov-Model decoder. That is, in addition to the character related technical profiles, a special technical profile for non-handwriting (garbage) motion is created. Another possibility is to use a threshold based approach.
In
In
In
All other codes can be decoded according to this scheme. This scheme can be used to create models of characters as sequences of respective portions. Following this scheme, for example, the sequence of portions describing the movements performed to write the letter B in
For each portion a respective technical profile can be created. Such technical profile is dependent on the context of the portion, because the physical parameters characterizing the portion depend on the previous and subsequent portions. The representation of characters by such technical profiles is very flexible, because it does not require the modeling of complete character models. Rather, any arbitrary character or symbol can be built from such basic context-dependent or context aware primitives, and as a consequence can also be recognized by the decoding component of the system. Therefore, new symbols, characters, and words can be defined and integrated on the fly without the need to change, modify, or retrain the existing models and systems.
The signal patterns for individual portions vary depending on the preceding and or subsequent portions. For example, it can make a great difference for context-dependent sequences of technical profiles if two consecutive movements (portions of the trajectory) are performed with a pause in between or without a pause.
Another example is given in
The following naming convention will be used for denoting a portion in the context of the preceding and succeeding (subsequent) portions: p(pp|sp) where p is the portion, pp denotes the preceding portion (preceding context) and sp denotes the subsequent portion (succeeding context).
For example, technical profiles can be stored for all possible combinations of preceding and succeeding contexts. Alternatively, technical profiles can be created and stored for groups of contexts. For example, only one technical acceleration profile for the “down” portion in the context of the preceding portions of “leftup” and “up” can be created since the pattern of “down” is almost the same for both preceding contexts (cf.,
The figure relates to the example of writing the character “A””. The respective sequence of portions related to the character strokes can be described as: s:c-ur, s:c-dr, s:c-ul, s:c-r.
As mentioned earlier, the accuracy of handwriting recognition may be improved by complementing acceleration sensor data with further sensor data, such as rotation and/or orientation sensor data. All three types of sensor data can be used for handwriting recognition based on relative sensor data. In other words, acceleration sensor data, orientation sensor data and rotation sensor data can be used to determine a match with the predefined technical profiles stored in the system without a need to know the absolute position in space of the motion sensor unit. Therefore, there may be no need to have a complex static system with stationary camera sensors. Accurate handwriting recognition according to some implementations only relies on data, which are measured by sensors independent of any absolute spatial coordinates. The figure shows example signals recorded by a motion sensor unit 10 (cf.
The computer system receives 1100 sensor measurement data from a motion sensor unit physically coupled with a movable part of a user's body. The sensor measurement data includes a second derivation in time of a trajectory of the motion sensor unit. The trajectory includes a sequence of portions corresponding to a movement performed by the user. For example, the user moves his or her hand with the attached motion sensor unit and the motion sensor unit may record measurement data regarding the acceleration, rotation or orientation of the motion sensor unit over time. Such data may then be received by the computer system.
The computer system can compare 1400 the received sensor measurement data with a plurality of sequences of technical profiles. Such technical profiles at least include a plurality of predefined acceleration profiles. In alternative implementations, the technical profiles may further include orientation, rotation and/or pressure profiles. Each acceleration profile includes information on the distribution of acceleration data characterizing a movement associated with a specific portion of a potential trajectory of the motion sensor unit in the context of at least a previous or subsequent portion of the potential trajectory. Thereby, each technical profile may include a stochastic description of the evolvement of the measurements over time, where the measurement data characterize the movement associated with the respective specific portion of the potential trajectory of the motion sensor unit. This allows that there is no need for an exact matching of the measurement data and the respective technical profiles. A similarity within a predefined similarity range can be sufficient to identify the respective technical profiles. Possible sequences of context dependent technical profiles are defined in the dictionary. The received sensor data or its transformation to the feature space is aligned with the possible sequences of technical profiles (e.g. characters or words). A similarity score is computed for the possible sequences of technical profiles and the received sensor data or its transformation to the feature space. To align the data with a sequence of technical profiles, the technical profiles are concatenated according to the dictionary to form new virtual technical profiles representing sequences of the original technical profiles. The individual technical profiles in the sequence are chosen according to their context, i.e., according to the previous and subsequent technical profiles. The sequence of technical profiles with the highest similarity score is selected as output.
If the motion sensor unit also provides rotation and/or orientation data, the technical profiles further may include respective predefined rotation data, orientation data and or pressure data associated with the specific portions of the potential trajectory of the motion sensor unit in the context of at least a previous or subsequent portion of the potential trajectory. This can increase the overall accuracy of the handwriting recognition method. The disclosed four sensor measurement data types (acceleration, orientation, rotation, air pressure) are suitable to measure the relative movements of the motion sensor unit in all spatial dimensions over time without a need to have a complex stationary sensor in place because the disclosed handwriting recognition method is not dependent on any absolute position values for the determination of the trajectory of the motion sensor unit.
In one implementation, the system can transform 1200 the received sensor data into the feature space to compare the transformed data with the representations in the technical profiles. The representations (i.e., the transformed motion sensor measurement data) are representative of the acceleration data, rotation data, orientation data and/or pressure data of the motion sensor data. In other words, the measurement data is transformed to the feature space, which might have a different dimensionality and a different number of samples per time unit. The samples in the feature space are called feature vectors and therefore, the transformation results in a sequence of feature vectors. Such a sequence of feature vectors extracted from the received sensor data can then be compared to a corresponding technical profile. Thereby, each technical profile may include a stochastic description of the evolvement of each feature over time, where the features characterize the movement associated with the respective specific portion of the potential trajectory of the motion sensor unit. This allows that there is no need for an exact matching of the feature vectors derived from the measurement data and the respective technical profiles. A similarity within a predefined similarity range can be sufficient to identify the respective technical profiles. The use of preprocessed features instead of measurement raw data may allow reducing the amount of data to be stored and processed by the system and may allow for a better generalization of the technical profiles as well as for a higher accuracy of the handwriting recognition method.
In one implementation, upon receipt of the sensor measurement data, the system can separate 1300 handwriting-related measurement data from other measurement data of the motion sensor unit. This preprocessing step allows for continuous operation of the system in the background without the need to explicitly switching it on or off. It further helps to reduce the amount of data which needs to be processed by the comparing step and, therefore, contributes to improve the performance of the handwriting recognition method 1000.
The system can then identify 1500 a particular sign, character or word corresponding to the received sensor data, either on the basis of the raw measurement data or by using the representation of the data transformed to the feature space representing the motion characteristics. If the identified sequence of portions of the trajectory is associated with a predefined context-dependent sequence of portions of a specific potential trajectory representing the particular sign, character or word, the particular sign, character or word is identified. Finally, the system provides 1600 a representation of the identified sign, character or word to an output device. It may be appreciated that in the description of the computer system further optional method steps are disclosed which can be combined with the computer implemented method 1000, such as for example the use of a language model.
The training sample data 12, 13 can include data from different users and multiple instances of the recorded characters. Multiple instances of a recorded character can also be based on different writing habits of different users with regards to different sequences of strokes being used for writing the same character. In general, typically a small variety is observed in the way one and the same person writes certain characters; whereas a larger variety is observed in the way different people write a specific character. The training component 170 is configured to identify the technical profiles that represent this variety in the movement of the users reflected by the resulting acceleration, rotation, orientation and/or pressure sensor signals. This can be achieved by using statistical and/or probabilistic methods like e.g. Hidden Markov Models (HMMs). The training component may use one of the known training methods for HMMs like e.g. the Baum-Welch algorithm, Viterbi training or discriminative training.
In the next step the training component 170 (cf.
The training component may also create one technical profile for a group of contexts. That is, the technical profile is the same for a given set of preceding or succeeding contexts. For example, this may be useful if a portion of a trajectory is the same for a number of preceding and succeeding contexts. By grouping such similar context-dependent technical profiles the overall number of technical profiles which need to be stored can be reduced to save memory consumed by the data storage component. Additionally, the number of parameters that need to be estimated during training can be reduced, thus improving the system performance. Such groups can be defined by experts or automatically by the system by first creating technical profiles for all possible combinations of contexts and afterwards subsume all contexts, for which the technical profiles are similar according to a similarity measure for technical profiles. This may be done by a clustering algorithm, e.g. k-means, or based on a pre-defined threshold for the maximal similarity.
Method steps of the implementations can be performed by one or more programmable processors executing a computer program to perform functions of the implementations by operating on input data and generating output. Method steps can also be performed by, and apparatus can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computing device. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are at least one processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. Such storage devices may also provisioned on demand and be accessible through the Internet (Cloud Computing). Information carriers suitable for embodying computer program instructions and data include all forms of non-volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in special purpose logic circuitry.
To provide for interaction with a user, a computer having a display device, e.g., a cathode ray tube (CRT) or liquid crystal display (LCD) monitor, for displaying information to the user and an input device such as a keyboard, touchscreen or touchpad, a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer, can be used in some or all implementations as described herein. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.
A computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with implementations, or any combination of such back-end, middleware, or front-end components, can be used in some or all implementations described herein. Client computers can also be mobile devices, such as smartphones, tablet PCs or any other handheld computing device. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet or wireless LAN or telecommunication networks.
The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Number | Date | Country | Kind |
---|---|---|---|
14156530.9 | Feb 2014 | EP | regional |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/EP2015/051774 | Jan 2015 | US |
Child | 15246639 | US |