1. Field of the Invention
The present invention relates to the field of image recognition, and in particular to a method and apparatus for accelerated handwritten symbol recognition in a pen based tablet computer.
2. Background Art
In some computer systems, handwritten symbols are input to the system. These symbols are translated by the computer system to machine readable characters. This translation is typically computation intensive. In some computer systems, battery operated portable devices for example, the general purpose central processing unit (CPU) used for the translation is inefficient in its power consumption during the translation operation. Thus, the battery is drained more rapidly. Additionally, some battery operated systems are limited in computational power. When a real-time translation requirement is placed on symbol translation, the limited computational power results in a limited degree of accuracy in the translation process. These problems can be better understood with a review of handwritten data entry.
Handwritten Data Entry
A typical computer system consists of a central processing unit (CPU), main memory such as random access memory (RAM), a data entry device, including a positioning device, a mass storage device such as one or more disk drives, a display and/or a printer. In the prior art, the data entry device often consists of a keyboard, on which a user enters data by typing. The positioning device of a prior art computer system may consist of a “mouse” or other cursor positioning device.
Computer systems also exist that are directed to handwritten data entry rather than keyboard data entry. These systems are often characterized by the use of a pen, stylus, or other writing device, to enter handwritten data directly on the display of the computer system. Alternatively, these systems may provide for a user to enter data on a digitizing tablet or other input device, with the image of the written input displayed on a separate computer display output device. The writing device for entering handwritten or freestyle stroke input information is not limited to a pen or stylus, but may be any input device such as a mouse, trackball, pointer, or even a person's fingers. Such systems are not necessarily limited to receiving data generated by human users. For example, machine generated data may also be inputted and accepted to such systems.
One class of this handwriting entry computer system that receives handwritten data input is referred to as a “pen based” computer system. In a pen based computer system, a writer can input information on a display by “writing” directly on the display. A writing device, such as a pen or stylus, is used to enter information on the display. In a typical pen-based computer system, a user touches the stylus to the display and writes as the user would on a piece of paper, by making a series of pen strokes to form letters and words. A line appears on the display that follows the path of travel of the pen point, so that the pen strokes appear on the display as ink would appear on a handwritten page. Thus, the user can enter information into the computer by writing on the display. Pen based computers typically have a display surface that serves as both an input receiving device and as an output display device.
Handwritten Data Translation
One characteristic of handwriting entry computer systems is the ability to translate original handwritten symbols into machine readable words or characters for display. This translation is accomplished via a “character recognition” algorithm. The handwritten symbols are translated into, for example, ASCII characters. After the translation, the appearance of the displayed characters is as if they had been typed in via a keyboard.
To translate a handwritten character into a machine readable character, the handwritten character is compared to a library of characters to determine if there is a match. A description, or “template” for each character is defined and stored in memory. Handwritten characters are compared to the stored templates. Match coefficients, reflecting how closely a handwritten character matches the template of a stored character, are calculated for each template character. The template character with the highest match coefficient is identified. The character represented by this template provides the “best fit” for the handwritten character. If the match coefficient for the “best fit” character exceeds a predetermined minimum threshold, the “best fit” character is adopted. If the match coefficient for the “best fit” character is less than the minimum threshold value, no translation is done. If the handwritten character cannot be translated, the character must be re-entered.
A disadvantage of current character recognition algorithms is limited accuracy. Often, handwritten characters are not translated at all or are mistranslated as an ASCII character other than the handwritten character. The mistranslated character must then be rewritten by the user, sometimes repeatedly, until a correct translation is made.
Handwriting Recognition in Portable Systems
A portable pen-based computer systems is constrained by the amount of power stored in its battery. Typically, portable pen-based computer systems, which require handwriting recognition (HWR), rely on grid based single character recognition, which forces users to print characters in stylized formats. This approach is not suitable for entering large text segments. A better approach for entering large text segments is to enable users to write naturally on the screen in their own personal, unconstructed style using HWR algorithms. However, HWR algorithms require a large amount of computation to translate handwritten symbols into machine readable characters. Typical portable pen-based computer systems lack the computational power necessary to satisfactorily perform translations.
Typical portable pen-based computer systems use a general purpose CPU for HWR calculations. Typically, a general purpose CPU is inefficient in power consumption during HWR calculations. The general purpose CPU is designed to perform more than HWR calculations, so some functions of the CPU are powered, but not used for the HWR calculation. Additionally, a general purpose CPU is inefficient in speed during HWR calculations. The general purpose CPU must be able to perform certain operating system tasks while completing HWR calculations. Thus, the speed with which HWR calculations are completed is diminished. As a result, fewer HWR calculations may be completed in a limited amount of time. Thus, if the time for HWR is limited, the accuracy of the translation is also limited.
Single Symbol Translation
Typically, portable pen-based computer systems translate one character at a time. However, such a scheme is difficult when a user has poor handwriting. For example,
The present invention provides a method and apparatus for accelerated handwritten symbol recognition in a pen based tablet computer. In one embodiment, handwritten symbols are translated into machine readable characters using hidden Markov models. In one embodiment, handwritten symbols are translated into machine readable characters using special purpose hardware. In one embodiment, the special purpose hardware is a recognition processing unit (RPU) which performs feature extraction and recognition. A user inputs the handwritten symbols and software recognition engine preprocesses the input to a reduced form. In one embodiment, the preprocessor is fully information preserving.
The data from the preprocessor is sent to the RPU which performs feature extraction and recognition. In one embodiment, the RPU has memory and the RPU operates on data in its memory. In one embodiment, the RPU uses a hidden Markov model (HMM) as a finite state machine that assigns probabilities to a symbol state based on the preprocessed data from the handwritten symbol. In another embodiment, the RPU recognizes collections of symbols, termed “wordlets,” in addition to individual symbols.
In one embodiment, the software recognition engine uses the data from the RPU in a postprocessor. The postprocessor computes a stream of symbol observation events from data produced by the RPU and writer confirmation data. In one embodiment, the postprocessor also uses information about context, spelling, grammar, past word usage and user information to improve the accuracy of the symbols produced.
These and other features, aspects and advantages of the present invention will become better understood with regard to the following description, appended claims and accompanying drawings where:
The invention is a method and apparatus for accelerated handwritten symbol recognition in a pen based tablet computer. In the following description, numerous specific details are set forth to provide a more thorough description of embodiments of the invention. It is apparent, however, to one skilled in the art, that the invention may be practiced without these specific details. In other instances, well known features have not been described in detail so as not to obscure the invention.
Handwriting Recognition Calculations
In one embodiment, HMM calculations are used to determine the probability of a symbol appearing in a sequence of symbol observations. A HMM with N states, M observation symbols, the state alphabet Vs={S1, S2, . . . , SN} and the emission alphabet Ve={v1, v2, . . . , vM} is defined by the triplet λ=(A, B, π). A is a state transition matrix defined as aij=P(qt+1=Sj|qt=Si) for 1≦i≦N and 1≦j≦N, which means the probability that the state at time t+1 is state j given that the state at time t is state i. B is the observation probability matrix defined as bj(k)=P(vk|qt=Sj) for 1≦j≦N and 1≦k≦M, which means the probability of the observation being vk given that the state at time t is state j. p is an initial state distribution defined as πi=P(q1=Si), which means the probability of the state at time 1 is state i.
If we have an observation sequence O=(o1o2 . . . oT), the RPU calculates the probability of this sequence given the model λ. This value is calculated by determining a series of values termed forward variables defined as αi(t)=P(o1o2 . . . ot, qt=Si|λ), which means the probability of the observation sequence from 1 to t and the state at time t being state i given λ. These values are calculated by initializing the variable as αi(1)=πibi(o1) for 1≦i≦N. Further values are calculated using
for 1≦t≦T−1 and 1≦j≦N. The probability of the sequence given λ is defined by
Similarly, a backward variable is defined as βi(t)=P(ot+1ot+2 . . . oT|qt=Si, λ), which means the probability of the observation sequence from t+1 to T given state at time t being state i given λ. The backward variables are initialized as βi(T)=1 for 1≦i≦N. Further values are calculated using
for 1≦t≦T−1 and 1≦j≦N.
The calculations to compute forward and backward variables are performed in the RPU. Thus, probabilities can be calculated for each new symbol to determine which symbols the new symbol is most likely to be. In one embodiment, the HMM calculations are performed on a general purpose computational unit.
Pre-processed symbol observations are the input to the HMMs. In one embodiment, the symbol observation alphabet (the emission alphabet) is comprised of angles of equalized segments. In other embodiment, more complex symbol observation alphabets are used. In one embodiment, at least one HMM is created for each symbol in the output alphabet. The probability of each symbol given the observation sequence is calculated by each symbol's HMM. In one embodiment, a post-processing unit uses the information from the HMMs to determine the appropriate symbol.
Training A, B and π
It is desirable to select the parameters A, B and π of λ that maximize the probability of a sequence in the training set given λ. One algorithm used to determine the parameters is the Baum-Welch method. The Baum-Welch method guarantees a monotonically increasing probability and converges quickly.
First, a joint event variable is defined as εij(t)=P(qt=Si, qt+1=Sj|O, λ), which means that the probability of the state at time t being state i and the state at time t+1 being state j given sequence O and λ. From the definitions of forward and backward variables, this becomes εij(t)=(αi(t)aijbj(ot+1)βJ(t+1))/P(O|λ).
Additionally, a state variable is defined as γi(t)=P(qt=Si|O, λ), which means the probability of the state at time t being state i given sequence O and λ. From the definitions of forward and backward variables, this becomes γi(t)=(αi(t)βj(t))/P(O|λ).
A new λ, λ′, is calculated as follows. A new a, a′, is calculated as
A new b, b′, is calculated as
A new π, π′, is calculated as π′i=γi(1).
A variation of the Baum-Welch method, termed the “Levingson method,” calculates λ′ as follows when K observation sequences are used to adjust the parameters. A new a, a′, is calculated as
A new b, b′, is calculated as
A new π, π′, is calculated as
Special Purpose Hardware for Recognition Processing
In one embodiment, handwritten symbols are translated into machine readable characters using special purpose hardware. In one embodiment, the special purpose hardware is a recognition processing unit (RPU) which performs feature extraction and recognition. In another embodiment, a user inputs the handwritten symbols and software recognition engine preprocesses the input to a reduced form. In one embodiment, the preprocessor is fully information preserving.
The unconfirmed symbol observation is presented to the writer. The writer can confirm a symbol, reject a symbol or make no determination. The postprocessor uses confirmed symbol observations (260), rejected symbol observations and unconfirmed symbol observations to adjust how it makes symbol observations. The preprocessor also uses confirmed symbol observations and unconfirmed symbol observations to adjust how it preprocesses the handwritten symbols. Additionally, training data (270) is used by the preprocessor, the RPU, and the postprocessor to adjust their calculation to achieve more accurate symbol translations.
The special purpose hardware of the RPU enables the system to calculate more handwriting recognition calculations in the same amount of time when compared to a system where handwriting recognition calculations are performed by a general purpose processor (the MPU). In one embodiment, the RPU uses parallel processing to make multiple handwriting recognition calculations each clock cycle. Typically, a general purpose processor requires multiple clock cycles to perform one handwriting recognition calculation. In one embodiment, the RPU performs eight handwriting recognition calculations in parallel for each clock cycle. Since the RPU only performs handwriting recognition calculations, no power is wasted during the calculation. Thus, the same amount of handwriting recognition calculations will require less power when computed by an RPU than when computed by a general purpose processor.
Memory on the RPU
In one embodiment, the data from the preprocessor is sent to the RPU which performs feature extraction and recognition. In another embodiment, the RPU has memory and the RPU operates on data in its memory.
for 1≦t≦T−1 and 1≦j≦N.
In some embodiments, the RPU has multiple HMM calculation units to enable multiple HMM calculations to take place in parallel. In one embodiment, the RPU has N HMM calculation units. Thus, values of αj(t+1) are calculated in parallel for all values of j.
Symbols and Wordlets
In one embodiment, the RPU uses a hidden Markov model (HMM) as a finite state machine that assigns probabilities to a symbol based on the preprocessed data from the handwritten symbol. For example, a handwritten symbol may have a one in three probability of being an “e” and a one in four probability of being an “i.” In another embodiment, the RPU recognizes collections of symbols, termed “wordlets,” in addition to individual symbols.
For example, the RPU may recognize “tion” or “ing” as one symbol. The output alphabet contains “tion” and “ing” in addition to “t”, “i”, “o”, “n” and “g”. The ability to recognize a handwritten symbol as the wordlet “ing” improves the accuracy of translation. For example, in
Probabilistic Context Free Grammar
In one embodiment, probabilistic context free grammar information is used to improve the accuracy of symbol translation. A probabilistic context free grammar is defined as G=(VN, VT, P, S). VN is a nonterminal feature alphabet defined as VN={F1, F2, . . . , FN}. VT is a terminal feature alphabet defined as VT={w1, w2, . . . , wM}. All of the production rules of the grammar are of the form Fi→FjFk or Fi→wk where FiεVN are nonterminal features and wkεVT are terminal features. F1 is set equal to the entire string of terminals.
These production rules are defined by tensors A and B as P=(A, B). For the nonterminal features, a probability tensor A of rank 3 is defined as aijk=P(Fi→FjFk) for 1≦i≦N, 1≦j≦N and 1≦k≦N. For the terminal features, a production probability matrix B is defined as bj(k)=P(Fj→wk) for 1≦j≦N and 1≦k≦M.
In one embodiment, the probability of a string of terminals of length T, W=w1w2 . . . wT, where wkεVT is determined given a probabilistic context free grammar defined as P(W|G). In one embodiment, the probability of a sub-sequence, Wp,q=wp . . . wq, termed an “inside probability” is calculated. The inside probability is initialized as βi(t, t)=bi(wt) for 1≦i≦N and 1≦t≦T. Successive inside probabilities are determined by calculating
for 1≦i≦N. At termination, P(W|G)=β1(1,T).
Similarly, the “outside probability” is the probability that the sub-sequence, Wp,q=wp . . . wq, was generated by the nonterminal Fi in the sequence W=w1w2 . . . wT. The outside probability is initialized as αi(1, T)=δ1i for 1≦i≦N. δ1i=1 for i=1 and δ1i=0 for all other values of i. Successive outside probabilities are determined by calculating
for 1≦j≦N. At termination,
for 1≦t≦T.
Training for Probabilistic Context Free Grammar
A joint feature probability is defined as ξijk(p, q)=
A parent feature probability is defined as γi(p )=
The joint feature probability and parent feature probability are used to calculate new nonterminal probabilities and new terminal probabilities. The new nonterminal probabilities are calculated as
for 1≦j≦N and 1≦k≦M. The new terminal probabilities are calculated as
1≦j≦N and 1≦k≦M.
In one embodiment, the inside probability is used to determine a probability of a string of observed terminals given a probabilistic context free grammar. In one embodiment, the tensors of the probability are trained on a sample language by calculating new terminal and nonterminal probabilities using the above equations. In one embodiment, the inside probability is calculated using general purpose hardware. In another embodiment, the inside probability is calculated using special purpose hardware.
Context Consideration to Improve Accuracy
In one embodiment, the software recognition engine uses the data from the RPU in a postprocessor. The postprocessor computes a stream of symbol observation events from data produced by the RPU and writer confirmation data. In one embodiment, the postprocessor also uses information about context, spelling, grammar, past word usage and user information to improve the accuracy of the symbols produced.
In one embodiment, the postprocessor uses spelling information combined with previously generated symbols to determine the current symbol.
In another embodiment, the postprocessor uses grammar information combined with previously generated symbols to determine the current symbol.
In one embodiment, the postprocessor uses past word usage information combined with previously generated symbols to determine the current symbol.
In one embodiment, the postprocessor uses user information combined with previously generated symbols to determine the current symbol.
Thus, a method and apparatus for accelerated handwritten symbol recognition in a pen based tablet computer is described in conjunction with one or more specific embodiments. The invention is defined by the following claims and their full scope and equivalents.
The present application is based on U.S. provisional patent application No. 60/201,581 filed on May 3, 2000, and claim priority to that application.
Number | Name | Date | Kind |
---|---|---|---|
5107541 | Hilton | Apr 1992 | A |
5442715 | Gaborski et al. | Aug 1995 | A |
5455901 | Friend et al. | Oct 1995 | A |
5542006 | Shustorovich et al. | Jul 1996 | A |
5687254 | Poon et al. | Nov 1997 | A |
5802204 | Basehore | Sep 1998 | A |
5812698 | Platt et al. | Sep 1998 | A |
5854855 | Errico et al. | Dec 1998 | A |
5883986 | Kopec et al. | Mar 1999 | A |
5940532 | Tanaka | Aug 1999 | A |
6154722 | Bellegarda | Nov 2000 | A |
6298154 | Cok | Oct 2001 | B1 |
6633282 | Monroe | Oct 2003 | B1 |
Number | Date | Country | |
---|---|---|---|
20020118879 A1 | Aug 2002 | US |
Number | Date | Country | |
---|---|---|---|
60201581 | May 2000 | US |