1. Technical Field
A “Constrained Predictive Interface” provides various techniques for using predictive constraints in a source-channel model to improve the usability, accuracy, discoverability, etc. of user interfaces such as soft keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc.
2. Related Art
Conventional “single-tap” key entry systems are referred to as “predictive” because they predict the user's intended word, given the current sequence of keystrokes. In general, conventional predictive interfaces ignore any ambiguity between characters upon entry to enter a character with only a single tap of the associated key. However, because multiple letters may be associated with the key-tap, the system considers the possibility of extending the current word with each of the associated letters. Single-tap entry systems are surprisingly effective because, after the first few key-taps of a word, there are usually relatively few words matching that sequence of taps. However, despite improved performance, single-tap systems are still subject to ambiguity at the word level. Various techniques exist for using contextual information of words to aid the overall prediction process.
Predictive virtual keyboards and the like have been implemented in a number of space-limited environments, such as the relatively small display area of mobile phones, PDA, media players, etc. For example, one well-known mobile phone provides a virtual keyboard (rendered on a touch-screen display) that uses a built-in dictionary to predict words while the user is typing those words. Using these predictions, the keyboard readjusts the size of “tap zones” of letters, making the ones that are most likely to be selected by the user larger while making the tap zones of letters that are less likely to be typed smaller. Note that the displayed keys themselves do not change size, just the tap zones corresponding to physical regions that allow those keys to be selected by the user.
More specifically, conventional solutions in this field often use a “source-channel predictive model” to implement a predictive user interface (UI). In general, the predictive features of these techniques are implemented by using a statistical model that models the likelihood that users would type different sequences of keys (a source model or language model). This source model is then combined with another statistical model that models the likelihood that a user touching different soft keys will generate different digitizer detection patterns (i.e., a channel model or touch model). In the case of a virtual keyboard, the digitizer typically outputs an (x, y) coordinate pair for each touch or tap, with that coordinate then being used to identify or select a particular key based on the tap zone corresponding to the (x, y) coordinate. In other words, a source-channel model has components including a source model and a channel model.
One problem with some of the conventional source-channel predictive models that are used to enable virtual keyboards is that in some cases, overly strict predictive models actually prevent the user from selecting particular keys, even if the user wants to select a particular key. For example, one well-known mobile phone, which provides a touch-screen based virtual keyboard, will not allow the user to type the letter sequence “Steveb” since the predictive model assumes that the user is actually attempting to type the name “Steven” (since the “n” key is adjacent to the “b” key on a standard QWERTY style keyboard). The problem here is that that in the case that the user is actually trying to type an email address, such as “steveb@microsoft.com” the aforementioned mobile phone predictive model will not allow this address to be typed.
Additional examples of the overly strict predictive model of the aforementioned mobile phone include not allowing the user to deviate from typing any character surrounding the last character of various words such as, for example, “know”, “time”, “spark”, “quick”, “build”, “split”, etc. In other words, the tap zones of letters surrounding the last letter of such words is either eliminated or sufficiently covered by the tap zone of the letter expected by the conventional source-channel predictive model such that the user simply cannot select the tap zone for any other letter. An example is that in the case of the word “know”, the user is prevented by selecting the characters surrounding the “w” key (on a qwerty keyboard) such that the user is specifically prevented from selecting either the “q” (left), or the “e” (right) key surrounding the “w” key. This is a problem if the user is typing an alias or a proper noun, such as the company name “Knoesis”.
Another conventional “soft keyboard” approach introduces the concept of fuzzy boundaries for the various keys. For example, when a user presses a spot between the “q” and the “w” keys, the actual letter “pressed” or tapped by the user is automatically determined based on the precise location where the soft keyboard was actuated, the sequence of letters already determined to have been typed by the user, and/or the typing speed of the user. In other words, this soft keyboard provides a predictive keyboard interface that predicts at least one key within a sequence of keys pressed by the user that is only a partial function of the physical location tapped or pressed by the user. Further, in some cases, this soft keyboard will render predicted keys differently from other keys on the keyboard. For example, the predicted keys may be larger or highlighted differently on the soft keyboard as compared to the other keys, making them more easily typed by a user as compared to the other keys.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
In general, a “Constrained Predictive Interface,” as described herein, uses a “source-channel predictive model” to implement predictive user interfaces (UI). However, in contrast to conventional source-channel predictive models, the Constrained Predictive Interface further uses various predictive constraints on the overall source-channel model (either as a whole, or on either the source model or the channel model individually) to improve UI characteristics such as accuracy, usability, discoverability, etc. This use of predictive constraints improves user interfaces such as soft or virtual keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc. Note that the terms “soft keyboard” and “virtual keyboard” are used interchangeably herein to refer to various non-physical keys or keyboards such as touch-screen based keyboards having one or more keys rendered on a display device, laser or video projection based keyboards where an image of keys or a keyboard is projected onto a surface, or any other similar keyboard lacking physical keys that are depressed by the user to enter or select that key.
More specifically, in various embodiments, the predictive constraints limit the source-channel model by forcing specific user actions regardless of any current user input context when conditions corresponding to specific predictive constraints are met by user input received by the Constrained Predictive Interface. In other words, in various embodiments, the Constrained Predictive Interface ensures that a user can take any desired action at any time by taking into account a likelihood of possible user actions in different contexts to determine intended user actions (e.g., intended user input or command) relative to the additional predictive constraints on either the channel model, the source model, or the overall source-channel predictive model.
For example, in the context of virtual keyboard interfaces, various embodiments of the Constrained Predictive Interface use predictive constraints such as key “sweet spots” within an overall “hit target” defining each key. In general, selection of the overall hit target of a particular key may return that key, or some neighboring key, depending upon the probabilistic context of the user input based on an evaluation of that input by the source-channel model. However, selection of the sweet spot of a particular key will return that key, regardless of the probabilistic or predictive context associated with the overall source-channel model. In other words, in a soft or virtual keyboard, the hit target of each key corresponds to some physical region in proximity to each key that may return that key when some point within that physical region is touched or otherwise selected by the user, while the sweet spot within that hit target will always return that key (unless additional limitations or exceptions are used in combination with the constraints).
In related embodiments, predictive hit target resizing provides dynamic real-time virtual resizing of one or more particular keys based on various probabilistic criteria. Consequently, hit target resizing makes it more likely that the user will select the intended key, even if the user is not entirely accurate when selecting a position corresponding to the intended key. Further, in various embodiments, hit target resizing is based on various probabilistic piecewise constant touch models, as specifically defined herein. Note that hit target resizing does not equate to a change in the rendered appearance of keys. However, in various embodiments of the Constrained Predictive Interface, rendered keys are also visually increased or decreased in size depending on the context.
In further embodiments, a user adjustable or automatic “context weight” is applied to either the source (or language) model, to the channel (or touch) model, or to a combination thereof. For example, in various embodiments of the automatic case, the context weight, and which portion of source-channel model that weight is applied to, is a function of one or more observed user input behaviors or “contexts”, including factors such as typing speed, latency between keystrokes, input scope, keyboard size, device properties, etc., which depend on the particular user interface type being enabled by the Constrained Predictive Interface. The context weight controls the influence of the predictive intelligence of the source or channel model on the overall source-channel model.
For example, in the case of a virtual keyboard, as the context weight on the touch model is increased relative to the language model, the influence of the predictive intelligence of the touch model on the overall language-touch model of the virtual keyboard becomes more dominant. Note also that in various embodiments, the context weight is used to limit the effects of the predictive constraints on the source or channel model (since the influence of the predictive intelligence of those models on the overall source-channel model is limited by the context weight). However, in related embodiments, the predictive constraints on either component of the source-channel model are not influenced or otherwise limited by the of the optional context weight.
In view of the above summary, it is clear that the Constrained Predictive Interface described herein provides various techniques for applying predictive constraints to a source-channel predictive model to improve characteristics such as accuracy, usability, discoverability, etc. in a variety of source-channel based predictive user interfaces. Examples of such predictive interfaces include, but are not limited to soft or virtual keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc. In addition to the just described benefits, other advantages of the Constrained Predictive Interface will become apparent from the detailed description that follows hereinafter when taken in conjunction with the accompanying drawing figures.
The specific features, aspects, and advantages of the claimed subject matter will become better understood with regard to the following description, appended claims, and accompanying drawings where:
In the following description of the embodiments of the claimed subject matter, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the claimed subject matter may be practiced. It should be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the presently claimed subject matter.
1.0 Introduction
In general, a “Constrained Predictive Interface,” as described herein, provides various techniques for using predictive constraints in combination with a source-channel predictive model to improve accuracy in a variety of user interfaces, including for example, soft or virtual keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc. More specifically, the Constrained Predictive Interface provides various embodiments of a source-channel predictive model with various predictive constraints applied to the source-channel model (either as a whole, or on either the source model or the channel model individually) to improve UI characteristics such as accuracy, usability, discoverability, etc.
Note that the concept of source-channel predictive models for user interfaces is known to those skilled in the art, and will not be described in detail herein. However, the concept of applying additional predictive constraints to the channel model of the overall source-channel predictive model to enable the Constrained Predictive Interface will be described in detail herein. Further, it should also be noted that the terms “soft keyboard” and “virtual keyboard” are used interchangeably herein to refer to various non-physical keys or keyboards such as touch-screen based keyboards having one or more keys rendered on a touch-screen display device, laser or video projection based keyboards where an image of keys or a keyboard is projected onto a surface in combination with the use of various sensor devices to monitor user finger position, or any other similar keyboard lacking physical keys that are depressed by the user to enter or select that key. In addition, it should also be understood that that soft and virtual keyboards are known to those skilled in the art, and will not be specifically described herein except as they are improved via the Constrained Predictive Interface.
For example, in the case of a soft or virtual keyboard, the source model is represented by a probabilistic or predictive language model while the channel model is represented by a probabilistic or predictive touch model to construct a predictive language-touch model. In this case, the language model provides a predictive model of probabilistic user key input sequences, based on language, spelling, grammar, etc. Further, the touch model provides a predictive model for generating digitizer detection patterns corresponding to user selected coordinates relative to the soft keyboard. These coordinates then map to particular keys, as a function of the language model. In other words, the language and touch models are combined to produce a probabilistic language-touch model of the soft keyboard. However, in contrast to conventional language-touch models (or other source-channel predictive models), the touch (or channel) model is further constrained by applying predictive constraints to the touch model. The result is a source-channel predictive model having predictive constraints on the channel model to improve the accuracy of the overall source-channel predictive model.
1.1 System Overview
As noted above, the “Constrained Predictive Interface,” provides various techniques for applying predictive constraints on the channel model to improve accuracy in a variety of source-channel based predictive UIs, including for example, soft or virtual keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc. The processes summarized above are illustrated by the general system diagram of
In particular, the system diagram of
In addition, it should be noted that any boxes and interconnections between boxes that may be represented by broken or dashed lines in
In general, as illustrated by
Once the source-channel model 100 has been defined for the particular user interface being enabled by the Constrained Predictive Interface, a user input evaluation module 115 receives a user input from a user input module 120. As noted above, the user input evaluation module 115 queries the source-channel model 100 with the input received from the user input module 120 to determine what that user input represents (e.g., a particular key of a soft keyboard, a particular gesture for a gesture-based UI, etc.). As noted above, Constrained Predictive Interface can be used to enable any user interface that is modeled using a source-channel based prediction system. Examples of such interfaces include soft keyboards 125, speech recognition 130 interfaces, handwriting recognition 135 interfaces, gesture recognition 140 interfaces, EMG sensor 145 based interfaces, etc.
In the case of virtual UIs such as a soft keyboard, for example, where the keyboard is either displayed on a touch screen or rendered on some surface or display device, a UI rendering module 150 renders the UI so that the user can see the interface in order to improve interactivity with that UI. In various embodiments, “hit targets” associated with the keys are expanded or contracted depending on the context. In general, in the case of a soft or virtual keyboard (or other button or key-based UI), the hit target of each key or button corresponds to some physical region in proximity to each key that will return that key when some point within that physical region is touched or otherwise selected by the user. See Section 2.1 and Section 2.2 for further discussion on “hit-target” resizing (also discussed herein as “resizable hit targets”).
Further, in related embodiments corresponding to key-based UI's such as soft keyboards or virtual button based interfaces, key resizing is used such that various keys or buttons of the UI visually expand or contract in size depending upon the current probabilistic context of the user input. For example, assuming that the current context makes it more likely that the user will type the letter “U” (i.e., the user has just typed the letter “Q”), the representation of the letter “U” in the rendered soft keyboard will be increased in size (while surrounding keys may also be decreased in size to make room for the expanded “U” key). In such cases, the UI rendering module 150 receives key or button resizing instruction input (as a function of the current input context) from the user input evaluation module 115 that in turn queries the source-channel model 100 to determine the current probabilistic context of the user input for making resizing decisions. In addition, it should be understood that both hit target resizing and key resizing may be combined to create various hybrid embodiments of the Constrained Predictive Interface, as described herein.
Once the user input evaluation module 115 determines the intended user input via the source-channel model 100, the user input evaluation module passes that information to a UI action output module 155 that simply sends the intended user input to a UI action execution module 160 for command execution. For example, if the intended user determined by the user input evaluation module 115 input is a typed “U” key, the UI action output module 155 sends the “U” key to the UI action execution module 160 which then processes that input using convention techniques (e.g., inserting the “U” key into a text document being typed by the user).
As noted above, the Constrained Predictive Interface uses various predictive constraints 165 on the channel model 110 (i.e., the touch model in the case of a soft or virtual keyboard) in the source-channel predictive model to ensure that particular usability constraints will be honored by the system, regardless of the context. More specifically, as described in Section 2.5, in various embodiments of the Constrained Predictive Interface, one or more a priori constraints are used to limit the channel model 110 in order to improve the user experience. For example, in the case of soft or virtual keyboards, these a priori predictive constraints 165 include concepts such as, for example, “sweet spots” and “convex hit targets.”
Considering the case of a virtual keyboard, “sweet spots” are defined by a physical region or area located in or near the center of each rendered key that returns that key, regardless of the probabilistic or predictive context returned by the source-channel model 100. Similarly, in the case of a virtual keyboard, the use of convex hit targets changes the shape (and typically size) of the hit targets for one or more of the keys as a function of the current probabilistic context of the user input. However, it should be understood that as described in Sections 2.5 and 2.8, the specific type of predictive constraint 165 applied to the touch model 110 will depend upon the particular type of UI (i.e., UI's based on virtual keyboards, speech, handwriting, gestures, EMG sensors, etc. will use different predictive constraints).
In various related embodiments, a constraint adjustment module 170 is provided to allow either or both manual or automatic adjustments to the predictive constraints. For example, in the case of a soft or virtual keyboard, the size of the sweet spot associated with one or more specific keys can be increased or decreased, either automatically or by the user, via the constraint adjustment module 170. Similarly, in the case of a handwriting-based UI, where the “sweet-spot” constraint on the channel model is any pattern, within some fixed threshold of an exemplary pattern, that is recognized as a corresponding character or word, regardless of any probabilistic context associated with the corresponding source-channel model 100. Therefore, in this case, the constraint adjustment module 170 will be used to adjust the fixed threshold around the exemplary pattern within which a particular character or word is always recognized, regardless of the probabilistic context (unless additional limitations or exceptions are used in combination with the constraints).
In further embodiments (see Section 2.4), the concept of a “context weight” is applied to either the source model 105 or the channel model 110, or to a combination of both models. In particular, while predictive models such as the source-channel model 100 are useful for improving the accuracy of various UIs, overly strict predictive models can actually prevent the user from achieving particular inputs (such as selecting particular keys of a virtual keyboard), regardless of the user intent. Therefore, to address such issues, in various embodiments, a context weight module 175 allows the user to adjust a weight, α, typically ranging from 0% to 100% (but can be within any desired range) when weighting the source model 105, or typically from 100% and up (but can be within any desired range) when weighting the channel model 110. In general, at a context weight of 0% on the source model, the predictive intelligence of the source model 105 is eliminated, while at 100% weighting, the predictive intelligence of the weighted source model behaves as if it is not weighted. Similarly, as the weight on the channel model 110 is increased above 100%, the predictive influence of the channel model becomes more dominant over that of the source model 105.
For example, in the case of a soft or virtual keyboard with weighting of the language model (i.e., the source model 105), it is useful for the hit targets for each key to correspond to the boundaries of each of the rendered keys when the context weight is set at or near 0% on the language model. Note that causing keys to correspond to the boundaries of each of the rendered keys is the same result that would be obtained if no predictive touch model were used in implementing the virtual keyboard. In other words, pressing anywhere in the rendered boundary of any key will return that key in this particular case. Conversely, where the context weight on the touch model is increased above 100%, the predictive influence of the touch model (such as, for example, context-based hit target resizing) will increase, with the result that key hot targets may not directly correspond to the rendered keys.
In related embodiments, a weight adjustment module 180 automatically adjusts the context weight on either or both the source model 105 or the channel model 110 as a function of various factors (e.g., user typing speed, latency between keystrokes, input scope, keyboard size, device properties, etc.) as determined by the user input evaluation module 115. In addition, in various embodiments, the weight adjustment module 180 also makes a determination of which of the models (i.e., the source model 105 or the channel model 110) is to be weighted via the use of the context weight. See Section 2.4 for additional details regarding use of the context weight to modify the predictive influence of either the source model 105 or the channel model 110.
2.0 Operational Details of the Constrained Predictive Interface
The above-described program modules are employed for implementing various embodiments of the Constrained Predictive Interface. As summarized above, the Constrained Predictive Interface provides various techniques for applying predictive constraints on a source-channel predictive model to improve UI characteristics such as accuracy, usability, discoverability, etc. in a variety of source-channel based predictive user interfaces. The following sections provide a detailed discussion of the operation of various embodiments of the Constrained Predictive Interface, and of exemplary methods for implementing the program modules described in Section 1 with respect to
In particular, the following sections provide examples and operational details of various embodiments of the Constrained Predictive Interface. This information includes: a discussion of common techniques for improving the accuracy of soft keyboards; source-channel model based approaches to input modeling; “effective hit targets” for use by the Constrained Predictive Interface; controlling the impact of user interface (UI) intelligence; predictive constraints for improving UI usability; constrained touch models; examples of specific touch models for soft or virtual keyboards or key/button-type interfaces; and the extension of the Constrained Predictive Interface to a variety of user interface types.
2.1 Improving the Accuracy of Soft Keyboards
As is known to those skilled in the art, typing accurately and quickly on a soft or virtual keyboard is generally an error prone process. This problem is especially evident when using relatively small mobile devices such as mobile phones. The reasons for this include the lack of haptic feedback (e.g., touch-typing is more difficult when the boundaries of the keys cannot be felt) and the small size of the keys with respect to the fingertips. Several intelligent keyboard technologies have been introduced to help alleviate such problems. These known technologies include:
As described in the following paragraphs, the Constrained Predictive Interface described herein builds on these known techniques for applying predictive constraints on the channel model in a source-channel predictive model to improve accuracy in a variety of source-channel based predictive user interfaces. Examples of such user interfaces include, but are not limited to, soft or virtual keyboards, pen interfaces, multi-touch interfaces, 3D gesture interfaces, myoelectric or EMG based interfaces, etc.
2.2 Source-Channel Approach to Input Modeling
In general, conventional source-channel based approaches to input modeling provide methods for improving the accuracy of user input systems such as soft keyboards. Such source-channel models generally use a first statistical model (e.g., a “source model” or a “language model”) to model the likelihood that users would type different sequences of keys in combination with a second statistical model (e.g., a “channel model” or “touch model”) that models the likelihood that a user touching different soft keys will generate different digitizer detection patterns. Note that for purposes of explanation regarding the use of soft or virtual keyboards, the following discussion will assume that the digitizer outputs an (x, y) coordinate pair for each touch. Further, these ideas can be extended to more elaborate digitizer outputs such as bounding boxes.
Language models assign a probability pL(k1, . . . , kn) to any sequence of keys, k1, . . . , kn ∈ . Typically, causal or left-to-right language models are used that allow this probability, pL, to be efficiently computed in a left-to-right manner using Bayes' rule as p(k1)p(k2|k1)p(k3|k1,k2) . . . p(kn|k1, . . . , kn−1). Often, an N-gram model where the approximation pL(ki|k1, . . . ,ki−1)≈pL(ki|ki−(N−1), . . . ,ki−1) is used.
In contrast, a touch model assigns a probability pT(x1, . . . ,xn|k1, . . . ,kn) to the digitizer generating the sequence of touch locations x1, . . . ,xn ∈ ⊂ 2 when the user types keys k1, . . . ,kn. Typically an independence assumption is made to give pT(x1, . . . ,xn|k1, . . . ,kn)≈Πi=1npT(xi|ki).
Given a language model and a touch model, hit target resizing is implemented by taking the keys typed so far k1, . . . ,kn−1 and the touch location xn to decide what the nth key typed was, according to:
which is given by
2.2.1 Hit-Target Resizing with Source-Channel Modeling
While conventional source-channel modeling does not explicitly resize the hit target, conventional source-channel modeling leads to implicit hit targets for each key in each context, consisting of the touch locations that return that key.
For example, automatic correction of hit targets can be done by done by examining the key presses or touches of the user with respect to the probability of each key, as illustrated by Equation (3):
(k1, . . . ,kn)*=argmaxk
which is given by Equation (4), as follows:
(k1, . . . ,kn)*=argmaxk
which can be efficiently computed using dynamic programming techniques.
2.2.2 Prediction/Auto-Completion with Source-Channel Modeling
In a source-channel modeling system, prediction/auto-completion can be done by as a function of the key sequences pressed, touched, or otherwise selected by the user in combination with the probability of each key or key sequence as illustrated by Equation (5), as follows:
(k1, . . . ,km)*=argmaxm≧nargmaxk
where km is constrained to be a word separator (e.g., dash, space, etc.).
Because the problem is decomposed into a language model and a touch model, the language model can be estimated based on text data that was not necessarily entered into the target keyboard, and the touch model can be trained independently of the type of text a user is expected to type. Note that the source-channel approach described here is analogous to the approach used in speech recognition, optical character recognition, handwriting recognition, and machine translation. Thus, more sophisticated approaches such as topic sensitive language models, context sensitive channel models, and adaptation of both models can be used here. Further, the ability to specify the touch model and language model independently is critical. In practice, the language model may depend on application and input scope (e.g., specific language models for email addresses, URLs, body text, etc.), while the touch model may depend on the device dimensions, digitizer, and the keyboard layout.
2.3 Effective Hit Targets
For each of the three cases described in Section 2.2, including hit target resizing, auto-correction, and auto-completion, the Constrained Predictive Interface defines an “effective hit target,” (c), for any particular key, k, of a soft or virtual keyboard given a context, c, as:
(c)={x ∈ χ π(k|c)pT(x|k)≧π(k′|c)pT(x|k′)∀k′ ∈ } Equation (6)
The prior probability, π(k|c), of k in the context c may depend on the language model and the touch model depending on the information encoded in the context. In the case of hit target resizing, it includes all prior letters, and therefore is the language model probability of k given the keystroke history preceding the current user keystroke. Similarly, In the case of auto-correction, the prior probability, π(k|c), is the posterior probability of k given all previous and following touch locations, and depends on both the language and touch models. Note that for purposes of explanation, the following discussion will sometimes will leave the context implicit by referring to the effective hit target as simply Note that “effective hit target” refers to the points on the keyboard where a specific key is returned, and not the key that the user intended to hit (i.e. the “target key”).
2.4 Controlling the Impact of UI Intelligence
While predictive models are useful for improving the accuracy of soft keyboards, overly strict predictive models can actually prevent the user from selecting particular keys, regardless of the user intent. Consequently, the user (or the operating system or application), may want to control the extent to which intelligent technologies impact the user experience. Reasons that the user may want to control the impact of the predictive model include cases where the predictive technology, being imperfect, does not match the behavior of a particular user in a particular context well, or because the predictive module is unable to determine the appropriate context for making predictions.
In various embodiments, this user (or automatic) control takes the form of a context weight, α, typically ranging between 0% and 100% (but can be within any desired range) for the source model, and typically ranging from 100% and larger for the channel model (but can be set within any desired range). Note that in various embodiments, either or both the source and channel model can be weighted using different context weights. However, it should be also noted while both the source and channel models can be weighted using the same context weights, this equates to the case where neither model is weighted since the common weights will simply cancel each other when determining the output of the source-channel model.
For example, given a context weight on the order of about of 0% on the source model (i.e., the language model in the case of a soft or virtual keyboard) there is little or no predictive intelligence for the source model, thus making the predictive intelligence of the channel model (i.e., the touch model in the case of a soft or virtual keyboard) as dominant as possible. However, the effective removal of the source model from the overall source-channel model in the case where the context weight on the source model is at or near 0% can sometimes cause problems where the user input returned by the source-channel model does not match the input expected by the user. This issue is addressed by the use of a “neutral source model” in place of the weighted source model for cases where the context weight on the source model is at or near 0% (i.e., when α≅0).
In particular, in the case of a soft or virtual keyboard a “neutral language model” (i.e., a “neutral source model”) is used to ensure that the hit targets for each key match the rendered keyboard. In the more general case, the use of a “neutral source model” ensures that actual user inputs directly correspond to “expected user input boundaries” with respect to predefined exemplary patterns or boundaries for specific inputs. Examples of expected user input boundaries for various UI types include rendered boundaries of keys for a soft or virtual keyboard, gestures or gesture angles within predefined exemplary gesture patterns in a gesture-based interface, speech patterns within predefined exemplary words or sound patterns in a speech-based interface, etc.
For example, in the case of a soft or virtual keyboard when weighting the source model, at or near 0%, the hit targets (e.g., region 210 inside broken line around key 200) should align with the rendered keyboard as shown in
As noted above, it should be understood that the concept of using a neutral source model when the context weight applied to the source model is at or near 0% (i.e., α≅0) is extensible to any source-channel model based user interface. However, for purposes of explanation, the following discussion will explain the use of the “neutral language model” (i.e., the “neutral source model”) in the case of a soft or virtual keyboard.
In general, the hit targets should resize to reflect the effect of the predictive models as the weight on the source model approaches 100% (assuming an unweighted channel model). Intuitively, this would be similar to a language model weight commonly used in speech recognition or machine translation. However, the condition that the hit targets match the rendered keyboard when the context weight is at or near 0% (i.e., when α≅0) on the source model introduces a small complication. In particular, hit targets under the language model weight formulation are given by:
(c)={x ∈ χ π(k|c)αpT(x|k)≧π(k′|c)πpT(x|k′)∀k′ ∈ } Equation (7)
When α=0, this reduces to:
(c)={x ∈ χ pT(x|k)≧pT(x|k′)∀k′ ∈} Equation (8)
The condition that these hit targets will match the rendered keyboard, when α≅0, imposes a very strong constraint on the touch model (i.e., the channel model in the more general case). In other words, when α≅0 it is useful for the hit target for each key to match the rendered keyboard without resizing those hit targets. One way to achieve this behavior without restricting the touch model further is to use a “neutral language model”, π0(k), proportional to:
where π0(k) is chosen so that the neutral targets, (c), of each individual key:
(c)={x ∈ χ (π0(k)pT(x|k)≧π0(k)pT(x|k′)∀k′ ∈ } Equation (10)
match the rendered keyboard. This is equivalent to allowing un-normalized touch models. Therefore, the selection of the touch model, pT(x|k), includes the choice of neutral language model, π0(k), that is selected such that the “neutral targets” (i.e., the hit targets corresponding to the use of the neutral language model) of the keys match the rendered keyboard.
Note that the variable a is referred herein as to as a “context weight” to distinguish it from a traditional language model weight. Further, it should also be noted that in various embodiments, the context weight is a function of one or more of a variety of factors such as typing speed, latency between keystrokes, the input scope, keyboard size, device properties, etc. that depend upon the particular type of UI being enabled by the Constrained Predictive Interface.
For example, in the case of a soft or virtual keyboard, as a user types faster (i.e., decreased key input latency), it is expected that the accuracy of the user finger placement will decrease. Consequently, increasing the context weight on the language model (or decreasing the context weight on the touch model) as a function of user typing speed or input latency will generally improve accuracy of the keys returned by the overall source-channel model. Conversely, as the typing speed or input latency decreases (thus indicating a more deliberate user finger placement), decreasing the context weight on the language model (or increasing the context weight on the touch model) as a function of user typing speed or input latency will generally improve accuracy of the keys returned by the overall source-channel model. Similarly, as the size of the keyboard decreases, such as with the input screen of a relatively small mobile phone, PDA, etc., it is more difficult for the user to accurately touch the intended keys since those keys may be quite small. Therefore, increasing the context weight on the source model (or decreasing the context weight on the touch model) as a function of decreasing keyboard size will also generally improve the accuracy of the keys returned by the overall source-channel model.
An expanded example of determining which model (i.e., the source model or the channel model) is to be weighted will now be presented. For example, if the user is typing quickly, then the language model (i.e., the source model) should be weighted more than the touch model (i.e., the channel model). Conversely, if the user is typing slowly, then the touch model should be weighted more. More specifically, if the user is entering keys quickly (i.e., short latencies between keys), it is likely that the user will make more finger positioning mistakes when attempting to hit particular keys. Note that this is true whether user is typing or using any other interface (e.g., gesture interfaces, myoelectric interfaces, etc., with short latencies between user inputs). Further, in view of the preceding discussion, it should be understood that decreasing the weight on the source model can achieve similar results to increasing the weight on the channel model, and vice versa.
Thus, in the case of short latencies between user inputs, it is generally desirable to weight the language model (i.e., the source model) more, under the implicit assumption that the overall system should be good enough to recognize what the user is attempting to input. Other the other hand, if the user is entering keys slowly, then the user is likely trying to be very deliberate about his input. In this situation, it is generally desirable to weight the weight the language model less (or the touch model more) since the user may be trying to enter something that he believes the overall system is not good enough to recognize. For example, if the quickly (and intentionally) types “knoesis”, and the system auto-corrects this word to something not intended, then the next time that the user types it, he will likely type “kno” quickly and then “e” not so quickly—because the user wants to get it right. In other words, given some or all of the various user contexts discussed above, such as input latency, for example, the Constrained Predictive Interface will determine which model to weight (i.e., source model or channel model) along with how much weight should be applied to the selected model. In addition, when the touch model is weighted highly (or the language model is weighted to a level at or near zero), a neutral language model can be used to ensure that the resulting hit targets match the rendered keyboard.
As noted above, in various embodiments of the Constrained Predictive Interface, the context weight is set automatically as a function of various factors, including typing speed, input latencies, the input scope, keyboard size, device properties, etc. However, in related embodiments, the context weights on either or both the source model and the channel model are set to any user-desired values. Such embodiments allow the user to control the influence of the predictive intelligence of the touch model (i.e., the channel model in the more general case) and/or the language model (i.e., the source model in the more general case). Further, the concept of neutral source models, as discussed above, are also applicable to embodiments including user adjustable context weights, with the neutral source model being either automatically applied based on the context weight, as discussed above, or manually selected by the user via a user interface.
2.5 Predictive Constraints for Improving UI Usability
Conventional source-channel models are sometimes considered “optimal” in the sense that as the language model gets closer and closer to modeling the true distribution of text entered into a device, and as the touch model gets closer and closer to the true distribution of digitizer output, the output of the soft keyboard approaches the optimal accuracy possible.
However, the shapes of the hit targets implicit in the language and touch models may be quite different from what a user intuitively expects. This may lead to a confusing user experience. Therefore, in various embodiments of the Constrained Predictive Interface, a priori constraints on the hit targets are specified in order to improve the user experience. In the case of soft or virtual keyboards, these a priori constraints include the concepts of “sweet spots” and “convex hit targets.”
2.5.1 Sweet Spots
In various embodiments, one or more of the keys in the soft or virtual keyboard enabled by Constrained Predictive Interface includes a “sweet spot” in or near the center of each key that returns that key, regardless of the context. For example, the user touching the dead center of the “E” key after typing “SURPRI” should yield “SURPRIE,” even if “SURPRIS” is more likely. In other words, when using sweet spots, the hit target for a key is constrained such that it is prevented from growing to include the “sweet spot” of neighboring keys. This concept is illustrated by
In particular, the problem of unconstrained hit targets is illustrated by
In contrast, as illustrated by
In various embodiments, the sweet spot for each key is consistent in both size and placement for the various keys (i.e., approximately the same size in the approximate center of each key). However, in various embodiments, a user control is provided to increase or decrease the size of the sweet spots either on a global basis or for individual keys.
For example, assume that the user generally has repeated trouble accurately touching the sweet spot of the “Z” key when typing quickly, thereby leading to erroneous selection of the “A”, “S”, or “X” keys. In this case, the user can increase the size of the sweet spot of the “Z” key, or any other desired keys, via the user control to improve the overall user experience. Further, in related embodiments, the sweet spots of one or more of the keys are automatically increased or decreased in size, or automatically repositioned, to reflect learned user typing behavior (e.g., user typically hits on or near a particular coordinate when attempting to select the “Z” key). In addition, it should also be noted there are no particular constraints on the geometric shape of the sweet spot. In other words, each of the sweet spots can be any shape desired (e.g., square, round, amorphous, etc.).
2.5.2 Convet Hit Targets
Another example of a confusing user experience results from the shape of conventional hit targets. For example, if in a particular context, the system returns the same key when the user touches either of two points on the keyboard, it is reasonable for the user to expect that the system will output the same key when the user touches any location between those two points, even if doing so leads to worse accuracy. However, as illustrated by
In particular,
Therefore, in various embodiments, the Constrained Predictive Interface constrains the hit targets to take convex shapes. For example, as illustrated by
Clearly, a constraint such as convex hit targets can be especially helpful in a user interface where a tentative key response is shown to the user when they touch the keyboard. For example, the user can slide their finger around, with the tentative result changing as if they had touched the new current location instead of their original touch location. The response showing when the user releases their finger is selected as the final decision. This allows the user to search for the hit target of their desired key by sliding their finger across the soft keyboard without observing the confusing behavior of the conventional hit target geometries illustrated by
2.6 Constrained Touch Models
In various embodiments, the Constrained Predictive Interface combines the usability constraints of “sweet spots” and “convex hit targets” described in Section 2.5 with source-channel type predictive models to provide an improved UI experience.
In particular, a set of allowable touch models is chosen so that either, or both, of the usability constraints discussed above (i.e., sweet spots and convex hit targets) are satisfied no matter what language model is chosen. However, in various embodiments, the language model is further constrained to be a “smooth” model. In other words, in embodiments employing a smooth language model, the language model allows any key to be hit or selected for any non-zero probability, regardless of the context. Given such a general language model, minimal constraints are imposed on the touch model such that the resulting hit targets obey either, or both, the sweet spot and convexity constraints described above. Note that the following notation is used throughout the following discussion:
Alphabet of keys
χ ⊂ 2 Space of touch points
x,y,z ∈ χ Touch points
i,j,k ∈ Keys, members of
(c) ⊂ χ Hit target for i ∈ in the context c.
⊂ χ Sweet spot for i ∈
⊂ χ Support of pT(x|i)−={x ∈ χpT(x|i)>0}
2.6.1 Guaranteeing the Sweet Spot Constraint
As discussed above, the sweet spot, , for a particular key, i, represents some fixed region in or near the center of that key that will return that key when the digitizer outputs an (x, y) coordinate pair within the boundaries of the corresponding sweet spot, regardless of the current context. Guaranteeing the sweet spot constraint in a system wherein hit targets have variable sizes based on probabilistic models uses a probabilistic modeling of the overall system. For example, consider Theorem 1, which states the following:
Theorem 1: Let ⊂ (c) ∀i ∈ for any choice of context c and language model, and suppose that all sweet spots have non-empty interiors. Then pT(|j)=0 ∀i ≠j. That is, ∩ =φ.
Proof of Theorem 1: For a proof by contradiction, suppose that there exist some i,j ∈ with i≠j, such that pT(|j)=A>0. Since ⊂ (c), it can be seen that
p
T(x|i)π(i|c)>pT(x|j)π(j|c) Equation (11)
for all x ∈ for any choice of language model and context. Integrating both sides over gives:
p
T(|i)π(i|c)>pT(|j)π(j|c) Equation (12)
which gives:
Since this relationship holds for any choice of language model and context, the relationship also holds when
yielding pT(|i)>1, which is a contradiction, thus proving Theorem 1.
Therefore, the touch model ensures that the sweet spot of any particular key can be hit or selected to as long as that the touch model assigns a zero (or very low) probability to any key generating touch points inside another key's sweet spot. Smooth distributions such as mixtures of Gaussians that are traditionally used for acoustic models in speech recognition are therefore inappropriate for use as touch models if the sweet spot constraint is used. Such distributions would have to have their support restricted and then renormalized in order to meet the sweet spot constraint. Indeed, this would hold for any other mixture distribution, such as mixtures of exponential distributions, or other mixtures of distributions of the form
p(x)∝e−||x−x
where the norm ∥·∥ and the power p can be chosen arbitrarily as long as the distributions are normalized.
2.7 Touch Model Examples
The following paragraphs describe various examples of touch models that for are defined for use by the Constrained Predictive Interface for implementing soft or virtual keyboards and other key/button based UIs. In addition, the following examples include a discussion of the properties of the resulting hit targets.
2.7.1 Row-by-Row Touch Models
As illustrated by
For example, in various embodiments, touch models can be defined to use a fixed, constant height for all keys in a keyboard row, and only allow resizing in the horizontal direction. Then, for each key, i, a support, is defined as a rectangle of height hi (where hi is shared by all keys on i's row) and with left and right boundaries at horizontal coordinates li and ri, and a sweet spot ⊂ so that ∩ =φ ∀j≠i. Then, by setting ci to be key i's horizontal coordinate, choosing the touch model pT(x|i) as illustrated by Equation (15) will simultaneously guarantee the sweet spot and convexity constraints of the touch model:
Given this formulation, the neutral language model, π0(k), (as discussed in Section 2.4) is chosen so that the neutral targets match the rendered keyboard.
In particular, the following steps are repeated for each row of keys:
2.7.2 Piecewise Constant Touch Models
Given desired neutral targets and sweet spots for each key i, a “piecewise constant touch model”, pT(x|i), for use in hit target resizing is specifically defined herein as a touch model having a set of Ni>1 nested regions, where (N) ⊂ (N−1) ⊂ . . . ⊂(1) with (n
n
i(x)=max{n: x ∈ (n)} Equation (16)
ƒi(x)=νi(n
Further, let wi=∫ƒi(x)dx, along with the following touch model definitions:
The above-described formulation of a piecewise constant touch models yields hit targets which guarantee the sweet spot constraints and allows neutral targets to match the rendered targets. In other words, hit target expansion and contraction (i.e., hit target resizing) is defined by using the nested regions of the piecewise constant touch model as a function of the current probabilistic context of the user input. This concept of a “piecewise constant touch model”, as described above, is illustrated by
2.7.3 Piecewise Constant Approximable Touch Models
In various embodiments, given a desired support (e.g., rectangle of height hi, as described in Section 2.7.1), neutral target, and sweet spot for each key, a sequence of finer and finer grained piecewise constant touch models (as described in Section 2.7.2) are built whose nested regions and corresponding values are refined further and further, to approximate a continuous function. This approximated continuous function provides a “piecewise constant approximable touch model” for use in hit target resizing. In other words, the “piecewise constant approximable touch model”, as specifically defined herein, provides an approximation of a continuous function (representing a series of nested hit targets for each key) that is used to define a touch model that when used in combination with the neutral language model guarantees the sweet spot constraint and has the aforementioned neutral targets.
For example, a pyramidal piecewise constant approximable touch model, pT(x|i), can be constructed as follows:
For each key i, given its rectangular desired neutral target define a rectangular support, and a sweet spot, such that ⊂ ⊂ and ∩ =φ ∀j≠i. Further, define ƒi(x) to be a unique function that has the following properties:
This touch model yields targets that guarantee the sweet spot constraints and allows neutral targets to match the rendered targets. In other words, a “piecewise constant approximable touch model”, as specifically defined herein, represent a series of nested versions of the piecewise constant touch models described in 2.7.2 for use in hit target expansion and contraction (i.e., hit target resizing).
2.8 Extension to Other Types of User Interfaces
While the discussion above has been presented for a predictive touch keyboard, the principle of using source-channel predictive models with usability constraints to improve UI characteristics such as accuracy, usability, discoverability, etc., is easily extensible to other types of predictive user interfaces. For example, other types of predictive user interfaces for which the Constrained Predictive Interface can improve UI characteristics include speech-based interfaces, handwriting-based interfaces, gesture based interfaces, key or button based interfaces, myoelectric or EMG sensor based interfaces, etc. Note that any or all of these interfaces can be embodied in a variety of devices, such as mobile phones, PDAs, digital picture frames, wall displays, Surface™ devices, computer monitors, televisions, tablet PCs, media players, remote control devices, etc.
Further, it should also be understood that any conventional tracking or position sensing technology corresponding to various user interface types can be used to implement various embodiments of the Constrained Predictive Interface. For example, in the case of a soft or virtual keyboard, a conventional touch-screen type display can be used to simultaneously render the keys and determine the (x, y) coordinates of the user touch. Related technologies include the user of laser-based or camera-based sensors to determine user finger positions relative to a soft or virtual keyboard. Further, such technologies are also adaptable to use in determine user hand or finger positions or motions in the case of a hand or finger-based gesture-based user interface.
In other words, it should be understood that conventional user interface technologies, including touch-screens, pressure sensors, laser sensors, optical sensors, etc., are applicable for use with the Constrained Predictive Interface by modifying those technologies to include the concept of the predictive constraints described herein for improving the UI characteristics of such interfaces.
2.8.1 Handwriting Based Interfaces
Many approaches for handwriting recognition exist, where a language model or source model is used to model the likelihood of different characters or words in a given context and a channel model is used to model the likelihood of different features of the pen strokes given a target word of character. If for example, a pen stroke pattern is ambiguous and could either be interpreted as an ‘a’ or an ‘o,’ the language model would be used to disambiguate. For example, if the preceding characters are “eleph” the pattern would be interpreted as an “a” (since “elephant” is the probable word) while if the preceding characters are “alligat” the pattern would be interpreted as an “o” (since “alligator” is the probable word). However, such a system would make it very difficult for a user to deliberately write “allegata.”
Therefore, to ensure that the user can write whatever characters she wants, the “sweet spot” techniques described above with respect to a soft or virtual keyboard are adapted to modify handwriting-based user interfaces to ensure that any character sequence can be input by the user, regardless of any word or character probability associated with the language model.
In particular, each letter or word is assigned one or more exemplary patterns that take the role of “sweet spots” for that letter or word. In contrast to the region-based sweet spots in or near the center of each key in a soft keyboard, a “sweet-spot” constraint in the context of a language model is any pattern within some fixed threshold of the exemplary patterns that is recognized as the corresponding letters or words, regardless of any word or character probability associated with the language model. Note however, that in various embodiments, conventional spell checks can subsequently be performed on the resulting text to allow the user to correct spelling errors, if desired.
2.8.2 Gesture Based Interfaces
In various embodiments, the “sweet spot” techniques described above with respect to a soft or virtual keyboard are adapted to modify gesture-based user interfaces (such as pen flicks, finger flicks, 3-D hand or body gestures, etc.) are adapted improve the accuracy of 2-D and/or 3-D gesture based interfaces.
In particular, the Constrained Predictive Interface is adapted for use in improving gesture-based user interfaces that allow the use of contextual models to get high recognition accuracy while still ensuring that each gesture is recognizable if carefully executed, relative to one or more exemplary gestures. For example, suppose a horizontal right to left finger flick means “delete” and a diagonal lower right to upper left flick means “previous page.” Suppose also that a source model models the probability of going to the previous page or deleting given the user context. For example, “delete” may be more likely after misspelling a word, while “previous page” may be more likely after a period of inactivity corresponding to reading.
Therefore, a “sweet spot” constraint in this instance would state that a flick from right to left within a couple of degrees to the horizontal would mean delete no matter the context, while a flick within 40-50 degrees would mean go back no matter the context. In other words, the sweet spot constraint in a gesture-based user interface ensures that any gesture within some fixed threshold of the exemplary gesture is recognized as the corresponding gesture, regardless of the context.
2.8.3 Key or Button Based Interfaces
These are interfaces where the user presses, points at, or otherwise interacts with a button, key or other control to make their selection. Clearly, as with the soft or virtual keyboards described above, the keys or buttons in this context are also soft or virtual (e.g., buttons or keys displayed on a touch screen). As with soft or virtual keyboards, the regions of the UI that correspond to the different UI actions would grow and shrink depending on user context, in a manner analogous to hit targets in a keyboard. Further, either or both sweet spot and shape constraints can be imposed on those buttons or keys.
2.8.4 Myoelectric or EMG Based Interfaces
Myoelectric signals are muscle-generated electrical signals that are typically captured using conventional Electromyography (EMG) sensors. As is known to those skilled in the art, myoelectric signals, or sequences of myoelectric signals, from muscle contractions can be used as inputs to a user interface for controlling a large variety of devices, including prosthetics, media players, appliances, etc. In other words, various UI actions are initiated by evaluating and mapping electrical signals resulting from particular user motions (e.g., hand or finger motions, wrist motions, arm motions, etc.) to cause the user interface to interact with various applications in the same manner as any other typical user interface receiving a user input.
As with the soft or virtual keyboards described above, a source model is used to model the likelihood of different UI actions given the context in combination with a channel model that models the EMG signals corresponding to different muscle generated electrical signals. In order to ensure that certain UI actions are possible in any context, exemplary EMG signals corresponding to each of these actions are recorded (typically, but not necessarily on a per-user basis). “Sweet spot” constraints are then imposed by specifying that EMG signals that are within some threshold of these exemplary signals in a feature space in which measured EMG signals are embedded will initiate the corresponding actions, regardless of the context of those UI actions.
3.0 Exemplary Operating Environments
The Constrained Predictive Interface described herein is operational within numerous types of general purpose or special purpose computing system environments or configurations.
For example,
In particular, as illustrated by
In addition, the simplified computing device of
The simplified computing device of
Finally, the simplified computing device 900 may also include in integral or attached display device 955. As discussed above, in various embodiments, this display device 955 also acts as a touch screen for accepting user input (such as in the case of a soft or virtual keyboard, for example).
The foregoing description of the Constrained Predictive Interface has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the claimed subject matter to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. Further, it should be noted that any or all of the aforementioned alternate embodiments may be used in any combination desired to form additional hybrid embodiments of the Constrained Predictive Interface. It is intended that the scope of the invention be limited not by this detailed description, but rather by the claims appended hereto.