Exception processing of character entry sequences

Information

  • Patent Grant
  • 8847962
  • Patent Number
    8,847,962
  • Date Filed
    Tuesday, July 1, 2008
    16 years ago
  • Date Issued
    Tuesday, September 30, 2014
    9 years ago
Abstract
Systems and techniques are described to perform operations including displaying a first character in a user interface in response to a first user input, the first character encoded by a first ordered sequence comprising at least one code point, receiving a second user input, determining if the second user input defines an exception input to the first ordered sequence, in response to determining that the second user input defines an exception input to the first ordered sequence, generating a second ordered sequence comprising at least one code point, the second ordered sequence based on the first ordered sequence and the exception input, wherein the second ordered sequence does not include the first ordered sequence in a predicate sequence, and displaying a second character defined by the second ordered sequence in place of the first character in the user interface.
Description
TECHNICAL FIELD

This disclosure relates to entering characters of a writing system into a user interface.


BACKGROUND

Computer software, for example, text editors, enable users to enter characters of a writing system, such as the Roman characters used, in writing text in a language, e.g., English, using input devices, e.g., keyboards, and in response, display the entered characters in a display device. In many writing systems, sequences of characters form words. For example, in writing English, sequences are formed by arranging characters including consonants and vowels adjacent to each other. A consonant or vowel that forms the first letter of a word is a first letter sequence of the word. When a second letter is added to the first letter, the combination of the first letter and the second letter forms a second sequence, and the first sequence is a subsequence (and more specifically, a prefix) of the second sequence. A third sequence is formed when a third letter is added to the second sequence, and so on.


Such letter sequences can be encoded by numerical values and stored in a computer memory. The most prevalent encoding system is Unicode, which provides a unique number for every character. The Unicode system readily facilitates text entry of many Western-style languages in which words are formed by sequential entry of characters. Typically, entry of the letters in a Unicode input sequence corresponds to the phonetic sound of the word. However, in certain languages, there are characters that, when entered, results in a particular sequence that does not conform to a corresponding writing sequence that a native speaker would use when writing the characters on paper. Indic languages, for example, include consonants, vowels, and combinations of them whose Unicode input sequences do not conform to their writing sequences. Thus, users that are fluent in such languages often must input Unicode characters in a sequence that is different than would be used when writing without the aid of computer device, e.g., when using pen and paper.


For example, Indic characters include symbols representing consonants, vowels, and dependent markers that also represent vowels and other variations. A common combination is a consonant and a vowel. In the Indic language, rather than writing a character for the consonant immediately followed by a character for the vowel, a single character of the consonant and a dependent vowel marker representing the vowel is written. There is, however, no Unicode input sequence that corresponds to how a fluent writer would actually write the single character that is the combination of the consonant and a dependent vowel marker. Thus, in using conventional text input software, the fluent writer typically must input a Unicode sequence for certain characters that corresponds to a sequence of characters that the fluent writer would not otherwise use when writing.


SUMMARY

This specification describes technologies relating to managing exceptions to character entry sequences.


In general, one aspect of the subject matter described in this specification can be embodied in methods that include the actions of displaying a first character in a user interface in response to a first user input, the first character encoded by a first ordered sequence comprising at least one code point; receiving a second user input; determining if the second user input defines an exception input to the first ordered sequence; in response to determining that the second user input defines an exception input to the first ordered sequence, generating a second ordered sequence comprising at least one code point, the second ordered sequence based on the first ordered sequence and the exception input, wherein the second ordered sequence does not include the first ordered sequence in a predicate sequence; and displaying a second character defined by the second ordered sequence in place of the first character in the user interface. Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.


In another aspect, a computer-implemented method is described. The method includes receiving a first sequence including at least one character in a user interface. The first sequence is received in response to a first user input. The at least one character is encoded by a first ordered sequence including multiple first code points. The method includes receiving a second sequence including multiple second code points. Further, the method includes determining that the second sequence defines an exception to the first ordered sequence, and in response, generating a third sequence that is based on the first sequence and the second sequence. The third sequence includes multiple third code points that includes the multiple first code points and the multiple second code points. Furthermore, the method includes displaying an output defined by the third sequence in place of the first sequence and the second sequence.


This, and other aspects, can include the following feature. A sequence in which code points in the multiple third code points are arranged does not comprise the multiple first code points in the first sequence followed by the multiple second code points in the second sequence. Other embodiments of this aspect include corresponding systems, apparatus, and computer program products.


Particular implementations of the subject matter described in this disclosure can be implemented to realize one or more of the following advantages. The methods and techniques described enable users to enter characters of a language in a sequence that comports with the rules of the language, e.g., the sequence in which the user typically writes the language. The methods and techniques identify exception inputs in an input sequence that are an exception to a representative pre-defined sequence, and alters the entered sequence to produce a desired output.


The details of one or more implementations of the disclosure are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is an example of a sequence of character inputs forming a Hindi word by normal phonetic processing.



FIG. 2 is an example of a sequence of inputs forming a Hindi character by sequential phonetic processing.



FIG. 3 is an example of a first exception input to a sequence of inputs forming a Hindi character by pre-phonetic exception input processing.



FIG. 4 is another example of a first exception input to a sequence of inputs forming a Hindi character by pre-phonetic exception input processing.



FIG. 5 is an example of a sequence of inputs forming a Hindi character by sequential phonetic processing.



FIG. 6 is an example of a second exception input to a sequence of inputs forming a Hindi character by post-phonetic exception input processing.



FIG. 7 is another example of a second exception input to a sequence of inputs forming a Hindi character by post-phonetic exception input processing.



FIG. 8 is an example of a system to enter Hindi characters.



FIG. 9 is an example of a user interface in which Hindi characters are displayed.





Like reference numbers and designations in the various drawings indicate like elements.


DETAILED DESCRIPTION


FIG. 1 is an example of a sequence of character inputs forming a Hindi word by normal phonetic processing by a Unicode device. Normal phonetic process occurs when each Unicode code point corresponding to an input generates an input character that is also an output character, so that the output sequence contains an output character corresponding to each input in the input sequence.


The example of FIG. 1 depicts normal phonetic processing of a sequence of character inputs forming a Hindi character sequence “custom character”, which has the phonetic sound of “kata”. The selected character sequence is not a word in Hindi, and is selected as an example to illustrate implementations of methods and systems described herein. It is to be understood that the methods described herein can be applied to any character sequence, regardless of whether the sequence forms an official Hindi word or not. The first character in the word is the consonant “custom character” which has the phonetic sound “ka” and (Unicode) code point 0915, and the second character in the word is the consonant “custom character” which has the phonetic sound “ta”, and code point 0924. The sequence of user inputs and outputs to display “custom character” is shown in Table 1.













TABLE 1








Output
Code points in


Input
Character
Code point
(115)
output sequence







First input
First
0915

custom character

0915, 0924


(105) - custom character
character - custom character


Second input
Second
0924


(110) - custom character
character - custom character









On a user device that enables users to input characters, for example, a computer system, the first character “custom character” is displayed in response to receiving a first user input, i.e., a selection of the character “custom character” on a keyboard. The first character “custom character” is thus encoded by a first ordered sequence of code points that includes the code point for “custom character”. Subsequently, the second input, i.e., a selection of the character “custom character” on the keyboard, is received. In response, a code point corresponding to “custom character” is identified, the first ordered sequence is updated with this code point, and the output “custom character” is displayed in the user interface. The output “custom character” is encoded by the first ordered sequence including the code point for “custom character”, and the code point for “custom character”. In normal phonetic processing, each subsequence of an ordered sequence represents a predicate sequence. In some implementations, the predicate sequence is a sequence that is recognized by Unicode. For example, characters can be input into a user interface in a recognizable sequence such that the characters are combined to form an output. In such scenarios, the sequence represents a predicate sequence.


In this particular case, the phonetic sequence in normal phonetic processing also corresponds to a sequence of characters in which a fluent writer of Hindi will write the characters on paper.



FIG. 2 is an example of a sequence of inputs forming a Hindi character by sequential phonetic processing in a Unicode device. Sequential phonetic processing is similar to normal phonetic processing in that a Unicode sequence of input characters corresponds to the phonetic sound of a component part of a word. However, in sequential phonetic processing, each Unicode code point corresponding to an input generates an input character, but not all of the input characters are output characters. Thus, the output sequence contains an output character that is not one of the input characters.


The output character illustrated in the example in FIG. 2 is “custom character”, which sounds like “art”, and is a variant of the consonant “custom character” formed by combining the consonant “custom character” with a dependent marker illustrated by the arch symbol placed over the consonant “custom character”. The arch symbol represents the sound “ar” and the consonant “custom character” represents the sound “ta”. The combination is represented as “custom character” and the sound it represents is “art”.


In sequential phonetic processing, the character “custom character” is formed by receiving Hindi characters in a writing sequence that corresponds to the phonetic sequence. This sequence is formed by receiving the Hindi character “custom character”, which corresponds to the sound “ar”, followed by receiving the Hindi character “custom character”, which corresponds to the sound “ta”, so that the phonetic sound “art” is produced by the combination of “ar” and “ta”. The sequence of user inputs and outputs to display “custom character” in sequential phonetic processing is shown in Table 2.













TABLE 2








Output
Code points in


Input
Character
Code point
(215)
output sequence







First input
First
0930, 094D

custom character

0930, 094D,


(205) - custom character
character - custom character


0924


Second input
Second
0924


(210) - custom character
character - custom character









On a user device, the first character “custom character” is displayed in response to receiving the first user input 205, i.e., a selection of the character “custom character” on a keyboard. Because the character “custom character” is a variant of the consonant “custom character”, in some implementations, the character “custom character” can be input by two user inputs, the first being a selection of the character “custom character” and the second being a selection of the halant sign (custom character) that, in combination with “custom character” produces the character “custom character”. Such selections result in the input of a first order sequence of code points 0930 and 094D, values that correspond to the character “custom character” and the halant sign, respectively.


Subsequently, the second input 210, which is the selection of the character “custom character” on the keyboard, is received. In response, the code point 0924, corresponding to “custom character” is identified, and the first ordered sequence is updated. The combination of “custom character” and “custom character” form the output 215custom character”, which is represented by the first ordered sequence 0930, 094D, and 0924. The two characters “custom character” and “custom character” displayed in response to the inputs are replaced by “custom character”.


In the example in FIG. 2, the Unicode sequence of the inputs corresponds to the phonetic sequence. However, the Unicode sequence does not correspond to a sequence in which a fluent writer will actually write “custom character” (the fluent writer will not write characters “custom character” and “custom character” on paper because these two characters will not produce “custom character”). Instead, the fluent writer may first write “custom character”, and then add the arch symbol above “custom character”. Accordingly, many fluent writers of the Indic language may find the process of entering Indic text on a Unicode device to be awkward or non-intuitive.


Thus, in some implementations, a pre-phonetic exception input to the sequential phonetic processing of Unicode can be used to facilitate the entering characters in a more intuitive manner, e.g., entering inputs that do not correspond to the order of the phonetic sounds that are represented by the inputs.


In the case of “custom character”, the entry of dependent marker, i.e., the arch symbol, can be characterized as a pre-phonetic exception input because when “custom character” is spoken, the sound produced by the arch symbol occurs before the sound produced by the consonant to which the arch symbol is added, despite the fact that the consonant is entered as input before the arch symbol is entered as input.



FIG. 3 is an example of a first exception input to a sequence of inputs forming a Hindi character by pre-phonetic exception input processing. The output Hindi character illustrated in FIG. 3 is the same as the output Hindi character illustrated in FIG. 2. As shown in FIG. 3, the input sequence to form “custom character” is “custom character” followed by a symbol indicative of the arch symbol. In Unicode, the arch symbol does not have a unique code point; thus, in some implementations, the arch symbol can be represented on an input device, e.g., a keyboard, by any Hindi character having the arch symbol displayed over the character, such as “custom character”. The choice of Hindi character over which the arch symbol is displayed is arbitrary and the character can be any character so long as the representation conveys that the arch symbol will be selected upon selection of the displayed character. In some implementations, the display can include any illustration including the arch symbol or the display can include only the arch symbol and no character or illustrations. In some implementations, the Hindi character over which the arch symbol is displayed can be dynamically re-written. For example, the most recent character entered using the input device can be identified by determining the character to the left of the cursor position in the user interface. The arch symbol can be displayed over the identified character and provided for selection. When a new character is entered, the previously identified character is dynamically replaced with the new character. The sequence of inputs in the pre-phonetic exception processing to form the output “custom character” is shown in Table 3.













TABLE 3








Output
Code points in


Input
Character
Code point
(325)
output sequence







First input
First
0924

custom character

0930, 094D,


(305) - custom character
character - custom character


0924


Second input
Exception
0930, 094D


(310) - custom character
input (315)









The first character “custom character” is displayed in a user interface in response to receiving a first user input, e.g., a selection of the character “custom character” on a keyboard. In this example, the first character is encoded by a first ordered sequence that includes the code point for “custom character”, which is 0924. Subsequently, the second input, i.e., the selection of the arch symbol, e.g., caused by the selection of the key “custom character”, is received. The second user input, “custom character”, defines an exception input to the first ordered sequence because the code point 0924 and the code point of the second user input, when encoded, do not form the desired output. That is, the code points of the sequence 0924, 0930, and 094D do not represent the character “custom character”.


To handle this pre-phonetic exception input, a second ordered sequence is generated based on the first ordered sequence. The second ordered sequence includes the code point for “custom character” and at least one or more code points representing the arch symbol. The code points in the second ordered sequence do not include the code point of “custom character” in a predicate sequence. Although “custom character” was received as the first input, in the code points encoded in the second ordered sequence, the code point for “custom character” does not appear first.


For pre-phonetic exception processing, upon detecting an input of the arch symbol after the input representing “custom character”, the position immediately preceding the character “custom character” is identified, and the character “custom character” is inserted. The combination of “custom character” and “custom character” represents the input sequence to form the output “custom character” in subsequent phonetic processing. Subsequently, the output 325custom character” is displayed in the user interface. The code point for “custom character”, of which “custom character” is a variant, appears first in the second ordered sequence followed by the code point for “custom character”.


In summary, pre-phonetic exception input processing thus allows a user to enter a consonant first, followed by accent symbol that is representative of a phonetic sound to the sound associated with the consonant. Accordingly, a user may enter text inputs in an order that is more intuitive than the sequential phonetic input order required by Unicode. In such processing, the code points corresponding to the accent symbol are inserted into the output sequence before the code points corresponding to the first input.



FIG. 4 is another example of a first exception input to a sequence of inputs forming a Hindi character by pre-phonetic exception input processing. One of the characters in the text entry sequence includes a combination of the consonant “custom character” and a dependent vowel marker representing the vowel “custom character”, which sounds like “aa”, to form the character “custom character”, which sounds like “taa”. To generate the desired output of “custom character” “The arch symbol is received in a user interface after the character “custom character” is received. The sequence of inputs in the pre-phonetic exception processing to form the output “custom character” is shown in Table 4.













TABLE 4








Output
Code points in


Input
Character
Code point
(425)
output sequence







First input
First
0924, 093E

custom character

0930, 094D,


(405) - custom character
character -


0924, 093E




custom character



Second input
Exception
0930, 094D


(410) - custom character
input (415)









As shown in FIG. 4, the input sequence of code point including inputs of 0924 and 093E followed by 0930 and 094D constitute an exception input that results in the output sequence of 0930, 094D, 0924, and 093E.



FIG. 5 is an example of a sequence of inputs forming a Hindi character by sequential phonetic processing. The example of a sequence of inputs forming a Hindi character “custom character” (phonetic sound “pra”) that is a variant of the consonant “custom character” formed by combining a dependent marker illustrated by the inclined line that represents the sound “ra” with “custom character” such that the combination is represented as “custom character”. In sequential phonetic processing, the character “custom character” can be formed by receiving Hindi characters in a writing sequence that corresponds to a phonetic sequence when “custom character” is spoken. The sequence of user inputs and outputs to display “custom character” is shown in Table 5.













TABLE 5








Output
Code points in


Input
Character
Code point
(515)
output sequence







First input
First
092A, 094D

custom character

092A, 094D,


(505) - custom character
character -


0930




custom character



Second input
Second
0930


(510) - custom character
character -




custom character










In Table 5, the first character “custom character” is displayed in response to receiving a first user input 405, i.e., a selection of the character “custom character” on a keyboard. In some implementations, the character “custom character” can be provided for selection, while in other implementation, selecting the character “custom character” as input can include the selection of the character “custom character” and a marker that, in combination with “custom character” produces the character “custom character”. In this example, the first character “custom character” is encoded by a first ordered sequence comprising the code point for “custom character” (092A, 094D). Subsequently, the second input 510, i.e., a selection of the character “custom character” is received. In response, a code point corresponding to “custom character” (0930) is identified. The preferred phonetic processing in Unicode generates the output 510custom character” from the combination of “custom character” and “custom character”.



FIG. 6 is an example of a second exception input to a sequence of inputs forming a Hindi character by post-phonetic exception input processing. The output Hindi character illustrated in FIG. 6 is the same as the output Hindi character illustrated in FIG. 5. A fluent writer will first write “custom character”, and then add the inclined line symbol adjacent to the left of “custom character” to form the character “custom character”. Thus, the input of the inclined line symbol after the input of the character “custom character” represents a post-phonetic exception input because the corresponding code points that are input form an input sequence that is different than the output sequence. The sequence of user inputs, including the exception input, and the output “custom character” is shown in Table 6.













TABLE 6








Output
Code points in


Input
Character
Code point
(625)
output sequence







First input
First
092A

custom character

092A, 094D,


(605) - custom character
character -


0930




custom character



Second input
Exception
094D, 0930


(610) - custom character
input (615)









The first character “custom character” is displayed in a user interface in response to receiving a first user input 605, e.g., a selection of the character “custom character” on a keyboard, and is encoded by a first ordered sequence that includes a code point for “custom character” (092A). Subsequently, the second input 610, i.e., the selection of the inclined line symbol by selection of the key “custom character” is received. The inclined line symbol is an exception input 615 to the sequential phonetic processing sequence. In response, the code points of the ordered sequence 092A, 094D and 0930 result in the code points 092A, 094D, and 0930, which, in turn, causes the character “custom character” to be displayed in the user interface. In this example, a difference between the output sequence and the input sequence is that in the input sequence, a code point of the halant symbol (094D) is associated with the character “custom character” whereas in the output sequence, the code point associated of the halant symbol is not associated with that character.



FIG. 7 is another example of a second exception input to a sequence of inputs forming a Hindi character by post-phonetic exception input processing. The output character “custom character” includes a dependent vowel marker. The sequence of user inputs, including the exception input, and the output “custom character” is shown in Table 7. Note that although the sequence in which the code points are received is 092A, 093E (first input) followed by 094D, 0930 (second input), the code points associated with the second input are inserted between the code points associated with the first input resulting in 092A, 094D, 0930, 093E as the output code point sequence.













TABLE 7








Output
Code points in


Input
Character
Code point
(725)
output







First input
First
092A, 093E

custom character

092A, 094D,


(705) - custom character
character - custom character


0930, 093E


Second input
Exception
094D, 0930


(710) - custom character
input (715)









Exception can be realized by instructions that upon execution cause a processing device to carry out the processes and functions described above. Such instructions can, for example, include interpreted instructions, such as script instructions, e.g., JavaScript or ECMAScript instructions, or executable code, or other instructions stored in a computer readable medium. In some implementations, a look-up table of code point sequences of first ordered sequences for pre-phonetic and post-phonetic exception inputs can be mapped to corresponding second ordered sequences. For example, first ordered sequence of 092A, 093E, 0915, 094D and 0930 can be mapped to the second ordered sequence of 092A, 094D, 0930, and 093E. In other implementations, an input sequence of first characters can be mapped to an output sequence of second characters. For example, the input sequence of characters custom character and custom character can be mapped to the output character custom character. Combinations of character mappings and code point sequences can also be used.



FIG. 8 depicts a schematic of an example of a system 800 for entering Hindi characters. The system 800 can be any computer system such as a desktop computer, laptop computer, personal digital assistant (PDA), and the like. The system 800 includes a display device 810 operatively coupled to a computer 805, where the user interface 815 is displayed in the display device 810. In addition, the system 800 includes input devices, such as a keyboard 820 and a pointing device 825, e.g., a mouse, a stylus, and the like, operatively coupled to the computer 805. A user can provide input using the keyboard 820 and the pointing device 820, and can view the outcome of the input on the display device 810.


The computer 805 can be operatively coupled to a remote engine 835 over one or more networks 840, e.g., the Internet. The software that enables a user to enter Hindi characters can be located at the remote engine 835, and can be displayed in the user interface 815 when the user accesses the remote engine 835 over the one or more networks 840. In some implementations, the user can download the software on to the computer 805 to execute the software.


In some implementations, the computer 805 is configured to receive a computer readable medium, e.g., a CD-ROM, on which a computer program product including instructions to perform operations, is tangibly embodied. For example, the computer 805 can include a CD-ROM drive to receive the CD, which is, e.g., an installation disk that includes instructions corresponding to the software application that enables entering Hindi characters. The instructions in the computer program product cause the processing unit 110 to provide a user with a user interface 815 that allows the user to enter and edit Hindi characters.



FIG. 9 is an example of a user interface in which Hindi characters are displayed. The user interface 815 is displayed in the display device 810 and can be, e.g., a user interface provided by a web browser, a user interface provided by any software application, and the like. The user interface 815 includes a text box 905 in which a user can position the cursor 830 and select the text box 905. In response, the cursor 930 can be replaced by a vertical line 910 representing the cursor 830, where the line is similar to a line representing the cursor in a text editor, e.g., Microsoft® NotePad. The user can enter Hindi characters using the input devices 820 and 825, and the entered characters are displayed in the text box 905. In some implementations, the user interface can be a text editor.


In some implementations, the keyboard used to input Hindi characters can be a virtual keyboard displayed in a user interface and can include multiple keys representing the characters of Hindi. Examples of such keyboards are described in the patent application titled “Language Keyboard”, application Ser. No. 11/833,901, filed Aug. 3, 2007, the entire content of which is incorporated herein by reference. In some implementations, the Hindi characters including consonants, vowels, dependent vowel markers, and the like, can be included in a javascript that includes exception inputs.


Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer-readable medium for execution by, or to control the operation of, data processing apparatus. The computer-readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, or a combination of one or more of them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them.


A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.


The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output.


To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.


While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular implementations of the disclosure. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.


Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.


Thus, particular implementations of the disclosure have been described. Other implementations are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. In some implementations, the Hindi character that the user inputs can be a compound formed by combining a variant of a Hindi character with another consonant. An example of a compound in Hindi is “custom character”, which sounds like “ksha”. The character, “custom character”, is a combination of “custom character” (phonetic sound “ik”) and “custom character” (phonetic sound “sha”) where “custom character” is a variant of the consonant “custom character” (phonetic sound “ka”). There are multiple ways in which a user can enter “custom character” in a user interface including selecting a key that displays “custom character”, and entering the sequence “custom character+custom character”, which is replaced by “custom character”. Similarly, in preferred phonetic processing, there are multiple ways to enter the first exception input, namely the arch symbol, in the user interface to form “custom character”, including entering the sequence “custom character+custom character+custom character” or entering the sequence “custom character+custom character”. In pre-phonetic exception input processing to form “custom character”, upon detecting the first input “custom character” followed by “custom character”, the entered sequence is replaced with “custom character+custom character” to display “custom character” in the user interface.


Similarly, to form the character “custom character”, the user can enter one of the sequences “custom character+custom character+custom character” or “custom character+custom character” to form the character “custom character”. Alternatively, the user can enter “custom character” followed by “custom character”, which is replaced with the sequence “custom character+custom character” to form the character “custom character”.


The examples described in this disclosure represent only a few Hindi characters where the sequence of inputs include an exception input. The exception inputs can be added to any Hindi consonant and to any combination including consonant and dependent vowel marker. The concept of exception inputs, and managing the entry of exception inputs can also be extended to several Indic languages, e.g., Telugu, Malayalam, and the like.

Claims
  • 1. A computer-implemented method for generating and displaying encoded characters comprising: displaying, by data processing apparatus, a first character in a user interface in response to a first user input, the first character encoded by a first ordered code point sequence comprising at least one first code point, wherein each code point in the first ordered code point sequence represents a respective Unicode code point;receiving, by the data processing apparatus, a second user input after the first user input;determining, by the data processing apparatus, that the second user input defines an exception input to the first ordered code point sequence, including: determining that the second user input is encoded by an exception code point sequence comprising one or more exception code points, wherein each code point in the exception code point sequence represents a respective Unicode code point; anddetermining that a candidate ordered code point sequence, including the first ordered code point sequence followed by the exception code point sequence, does not represent a character to which a combination of the first character and the second user input corresponds;in response to determining that the second user input defines an exception input to the first ordered code point sequence, generating, by the data processing apparatus, a second ordered code point sequence by mapping the first ordered code point sequence and the exception code point sequence to the second ordered code point sequence, wherein the second ordered code point sequence includes the at least one first code point and the one or more exception code points, and one or more beginning code points in the beginning of the second ordered code point sequence do not represent the first character; anddisplaying, by the data processing apparatus, a second character defined by the second ordered code point sequence in place of the first character in the user interface.
  • 2. The method of claim 1, further comprising: in response to determining that the second user input does not define an exception input to the first ordered code point sequence, identifying a code point corresponding to the second user input; anddisplaying characters encoded by the first ordered code point sequence and the identified code point.
  • 3. The method of claim 1, wherein the first ordered code point sequence represents an Indic language consonant.
  • 4. The method of claim 3, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the consonant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 5. The method of claim 3, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant.
  • 6. The method of claim 1, wherein the first ordered code point sequence represents a consonant of an Indic language in a combination including the consonant and a dependent vowel marker associated with the consonant, and wherein the first user input causes displaying the combination of the consonant and the dependent vowel marker in the user interface.
  • 7. The method of claim 6, wherein generating the second ordered code point sequence comprises: identifying a position before the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 8. The method of claim 6, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the consonant and the dependent vowel marker.
  • 9. The method of claim 1, wherein the code point of the first ordered code point sequence represents a variant of one Indic language character in the Indic language in a compound formed by combining the variant with a consonant of the Indic language.
  • 10. The method of claim 9, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the variant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 11. The method of claim 9, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the variant and the consonant.
  • 12. A non-transitory computer-readable medium storing computer program instructions executable by data processing apparatus to perform operations comprising: displaying a first character in a user interface in response to a first user input, the first character encoded by a first ordered code point sequence comprising at least one first code point, wherein each code point in the first ordered code point sequence represents a respective Unicode code point;receiving a second user input after the first user input;determining, by the data processing apparatus, that the second user input defines an exception input to the first ordered code point sequence, including: determining that the second user input is encoded by an exception Code point sequence comprising one or more exception code points, wherein each code point in the exception code point sequence represents a respective Unicode code point; anddetermining that a candidate ordered code point sequence, including the first ordered code point sequence followed by the exception code point sequence, does not represent a character to which a combination of the first character and the second user input corresponds;in response to determining that the second user input defines an exception input to the first ordered code point sequence, generating, by the data processing apparatus, a second ordered code point sequence by mapping the first ordered code point sequence and the exception code point sequence to the second ordered code point sequence, wherein the second ordered code point sequence includes the at least one first code point and the one or more exception code points, and one or more beginning code points in the beginning of the second ordered code point sequence do not represent the first character; anddisplaying a second character defined by the second ordered code point sequence in place of the first character in the user interface.
  • 13. The computer-readable medium of claim 12, the operations further comprising: in response to determining that the second user input does not define an exception input to the first ordered code point sequence, identifying a code point corresponding to the second user input; anddisplaying characters encoded by the first ordered code point sequence and the identified code point.
  • 14. The computer-readable medium of claim 12, wherein the first ordered code point sequence represents an Indic language consonant.
  • 15. The computer-readable medium of claim 14, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the consonant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 16. The computer-readable medium of claim 14, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant.
  • 17. The computer-readable medium of claim 12, wherein the first ordered code point sequence represents a consonant of an Indic language in a combination including the consonant and a dependent vowel marker associated with the consonant, and wherein the first user input causes displaying the combination of the consonant and the dependent vowel marker in the user interface.
  • 18. The computer-readable medium of claim 17, wherein generating the second ordered code point sequence comprises: identifying a position before the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 19. The computer-readable medium of claim 17, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the consonant and the dependent vowel marker.
  • 20. The computer-readable medium of claim 12, wherein the code point of the first ordered code point sequence represents a variant of one Indic language character in the Indic language in a compound formed by combining the variant with a consonant of the Indic language.
  • 21. The computer-readable medium of claim 20, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the variant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 22. The computer-readable medium of claim 20, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the variant and the consonant.
  • 23. A system comprising: a processor; anda non-transitory computer-readable medium embodying a computer program product encoding instructions to cause the processor to perform operations comprising: displaying a first character in a user interface in response to a first user input, the first character encoded by a first ordered code point sequence comprising at least one first code point, wherein each code point in the first ordered code point sequence represents a respective Unicode code point;receiving a second user input after the first user input;determining, by the data processing apparatus, that the second user input defines an exception input to the first ordered code point sequence, including: determining that the second user input is encoded by an exception code point sequence comprising one or more exception code points, wherein each code point in the exception code point sequence represents a respective Unicode code point; anddetermining that a candidate ordered code point sequence, including the first ordered code point sequence followed by the exception code point sequence, does not represent a character to which a combination of the first character and the second user input corresponds;in response to determining that the second user input defines an exception input to the first ordered code point sequence, generating, by the data processing apparatus, a second ordered code point sequence by mapping the first ordered code point sequence and the exception code point sequence to the second ordered code point sequence, wherein the second ordered code point sequence includes the at least one first code point and the one or more exception code points, and one or more beginning code points in the beginning of the second ordered code point sequence do not represent the first character; anddisplaying a second character defined by the second ordered code point sequence in place of the first character in the user interface.
  • 24. The system of claim 23, the operations further comprising: in response to determining that the second user input does not define an exception input to the first ordered code point sequence, identifying a code point corresponding to the second user input; anddisplaying characters encoded by the first ordered code point sequence and the identified code point.
  • 25. The system of claim 23, wherein the first ordered code point sequence represents an Indic language consonant.
  • 26. The system of claim 25, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the consonant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 27. The system of claim 25, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant.
  • 28. The system of claim 23, wherein the code point of the first ordered code point sequence represents a consonant of an Indic language in a combination including the consonant and a dependent vowel marker associated with the consonant, and wherein the first user input causes displaying the combination of the consonant and the dependent vowel marker in the user interface.
  • 29. The system of claim 28, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the consonant in the combination in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 30. The system of claim 28, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the consonant and the dependent vowel marker.
  • 31. The system of claim 23, wherein the code point of the first ordered code point sequence represents a variant of one Indic language character in the Indic language in a compound formed by combining the variant with a consonant of the Indic language.
  • 32. The system of claim 31, wherein generating the second ordered code point sequence comprises: identifying a position before a code point representing the variant in the first ordered code point sequence; andinserting a code point representing a special character of the Indic language corresponding to the exception input in the identified position.
  • 33. The system of claim 31, wherein generating the second ordered code point sequence comprises: identifying a position in the first ordered code point sequence of the code point representing the consonant;replacing the code point representing the consonant with one or more code points representing a first variant of the consonant; andadding one or more code points representing a special character corresponding to the exception input adjacent to the one or more code points representing the first variant and between the code point representing the variant and the consonant.
  • 34. A computer-implemented method comprising: receiving, by data processing apparatus, a first sequence including at least one character in a user interface, the first sequence received in response to a first user input, the at least one character encoded by a first ordered code point sequence including a first plurality of code points, wherein each code point in the first ordered code point sequence represents a respective Unicode code point;receiving, by the data processing apparatus, a second sequence encoding encoded by a second ordered code point sequence including a second plurality of code points after the first sequence, wherein each code point in the second ordered code point sequence represents a respective Unicode code point;determining, by the data processing apparatus, that the second sequence defines an exception to the first ordered code point sequence by determining a particular sequence of code points including the first ordered code point sequence followed by the second ordered code point sequence does not represent a character to which a combination of the first sequence and the second sequence correspond;in response to determining that the second sequence defines an exception to the first ordered code point sequence, generating, by the data processing apparatus, a third code point sequence by mapping the first ordered code point sequence and the second ordered code point sequence to the third code point sequence, the third code point sequence comprising the first plurality of code points and the second plurality of code points and comprising the second ordered code point sequence at the beginning of the third code point sequence; anddisplaying, by the data processing apparatus, an output defined by the third code point sequence in place of the first sequence and the second sequence.
  • 35. The method of claim 34, wherein a sequence in which code points in the third code point sequence are arranged comprises the second ordered code point sequence followed by the first ordered code point sequence.
US Referenced Citations (14)
Number Name Date Kind
6934564 Laukkanen et al. Aug 2005 B2
7801722 Kotipalli et al. Sep 2010 B2
20030119551 Laukkanen et al. Jun 2003 A1
20050195171 Aoki et al. Sep 2005 A1
20060126936 Bhaskarabhatla Jun 2006 A1
20070156394 Banerjee et al. Jul 2007 A1
20070174771 Mistry Jul 2007 A1
20070276650 Kotipalli et al. Nov 2007 A1
20070277118 Kotipalli et al. Nov 2007 A1
20080120541 Cheng May 2008 A1
20080186211 Harman Aug 2008 A1
20080221866 Katragadda et al. Sep 2008 A1
20080240567 Chaoweeraprasit et al. Oct 2008 A1
20090037837 Raghunath et al. Feb 2009 A1
Related Publications (1)
Number Date Country
20100002004 A1 Jan 2010 US