The present invention relates to a machine-aided reading systems and methods. More particularly, the present invention relates to a user interface and underlying architecture that assists users with reading non-native languages.
With the rapid development of the Internet, computer users all over the world are becoming increasingly more exposed to writings that are penned in non-native languages. Many users are entirely unfamiliar with non-native languages. Even for a user who has some training in a non-native language, it is often difficult for that user to read and comprehend the non-native language.
Consider the plight of a Chinese user who accesses web pages or other electronic documents written in English. The Chinese user may have had some formal training in English during school, but such training is often insufficient to enable them to fully read and comprehend certain words, phrases, or sentences written in English. The Chinese-English situation is used as but one example to illustrate the point. This problem persists across other language boundaries.
Accordingly, this invention arose out of concerns associated with providing machine-aided reading systems and methods that help computer users read and comprehend electronic writings that are presented in non-native languages.
A computer-aided reading system offers assistance to a user who is reading in a non-native language, as the user needs help, without requiring the user to divert attention away from the text.
In one implementation, the reading system is implemented as a reading wizard for a browser program. The reading wizard is exposed via a graphical user interface (UI) that allows the user to select a word, phrase, sentence, or other grouping of words in the non-native text, and view a translation of the selected text in the user's own native language. The translation is presented in a window or pop-up box located near the selected text to minimize distraction.
In one aspect, a core of the reading wizard includes a shallow parser, a statistical word translation selector, and a translation generator. The shallow parser parses phrases or sentences of the user-selected non-native text into individual translation units (e.g., phrases, words). In one implementation, the shallow parser segments the words in the selected text and morphologically processes them to obtain the morphological root of each word. The shallow parser employs part-of-speech (POS) tagging and base noun phrase (baseNP) identification to characterize the words and phrases for further translation selection. The POS tagging and baseNP identification may be performed, for example, by a statistical model. The shallow parser applies rules-based phrase extension and pattern matching to the words to generate tree lists.
The statistical word translation selector chooses top candidate word translations for the translation units parsed from the non-native text. The word translation selector generates all possible translation patterns and translates the translation units using a statistical translation and language models. The top candidate translations are output.
The translation generator translates the candidate word translations to corresponding phrases in the native language. The translation generator uses, in part, a native language model to help determine proper translations. The native words and phrases are then presented via the UI in proximity to the selected text.
Overview
A computer-aided reading system helps a user read a non-native language. For discussion purposes, the computer-aided reading system is described in the general context of browser programs executed by a general-purpose computer. However, the computer-aided reading system may be implemented in many different environments other than browsing (e.g., email systems, word processing, etc.) and may be practiced on many diverse types of devices.
The embodiments described below can permit users who are more comfortable communicating in a native language, to extensively read non-native language electronic documents quickly, conveniently, and in a manner that promotes focus and rapid assimilation of the subject matter. User convenience can be enhanced by providing a user interface with a translation window closely adjacent the text being translated. The translation window contains a translation of the translated text. By positioning the translation window closely adjacent the translated text, the user's eyes are not required to move very far to ascertain the translated text. This, in turn, reduces user-perceptible distraction that might otherwise persist if, for example, the user were required to glance a distance away in order to view the translated text.
User interaction is further enhanced, in some embodiments, by virtue of a mouse point translation process. A user is able, by positioning a mouse to select a portion of text, to quickly make their selection, whereupon the system automatically performs a translation and presents translated text to the user.
Exemplary System Architecture
The computer system 100 has one or more peripheral devices connected via the I/O interface 106. Exemplary peripheral devices include a mouse 110, a keyboard 112 (e.g., an alphanumeric QWERTY keyboard, a phonetic keyboard, etc.), a display monitor 114, a printer 116, a peripheral storage device 118, and a microphone 120. The computer system may be implemented, for example, as a general-purpose computer. Accordingly, the computer system 100 implements a computer operating system (not shown) that is stored in memory 104 and executed on the CPU 102. The operating system is preferably a multi-tasking operating system that supports a windowing environment. An example of a suitable operating system is a Windows brand operating system from Microsoft Corporation.
It is noted that other computer system configurations may be used, such as hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. In addition, although a standalone computer is illustrated in
Exemplary Reading System
The computer system 100 implements a reading system 130 that assists users in reading non-native languages. The reading system can provide help at the word, phrase, or sentence level. The reading system is implemented in
The reading system 130 has a user interface 134 and a cross-language reading wizard 136. The UI 134 exposes the cross-language reading wizard 136. The browser program 132 may include other components in addition to the reading system, but such components are considered standard to browser programs and will not be shown or described in detail.
The reading wizard 136 includes a shallow parser 140, a statistical word translation selector 142, and a translation generator 144.
Exemplary Shallow Parser
The shallow parser 140 parses phrases or sentences of the selected non-native text into individual translation units (e.g., phrases, words).
As shown, shallow parser 140 comprises a word segment module 200, a morphological analyzer 202, a part-of-speech (POS) tagging/base noun phrase identification module 204, a phrase extension module 206, and a pattern or template matching module 208. Although these components are shown as individual components, it should be appreciated and understood that the components can be combined with one another or with other components.
In accordance with the described embodiment, shallow parser 140 segments words in text that has been selected by a user. It does this using word segment module 200. The shallow parser then uses morphological analyzer 202 to morphologically process the words to obtain the morphological root of each word. The morphological analyzer can apply various morphological rules to the words in order to find the morphological root of each word. The rules that morphological analyzer 202 uses can be developed by a person skilled in the particular language being analyzed. For example, one rule in English is that the morphological root of words that end in “ed” is formed by either removing the “d” or the “ed”.
The shallow parser 140 employs part-of-speech (POS) tagging/base noun phrase (baseNP) identification module 204 to characterize the words and phrases for further translation selection. The POS tagging and baseNP identification can be performed, for example, by a statistical model, an example of which is described below in a section entitled “POS tagging and baseNP Identification” just below. The shallow parser 140 uses phrase extension module 206 to apply rule-based phrase extension to the words characterized by POS tagging/base noun phrase identification module 204. One goal of the phrase extension module is to extend a base noun phrase to a more complex noun phrase. For example, “baseNP of baseNP” is the more complex noun phrase of the “baseNP” phrase. The shallow parser 140 also uses patterning or template matching module 208 to generate tree lists. The patterning or template matching module is used for translation and recognizes that some phrase translation is pattern dependent, and is not directly related to the words in the phrases. For example, the phrase “be interested in baseNP” contains a pattern (i.e. “baseNP”) that is used to form a more complex translation unit for translation. The words “be interested in” are not directly related to the pattern that is used to form the more complex translation unit.
POS Tagging and BaseNP Identification
The following discussion describes a statistical model for automatic identification of English baseNP (Noun Phrase) and constitutes but one way of processing selected text so that a tree list can be generated. The described approach uses two steps: the N-best Part-Of-Speech (POS) tagging and baseNP identification given the N-best POS-sequences. The described model also integrates lexical information. Finally, a Viterbi algorithm is applied to make a global search in the entire sentence which permits a linear complexity for the entire process to be obtained.
Finding simple and non-recursive base Noun Phrase (baseNP) is an important subtask for many natural language processing applications, such as partial parsing, information retrieval and machine translation. A baseNP is a simple noun phrase that does not contain other noun phrase recursively. For example, the elements within [ . . . ] in the following example are baseNPs, where NNS, IN VBG etc are part-of-speech (POS) tags. POS tags are known and are described in Marcus et al., Building a Large Annotated Corpus of English: the Penn Treebank, Computational Linguistics, 19(2): 313-330, 1993.
[Measures/NNS] of/IN [manufacturing/VBG activity/NN] fell/VBD more/RBR than/IN [the/DT overall/JJ measures/NNS] ./.
The Statistical Approach
In this section, the two-pass statistical model, parameters training and the Viterbi algorithm for the search of the best sequences of POS tagging and baseNP identification are described. Before describing the algorithm, some notations that are used throughout are introduced.
Let us express an input sentence E as a word sequence and a sequence of POS respectively as follows:
E=w1w2. . . wn−1wn
T=t1t2. . . tn−1tn
where n is the number of words in the sentence, ti is the POS tag of the word wi.
Given E, the result of the baseNP identification is assumed to be a sequence, in which some words are grouped into baseNP as follows
. . . wi−1[wiwi+1. . . wj]wj+1. . .
The corresponding tag sequence is as follows:
B=. . . ti−1[titi+1. . . tj]tj+1. . .=. . . ti−1bi,jtj+1. . .=n1n2. . . nm (a)
in which bi,j corresponds to the tag sequence of a baseNP: [ti ti+1. . . tj]. bi,j may also be thought of as a baseNP rule. Therefore B is a sequence of both POS tags and baseNP rules. Thus 1≦m≦n, ni ∈(POS tag set∪ baseNP rules set). This is the first expression of a sentence with baseNP annotated. Sometimes, we also use the following equivalent form:
Q=. . . (ti−1,bmi−1) (ti,bmi) (ti+1,bmi+1). . . (tj,bmj) (tj+1,bmj+1). . . =q1q2. . . qn (b)
where each POS tag ti is associated with its positional information bmi with respect to baseNPs. The positional information is one of {F,I,E,O,S}. F, E and I mean respectively that the word is the left boundary, right boundary of a baseNP, or at another position inside a baseNP. O means that the word is outside a baseNP. S marks a single word baseNP.
For example, the two expressions of the example given above are as follows:
B=[NNS]IN[VBG NN]VBD RBR IN[DT JJ NNS] (a)
Q=(NNS S)(IN O)(VBG F)(NN E)(VBD O)(RBR O)(IN O)(DT F)(JJ I)(NNS E)(.O) (b)
An ‘Integrated’ Two-Pass Procedure
The principle of the described approach is as follows. The most probable baseNP sequence B* may be expressed generally as follows:
We separate the whole procedure into two passes, i.e.:
In order to reduce the search space and computational complexity, we only consider the N best POS tagging of E, i.e.
Therefore, we have:
Correspondingly, the algorithm is composed of two steps: determining the N-best POS tagging using Equation (2), and then determining the best baseNP sequence from those POS sequences using Equation (3). The two steps are integrated together, rather than separated as in other approaches. Let us now examine the two steps more closely.
Determining the N Best POS Sequences
The goal of the algorithm in the first pass is to search for the N-best POS-sequences within the search space (POS lattice). According to Bayes' Rule, we have
Since P(E) does not affect the maximizing procedure of P(T|E), equation (2) becomes
We now assume that the words in E are independent. Thus
We then use a trigram model as an approximation of P(T), i.e.:
Finally we have
In the Viterbi algorithm of the N best search, P(wi|ti) is called the lexical generation (or output) probability, and P(ti|ti−2,ti−1) is called the transition probability in the Hidden Markov Model. The Viterbi algorithm is described in Viterbi, Error Bounds for Convolution Codes and Asymptotically Optimum Decoding Algorithm, IEEE Transactions on Information Theory IT-13(2): pp.260-269, April, 1967.
Determining the BaseNPs
As mentioned before, the goal of the second pass is to search the best baseNP-sequence given the N-best POS-sequences.
Considering E,T and B as random variables, according to Bayes' Rule, we have
Since
we have,
Because we search for the best baseNP sequence for each possible POS-sequence of the given sentence E, P(E|T)×P(T)=P(E∩T)=const. Furthermore, from the definition of B, during each search procedure, we have
Therefore, equation (3) becomes
using the independence assumption, we have
With trigram approximation of P(B), we have:
Finally, we obtain
To summarize, in the first step, the Viterbi N-best searching algorithm is applied in the POS tagging procedure and determines a path probability ft for each POS sequence calculated as follows:
In the second step, for each possible POS tagging result, the Viterbi algorithm is applied again to search for the best baseNP sequence. Every baseNP sequence found in this pass is also associated with a path probability
The integrated probability of a baseNP sequence is determined by ftα×fb, where α is a normalization coefficient (α=2.4 in our experiments). When we determine the best baseNP sequence for the given sentence E, we also determine the best POS sequence of E, which corresponds to the best baseNP of E.
As an example of how this can work, consider the following text: “stock was down 9.1 points yesterday morning.” In the first pass, one of the N-best POS tagging results of the sentence is: T=NN VBD RB CD NNS NN NN.
For this POS sequence, the second pass will try to determine the baseNPs as shown in
The Statistical Parameter Training
In this work, the training and testing data were derived from the 25 sections of Penn Treebank. We divided the whole Penn Treebank data into two sections, one for training and the other for testing.
In our statistical model, we calculate the following four probabilities: (1) P(ti−2,ti−1), (2) P(wi|ti), (3) P(ni|ni−2ni−1) and (4) P(wi|ti,bmi). The first and the third parameters are trigrams of T and B respectively. The second and the fourth are lexical generation probabilities. Probabilities (1) and (2) can be calculated from POS tagged data with following formulae:
As each sentence in the training set has both POS tags and baseNP boundary tags, it can be converted to the two sequences as B (a) and Q (b) described in the last section. Using these sequences, parameters (3) and (4) can be calculated with calculation formulas that are similar to equations (13) and (14) respectively.
Before training trigram model (3), all possible baseNP rules should be extracted from the training corpus. For instance, the following three sequences are among the baseNP rules extracted.
DT CD CD NNPS (1)
RB JJ NNS NNS (2)
NN NN POS NN (3)
There are more than 6,000 baseNP rules in the Penn Treebank. When training trigram model (3), we treat those baseNP rules in two ways. First, each baseNP rule is assigned a unique identifier (UID). This means that the algorithm considers the corresponding structure of each baseNP rule. Second, all of those rules are assigned to the same identifier (SID). In this case, those rules are grouped into the same class. Nevertheless, the identifiers of baseNP rules are still different from the identifiers assigned to POS tags.
For parameter smoothing, an approach was used as described in Katz, Estimation of Probabilities from Sparse Data for Language Model Component of Speech Recognize, IEEE Transactions on Acoustics, Speech, and Signal Processing, Volume ASSP-35, pp. 400-401, March 1987. A trigram model was built to predict the probabilities of parameter (1) and (3). In the case that unknown words are encountered during baseNP identification, a parameters (2) and (4) are calculated in the following way:
Here, bmj indicates all possible baseNP labels attached to ti, and tj is a POS tag guessed for the unknown word wi.
Step 500 receives selected text. This step is implemented in connection with a user selecting a portion of text that is to be translated. Typically, a user selects text by using an input device such as a mouse and the like. Step 502 segments words in the selected text. Any suitable segmentation processing can be performed as will be appreciated by those of skill in the art. Step 504 obtains the morphological root of each word. In the illustrated and described embodiment, this step is implemented by a morphological analyzer such as the one shown in
Step 506 characterizes the words using part-of-speech (POS) tagging and base noun phrase identification. Any suitable techniques can be utilized. One exemplary technique is described in detail in the “POS Tagging and BaseNP Identification” section above. Step 508 applies rules-based phrase extension and pattern matching to the characterized words to generate a tree list. In the above example, this step was implemented using a phrase extension module 206 and a pattern or template matching module 208. Step 510 outputs the tree list for further processing.
As an example of a tree list, consider
Exemplary Word Translation Selector
The word translation selector 142 receives the tree lists and generates all possible translation patterns. The selector 142 translates the parsed translation units using a statistical translation and language models to derive top candidate word translations in the native text. The top candidate translations are output.
Step 700 receives a tree list that has been produced according to the processing described above. Step 702 generates translation patterns from the tree list. In one embodiment, all possible translation patterns are generated. For example, for English to Chinese translation, the English noun phrase “NP1 of NP2” may have two kinds of possible translations: (1) T(NP1)+T(NP2), and (2) T(NP2)+T(NP1). In the phrase translation, the translated phrase is a syntax tree and, in one embodiment, all possible translation orders are considered. Step 704 translates parsed translation units using a translation model and language model. The translation units can comprise words and phrases. Step 704 then outputs the top N candidate word translations. The top N candidate word translations can be selected using statistical models.
Exemplary Translation Generator
The translation generator 144 translates the top N candidate word translations to corresponding phrases in the native language. The native words and phrases are then presented via the UI in proximity to the selected text.
Translation generator 144 can include a template module 802 that contains multiple templates that are used in the translation processing. Any suitable templates can be utilized. For example, so-called large phrase templates can be utilized to assist in the translation process. The operation of templates for use in natural language translation is known and is not described here in additional detail.
The translation generator 144 can include a rules module 804 that contains multiple rules that are used to facilitate the translation process. Rules can be hand-drafted rules that are drafted by individuals who are skilled in the specific languages that are the subject of the translation. Rules can be drafted to address issues pertaining to statistical errors in translation, parsing, translation patterns. The principles of rules-based translations will be understood by those of skill in the art.
Translation generator 144 can include one or more statistical models 806 that are used in the translation process. The statistical models that can be used can vary widely, especially given the number of possible non-native and native languages relative to which translation is desired. The statistical models can be based on the above-described POS and baseNP statistical parameters. In a specific implementation where it is desired to translate from English to Chinese, the following models can be used: Chinese Trigram Language Model and the Chinese Mutual Information Model. Other models can, of course, be used.
The above-described modules and models can be used separately or in various combinations with one another.
At this point in the processing, a user has selected a portion of non-native language text that is to be translated into a native language. The selected text has been processed as described above. In the discussion that is provided just below, methods and systems are described that present the translated text to the user in a manner that is convenient and efficient for the user.
Reading Wizard User Interface
The remaining discussion is directed to features of the user interface 134 when presenting the reading wizard. In particular, the reading wizard user interface 134 permits the user to select text written in a non-native language that the user is unsure how to read and interpret. The selection may be an individual word, phrase, or sentence.
Notice that the translation window 902 is located adjacent at least a portion of the highlighted text. By locating the translation window in this manner, the user is not required to divert their attention very far from the highlighted text in order to see the translated text. This is advantageous because it does not slow the user's reading process down an undesirable amount. Notice also that the translation window contains a drop down arrow 904 that can be used to expose other translated versions of the selected text. As an example, consider
The embodiments described above help a user read a non-native language and can permit users who are more comfortable communicating in a native language, to extensively read non-native language electronic documents quickly, conveniently, and in a manner that promotes focus and rapid assimilation of the subject matter. User convenience can be enhanced by providing a user interface with a translation window (containing the translated text) closely adjacent the text being translated. By positioning the translation window closely adjacent the translated text, the user's eyes are not required to move very far to ascertain the translated text. This, in turn, reduces user-perceptible distraction that might otherwise persist if, for example, the user were required to glance a distance away in order to view the translated text. User interaction is further enhanced, in some embodiments, by virtue of a mouse point translation process. A user is able, by positioning a mouse to select a portion of text, to quickly make their selection, whereupon the system automatically performs a translation and presents translated text to the user.
Although the invention has been described in language specific to structural features and/or methodological steps, it is to be understood that the invention defined in the appended claims is not necessarily limited to the specific features or steps described. Rather, the specific features and steps are disclosed as preferred forms of implementing the claimed invention.
This application stems from and claims priority to U.S. patent application Ser. No. 09/840,772, filed on Apr. 23, 2001 which, in turn, claimed priority to U.S. Provisional Application Ser. No. 60/199,288, filed on Apr. 24, 2000, the disclosures of which is expressly incorporated herein by reference. This application is also related to U.S. patent application Ser. No. 09/556,229, filed on Apr. 24, 2000, the disclosure of which is incorporated by reference herein.
Number | Name | Date | Kind |
---|---|---|---|
4814988 | Shiotani et al. | Mar 1989 | A |
4866670 | Adachi et al. | Sep 1989 | A |
5373441 | Hirai et al. | Dec 1994 | A |
5477451 | Brown et al. | Dec 1995 | A |
5822720 | Bookman et al. | Oct 1998 | A |
5854997 | Sukeda et al. | Dec 1998 | A |
5882202 | Sameth et al. | Mar 1999 | A |
6092034 | McCarley et al. | Jul 2000 | A |
6092036 | Hamann | Jul 2000 | A |
6138087 | Budzinski | Oct 2000 | A |
6139201 | Carbonell et al. | Oct 2000 | A |
6163785 | Carbonell et al. | Dec 2000 | A |
6327566 | Vanbuskirk et al. | Dec 2001 | B1 |
6434518 | Glenn | Aug 2002 | B1 |
6516296 | Fuji | Feb 2003 | B1 |
6778949 | Duan et al. | Aug 2004 | B2 |
6876963 | Miyahira et al. | Apr 2005 | B1 |
20020007265 | Yamada | Jan 2002 | A1 |
Number | Date | Country |
---|---|---|
1250915 | Apr 2000 | CN |
Number | Date | Country | |
---|---|---|---|
20050071173 A1 | Mar 2005 | US |
Number | Date | Country | |
---|---|---|---|
60199288 | Apr 2000 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 09840772 | Apr 2001 | US |
Child | 10966809 | US |