1. Technical Field
The present disclosure relates to language disambiguating systems and a method relating thereto.
2. Description of Related Art
When words or phrases are ambiguous, there is more than one interpretation. When translating from one language into another, there is a need to resolve any ambiguities to ensure full and correct understanding of sentences.
Therefore, it is desirable to provide a disambiguating system and method, which can overcome the above-mentioned problem.
Embodiments of the disclosure will be described with reference to the accompanying drawings.
The disambiguating system 10 includes an interface 100, a storage unit 200, and a processor 300. The disambiguating system 10 exchanges information with the application system 20 via the interface 100. For example, the disambiguating system 10 receives sentences for disambiguation from the application system 20 via the interface 100, and the application system 20 receives outputs from the disambiguating system 10 via the interface 100.
The storage unit 200 stores a first database 2100 and a second database 2200. The first database 2100 includes a dictionary of ambiguous language data, such as ambiguous words and/or phrases. The second database 2200 includes a collection of disambiguating algorithms, such as disambiguating algorithms based on professional semantics, colloquial semantics, and context. Each piece of ambiguous language data in the dictionary is associated with at least one disambiguating algorithm.
The processor 300 includes a recognition module 3100, a disambiguating module 3200, a selection module 3300, and an output module 3400.
The recognition module 3100 receives a sentence or other input from the application system 10 via the interface and recognizes if the sentence includes a piece of ambiguous language data which is defined in the dictionary. In detail, the recognition module 3100 searches each word and phrase of the sentence, and determines whether the words and/or phrases are ambiguous. For example, the recognition module 3100 searches and finds the phrase “underground factory” in the sentence. The sentence “[T]his is an underground factory and should be banned” is ambiguous as the phrase “underground factory” is defined in the dictionary as having a special meaning, and in the sentence “I went fishing for some sea bass” the word “bass” is also ambiguous. The word “mouse” is another example of a word with more than one meaning, in the sentence “I killed a mouse this morning”.
The first database 2100 also includes distinct and different definitions of the phrase “underground factory” and the words “bass” and “mouse” in the dictionary. For example, the phrase “underground factory” has two distinct definitions: (1) an illegal factory (colloquial semantics), and (2) a factory operating below the surface of the earth. The word “bass” also has two distinct definitions: (1) a type of fish, and (2) audible tones of low frequency. The word “mouse” also has two distinct definitions: (1) small rodent, and (2) a computer input device.
The first database 2100 also associates the phrase “underground factory” with the disambiguating algorithms based on colloquial semantics and context, and the words “bass”, and “mouse,” with the disambiguating algorithms based upon professional semantics and context.
The disambiguating module 3200 is to disambiguate the recognized piece of ambiguous language date to generate results of disambiguating, using the associated disambiguating algorithm(s) of the output from the recognition module 3100. For example, the disambiguating module 3200 interprets the phrase “underground factory” as “an illegal factory” using the disambiguating algorithms based on colloquial semantics and context (the word “banned” in the context provides enough evidence to prompt disambiguation of the phrase “underground factory”). The disambiguating module 3200 interprets the word “bass” as a type of fish using the disambiguating algorithms based on professional semantics and context (the word “fishing” and “sea” in the context provide enough evidence to prompt disambiguation of the word “bass”). The disambiguating module 3200 interprets the word “mouse” as a computer input device using the disambiguating algorithms based on professional semantics and as a small rodent using disambiguating algorithm based on context (the word “killed” in the context provides evidence to prompt disambiguation of the word “mouse”).
The selection module 3300 selects an interpretation from results, using various methods such as decision tree. For example, the selection module 3300 selects “illegal factory” as the definition of the phrase “underground factory” because both the disambiguating algorithms based on colloquial semantics and context yield the same result of “illegal factory” . The selection module 3300 selects “a type of fish” as the appropriate definition of the word “bass” as both the disambiguating algorithms based on professional semantics and context result in the interpretation “a type of fish”. The selection module 3300 selects “a small rodent” instead of “a computer input device” as the interpretation of the meaning of the word “mouse” using decision tree method.
The output module 3400 outputs the interpretations.
In step S21, the recognition module 3100 receives a sentence from the application system 10 via the interface 100.
In step S22, the recognition module 3100 recognizes if a piece of ambiguous language data which is defined in the dictionary is existed in the sentence.
In step S23, the disambiguating module 3200 disambiguates the recognized piece of ambiguous language data to produce one or more results of disambiguating, utilizing the at least one associated disambiguating algorithm, and generate results of disambiguating.
In step S24, the selection module 3300 selects an interpretation from the results.
In step S25, the output module 3400 outputs the interpretation to the application system 10 via the interface 100.
In another embodiment, the first database 2100 and the second database 2200 can be updated by a user to edit (e.g., add, change, or delete) the language data and disambiguating algorithms.
Particular embodiments are shown here and described by way of illustration only. The principles and the features of the present disclosure may be employed in various and numerous embodiments thereof without departing from the scope of the disclosure as claimed. The above-described embodiments illustrate the scope of the disclosure but do not restrict the scope of the disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 201210051144.0 | Mar 2012 | CN | national |