This application also claims the benefit of priority under 35 USC 119 to Russian Patent Application No. 2013156494, filed Dec. 19, 2013; the disclosure of the priority application is incorporated herein by reference.
There are a lot of ambiguous words in many languages, i.e., words that have several meanings. When a human finds such word in text he/she can unmistakably select the proper meaning depending on context and intuition. Another situation is when a text is analyzed by a computer system. Existing systems for text disambiguation are mostly based on lexical resources, such as dictionaries. Given a word, such methods extract from the lexical resource all possible meanings of this word. Then various methods may be applied to find out which of these meanings of the word is the correct one. The majority of these methods are statistical, i.e. based on analyzing large text corpora, while some are based on the dictionary information (e.g., counting overlaps between dictionary gloss and word's local context). Given a word which is to be disambiguated, such methods usually solve a classification problem (i.e., possible meanings of the word are considered as categories, and the word has to be classified into one of them).
Existing methods address the problem of disambiguation of polysemous words and homonyms, the methods consider as polysemous and homonyms those words that appear several times in the used sense inventory. Neither of the methods deals with words that do not appear at all in the used lexical resource. Sense inventories used by existing methods do not allow changes and do not reflect the changes going on in the language. Only a few methods are based on Wikipedia but the methods themselves do not make any changes in the sense inventory and those.
Nowadays, the world changes rapidly, many new technologies and products appear, and the language changes respectively. New words to denote new concepts appear as well as new meaning of some existing words. Therefore, methods for text disambiguation should be able to deal efficiently with new words that are not covered by used sense inventory, to add these concepts to the sense inventory and thus, use them during further analysis.
An exemplary embodiment relates to method. The method includes, but is not limited to any of the combination of: receiving text by a computing device, the text including a word; comparing, by a processor of the computing device, the word in the text to inventory words in a sense inventory, wherein the sense inventory comprises at least one inventory word and at least one concept corresponding to the at least one inventory word; responsive to matching the word to an inventory word in the sense inventory, identifying a concept for the word by comparing each concept related to the inventory word to the word; responsive to identifying the concept that is correct for the word, assigning the concept to the word; and responsive to not identifying the concept that is correct for the word, adding a new concept to the sense inventory for the inventory word.
Another exemplary embodiment relates to a system. The system includes one or more data processors. The system further includes one or more storage devices storing instructions that, when executed by the one or more data processors, cause the one or more data processors to perform operations comprising: receiving text by a computing device, the text including a word; comparing, by a processor of the computing device, the word in the text to inventory words in a sense inventory, wherein the sense inventory comprises at least one inventory word and at least one concept corresponding to the at least one inventory word; responsive to matching the word to an inventory word in the sense inventory, identifying a concept for the word by comparing each concept related to the inventory word to the word; responsive to identifying the concept that is correct for the word, assigning the concept to the word; and responsive to not identifying the concept that is correct for the word, adding a new concept to the sense inventory for the inventory word.
Yet another exemplary embodiment relates to computer readable storage medium having machine instructions stored therein, the instructions being executable by a processor to cause the processor to perform operations comprising: receiving text by a computing device, the text including a word; comparing, by a processor of the computing device, the word in the text to inventory words in a sense inventory, wherein the sense inventory comprises at least one inventory word and at least one concept corresponding to the at least one inventory word; responsive to matching the word to an inventory word in the sense inventory, identifying a concept for the word by comparing each concept related to the inventory word to the word; responsive to identifying the concept that is correct for the word, assigning the concept to the word; and responsive to not identifying the concept that is correct for the word, adding a new concept to the sense inventory for the inventory word.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the disclosure will become apparent from the description, the drawings, and the claims, in which:
Like reference numbers and designations in the various drawings indicate like elements.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding concepts underlying the described embodiments. It will be apparent, however, to one skilled in the art that the described embodiments can be practiced without some or all of these specific details. In other instances, structures and devices are shown only in block diagram form in order to avoid obscuring the described embodiments. Some process steps have not been described in detail in order to avoid unnecessarily obscuring the underlying concept.
According to various embodiments disclosed herein, a method and a system for semantic disambiguation of text based on sense inventory with hierarchical structure or semantic hierarchy and method of adding concepts to semantic hierarchy are provided. The semantic classes, as part of linguistic descriptions, are arranged into a semantic hierarchy comprising hierarchical parent-child relationships. In general, a child semantic class inherits many or most properties of its direct parent and all ancestral semantic classes. For example, semantic class SUBSTANCE is a child of semantic class ENTITY and at the same time it is a parent of semantic classes GAS, LIQUID, METAL, WOOD_MATERIAL, etc.
Each semantic class in the semantic hierarchy is supplied with a deep model. The deep model of the semantic class is a set of deep slots. Deep slots reflect the semantic roles of child constituents in various sentences with objects of the semantic class as the core of a parent constituent and the possible semantic classes as fillers of deep slots. The deep slots express semantic relationships between constituents, including, for example, “agent”, “addressee”, “instrument”, “quantity”, etc. A child semantic class inherits and adjusts the deep model of its direct parent semantic class.
At least some of the embodiments utilize exhaustive text analysis technology, which uses wide variety of linguistic descriptions described in U.S. Pat. No. 8,078,450. The analysis includes lexico-morphological, syntactic and semantic analysis, as a result language-independent semantic structures, where each word is mapped to the corresponding semantic class, is constructed.
If the word appears two or more times in the sense inventory, the method decides (106) which of the concepts, if any, is the correct one for the word 101. This may be done by applying any existing word concept disambiguation method. If one of the concepts is found to be correct for the word, the word is identified with the corresponding concept of the sense inventory 108. Otherwise, new concept is added to the sense inventory 104. The parent object of the concept to be inserted may be identified by statistically analyzing each level of the hierarchy starting from the root and in each step choosing the most probable node. The probability of each node is based on text corpora.
If the word does not appear at all in the sense inventory, the corresponding sense is inserted in the sense inventory 104. The parent object of the concept to be inserted may be identified by statistically analyzing each level of the hierarchy starting from the root and in each step choosing the most probable node. The probability of each node is based on text corpora. In another embodiment, the method may disambiguate only one word or a few words in context, while other words are treated only as context and do not need to be disambiguated.
In one embodiment, the exhaustive analysis techniques may be utilized.
An index may comprise and may be represented as a table where each value of a feature (for example, a word, expression, or phrase) in a document is accompanied by a list of numbers or addresses of its occurrence in that document. In some embodiments, morphological, syntactic, lexical, and semantic features can be indexed in the same fashion as each word in a document is indexed. In one embodiment, indexes may be produced to index all or at least one value of morphological, syntactic, lexical, and semantic features (parameters). These parameters or values are generated during a two-stage semantic analysis described in more detail below. The index may be used to facilitate such operations of natural language processing such as disambiguating words in documents.
Accordingly, a rough syntactic analysis is performed on the source sentence to generate a graph of generalized constituents 332 for further syntactic analysis. All reasonably possible surface syntactic models for each element of lexical-morphological structure are applied, and all the possible constituents are built and generalized to represent all the possible variants of parsing the sentence syntactically.
Following the rough syntactic analysis, a precise syntactic analysis is performed on the graph of generalized constituents to generate one or more syntactic trees 342 to represent the source sentence. In one implementation, generating one or more syntactic trees 342 comprises choosing between lexical options and choosing between relations from the graphs. Many prior and statistical ratings may be used during the process of choosing between lexical options, and in choosing between relations from the graph. The prior and statistical ratings may also be used for assessment of parts of the generated tree and for the whole tree. In one implementation, the one or more syntactic trees may be generated or arranged in order of decreasing assessment. Thus, the best syntactic tree 346 may be generated first. Non-tree links may also be checked and generated for each syntactic tree at this time. If the first generated syntactic tree fails, for example, because of an impossibility to establish non-tree links, the second syntactic tree may be taken as the best, etc.
Many lexical, grammatical, syntactical, pragmatic, semantic features may be extracted during the steps of analysis. For example, the system can extract and store lexical information and information about belonging lexical items to semantic classes, information about grammatical forms and linear order, about syntactic relations and surface slots, using predefined forms, aspects, sentiment features such as positive-negative relations, deep slots, non-tree links, semantemes, etc. With reference to
The analysis methods ensure that the maximum accuracy in conveying or understanding the meaning of the sentence is achieved.
The language-independent semantic structure (LISS) 352 (constructed in block 207 in
Each semantic class in the semantic hierarchy may be supplied with a deep model. The deep model of the semantic class is a set of deep slots. Deep slots reflect the semantic roles of child constituents in various sentences with objects of the semantic class as the core of a parent constituent and the possible semantic classes as fillers of deep slots. The deep slots express semantic relationships between constituents, including, for example, “agent”, “addressee”, “instrument”, “quantity”, etc. A child semantic class inherits and adjusts the deep model of its direct parent semantic class.
With reference to
Semantic descriptions 204 are language-independent. Semantic descriptions 204 may provide descriptions of deep constituents, and may comprise a semantic hierarchy, deep slots descriptions, a system of semantemes, and pragmatic descriptions.
With reference to
With reference to
The surface models 810 are represented as aggregates of one or more syntactic forms (“syntforms” 812) in order to describe possible syntactic structures of sentences as included in the syntactic description 102. In general, the lexical meaning of a language is linked to their surface (syntactic) models 810, which represent constituents which are possible when the lexical meaning functions as a “core” and includes a set of surface slots of child elements, a description of the linear order, diatheses, among others.
The surface models 810 as represented by syntforms 812. Each syntform 812 may include a certain lexical meaning which functions as a “core” and may further include a set of surface slots 815 of its child constituents, a linear order description 816, diatheses 817, grammatical values 814, government and agreement descriptions 840, communicative descriptions 880, among others, in relationship to the core of the constituent.
The surface slot descriptions 820 as a part of syntactic descriptions 102 are used to describe the general properties of the surface slots 815 that are used in the surface models 810 of various lexical meanings in the source language. The surface slots 815 are used to express syntactic relationships between the constituents of the sentence. Examples of the surface slot 815 may include “subject”, “object_direct”, “object_indirect”, “relative clause”, among others.
During the syntactic analysis, the constituent model utilizes a plurality of the surface slots 815 of the child constituents and their linear order descriptions 816 and describes the grammatical values 814 of the possible fillers of these surface slots 815. The diatheses 817 represent correspondences between the surface slots 815 and deep slots 514 (as shown in
The syntactic forms, syntforms 812, are a set of the surface slots 815 coupled with the linear order descriptions 816. One or more constituents possible for a lexical meaning of a word form of a source sentence may be represented by surface syntactic models, such as the surface models 810. Every constituent is viewed as the realization of the constituent model by means of selecting a corresponding syntform 812. The selected syntactic forms, the syntforms 812, are sets of the surface slots 815 with a specified linear order. Every surface slot in a syntform can have grammatical and semantic restrictions on their fillers.
The linear order description 816 is represented as linear order expressions which are built to express a sequence in which various surface slots 815 can occur in the sentence. The linear order expressions may include names of variables, names of surface slots, parenthesis, grammemes, ratings, and the “or” operator, etc. For example, a linear order description for a simple sentence of “Boys play football.” may be represented as “Subject Core Object_Direct”, where “Subject, Object_Direct” are names of surface slots 815 corresponding to the word order. Fillers of the surface slots 815 indicated by symbols of entities of the sentence are present in the same order for the entities in the linear order expressions.
Different surface slots 815 may be in a strict and/or variable relationship in the syntform 812. For example, parenthesis may be used to build the linear order expressions and describe strict linear order relationships between different surface slots 815. SurfaceSlot1 SurfaceSlot2 or (SurfaceSlot1 SurfaceSlot2) means that both surface slots are located in the same linear order expression, but only one order of these surface slots relative to each other is possible such that SurfaceSlot2 follows after SurfaceSlot1.
As another example, square brackets may be used to build the linear order expressions and describe variable linear order relationships between different surface slots 815 of the syntform 812. As such, [SurfaceSlot1 SurfaceSlot2] indicates that both surface slots belong to the same variable of the linear order and their order relative to each other is not relevant.
The linear order expressions of the linear order description 816 may contain grammatical values 814, expressed by grammemes, to which child constituents correspond. In addition, two linear order expressions can be joined by the operator |(<<OR>>). For example: (Subject Core Object)|[Subject Core Object].
The communicative descriptions 880 describe a word order in the syntform 812 from the point of view of communicative acts to be represented as communicative order expressions, which are similar to linear order expressions. The government and agreement description 840 contains rules and restrictions on grammatical values of attached constituents which are used during syntactic analysis.
The non-tree syntax descriptions 850 are related to processing various linguistic phenomena, such as, ellipsis and coordination, and are used in syntactic structures transformations which are generated during various steps of analysis according to embodiments of the invention. The non-tree syntax descriptions 850 include ellipsis description 852, coordination description 854, as well as, referential and structural control description 830, among others.
The analysis rules 860 as a part of the syntactic descriptions 202 may include, but not limited to, semantemes calculating rules 862 and normalization rules 864. Although analysis rules 860 are used during the step of semantic analysis 150, the analysis rules 860 generally describe properties of a specific language and are related to the syntactic descriptions 102. The normalization rules 864 are generally used as transformational rules to describe transformations of semantic structures which may be different in various languages.
The semantic hierarchy 910 is comprised of semantic notions (semantic entities) and named semantic classes arranged into hierarchical parent-child relationships similar to a tree. In general, a child semantic class inherits most properties of its direct parent and all ancestral semantic classes. For example, semantic class SUBSTANCE is a child of semantic class ENTITY and the parent of semantic classes GAS, LIQUID, METAL, WOOD_MATERIAL, etc.
Each semantic class in the semantic hierarchy 910 is supplied with a deep model 912. The deep model 912 of the semantic class is a set of the deep slots 914, which reflect the semantic roles of child constituents in various sentences with objects of the semantic class as the core of a parent constituent and the possible semantic classes as fillers of deep slots. The deep slots 914 express semantic relationships, including, for example, “agent”, “addressee”, “instrument”, “quantity”, etc. A child semantic class inherits and adjusts the deep model 912 of its direct parent semantic class
The deep slots descriptions 920 are used to describe the general properties of the deep slots 914 and reflect the semantic roles of child constituents in the deep models 912. The deep slots descriptions 920 also contain grammatical and semantic restrictions of the fillers of the deep slots 914. The properties and restrictions for the deep slots 914 and their possible fillers are very similar and often times identical among different languages. Thus, the deep slots 914 are language-independent.
The system of semantemes 930 represents a set of semantic categories and semantemes, which represent the meanings of the semantic categories. As an example, a semantic category, “DegreeOfComparison”, can be used to describe the degree of comparison and its semantemes may be, for example, “Positive”, “ComparativeHigherDegree”, “SuperlativeHighestDegree”, among others. As another example, a semantic category, “RelationToReferencePoint”, can be used to describe an order as before or after a reference point and its semantemes may be, “Previous”, “Subsequent”, respectively, and the order may be spatial or temporal in a broad sense of the words being analyzed. As yet another example, a semantic category, “EvaluationObjective”, can be used to describe an objective assessment, such as “Bad”, “Good”, etc.
The systems of semantemes 930 include language-independent semantic attributes which express not only semantic characteristics but also stylistic, pragmatic and communicative characteristics. Some semantemes can be used to express an atomic meaning which finds a regular grammatical and/or lexical expression in a language. By their purpose and usage, the system of semantemes 930 may be divided into various kinds, including, but not limited to, grammatical semantemes 932, lexical semantemes 934, and classifying grammatical (differentiating) semantemes 936.
The grammatical semantemes 932 are used to describe grammatical properties of constituents when transforming a syntactic tree into a semantic structure. The lexical semantemes 934 describe specific properties of objects (for example, “being flat” or “being liquid”) and are used in the deep slot descriptions 920 as restriction for deep slot fillers (for example, for the verbs “face (with)” and “flood”, respectively). The classifying grammatical (differentiating) semantemes 936 express the differentiating properties of objects within a single semantic class, for example, in the semantic class HAIRDRESSER the semanteme <<RelatedToMen>> is assigned to the lexical meaning “barber”, unlike other lexical meanings which also belong to this class, such as “hairdresser”, “hairstylist”, etc.
The pragmatic description 940 allows the system to assign a corresponding theme, style or genre to texts and objects of the semantic hierarchy 910. For example, “Economic Policy”, “Foreign Policy”, “Justice”, “Legislation”, “Trade”, “Finance”, etc. Pragmatic properties can also be expressed by semantemes. For example, pragmatic context may be taken into consideration during the semantic analysis.
Also, any element of the language description 610 may be extracted during an exhaustive analysis of texts, and any element may be indexed (the index for the feature are created). The indexes or indices may be stored and used for the task of classifying, clustering and filtering text documents written in one or more languages. Indexing of semantic classes is important and helpful for solving these tasks. Syntactic structures and semantic structures also may be indexed and stored for using in semantic searching, classifying, clustering and filtering.
The disclosed techniques include methods to add new concepts to semantic hierarchy. It may be needed to deal with specific terminology which is not included in the hierarchy. For example, semantic hierarchy may be used for machine translation of technical texts that include specific rare terms. In this example, it may be useful to add these terms to the hierarchy before using it in translation.
In one embodiment, the process of adding a term into the hierarchy could be manual, i.e. an advanced user may be allowed to insert the term in a particular place and optionally specify grammatical properties of the inserted term. This could be done, for example, by mentioning the parent semantic class of the term. For example, when it may be required to add a new word “Netangin” to the hierarchy, which is a medicine to treat tonsillitis, a user may specify MEDICINE as the parent semantic class. In some cases, words can be added to several semantic classes. For example, e.g. some medicines may be added to MEDICINE and as well to SUBSTANCE classes, because their names could refer to medicines or corresponding active substances.
In one embodiment, a user may be provided with a graphical user interface to facilitate the process of adding new terms. This graphical user interface may provide a user with a list of possible parent semantic classes for a new term. This provided list may either be predefined or maybe created according to a word by searching the most probable semantic classes for this new term. This searching for possible semantic classes may be done by analyzing word's structure. In one embodiment, analyzing word's structure may imply constructing character n-gram representation of words and/or computing words similarity. Character n-gram is a sequence of n characters, for example the word “Netangin” may be represented as the following set of character 2-grams (bigrams): [“Ne”, “et”, “ta”, “an”, “ng”, “gi”, “in”]. In another embodiment, analyzing a word's structure may include identifying words morphemes (e.g., its ending, prefixes and suffixes). For example, the “in” ending is common for medicines and Russian surnames. That's why at least the two semantic classes corresponding to these two concepts could appear in the mentioned list.
In one embodiment, the mentioned interface may allow a user to choose words similar to the one to be added. This could be done to facilitate the process of adding new concepts. Some lists of well-known instances of semantic classes could be shown to a user. In some cases, a list of concepts may represent a semantic class better than its name. For example, a user having a sentence “Petrov was born in Moscow in 1971” may not know that “ov” is a typical ending of Russian male surnames and may have doubts if “Ivanov” is a name or a surname of a person. The user may be provided with a list including “Ivanov”, “Sidorov”, “Bolshov” which are all surnames, and a list of personal names neither of which has the same ending, then it will be easier for a user to make the right decision.
In one embodiment, a user may be provided with a graphical user interface allowing adding new concepts directly to the hierarchy. User may see the hierarchy and be able to find through the graphical user interface places where the concepts are to be added. In another embodiment, user may be suggested to select a child node of a node of the hierarchy, starting from the root, until the correct node is found.
In one embodiment, the semantic hierarchy has a number of semantic classes that allow new concepts to be inserted. It could be either the whole hierarchy (i.e., all semantic classes it includes) or a subset of concepts. The list of updatable semantic classes may be either predefined (e.g., as the list of possible named-entity types, i.e. PERSON, ORGANIZATION etc.) or it may be generated according to the word to be added. In one embodiment, the user may be provided with a graphical user interface asking a user if the word to be added is an instance of a particular semantic class.
In one embodiment, the semantic hierarchy has a number of semantic classes that allow new concepts to be inserted. It could be either the whole hierarchy, (i.e., all semantic classes it includes), or a subset of concepts. The list of updatable semantic classes may be either predefined (e.g., as the list of possible named-entity types, i.e., PERSON, ORGANIZATION etc.) or it may be generated according to the word to be added.
Added terms may be saved in an additional file which could be then added to the semantic hierarchy by a user. In another embodiment, these terms may appear as a part of the hierarchy.
Since the semantic hierarchy may be language independent, the disclosed techniques allow to process words and texts in one or many languages.
Since at least one unknown word in the first language was detected, at step 1104, a parallel corpus is selected. At least one second language different from the first language is selected (1104). The parallel corpus should be a corpus or texts it two languages with at least partial alignment. The alignment may be by sentences, that is each sentence in the first language is corresponded to a sentence of the second language. It may be, for example, Translation Memory (TM) or other resources. The aligned parallel texts may be provided by any method of alignment, for example, using a two-language dictionary, or using the method disclosed in U.S. patent application Ser. No. 13/464,447. In some embodiments, the only requirement to the second language selection may be that the second language also can be analyzed by the above mentioned analyzer based on exhaustive text analysis technology, that is all necessary language-specific linguistic descriptions according to
For each second language, a pair of texts with at least partial alignment is received (1105). The said found before unknown words are searched (1106) in the first language part of the texts. For the sentences containing the unknown words and for the aligned with them sentences in the second languages, language independent semantic structures are constructed and compared (1107). The language-independent semantic structure (LISS) of a sentence is represented as acyclic graph (a tree supplemented with non-tree links) where each word of specific language is substituted with its universal (language-independent) semantic notions or semantic entities referred to herein as “semantic classes”. Also, the relations between items of the sentence is marked with language-independent notions—deep slots 914. The semantic structure is built as result of the exhaustive syntactic and semantic analysis, also described in details in U.S. Pat. No. 8,078,450. So, if two sentences in two different languages have the same sense (meaning), for example, they are the result of exact and careful translation of each other, then their semantic structures must be identical or very similar.
If the semantic structures of the found pairs of sentences are identical, that means they have the same configuration with the same semantic classes in nodes, excluding the node corresponding the unknown words, and with the same deep slots as arcs.
For each unknown word, one or more semantic classes of a word (words) aligned with it are found (1108). Referring to
And all unknown words are mapped (1109) to the corresponding semantic classes. If such correspondence is established, it is possible to map and add the unknown word to corresponding semantic class with the semantic properties which can be extracted from corresponding lexical meaning in another language. It means that the lexical meaning “” will be added to the Russian part of the semantic hierarchy 910 into the semantic class MONTBLANC as its corresponded English lexical meaning “Mont Blanc” and it will inherit a syntactic model and other attributed of its parent semantic class MOUNTAIN.
Still referring
The list of the semantic classes may be predefined. For example, new concepts may be allowed only in “PERSON”, “LOCATION” and “ORGANIZATION” semantic classes. In this example, these semantic classes are the categories. The list of the semantic classes may be constructed by a method, which chooses the most probable classes from all classes in the semantic hierarchy, which in turn may be done applying machine learning techniques. The classes may be ranked according to the probability that the given word is an instance of such class. The ranking may be produced with a supervised method based on corpora. Then the top-k where k may be user-defined or an optimal number found by statistical methods. These predefined or found semantic classes represent the categories, to one or many of which the word is to be assigned. Then, a classifier is built (1305) using the text corpora 1304 (e.g., Naïve Bayes classifier). The word is classified (1306) into one or more of the possible categories (i.e., semantic classes 1303). Finally, the word is added (1307) to the hierarchy as an instance of the found semantic class (classes).
In one embodiment, disambiguation may be done in the form of verifying hypothesis. First, given an unknown word all semantic classes may be ranked according to the probability of the unknown word to be an object of this semantic class. Then, the hypothesis is that the unknown word is an instance of the first ranked semantic class. This hypothesis is then checked with statistical analysis of the text corpora. It may be done with the help of indices 209. If the hypothesis is rejected, the new hypothesis that the unknown word is an instance of the second ranked semantic class, may be formulated. And so on until the hypothesis is accepted. In another embodiment, semantic class for a word may be chosen with existing word sense disambiguation techniques.
The hardware 1400 may receive a number of inputs and outputs for communicating information externally. For interface with a user or operator, the hardware 1400 may include one or more user input devices 1406 (e.g., a keyboard, a mouse, imaging device, scanner, microphone) and a one or more output devices 1408 (e.g., a Liquid Crystal Display (LCD) panel, a sound playback device (speaker)). To embody the present invention, the hardware 1400 may include at least one screen device.
For additional storage, the hardware 1400 may also include one or more mass storage devices 1410, e.g., a floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g. a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive) and/or a tape drive, among others. Furthermore, the hardware 1400 may include an interface with one or more networks 1412 (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the hardware 1400 typically includes suitable analog and/or digital interfaces between the processor 1402 and each of the components 1404, 1406, 1408, and 1412 as is well known in the art.
The hardware 1400 operates under the control of an operating system 1414, and executes various computer software applications, components, programs, objects, modules, etc. to implement the techniques described above. Moreover, various applications, components, programs, objects, etc., collectively indicated by application software 1416 in
In general, the routines executed to implement the embodiments of the present disclosure may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as a “computer program.” A computer program typically comprises one or more instruction sets at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects of the invention. Moreover, while the invention has been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments of the invention are capable of being distributed as a program product in a variety of forms, and that the invention applies equally to actually effect the distribution regardless of the particular type of computer-readable media used. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD-ROMs), Digital Versatile Disks (DVDs), flash memory, etc.), among others. Another type of distribution may be implemented as Internet downloads.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the broad invention and that the present disclosure is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In an area of technology such as this, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modified or re-arranged in one or more of its details as facilitated by enabling technological advancements without departing from the principals of the present disclosure.
Implementations of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, computer software, firmware or hardware, including the structures disclosed in this specification and their structural equivalents or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on one or more computer storage medium for execution by, or to control the operation of data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate components or media (e.g., multiple CDs, disks, or other storage devices). Accordingly, the computer storage medium may be tangible and non-transitory.
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The term “client or “server” includes a variety of apparatuses, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, a code that creates an execution environment for the computer program in question, e.g., a code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, or a portable storage device (e.g., a universal serial bus (USB) flash drive). Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display), OLED (organic light emitting diode), TFT (thin-film transistor), plasma, other flexible configuration, or any other monitor for displaying information to the user and a keyboard, a pointing device, e.g., a mouse, trackball, etc., or a touch screen, touch pad, etc., by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user. For example, by sending webpages to a web browser on a user's client device in response to requests received from the web browser.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown, in sequential order or that all illustrated operations be performed to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking or parallel processing may be utilized.
Number | Date | Country | Kind |
---|---|---|---|
2013156494 | Dec 2013 | RU | national |
Number | Name | Date | Kind |
---|---|---|---|
4706212 | Toma | Nov 1987 | A |
5068789 | Van Vliembergen | Nov 1991 | A |
5128865 | Sadler | Jul 1992 | A |
5146405 | Church | Sep 1992 | A |
5175684 | Chong | Dec 1992 | A |
5268839 | Kaji | Dec 1993 | A |
5301109 | Landauer et al. | Apr 1994 | A |
5386556 | Hedin et al. | Jan 1995 | A |
5418717 | Su et al. | May 1995 | A |
5426583 | Uribe-Echebarria Diaz De Mendibil | Jun 1995 | A |
5475587 | Anick et al. | Dec 1995 | A |
5477451 | Brown et al. | Dec 1995 | A |
5490061 | Tolin et al. | Feb 1996 | A |
5497319 | Chong et al. | Mar 1996 | A |
5510981 | Berger et al. | Apr 1996 | A |
5550934 | Van Vliembergen et al. | Aug 1996 | A |
5559693 | Anick et al. | Sep 1996 | A |
5669007 | Tateishi | Sep 1997 | A |
5677835 | Carbonnell et al. | Oct 1997 | A |
5678051 | Aoyama | Oct 1997 | A |
5687383 | Nakayama et al. | Nov 1997 | A |
5696980 | Brew | Dec 1997 | A |
5715468 | Budzinski | Feb 1998 | A |
5721938 | Stuckey | Feb 1998 | A |
5724593 | Hargrave et al. | Mar 1998 | A |
5729741 | Liaguno et al. | Mar 1998 | A |
5737617 | Bernth et al. | Apr 1998 | A |
5752051 | Cohen | May 1998 | A |
5768603 | Brown et al. | Jun 1998 | A |
5784489 | Van Vliembergen et al. | Jul 1998 | A |
5787410 | McMahon | Jul 1998 | A |
5794050 | Dahlgren et al. | Aug 1998 | A |
5794177 | Carus et al. | Aug 1998 | A |
5826219 | Kutsumi | Oct 1998 | A |
5826220 | Takeda et al. | Oct 1998 | A |
5848385 | Poznanski et al. | Dec 1998 | A |
5867811 | O'Donoghue | Feb 1999 | A |
5873056 | Liddy et al. | Feb 1999 | A |
5884247 | Christy | Mar 1999 | A |
5995920 | Carbonell et al. | Nov 1999 | A |
6006221 | Liddy et al. | Dec 1999 | A |
6055528 | Evans | Apr 2000 | A |
6076051 | Messerly et al. | Jun 2000 | A |
6081774 | De Hita et al. | Jun 2000 | A |
6161083 | Franz et al. | Dec 2000 | A |
6182028 | Karaali et al. | Jan 2001 | B1 |
6223150 | Duan et al. | Apr 2001 | B1 |
6233544 | Alshawi | May 2001 | B1 |
6233546 | Datig | May 2001 | B1 |
6243669 | Horiguchi et al. | Jun 2001 | B1 |
6243670 | Bessho et al. | Jun 2001 | B1 |
6243689 | Norton | Jun 2001 | B1 |
6243723 | Ikeda et al. | Jun 2001 | B1 |
6246977 | Messerly et al. | Jun 2001 | B1 |
6260008 | Sanfilippo | Jul 2001 | B1 |
6266642 | Franz et al. | Jul 2001 | B1 |
6275789 | Moser et al. | Aug 2001 | B1 |
6278967 | Akers et al. | Aug 2001 | B1 |
6282507 | Horiguchi et al. | Aug 2001 | B1 |
6285978 | Bernth et al. | Sep 2001 | B1 |
6330530 | Horiguchi et al. | Dec 2001 | B1 |
6345245 | Sugiyama et al. | Feb 2002 | B1 |
6349276 | McCarley et al. | Feb 2002 | B1 |
6356864 | Foltz et al. | Mar 2002 | B1 |
6356865 | Franz et al. | Mar 2002 | B1 |
6381598 | Williamowski et al. | Apr 2002 | B1 |
6393389 | Chanod et al. | May 2002 | B1 |
6442524 | Ecker et al. | Aug 2002 | B1 |
6463404 | Appleby | Oct 2002 | B1 |
6470306 | Pringle et al. | Oct 2002 | B1 |
6523026 | Gillis | Feb 2003 | B1 |
6529865 | Duan et al. | Mar 2003 | B1 |
6601026 | Appelt et al. | Jul 2003 | B2 |
6604101 | Chan et al. | Aug 2003 | B1 |
6622123 | Chanod et al. | Sep 2003 | B1 |
6658627 | Gallup et al. | Dec 2003 | B1 |
6721697 | Duan et al. | Apr 2004 | B1 |
6760695 | Kuno et al. | Jul 2004 | B1 |
6778949 | Duan et al. | Aug 2004 | B2 |
6871174 | Dolan et al. | Mar 2005 | B1 |
6871199 | Binnig et al. | Mar 2005 | B1 |
6901399 | Corston et al. | May 2005 | B1 |
6901402 | Corston-Oliver et al. | May 2005 | B1 |
6928407 | Ponceleon et al. | Aug 2005 | B2 |
6928448 | Franz et al. | Aug 2005 | B1 |
6937974 | D'Agostini | Aug 2005 | B1 |
6947923 | Cha et al. | Sep 2005 | B2 |
6965857 | Decary | Nov 2005 | B1 |
6983240 | Ait-Mokhtar et al. | Jan 2006 | B2 |
6986104 | Green et al. | Jan 2006 | B2 |
7013264 | Dolan et al. | Mar 2006 | B2 |
7020601 | Hummel et al. | Mar 2006 | B1 |
7027974 | Busch et al. | Apr 2006 | B1 |
7050964 | Menzes et al. | May 2006 | B2 |
7085708 | Manson et al. | Aug 2006 | B2 |
7132445 | Taveras et al. | Nov 2006 | B2 |
7146358 | Gravano et al. | Dec 2006 | B1 |
7167824 | Kallulli | Jan 2007 | B2 |
7191115 | Moore | Mar 2007 | B2 |
7200550 | Menezes et al. | Apr 2007 | B2 |
7263488 | Chu et al. | Aug 2007 | B2 |
7269594 | Corston-Oliver et al. | Sep 2007 | B2 |
7346493 | Ringger et al. | Mar 2008 | B2 |
7356457 | Pinkham et al. | Apr 2008 | B2 |
7409404 | Gates | Aug 2008 | B2 |
7461056 | Cao et al. | Dec 2008 | B2 |
7466334 | Baba | Dec 2008 | B1 |
7475015 | Epstein et al. | Jan 2009 | B2 |
7619656 | Ben-Ezra et al. | Nov 2009 | B2 |
7672830 | Goutte et al. | Mar 2010 | B2 |
7672831 | Todhunter et al. | Mar 2010 | B2 |
7739102 | Bender | Jun 2010 | B2 |
8078450 | Anisimovich | Dec 2011 | B2 |
8122016 | Lamba | Feb 2012 | B1 |
8145473 | Anisimovich et al. | Mar 2012 | B2 |
8195447 | Anismovich | Jun 2012 | B2 |
8214199 | Anismovich et al. | Jul 2012 | B2 |
8229730 | Van Den Berg et al. | Jul 2012 | B2 |
8229944 | Latzina et al. | Jul 2012 | B2 |
8260049 | Deryagin et al. | Sep 2012 | B2 |
8266077 | Handley | Sep 2012 | B2 |
8271453 | Pasca et al. | Sep 2012 | B1 |
8285728 | Rubin | Oct 2012 | B1 |
8300949 | Xu | Oct 2012 | B2 |
8301633 | Cheslow | Oct 2012 | B2 |
8402036 | Blair-Goldensohn et al. | Mar 2013 | B2 |
8533188 | Yan et al. | Sep 2013 | B2 |
8548951 | Solmer et al. | Oct 2013 | B2 |
8554558 | Mccarley et al. | Oct 2013 | B2 |
8577907 | Singhal et al. | Nov 2013 | B1 |
8856096 | Marchisio et al. | Oct 2014 | B2 |
20010014902 | Hu et al. | Aug 2001 | A1 |
20010029442 | Shiotsu et al. | Oct 2001 | A1 |
20010029455 | Chin et al. | Oct 2001 | A1 |
20020040292 | Marcu | Apr 2002 | A1 |
20030145285 | Miyahira et al. | Jul 2003 | A1 |
20030158723 | Masuichi et al. | Aug 2003 | A1 |
20030176999 | Calcagno et al. | Sep 2003 | A1 |
20030182102 | Corston-Oliver et al. | Sep 2003 | A1 |
20030204392 | Finnigan et al. | Oct 2003 | A1 |
20040034520 | Langkilde-Geary et al. | Feb 2004 | A1 |
20040064438 | Kostoff | Apr 2004 | A1 |
20040098247 | Moore | May 2004 | A1 |
20040122656 | Abir | Jun 2004 | A1 |
20040172235 | Pinkham et al. | Sep 2004 | A1 |
20040193401 | Ringger et al. | Sep 2004 | A1 |
20040243581 | Weissman | Dec 2004 | A1 |
20040254781 | Appleby | Dec 2004 | A1 |
20040261016 | Glass et al. | Dec 2004 | A1 |
20050010421 | Watanabe et al. | Jan 2005 | A1 |
20050015240 | Appleby | Jan 2005 | A1 |
20050080613 | Colledge et al. | Apr 2005 | A1 |
20050086047 | Uchimoto et al. | Apr 2005 | A1 |
20050137853 | Appleby et al. | Jun 2005 | A1 |
20050155017 | Berstis et al. | Jul 2005 | A1 |
20050171757 | Appleby | Aug 2005 | A1 |
20050209844 | Wu et al. | Sep 2005 | A1 |
20050228641 | Chelba | Oct 2005 | A1 |
20050240392 | Munro, Jr. et al. | Oct 2005 | A1 |
20060004563 | Campbell et al. | Jan 2006 | A1 |
20060004653 | Strongin | Jan 2006 | A1 |
20060080079 | Yamabana | Apr 2006 | A1 |
20060095250 | Chen et al. | May 2006 | A1 |
20060217963 | Masuichi | Sep 2006 | A1 |
20060217964 | Kamatani et al. | Sep 2006 | A1 |
20060224378 | Chino et al. | Oct 2006 | A1 |
20060293876 | Kamatani et al. | Dec 2006 | A1 |
20070010990 | Woo | Jan 2007 | A1 |
20070016398 | Buchholz | Jan 2007 | A1 |
20070083359 | Bender | Apr 2007 | A1 |
20070083505 | Ferrari et al. | Apr 2007 | A1 |
20070094006 | Todhunter et al. | Apr 2007 | A1 |
20070100601 | Kimura | May 2007 | A1 |
20070203688 | Fuji et al. | Aug 2007 | A1 |
20070250305 | Maxwell | Oct 2007 | A1 |
20080040095 | Sinha | Feb 2008 | A1 |
20080059149 | Martin | Mar 2008 | A1 |
20080133218 | Zhou et al. | Jun 2008 | A1 |
20080228464 | Al-Onaizan et al. | Sep 2008 | A1 |
20080319947 | Latzina et al. | Dec 2008 | A1 |
20090063472 | Pell et al. | Mar 2009 | A1 |
20090070094 | Best et al. | Mar 2009 | A1 |
20090182549 | Anisimovich | Jul 2009 | A1 |
20100082324 | Itagaki et al. | Apr 2010 | A1 |
20110055188 | Gras | Mar 2011 | A1 |
20110072021 | Lu et al. | Mar 2011 | A1 |
20110238409 | Larcheveque | Sep 2011 | A1 |
20110258181 | Brdiczka et al. | Oct 2011 | A1 |
20110301941 | De Vocht | Dec 2011 | A1 |
20120023104 | Johnson et al. | Jan 2012 | A1 |
20120030226 | Holt et al. | Feb 2012 | A1 |
20120109640 | Anisimovich | May 2012 | A1 |
20120131060 | Heidasch et al. | May 2012 | A1 |
20120197628 | Best et al. | Aug 2012 | A1 |
20120197885 | Patterson | Aug 2012 | A1 |
20120203777 | Laroco, Jr. et al. | Aug 2012 | A1 |
20120221553 | Wittmer et al. | Aug 2012 | A1 |
20120239378 | Parfentieva | Sep 2012 | A1 |
20120246153 | Pehle | Sep 2012 | A1 |
20120296897 | Xin-Jing et al. | Nov 2012 | A1 |
20120310627 | Qi et al. | Dec 2012 | A1 |
20130013291 | Bullock et al. | Jan 2013 | A1 |
20130054589 | Cheslow | Feb 2013 | A1 |
20130091113 | Gras | Apr 2013 | A1 |
20130138696 | Turdakov et al. | May 2013 | A1 |
20130144592 | Och et al. | Jun 2013 | A1 |
20130144594 | Bangalore et al. | Jun 2013 | A1 |
20130185307 | El-Yaniv et al. | Jul 2013 | A1 |
20130254209 | Kang et al. | Sep 2013 | A1 |
20130282703 | Puterman-Sobe et al. | Oct 2013 | A1 |
20130311487 | Moore et al. | Nov 2013 | A1 |
20130318095 | Harold | Nov 2013 | A1 |
20140012842 | Yan et al. | Jan 2014 | A1 |
Number | Date | Country |
---|---|---|
2400400 | Dec 2001 | EP |
1365329 | Oct 2009 | EP |
2011160204 | Dec 2011 | WO |
Entry |
---|
Bolshakov, “Co-Ordinative Ellipsis in Russian Texts: Problems of Description and Restoration”, Published in: Proceeding COLING '88 Proceedings of the 12th conference on Computational linguistics—vol. 1 doi>10.3115/991635.991649, 1988, 65-67. |
Hutchins, “Machine Translation: past, present, future”, (Ellis Horwood Series in Computers and their Applications) Ellis Horwood: Chichester, 1986, 382 pp. ISBN 0-85312-788-3, $49.95 (hb). |
Mitamura, et al., “An Efficient Interlingua Translation System for Multi-Lingual Document Production”, http:// citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.5702, Jul. 1, 1991. |
Yamashita, et al., “Word Sense Disambiguation Using Pairwise Alignment”, Faculty of Administration and Informatics, University of Hamamatsu, 2003, 4 pages. |
Tou Ng et al., “Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study”, Department of Computer Science, National University of Singapore, Singapore, 2003, 8 pages. |
Seng Chan et al., “Scaling Up Word Sense Disambiguation via Parallel Texts”, Department of Computer Science, National University of Singapore, Singapore, 2005, 6 pages. |
Ion et al., “Multilingual Word Sense Disambiguation Using Aligned Wordnets”, Romanian Journal of Information Science and Technology, vol. 7, Nos. 1-2, 2004, pp. 183-200, 18 pages, Research Institute for Artificial Intelligence, Romanian Academy. |
Tufis et al., “Fine-Grained Word Sense Disambiguation Based on parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets”, Institute of Artificial Intelligence, Romania and Department of Computer Science, Poughkeepsie, New York, Sep. 13, 2004, 7 pages. |
Number | Date | Country | |
---|---|---|---|
20150178268 A1 | Jun 2015 | US |