Field
Implementations of the present invention relate to natural language processing. In particular, implementations relate to comparing documents, which could be written in one or more languages and which contain one or more types of information. Comparing documents may involve estimation, computation and visualization of measures of similarity between any number of documents or other types of electronic files.
Related Art
Many natural language processing tasks require comparing documents in order to find out how similar or different they are from each other, i.e., estimating or computing a measure of similarity or difference of the documents. First, text resources existing on the Internet or other sources usually include a lot of copies of the same document which can be presented in different forms and formats. So, a document similarity computation is usually an implicit but mandatory step of many document processing tasks. Document similarity computation usually involves statistics, machine learning, such as, for example, document classification, clustering and so on. In particular, document similarity/difference computation could be required in plagiarism detection which aims at detecting if or when a document has been plagiarized from another one. A straight-forward approach to do this task is to compute a similarity/difference measure between documents, which is usually based on lexical features, such as words and characters. If the mentioned similarity/difference measure is beyond a certain threshold, the documents are deemed similar and therefore, one document could have been plagiarized from another one. More sophisticated ways to do this task could include other similarity/difference measures and approaches—but the concept is the same.
A related task is duplicate and near-duplicate detection. While constructing linguistic corpora, it makes sense to get rid of duplicate and near-duplicate documents. In this task, as well as in the case of plagiarism detection, it is required to estimate how similar considered documents are. In this task, lexical-based representations and therefore, similarity/difference measures, are usually enough for adequate performance.
However, many challenges exist for determining similarity of documents. For example, computation of cross-language document similarity/difference is in increasing demand to detect cross-language plagiarism. In this situation, the above-mentioned similarity/difference should be able to adequately detect substantially similar documents in different languages. Too often, such detection fails. Besides performing this task, such similarity/difference measure also could be used to construct parallel and comparable corpora, to build or enrich machine translation systems.
Most of the existing document processing systems are able to deal with documents written in only one, or rarely, in a few particular, identifiable languages. Systems are generally not able to compare documents written in different languages because a workable similarity/difference between such documents cannot be computed.
Further, many systems are also limited to particular document formats, i.e., some systems cannot analyze some documents without first obtaining a reliable and accurate recognition of their text (such as in the case of PDF files which can require processing by optical character recognition). Moreover, each system usually deals only with one particular type of information or data contained in a document, i.e., only with text-based, audio-based or video-based information. However, many documents, sources or files about a particular topic (e.g., online news) include a variety of types of information and types of documents. For example, two news-oriented documents or sources may contain or reference the same video file but discuss the content of the video differently. In this case, a text-oriented system may conclude that the sources are not similar and may conclude that the video-oriented material is identical without being able to adequately process the nuances of such material.
Therefore, there is a substantial opportunity for new methods for more accurately estimating similarity/difference between documents, content, sources and files in different languages and in different formats.
Embodiments and discussion are presented herein regarding methods for finding substantially similar sources, files or documents, and estimating similarity/difference between given sources, files and documents. Sources may be in a variety of formats, and similarity/difference may be found across a variety of formats. Sources may be in one or more languages. Similarity/difference may be found across any number or types of languages. A variety of characteristics may be used to arrive at an overall measure of similarity/difference including determining or identifying syntactic roles, semantic roles and semantic classes in reference to sources.
While the appended claims set forth the features of the present invention with particularity, the invention, together with its objects and advantages, will be more readily appreciated from the following detailed description, taken in conjunction with the accompanying drawings.
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these specific details.
Reference in this specification to “one embodiment” or “an implementation” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase “in one embodiment” or “in one implementation” in various places in the specification are not necessarily all referring to the same embodiment or implementation, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.
Implementations of the present invention disclose techniques for comparing a number of documents that may contain different types of information including textual information presented in various languages. Comparing may include estimating, computing and visualizing similarity/difference between mentioned documents. Described herein is a method which estimates, computes and assists in visualizing similarity between documents. The method also compares different types of information. Moreover, the method includes techniques to deal with textual information, which can be compared based on substantially exhaustive syntactic and semantic analyses and language-independent semantic structures (LISS). Various lexical, grammatical, syntactical, pragmatic, semantic and other features may be identified in text and used to effectively solve said task. Implementations of the invention may or may not contain visualization of identified similarities or differences.
Implementations of the inventions allow a user to estimate similarity/difference between documents in different languages and those that include various types of information. Estimated document similarity may be represented by a value. Estimated document similarity alternatively, or additionally, may be represented with visualization techniques such as through a graphical user interface (GUI).
Document content can be divided into several partitions according to the type of information that the content includes. In particular, these partitions may be document segments containing only textual, graphical, audio, video or another type of information. The invention allows estimating the similarity/difference between different types of information presented in a document and may also provide graphical or other types of elements to assist visualization of found similarities/differences. Visualization may be implemented as e.g., highlighting, bolding, and underlining.
Document similarity and difference can be defined, for example, as follows:
sim(doc1, . . . ,docn) =s(infoType1(doc1), . . . ,infoTypem(doc1), . . . infoType1(docn), . . . ,infoTypem(docn))
dif(doc1, . . . ,docn) =d(infoType1(doc1), . . . ,infoTypem(doc1), . . . infoType1(docn), . . . ,infoTypem(docn))
where n is the number of documents to be compared, m is the number of different information types contained in documents, doci is the i-th document, where i could be from 1 to n, infoTypej(doci) is the part of doci, containing the j-th information type, s and d are functions of the said arguments.
Some documents may contain less information types than m.
For example, one document may be a video file and another document may be a text-based file or document. These two documents or files may be very similar, yet may be of different formats. The number of documents, files or sources in this case is 2, the number of information types is 2. And similarity, according to an exemplary implementation, is as follows:
sim(doc1, doc2)=s(infoType1(doc1),infoType2(doc1),infoType1(doc2),infoType2(doc2))=s(video(doc1),video(doc1),text(doc2),text(doc2))=s(video(doc1),text(doc2))
In one embodiment, similarity between a video document and a text document may be computed with topic detection techniques. For example, a video document about cars may be considered or found to be substantially similar to a text about cars.
In one embodiment, optionally, comparison of documents includes identification of documents' logical structure (for example, presented in U.S. Pat. No. 8,260,049). Block structures may be identified before or after optical character recognition of the documents. In such case, further similarity estimation could be stopped if the identified structures are found to be sufficiently different. At first, most important blocks, such as titles or headers, may be compared. In one embodiment, block structures of the documents are compared with some weights, e.g. document header has higher weight and therefore influences final similarity/difference more than other blocks. In another embodiment, if found logical and/or block structures have tree-like view, the comparing may be executed step by step in a top-down approach, and it can be stopped if a sufficient amount of difference or a sufficient number of differences is discovered during some step.
In one embodiment of the invention, similarity can be described as:
sim(doc1, doc2)=f(docText1, docText2, docImages1, docImages2, docVideo1, docVideo2, docAudio1, docAudio2,α,β,γ,δ),
Where docText, docImages, docVideo, docAudio are textual, graphical, video and audio parts of document respectively; α, β, γ, and δ are previously mentioned weights, and f is some function.
In one embodiment of the invention, similarity between two documents can be defined as follows:
sim(doc1, doc2)=α·simtext(doc1, doc2)+β·simimages(doc1, doc2)+γ·simvideo(doc1, doc2)+δ·simaudio(doc1, doc2),
where
simtext(doc1,doc2), simimages(doc1,doc2),simvideo(doc1,doc2), simaudio(doc1,doc2) are similarities defined between only textual, graphical, video and audio contents of doc1 and doc2 respectively; and α, β, γ, and δ are weights, i.e., real-valued numbers intuitively corresponding to the importance of each summand. In particular, if documents should be compared only on the basis of their textual content, α could be set to 1 and β, γ, and δ to 0.
In one embodiment, the mentioned similarity measure may be a real-valued, usually non-negative, function of two or more arguments.
Sometimes documents look similar or even identical, even though they include differences. Some differences are not easy to detect or it may take a long time for a person to make a comparison to find out that the documents in question are not identical. Such differences include, for example, using letters from another alphabet which have similar spelling, “masking” spaces with characters colored with the color or the background and thus are not visible, inserting additional spaces, presenting some of the text as an image, etc. In this case, an implementation of the invention could be employed to determine a measure of document similarity or difference.
The present invention allows computing similarity between sources, files, and/or documents in different languages. A naïve way to compare documents with information in different languages is to apply machine translation algorithms to one or more of the sources, which propagate errors due to the imperfect nature of translation. In the current invention, machine translation techniques are not required to be applied to sources. Textual parts of sources, files or documents could be first converted into language-independent semantic structures (LISS). The same could be applied to transcripts of audio and video parts of documents.
Electronic documents or files can appear in different forms, i.e., some of them can be represented as raw text, while other documents or files may be in the portable document format (PDF), which in turn is generally a transformation of a document in another format. In the case of some formats, for example, a PDF file, in one embodiment of the present invention, document comparison is preceded by optical character recognition (OCR) of one or more of the documents. In one embodiment, comparison is preceded by producing transcripts of audio information included in the sources, files or documents.
Additionally, appropriate methods of comparing are applied for each type of block. For example, appropriate methods to estimate semantic or lexical similarity between texts may be applied to compare document headers, while pictures may be converted, for example, into their RGB representations and a measure of similarity between these representations may be estimated.
For each corresponding text block, the system may employ automatic syntactic and semantic analyses to determine and to extract lexical, grammatical, syntactical, pragmatic, semantic and other features for further use in processing texts. These features are extracted during the process of a substantially exhaustive analysis of each sentence and constructing language-independent semantic structures (LISS), generally one for each sentence processed. Such preliminary exhaustive analysis precedes similarity estimation in one embodiment of the present invention. The system analyzes sentences using linguistic descriptions of a given natural language to reflect real complexities of the natural language, rather than simplified or artificial descriptions. The system functions are based on the principle of integral and purpose-driven recognition, where hypotheses about the syntactic structure of a part of a sentence are verified within the hypotheses about the syntactic structure of the whole sentence. Such procedure avoids analyzing numerous parsing of anomalous variants. Then, syntactic and semantic information about each sentence is extracted and the results are parsed, and lexical choices including results are obtained when resolving ambiguities. Information and results may be indexed and stored.
An index usually comprises a representation in the form of a table where each value of a feature (e.g., word, sentence) in a document is accompanied by a list of numbers or addresses of its occurrences in that document. For example, for each feature found in the text (e.g., word, character, expression, phrase), an index includes a list of sentences where it was found, and a number of the word corresponding to its place in the sentence. For instance, if the word “frame” was found in a text in the 1st sentence at the 4th place, and also in the 2nd sentence at the 2nd place, in the 10th—at the 4th and in 22nd sentences at the 5th place, its index may approximately looks like
“frame”—(1.4), (2.2), (10.4), (22.5).
If an index is created for a corpora, i.e., a set of texts, it may include a number corresponding to one of the texts that belong to the corpora. Similarly, indexes of other features may be made, e.g., semantic classes, semantemes, grammemes, syntactic relations, semantic relations etc. According to embodiments of the present invention, morphological, syntactic, lexical, and semantic features can be indexed in the same fashion as each word in a document. In one embodiment of the present invention, indexes may be produced to index all or at least one value of morphological, syntactic, lexical, and semantic features (parameters) for each sentence or other division. These parameters or values are generated during the two-stage semantic analysis described below. The index may be used to facilitate natural language processing.
In one implementation, said linguistic descriptions include a plurality of linguistic models and knowledge about natural languages. These things may be arranged in a database and applied for analyzing each text or source sentences such as at step 106. Such a plurality of linguistic models may include, but is not limited to, morphology models, syntax models, grammar models and lexical-semantic models. In a particular implementation, integral models for describing the syntax and semantics of a language are used in order to recognize the meanings of the source sentence, analyze complex language structures, and correctly convey information encoded in the source sentence.
With reference to
Accordingly, a rough syntactic analysis is performed on the source sentence to generate a graph of generalized constituents 232 for further syntactic analysis. All reasonably possible surface syntactic models for each element of lexical-morphological structure are applied, and all the possible constituents are built and generalized to represent all the possible variants of parsing the sentence syntactically.
Following the rough syntactic analysis, a precise syntactic analysis is performed on the graph of generalized constituents to generate one or more syntactic trees 242 to represent the source sentence. In one implementation, generating the syntactic tree 242 comprises choosing between lexical options and choosing between relations from the graphs. Many prior and statistical ratings may be used during the process of choosing between lexical options, and in choosing between relations from the graph. The prior and statistical ratings may also be used for assessment of parts of the generated tree and for the whole tree. In one implementation, the one or more syntactic trees may be generated or arranged in order of decreasing assessment. Thus, the best syntactic tree may be generated first. Non-tree links are also checked and generated for each syntactic tree at this time. If the first generated syntactic tree fails, for example, because of an impossibility to establish non-tree links, the second syntactic tree is taken as the best, etc.
Many lexical, grammatical, syntactical, pragmatic, semantic features are extracted during the steps of analysis. For example, the system can extract and store lexical information and information about belonging lexical items to semantic classes, information about grammatical forms and linear order, about syntactic relations and surface slots, using predefined forms, aspects, sentiment features such as positive-negative relations, deep slots, non-tree links, semantemes, etc.
With reference to
With reference again to
The analysis methods ensure that the maximum accuracy in conveying or understanding the meaning of the sentence is achieved.
With reference to
Referring to
The language-independent semantic structure of a sentence is represented as an acyclic graph (a tree supplemented with non-tree links) where each word of specific language is substituted with its universal (language-independent) semantic notions or semantic entities referred to herein as “semantic classes”. Semantic class is one of the most important semantic features that can be extracted and used for tasks of classifying, clustering and filtering text documents written in one or many languages. The other features usable for such task may be semantemes because they may reflect not only semantic, but also syntactical, grammatical, etc. language-specific features in language-independent structures.
The semantic classes, as part of linguistic descriptions, are arranged into a semantic hierarchy comprising hierarchical parent-child relationships. In general, a child semantic class inherits many or most properties of its direct parent and all ancestral semantic classes. For example, semantic class SUBSTANCE is a child of semantic class ENTITY and at the same time it is a parent of semantic classes GAS, LIQUID, METAL, WOOD_MATERIAL, etc.
Each semantic class in the semantic hierarchy is supplied with a deep model. The deep model of the semantic class is a set of deep slots. Deep slots reflect the semantic roles of child constituents in various sentences with objects of the semantic class as the core of a parent constituent and the possible semantic classes as fillers of deep slots. The deep slots express semantic relationships between constituents, including, for example, “agent”, “addressee”, “instrument”, “quantity”, etc. A child semantic class inherits and adjusts the deep model of its direct parent semantic class.
With reference to
Semantic descriptions 104 are language-independent. Semantic descriptions 104 may provide descriptions of deep constituents, and may comprise a semantic hierarchy, deep slots descriptions, a system of semantemes, and pragmatic descriptions.
With reference to
With reference to
Also, any element of language description 610 may be extracted during a substantially exhaustive analysis of texts, may be indexed (the index for the feature are created), the indices may be stored and used for the task of classifying, clustering and filtering text documents written in one or many languages. In one implementation, indexing of semantic classes is most significant and helpful for solving these tasks. Syntactic structures and semantic structures also may be indexed and stored for using in semantic searching, classifying, clustering and filtering.
One simple way to estimate similarity between two texts in the same language is to compare their indexes. It may be indexes of words, or indexes of semantic classes. The indexes may be presented by simple data structures, for example, arrays of numbers. If indexes of words for texts are identical, then the texts are identical, or may be considered identical for a particular purpose. If indexes of semantic classes for two texts are identical, then the texts are identical or substantially similar. This approach of using indexes of semantic classes, with some limitations, also may be applied to estimating similarity of texts in different languages. A word order in corresponding sentences in different languages may be different, so when estimating a measure of similarity between two sentences, it is acceptable to ignore the number of a word in the sentence corresponding to its placement or word order.
Another problem is that the most frequent words in a language, such as “the”, “not”, “and” etc. usually are not indexed, so the two sentences, “The approval of the CEO is required” and “The approval of the CEO isn't required” will have the same indexes, and these two sentences will be identified as the same by conventional methods. The methods of the present invention identify the sentences as different because they also take into account specific lexical, syntactical and semantic features extracted during steps of the analysis. The fact that the verb “require” is presented in negative form in one of the sentences is fixed by means of semantemes.
But, a problem arises if, for example, in some cases, one sentence in a language corresponds two or more sentences in another language and vice versa. In this case, to increase the accuracy of the present methods, the techniques of aligning (for example, presented in U.S. application Ser. No. 13/464,447) of two or more texts may be applied before indexing. There are many ways to calculate similarity between two texts. One naïve way to find out if two texts are similar is to count how many words they have in common. There are also more advanced versions of this approach such as techniques involving lemmatization, stemming, weighting, etc. For example, a vector space model (G. Salton, 1975) may be built, and vector similarity measures, such as e.g. cosine similarity, may be utilized. During the text processing described here, documents may be represented with language independent semantic classes that in their turn may be considered as lexical features. Therefore, the similarity measures as were mentioned above may be.
Such similarity measures have a drawback in that they do not actually capture the semantics. For example, the two sentences, “Bob has a spaniel” and “Richard owns a dog” are semantically similar but they do not share any words but an article. Therefore, a mere lexical text similarity measure will fail to find that these sentences are similar. To capture this type of similarity, knowledge-based semantic similarity measures may be used. They require a semantic hierarchy to be calculated. Similarity between two words usually depends on a shortest path between corresponding concepts in a corresponding semantic hierarchy. For example, “spaniel” in the semantic hierarchy corresponding to the first sentence above appears as a child node (hyponym) of “dog”, therefore semantic similarity between the concepts will be high. Word-to-word similarity measures may be generalized to text-to-text similarities by combining values for similarities of each word pair. Semantic classes described here represent nodes of semantic hierarchy. Therefore, knowledge-based semantic similarity measures described above and their generalizations to text-to-text similarity measures may be utilized within document processing.
For example, referring to the present invention, textual information may be represented as a list of features, which may include semantic classes {C1, C2, . . . Cm}, semantic features {M1, M2, . . . Mn}, and syntactic features {S1, S2, . . . Sk}. Since lexical meanings may be expressed in different words, and semantic class may unite several close lexical meanings, the semantic class embodies the idea of generalization. Synonyms and derivates are generalized. If we deal with texts in different languages, semantic class generalizes lexical meanings in the different languages. Semantic features reflect semantic structure of a text. This may contain e.g. semantic roles of elements, such as agent, experiencer etc. Syntactic features reflect syntactic structure of a text, for example, may be produced by constituency or dependency parsers.
In the present invention semantic classes are organized into the semantic hierarchy, which is in general a graph. Therefore, in one embodiment, the distance between two nodes can be defined as the shortest path between these nodes in the graph. And similarity between semantic classes can be a function of the mentioned distance between them.
In another embodiment, the similarity measure for two or more documents may be defined heuristically or on the basis of experience. For example, we have 2 text documents—D1 and D2. After semantic analysis we have two sets of semantic classes C(D1)=(C11, C12, . . . C1n) and C(D2)={C21, C22, . . . C2m}. Each class may be supplied by coefficient of the frequency Fij in the document. Most frequent in a language semantic classes may be excluded. Most common semantic classes (like ENTITY, ABSRACT_SCIENTIFIC_OBJECT, etc.) also may be discarded. Then a similarity or difference measure depends on distances between each pair of semantic classes (C1, C2), where C1εC(D1) and C2εC(D2). In one embodiment, the similarity or difference measure between semantic classes may be defined as e.g. a function of path between semantic classes, i.e., sim(C1, C2)=f(path(C1, C2)), dif(C1, C2)=g(path(C1, C2)), e.g. identity function. In another embodiment, the similarity measure or the difference measure is based on the idea of the closest common ancestor of the classes: anc(C1, C2).
In one embodiment, the similarity between texts may be defined as follows:
where |C(D)| denotes the number of semantic classes in C(D), and g is some function.
In one embodiment, the difference between texts may be defined as follows:
This method may be used to provide visualizations for similarities or differences between documents. This may be done with e.g., highlighting, underlining, and emphasizing similar parts or different parts while showing or displaying one or more of the documents or parts thereof.
The hardware 1200 also typically receives a number of inputs and outputs for communicating information externally. For interface with a user or operator, the hardware 1200 may include one or more user input devices 1206 (e.g., a keyboard, a mouse, imaging device, scanner, microphone) and a one or more output devices 1208 (e.g., a Liquid Crystal Display (LCD) panel, a sound playback device (speaker)). To embody the present invention, the hardware 1200 typically includes at least one screen device.
For additional storage, the hardware 1200 may also include one or more mass storage devices 1210, e.g., a floppy or other removable disk drive, a hard disk drive, a Direct Access Storage Device (DASD), an optical drive (e.g. a Compact Disk (CD) drive, a Digital Versatile Disk (DVD) drive) and/or a tape drive, among others. Furthermore, the hardware 1200 may include an interface with one or more networks 1212 (e.g., a local area network (LAN), a wide area network (WAN), a wireless network, and/or the Internet among others) to permit the communication of information with other computers coupled to the networks. It should be appreciated that the hardware 1200 typically includes suitable analog and/or digital interfaces between the processor 1202 and each of the components 1204, 1206, 1208, and 1212 as is well known in the art.
The hardware 1200 operates under the control of an operating system 1214, and executes various computer software applications, components, programs, objects, modules, etc. to implement the techniques described above. Moreover, various applications, components, programs, objects, etc., collectively indicated by application software 1216 in
In general, the routines executed to implement the embodiments of the invention may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as a “computer program.” A computer program typically comprises one or more instruction sets at various times in various memory and storage devices in a computer, and that, when read and executed by one or more processors in a computer, cause the computer to perform operations necessary to execute elements involving the various aspects of the invention. Moreover, while the invention has been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments of the invention are capable of being distributed as a program product in a variety of forms, and that the invention applies equally to actually effect the distribution regardless of the particular type of computer-readable media used. Examples of computer-readable media include but are not limited to recordable type media such as volatile and non-volatile memory devices, floppy and other removable disks, hard disk drives, optical disks (e.g., Compact Disk Read-Only Memory (CD-ROMs), Digital Versatile Disks (DVDs), flash memory, etc.), among others. Another type of distribution may be implemented as Internet downloads.
While certain exemplary embodiments have been described and shown in the accompanying drawings, it is to be understood that such embodiments are merely illustrative and not restrictive of the broad invention and that this invention is not limited to the specific constructions and arrangements shown and described, since various other modifications may occur to those ordinarily skilled in the art upon studying this disclosure. In an area of technology such as this, where growth is fast and further advancements are not easily foreseen, the disclosed embodiments may be readily modified or re-arranged in one or more of its details as facilitated by enabling technological advancements without departing from the principals of the present disclosure.
For purposes of the USPTO extra-statutory requirements, the present application constitutes a continuation-in-part of U.S. patent application Ser. No. 12/983,220, filed on 31 Dec. 2010, which is a continuation-in-part of U.S. Ser. No. 11/548,214, filed on 10 Oct. 2006 (now U.S. Pat. No. 8,078,450), which is entitled to the benefit of the filing date. Also, this application is a continuation-in-part of U.S. patent application Ser. No. 13/535,638, filed on 28 Jun. 2012, and a continuation-in-part of U.S. patent application Ser. No. 13/464,447, filed on 4 May 2012. The United States Patent Office (USPTO) has published a notice effectively stating that the USPTO's computer programs require that patent applicants reference both a serial number and indicate whether an application is a continuation or continuation-in-part. See Stephen G. Kunin, Benefit of Prior-Filed Application, USPTO Official Gazette 18 Mar. 2003. The Applicant has provided above a specific reference to the application(s) from which priority is being claimed as recited by statute. Applicant understands that the statute is unambiguous in its specific reference language and does not require either a serial number or any characterization, such as “continuation” or “continuation-in-part,” for claiming priority to U.S. patent applications. Notwithstanding the foregoing, Applicant understands that the USPTO's computer programs have certain data entry requirements, and hence Applicant is designating the present application as a continuation-in-part of its parent applications as set forth above, but points out that the designations are not to be construed as commentary or admission as to whether or not the present application contains any new matter in addition to the matter of its parent application(s). All subject matter of the Related Applications and of any and all parent, grandparent, great-grandparent, etc. applications of the Related Applications is incorporated herein by reference to the extent such subject matter is not inconsistent herewith.
Number | Name | Date | Kind |
---|---|---|---|
5268839 | Kaji | Dec 1993 | A |
5301109 | Landauer et al. | Apr 1994 | A |
5386556 | Hedin et al. | Jan 1995 | A |
5418717 | Su et al. | May 1995 | A |
5426583 | Uribe-Echebarria Diaz De Mendibil | Jun 1995 | A |
5677835 | Carbonell et al. | Oct 1997 | A |
5678051 | Aoyama | Oct 1997 | A |
5687383 | Nakayama et al. | Nov 1997 | A |
5715468 | Budzinski | Feb 1998 | A |
5752051 | Cohen | May 1998 | A |
5768603 | Brown et al. | Jun 1998 | A |
5787410 | McMahon | Jul 1998 | A |
5794050 | Dahlgren et al. | Aug 1998 | A |
5826219 | Kutsumi | Oct 1998 | A |
5884247 | Christy | Mar 1999 | A |
6006221 | Liddy et al. | Dec 1999 | A |
6055528 | Evans | Apr 2000 | A |
6076051 | Messerly et al. | Jun 2000 | A |
6081774 | de Hita et al. | Jun 2000 | A |
6182028 | Karaali et al. | Jan 2001 | B1 |
6233544 | Alshawi | May 2001 | B1 |
6243670 | Bessho et al. | Jun 2001 | B1 |
6246977 | Messerly et al. | Jun 2001 | B1 |
6275789 | Moser et al. | Aug 2001 | B1 |
6356864 | Foltz et al. | Mar 2002 | B1 |
6381598 | Williamowski et al. | Apr 2002 | B1 |
6442524 | Ecker et al. | Aug 2002 | B1 |
6463404 | Appleby | Oct 2002 | B1 |
6519557 | Emens et al. | Feb 2003 | B1 |
6523026 | Gillis | Feb 2003 | B1 |
6601026 | Appelt et al. | Jul 2003 | B2 |
6604101 | Chan et al. | Aug 2003 | B1 |
6622123 | Chanod et al. | Sep 2003 | B1 |
6778949 | Duan et al. | Aug 2004 | B2 |
6871199 | Binning et al. | Mar 2005 | B1 |
6901402 | Corston-Oliver et al. | May 2005 | B1 |
6928407 | Ponceleon et al. | Aug 2005 | B2 |
6928448 | Franz et al. | Aug 2005 | B1 |
6937974 | d'Agostini | Aug 2005 | B1 |
6947923 | Cha et al. | Sep 2005 | B2 |
6965857 | Decary | Nov 2005 | B1 |
6983240 | Ait-Mokhtar et al. | Jan 2006 | B2 |
7146358 | Gravano et al. | Dec 2006 | B1 |
7200550 | Menezes et al. | Apr 2007 | B2 |
7231393 | Harik et al. | Jun 2007 | B1 |
7249121 | Bharat et al. | Jul 2007 | B1 |
7263488 | Chu et al. | Aug 2007 | B2 |
7272595 | Tsuchitani et al. | Sep 2007 | B2 |
7356830 | Dimitrova | Apr 2008 | B1 |
7383258 | Harik et al. | Jun 2008 | B2 |
7406542 | Erlingsson | Jul 2008 | B2 |
7426507 | Patterson | Sep 2008 | B1 |
7472121 | Kothari | Dec 2008 | B2 |
7475015 | Epstein et al. | Jan 2009 | B2 |
7490099 | Myers et al. | Feb 2009 | B2 |
7536397 | Corston-Oliver | May 2009 | B2 |
7536408 | Patterson | May 2009 | B2 |
7555428 | Franz et al. | Jun 2009 | B1 |
7580827 | Brants et al. | Aug 2009 | B1 |
7580921 | Patterson | Aug 2009 | B2 |
7580929 | Patterson | Aug 2009 | B2 |
7584175 | Patterson | Sep 2009 | B2 |
7599914 | Patterson | Oct 2009 | B2 |
7672831 | Todhunter et al. | Mar 2010 | B2 |
7689536 | Weissman et al. | Mar 2010 | B1 |
7693813 | Cao et al. | Apr 2010 | B1 |
7698259 | Xue | Apr 2010 | B2 |
7698266 | Weissman et al. | Apr 2010 | B1 |
7711679 | Patterson | May 2010 | B2 |
7716216 | Harik et al. | May 2010 | B1 |
7739102 | Bender | Jun 2010 | B2 |
7792783 | Friedlander et al. | Sep 2010 | B2 |
7792836 | Taswell | Sep 2010 | B2 |
7831531 | Baluja et al. | Nov 2010 | B1 |
7840589 | Holt et al. | Nov 2010 | B1 |
7877371 | Lerner et al. | Jan 2011 | B1 |
7895221 | Colledge et al. | Feb 2011 | B2 |
7912705 | Wasson et al. | Mar 2011 | B2 |
7913163 | Zunger | Mar 2011 | B1 |
7925610 | Elbaz et al. | Apr 2011 | B2 |
7925655 | Power et al. | Apr 2011 | B1 |
7937265 | Pasca et al. | May 2011 | B1 |
7937396 | Pasca et al. | May 2011 | B1 |
7987176 | Latzina et al. | Jul 2011 | B2 |
8010539 | Blair-Goldensohn et al. | Aug 2011 | B2 |
8019748 | Wu et al. | Sep 2011 | B1 |
8024372 | Harik et al. | Sep 2011 | B2 |
8051104 | Weissman et al. | Nov 2011 | B2 |
8055669 | Singhal et al. | Nov 2011 | B1 |
8065248 | Baluja et al. | Nov 2011 | B1 |
8065316 | Baker et al. | Nov 2011 | B1 |
8078450 | Anisimovich et al. | Dec 2011 | B2 |
8086594 | Cao et al. | Dec 2011 | B1 |
8086619 | Haahr et al. | Dec 2011 | B2 |
8086624 | Hubinette | Dec 2011 | B1 |
8090723 | Cao et al. | Jan 2012 | B2 |
8108412 | Patterson | Jan 2012 | B2 |
8112437 | Katragadda et al. | Feb 2012 | B1 |
8117223 | Patterson | Feb 2012 | B2 |
8122026 | Laroco, Jr. et al. | Feb 2012 | B1 |
8145473 | Anisimovich et al. | Mar 2012 | B2 |
8166021 | Cao et al. | Apr 2012 | B1 |
8214199 | Anisimovich et al. | Jul 2012 | B2 |
8229730 | Van Den Berg et al. | Jul 2012 | B2 |
8229944 | Latzina et al. | Jul 2012 | B2 |
8260049 | Deryagin et al. | Sep 2012 | B2 |
8271453 | Pasca et al. | Sep 2012 | B1 |
8285728 | Rubin | Oct 2012 | B1 |
8301633 | Cheslow | Oct 2012 | B2 |
8402036 | Blair-Goldensohn et al. | Mar 2013 | B2 |
8577907 | Singhal et al. | Nov 2013 | B1 |
9069750 | Zuev | Jun 2015 | B2 |
20030101164 | Pic | May 2003 | A1 |
20030176999 | Calcagno et al. | Sep 2003 | A1 |
20040098250 | Kimchi et al. | May 2004 | A1 |
20040261016 | Glass et al. | Dec 2004 | A1 |
20050065916 | Ge et al. | Mar 2005 | A1 |
20050155017 | Berstis et al. | Jul 2005 | A1 |
20050209844 | Wu et al. | Sep 2005 | A1 |
20050240392 | Munro, Jr. et al. | Oct 2005 | A1 |
20050267871 | Marchisio et al. | Dec 2005 | A1 |
20060106767 | Adcock et al. | May 2006 | A1 |
20060106793 | Liang | May 2006 | A1 |
20060143176 | Mojsilovic | Jun 2006 | A1 |
20060149739 | Myers | Jul 2006 | A1 |
20060184516 | Ellis | Aug 2006 | A1 |
20070073745 | Scott | Mar 2007 | A1 |
20070083505 | Ferrari et al. | Apr 2007 | A1 |
20070094615 | Endo et al. | Apr 2007 | A1 |
20070130112 | Lin | Jun 2007 | A1 |
20070143322 | Kothari | Jun 2007 | A1 |
20070156669 | Marchisio et al. | Jul 2007 | A1 |
20070185860 | Lissack | Aug 2007 | A1 |
20080133483 | Bayley et al. | Jun 2008 | A1 |
20080133505 | Bayley et al. | Jun 2008 | A1 |
20080243777 | Stewart et al. | Oct 2008 | A1 |
20080294622 | Kanigsberg et al. | Nov 2008 | A1 |
20080319947 | Latzina et al. | Dec 2008 | A1 |
20090049040 | Fay et al. | Feb 2009 | A1 |
20090063472 | Pell et al. | Mar 2009 | A1 |
20090076839 | Abraham-Fuchs et al. | Mar 2009 | A1 |
20090089047 | Pell et al. | Apr 2009 | A1 |
20090089277 | Cheslow | Apr 2009 | A1 |
20090112841 | Devarakonda et al. | Apr 2009 | A1 |
20090182738 | Marchisio et al. | Jul 2009 | A1 |
20090222441 | Broder et al. | Sep 2009 | A1 |
20090271179 | Marchisio et al. | Oct 2009 | A1 |
20100095196 | Grabarnik et al. | Apr 2010 | A1 |
20100169314 | Green et al. | Jul 2010 | A1 |
20100169337 | Green et al. | Jul 2010 | A1 |
20100318423 | Kanigsberg et al. | Dec 2010 | A1 |
20100332493 | Haas et al. | Dec 2010 | A1 |
20110040772 | Sheu | Feb 2011 | A1 |
20110055188 | Gras | Mar 2011 | A1 |
20110072021 | Lu et al. | Mar 2011 | A1 |
20110119254 | Brown et al. | May 2011 | A1 |
20110153539 | Rojahn | Jun 2011 | A1 |
20110202526 | Lee et al. | Aug 2011 | A1 |
20110202563 | Colledge et al. | Aug 2011 | A1 |
20110301941 | De Vocht | Dec 2011 | A1 |
20110314032 | Bennett et al. | Dec 2011 | A1 |
20120023104 | Johnson et al. | Jan 2012 | A1 |
20120030226 | Holt et al. | Feb 2012 | A1 |
20120047145 | Heidasch | Feb 2012 | A1 |
20120131060 | Heidasch et al. | May 2012 | A1 |
20120197885 | Patterson | Aug 2012 | A1 |
20120203777 | Laroco, Jr. et al. | Aug 2012 | A1 |
20120221553 | Wittmer et al. | Aug 2012 | A1 |
20120246153 | Pehle | Sep 2012 | A1 |
20120296897 | Xin-Jing et al. | Nov 2012 | A1 |
20130013291 | Bullock et al. | Jan 2013 | A1 |
20130054589 | Cheslow | Feb 2013 | A1 |
20130091113 | Gras | Apr 2013 | A1 |
20130254209 | Kang et al. | Sep 2013 | A1 |
Number | Date | Country |
---|---|---|
2400400 | Dec 2001 | EP |
2011160204 | Dec 2011 | WO |
Entry |
---|
Bolshakov, I.A. “Co-Ordinative Ellipsis in Russian Texts: Problems of Description and Restoration” Proceedings of the 12th conference on Computational linguistics, vol. 1, pp. 65-67. Association for Computational Linguistics 1988. |
Hutchins, Machine Translation: Past, Present, Future, Ellis Horwood, Ltd., Chichester, UK, 1986. |
Mitamura, T. et al. “An Efficient Interlingua Translation System for Multi-lingual Document Production,” Proceedings of Machine Translation Summit III, Washington DC, Jul. 2-4, 1991. |
Number | Date | Country | |
---|---|---|---|
20130054612 A1 | Feb 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12983220 | Dec 2010 | US |
Child | 13662272 | US | |
Parent | 11548214 | Oct 2006 | US |
Child | 12983220 | US | |
Parent | 13535638 | Jun 2012 | US |
Child | 11548214 | US | |
Parent | 13464447 | May 2012 | US |
Child | 13535638 | US |