The present disclosure relates generally to search technology. More specifically, the present disclosure relates to searching of electronic content. For example, content on the Internet and in other electronic resources (e.g., text corpora (“corpuses”), dictionaries, glossaries, encyclopedias, etc.) may be searched by the users.
Search results may be generated based on keywords entered by a user as part of a search query. Existing search systems enable users to use simple query languages to find documents that either contain or do not contain the words or word combinations specified by the user. However, due to existence of homonyms and homographs in natural languages, a search result based on a keyword search may include a substantial amount of non-relevant or marginally relevant information. For example, if the user searches for texts with the word “page” in the sense of “a man or boy employed as the personal attendant to a queen”, the user may receive a large number of non-relevant information in the search results, where “page” may refer to an Internet webpage, a page in a newspaper or magazine, a section of stored data, etc. This is likely to happen because those other meanings of the word “page” are substantially more frequent than the one referring to a man or boy.
An exemplary embodiment relates to a method for generating search results. The method includes receiving one or more corpuses of natural language texts including indexed linguistic parameters and semantic structures of lexical units, the linguistic parameters and semantic structures generated during a preliminary syntactico-semantic analysis. The method further includes searching for text fragments satisfying a query in the one or more corpuses. The method further includes estimating relevance of search results. The method further includes ranking search results according to estimated relevance.
Another exemplary embodiment relates to a system comprising: one or more data processors; and one or more storage devices storing instructions that, when executed by the one or more data processors, cause the one or more data processors to perform operations. The operations comprising receiving one or more corpuses of natural language texts including indexed linguistic parameters and semantic structures of lexical units, the linguistic parameters and semantic structures generated during a preliminary syntactico-semantic analysis. The operations further comprising includes searching for text fragments satisfying a query in the one or more corpuses. The operations further comprising estimating relevance of search results. The operations further comprising ranking search results according to estimated relevance.
Yet another exemplary embodiment relates to computer readable storage medium having machine instructions stored therein, the instructions being executable by a processor to cause the processor to perform operations. The operations comprising receiving one or more corpuses of natural language texts including indexed linguistic parameters and semantic structures of lexical units, the linguistic parameters and semantic structures generated during a preliminary syntactico-semantic analysis. The operations further comprising includes searching for text fragments satisfying a query in the one or more corpuses. The operations further comprising estimating relevance of search results. The operations further comprising ranking search results according to estimated relevance.
The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the disclosure will become apparent from the description, the drawings, and the claims, in which:
Numerous specific details may be set forth below to provide a thorough understanding of concepts underlying the described embodiments. It may be apparent, however, to one skilled in the art that the described embodiments may be practiced without some or all of these specific details. In other instances, some process steps have not been described in detail in order to avoid unnecessarily obscuring the underlying concept.
According to various embodiments disclosed herein, a system performing semantic searches based on user specified queries is provided. While performing the semantic search, the system may take into account lexical meanings of one or more words in the query. A lexical meaning may be determined for one or more of the words in the search query. In some embodiments, a user interface provided by the system may allow a user to enter the query and select lexical meanings for one or more words in the selected query. For example, the user may right click using their mouse on a word that may have multiple meanings, and select a desired meaning from a shown list of meanings for that word. In other embodiments, the lexical meanings of the query words are determined by the system as disclosed herein. As a result, the search is performed not only using the words specified in the query, but also the words in specific lexical meanings.
Text corpora may be searched for the words specified by the user in a search query. The text corpora may include a set of texts, which may be electronically stored. Comprehensive semantic and syntactic parsing of text corpora is performed with the extraction of the full range of lexical, morphological, syntactic and semantic parameters of sentences and the construction of their semantic structures. A subsequent semantic indexing structure allows the user to search for not only “triplets” (i.e., three data entities {subject, predicate, object}) but sentences of any specified structure, including a query formulated in natural language (e.g., a natural language question). The user may indicate (e.g., by selecting parameters in settings provided in the user interface) whether two or more words included in the query can be contained in the same sentence. The query may be formulated to relate directly to a set of words that belong to a class or possess specific properties or characteristics. The system may enable the user to search for sentences with specified syntactic and/or semantic properties, such as illustrations of a given semantic relation (link). In particular, the system may allow the user to create queries based on grammatical meanings, semantic and/or semantic positions (links), syntactic patterns, stylistic and/or semantic features. The system may be useful to lexicographers, philologists, linguists, students and teachers of native and foreign languages, as well as to many conventional users.
The system may index natural language texts, and perform semantic searching using the indexed natural language texts as disclosed herein. The system may build at least one index for each text or text corpus by conducting a comprehensive or exhaustive syntactic and semantic analysis of natural language texts. During the syntactic and semantic analysis of the natural language texts, any combination of the following information is analyzed, saved, and/or indexed: numerous individual words, numerous lexical meanings of the words, the lexical, syntactic, and semantic information about each sentence received during the syntactic and semantic analysis. Lexical meanings of some or all words, all syntactic and semantic information about each sentence in the natural language texts generated pursuant to the syntactic and semantic analysis may be saved. The saved information may include data saved during interim steps of the analysis (interim parsing results), results of lexical parsing (lexical choices), including the results obtained when resolving ambiguities. The generated index may be used for semantic searching of natural language texts, as described herein.
The search systems described herein allow the user to search and locate relevant information using a semantic query that is formulated in a special language for semantic queries, and/or using natural language. The same analyzer or parser may be utilized to analyze the query in natural language, identify its syntactical structure and construct the semantic structure and, thus, for the system to determine the meaning of the query. The search is performed in accordance with the syntactic and semantic information stored in resources that permit these types of searches. Thus, the user can obtain relevant query results.
Moreover, since the search query can be formulated or translated into universal language-independent semantic terms, the search can be executed in text corpora containing documents in different languages. Thus, the user can obtain information presented in different resources independent of the language in which the search query was formulated. The search results can be presented to the user both in the language of the resource, in the original form as found in the document, and/or it can be translated into the language of the query using a machine translation system.
The United States U.S. Pat. No. 8,078,450 describes a method that includes deep syntactic and semantic analysis of natural language texts based on comprehensive linguistic descriptions. This method can be used at the analysis stage of the described method in building indices. The method uses a broad spectrum of linguistic descriptions, both universal semantic mechanisms and those associated with the specific language, which allows all the real complexities of the language to be reflected without simplification or artificial limits, without any danger of a combinatorial explosion, or an unguided growth in complexity. In addition, these analytical methods are based on principles of cohesive purpose-driven recognition, i.e., hypotheses about the structure of a portion of a sentence are verified as part of checking the hypotheses about the structure of the entire sentence. That makes it possible to avoid analyzing a large set of anomalies and variations.
The deep analysis may include lexical-morphological, syntactic and semantic analysis of each sentence of the text corpus, resulting in the construction of language-independent semantic structures in which each word of text is assigned to a corresponding semantic class.
Next, a language-independent semantic structure (107) is built or generated, which constitutes the meaning of the given sentence. This stage may also include restoration of referential links between sentences. An example of a referential link is anaphora—the use of linguistic constructions that can be interpreted only if another text block, usually the previous one, is taken into account.
Then, the original sentence, syntactic structure of the original sentence and the language-independent semantic structure are indexed (108). The result is a set of a collection of indices (109). The index can usually be presented in a table, where each value of a textual feature (e.g., a word, expression or phrase, relation between the elements of the sentence, morphological, lexical, syntactic or semantic feature, as well as syntactic and semantic structures) in the document is associated with a list of addresses of their occurrences in that document. In one embodiment, morphological, syntactic, lexical and semantic characteristics, and also structures and structural fragments can be indexed in the same way as a word in the document is indexed.
In one embodiment, indices can include all or at least one value of the morphological, syntactic, and lexical semantic characteristics (parameters). These values or parameters are generated during a two-stage semantic analysis, described in more detail hereinafter. Indices can be used in many tasks involved in processing natural language, particularly in organizing semantic searches. According to one implementation, the morphological, syntactic, and lexical semantic descriptions are structured and stored in the database. This set of instructions may include, at minimum, the morphological language model, the model of syntactical constructions for the language, and lexical-semantic models. In one embodiment, for the analysis of complex language structures, recognition of the meaning of the sentence and the correct transfer of the information contained therein, an integrated model is used to describe the syntax and semantics.
A rough syntactic analysis is applied to the source sentence and includes, in particular, the generation of all potential lexical meanings for words that make up the sentence or phrase, of all the potential relationships among them and of all potential constituents. All possible surface syntactic models are applied for each element of the lexical-morphological structure. Then, all possible constituents are created and generalized so as to represent all possible variations of the syntactic parsing of the sentence. The result is the formation of a graph of generalized constituents (232) for subsequent precise syntactic analysis. The graph of generalized constituents (232) includes all the potential links within the sentence.
The rough syntactic analysis is followed by precise syntactic analysis on the graph of generalized constituents, resulting in the “derivation” of a certain number of syntactic trees (242) that represent the structure of the source sentence. Construction of a syntax tree (242) includes a lexical selection for the nodes in the graph and a selection of the relationships between the nodes of the graph. A set of a priori and statistical scores may be used when selecting lexical variations or when selecting relationships from the graph. A priori and statistical scores may also be used both to evaluate the parts of the graph and to evaluate the entire tree. In one implementation, one or more syntactic trees are built or arranged in descending value. Thus, the best syntactic tree will be the first one constructed. At this time, the non-tree links are also checked and constructed. If the first syntactic tree is not appropriate, for example, because of the impossibility of establishing the necessary non-tree links, then the next syntactic tree is regarded as the best, and so on.
Since this lexical selection for the nodes of the graph and the selection of relationships between nodes takes place on the basis of a priori and statistical assessments, one implementation of the method not only examines and assesses all variants, but these variants also are stored and indexed at stage 108 with consideration of their integral estimates. That is, index 109 contains not only highly probable options from parsing sentences, but also the improbable options that are weighted correspondingly if this parsing is successful. The weight of the version from the parsing is then used in the calculation assessing the relevance of the search result.
A wide range of lexical, grammatical, syntactic, pragmatic and semantic features are derived at the stage (106) of the analysis and construction of semantic structures (107). For example, the system can derive and store lexical information and information about the affiliation of lexical units of semantic classes, information on grammatical forms and linear order, about syntactic relations and surface positions, the use of certain forms, of aspects, of tonalities such as positive and negative tonality, deep positions, non-tree links, semantics, etc.
In addition, at step 107, an ontological analysis can be conducted with the aim of deriving subject domain knowledge, ontological objects, and ontological facts. The derivation of ontological objects and ontological facts and fixing the relationship between them is carried out, for example, using a special type of rule, rules of logical inference, and other rules. This information is fixed in ontologies (110). For example, returning to the example illustrated in
The information from the ontologies is used in the process of building indices (108). This then enables, in the search process, the retrieval of information about the object, even if it is expressed indirectly in the text corpora. For example, information from the fragment illustrated in
Referring to
The proposed method of analysis supports the attainment of maximum precision in determining the meaning of the sentence.
The language-independent semantic structure of the sentence is represented as an acyclic graph (trees, supplemented by non-tree links) where each word of a specific language is replaced with universal (language-independent) semantic entities called semantic classes. A semantic class is a semantic characteristic that may be derived and used for completing tasks in the semantic search, classification, clustering and filtering of documents written in one or more languages. Moreover, semantemes can be used as information in the language-independent structures, reflecting not only semantic, but also syntactic, grammatical, and other language-dependent information.
Semantic classes can be arranged in a semantic hierarchy where a “daughter” semantic class and its “descendants” inherit much of the properties of the ‘parent’ and all previous semantic classes (“ancestors”). For example the semantic class SUBSTANCE is a daughter class of the rather broad class ENTITY and at the same time is a “parent” for semantic classes GAS, LIQUID, METAL, WOOD_MATERIAL, etc. Each semantic class in a semantic hierarchy is covered by a deep (semantic) model. The deep model is a set of deep slots (types of semantic relationships in sentences). Deep slots reflect the semantic roles of daughter constituents (i.e., structural units of a sentence) in various sentences with items from this semantic class as the core of a parent constituent and possible semantic classes as items filling the slot. These deep slots reflect the semantic relationships between constituents, such as “agent,” “addressee,” “instrument” or “quantity.” The daughter class inherits and tweaks the deep model of the parent class.
Referring to
Referring to
Word-inflextion descriptions (710) describe how the base form of the word may vary depending on case, gender, number, tense, etc. and broadly include all possible forms of the word. Word formation (730) describes what new words can be constructed using this word. Grammemes are units of the grammatical system (720) and, as indicated in link (722) and link (724), grammemes can be used to construct word-inflextion descriptions (710) and word formation descriptions (730).
The system of semantemes (930) represents a set of semantic categories. Semantemes may reflect lexical and grammatical categories and attributes as well as differential properties and stylistic, pragmatic and communication characteristics. For example, the semantic category “DegreeOfComparison” may be used to describe degrees of comparison expressed in different forms of adjectives, such as “easy,” “easier” and “easiest.” Likewise, the semantic category “DegreeOfComparison” may include semantemes, such as “Positive,” “ComparativeHigherDegree,” and “SuperlativeHighestDegree.” As another example, the semantic category “RelationToReferencePoint” can be used to describe the linear order—before or after the object or event is located in the sentence and the link to it, with the semantemes being “Previous”, “Subsequent”. In another example, the semantic category “EvaluationObjective” can fix the presence of an objective assessment, such as “Bad”, “Good”, etc. Lexical semantemes can describe the specific properties of objects, such as “being flat” or “being liquid” and are used in limiting the placeholders of the deep slots. Classifications of differential semantemes are used to express differential properties within a single semantic class. For example, in English, “hairdresser” for men is translated as “barber”, and in the semantic class “HAIRDRESSER” it will be assigned the semanteme “RelatedToMen”, while in the same semantic class we find “hairdresser” and “hairstylist” and so on.
Pragmatic descriptions (940) are used to assign a corresponding theme, style or genre to text during the parsing process, and it is also possible to ascribe the corresponding characteristics to objects in the semantic hierarchy. For example, “Economic Policy”, “Foreign Policy”, “Justice”, “Legislation”, “Trade”, “Finance”, etc.
Any parameter of linguistic description (610)—lexical meanings, semantic classes, grammemes, semantemes and more—are removed during an exhaustive analysis of the text, and any parameter can be indexed (an index specification is created). Indexing semantic classes is required in many tasks related to the analysis of natural language texts, such as semantic search, classification, clustering, filtering of texts, and much more. Indexing lexical meanings (as opposed to simply indexing the word alone) enables searches of not just words or word forms, but of the lexical meaning, that is, words in a particular meaning Syntactic structure and semantic structure can also be indexed and stored for use in semantic search, classification, clustering, and document filtering.
Returning to
In one embodiment, a combination of two, three or, generally speaking, N numbers can be used for indexing different syntactic, semantic, or other parameters. For example, combinations of two numbers—indexes of words that in the text are linked by a relationship corresponding to the given slot—can be used to index the surface or deep slots. For example, for the semantic structure of the sentence “This boy is smart, he'll succeed in life”, depicted in
Since not only words are indexed, but also their lexical meanings, semantic classes, syntactic and semantic relations, and any other elements of syntactic and semantic structures, it becomes possible to search the context using not only key words, but also using the context containing lexical meanings, meanings belonging to specific semantic classes, context including elements with specific syntactic and/or semantic features and/or morphological features or sets (combinations) of such features. Additionally, sentences may be found with non-tree syntactic phenomena, such as ellipses, parataxis, etc. Because semantic classes may be searched, it becomes possible to search semantically linked words and concepts.
Searches of fragments of syntactic and/or semantic structures can be done. The search result may include sentences or paragraphs, or other fragments, depending on the selection of the corresponding option by the user. Because all sentences in each corpus are analyzed and then saved along with the results of their syntactic and semantic analysis, syntactic and semantic structures can be produced for the user in graphical form as well.
The most widespread searching in search systems is performed using keywords. But a problem arises when one or more keywords are multi-valued, for example, in English a “bank” can mean 1) a financial institution, bank, 2) storage, repository, and 3) the shore of a river or lake. There are still more, less frequently occurring meanings of “bank.” In such a case, in response to the query using such a keyword, the user of standard search systems receives a set of results that are of no use to him. For example, most of the hits resulting from a standard search for the keyword “apple” relate to the name of the computer corporation, since it is encountered much more frequently in Internet resources; it is virtually impossible to find documents mentioning the fruit “apple” (they turn up on the most distant pages) without attaching additional words (e.g., “fruit”) to the request or an exclusion in a specially formulated computer lexicon.
As shown in
There are different ways of specifying the lexical meaning including, but not limited to, by specifying a semantic class. Another way is to provide each lexical meaning with an explanation similar to a dictionary entry, as shown in
The user can select any possible lexical meaning of the word for the search, and this meaning becomes the selected one and is shown as marked, for example, as shown in
Regardless of whether the user specifies the meanings for the query words, the user can see, for example, in the retrieved fragment (e.g., by clicking on the right mouse button or by tapping on a touch screen of the computing device utilized by the user), in which lexical meaning the word occurs, its semantic class, and also several other parameters, such as synonyms, the syntactic model, co-occurrence, examples of phrases with the word, etc. A sample of this kind of query is shown in
In some embodiments, the user may attach the Boolean operators AND, OR and NOT to the lexical meanings, that is, to the keywords that have been assigned to corresponding semantic classes.
A list of settings (1104) or parameters may be set by the user. For example, the user may set whether the word order is essential to the search. Another setting may allow the user to limit the space between the query objects. For example, the user can use the operator W/n to search for documents that contain no more than n words between the query objects. In one implementation, the user can use it explicitly, in others they can use it to select an optional length.
The user may indicate that the search includes synonyms of the one or more words used in the query. Synonyms are words that have the same or very similar meanings. In the semantic hierarchy, all synonymous lexical meanings, such as “food,” “meal,” “alimentary” are located in the same semantic class and have the same or close semantic characteristics and are semantemes. Then, if the user selects a setting or option “Search synonyms” (1104), and wants to find “food,” first the lexical meaning is determined, its semantic class, and as a result, documents can be found where “meal” or “alimentary” occur or, possibly, other more archetypal representatives of the semantic class FOOD. A measure of relevance can be introduced, for example, based on the assessment of the “closeness” of the lexical meaning from the query to the synonym found, and, taking into account context, the word order and other factors, it can be extended to a sentence, a fragment, etc.
There is still another way to specify the lexical meaning of words that are in the query, if the query is presented as part of a phrase or sentence, and that is its complete syntactic and semantic analysis. Many words in languages have several different meanings, senses. Some words in the query can have different meanings expressed in different lexical meanings To determine exactly which lexical meaning of the word applies to the query, a complete syntactic and semantic parsing of the query submitted as a sentence or phrase is performed to define the lexical meanings of the words constituting the query. An exhaustive syntactic analysis includes a rough and a close analysis. The rough analysis defines all the potentially possible meanings for each word. The close analysis, which is based on linguistic descriptions, language rules, co-occurrences, and on analysis of context, statistics and other factors, generates the most relevant lexical meanings Thus, as a result of the lexical selection from the close analysis, a semantic class is determined for each word in the user query.
Another example of a semantic query in a sentence form is shown in
Another example of a query to a semantic search engine is displayed in
In one implementation, all the morphological forms of words designated in the query are taken into account and all the morphological forms can be found. The limitations to the morphological forms can be included in the query in the form of a limitation on grammatical meanings, and can be indicated, for example, in angle brackets < >. The index that is generated using these methods is an integral part of the semantic search.
In one embodiment, users can formulate questions for a semantic search in natural language. The same parser that is used for syntactic and semantic analysis of the text corpus is applied to the syntactic and semantic analysis of the user's question, and is recognized by its syntactic structure and based on the language-independent semantic structure, and thus effects the “recognition” by the system of the meaning of the sentence. The constructed semantic structure then transmits the language of the search engine queries. For example, the aforementioned questions “What countries were discovered?” or “What can be made from milk?” originally formulated by the user in natural language is processed by the parser and translated into a semantic query. If the query is formulated as an interrogative sentence, then as a result of the analysis, the sentence structure is constructed, which can be a potential answer to the question, in this case, containing a lacuna.
Indices of syntactic and semantic structures are created and stored in the form of a tree or a graph. The desired structure is described by a search query using one or more search parameters. One or more parameters can be defined, can be specified using variables, and may be defined as a range of possible values of these variables. In other words, the query for the search can be presented as a sentence in natural language with “lacunae.” Lacunae can be covered or include both single words and word combinations, phrases, groups of words that form related components (a constituent), embedded (subordinate) sentence, etc. As a result of the query and search, there are options for filling these lacunae in the texts available in the markup of the text corpus. For display to the user, they can be sorted by frequency of occurrence.
The method makes it possible to search a wide range of entities, such as relationships, non-tree links, lexical classes, semantic classes, etc. These entities—words and phrases can be found through their grammatical features, lexical properties, syntactic and/or semantic properties. Clauses (sentences) can be found through any lexical, syntactic or semantic features.
Using the described methods, a type of search can be implemented. For example, a search can be made for properties of syntactic or semantic structural nodes—grammatical meanings, by the superclass (the class with all descendants) by semantemes (e.g., “time” without regard to the form of expression). Another search variant consists of a search of relationships between properties. For example, a search can be made for surface or deep slots with validation (or no validation) of inheritance, anaphoric links, or any number of nodes with the specified properties. Additionally, it is possible to derive the meanings of these attributes in the retrieved results.
Of significance is that since a search query can be translated into a semantic form that is independent of the actual language, the search can be carried out in texts in different languages, and in resources, including corpora of different languages. Thus, the user can retrieve information provided in all resources, regardless of the query language. The search result can be presented to the user in the language of the resource (as it appears in the original), or a search result can be translated into the language of the query using a machine translation system.
In some implementations, not only one or more indices can be used for the search, but also formal models positing knowledge about one or more domains. For example, ontologies can be used as domain models. An ontology may include, among other things, the set of concepts and entities related to the subject area and the relationships between them. They are used for domain modeling and logical deduction.
A semantic structure of the query is constructed (1430), which, along with the syntactic structure of the sentence, can be used for formal conclusions from data stored in ontologies (1480). For example, in response to the question “Who won at the Rome Olympics in fencing?” a result of the search, in particular, may be the sentence “Petrov became the Olympic champion in fencing in 1960.” The unification of semantic structures is produced by transformational rules, rules for ontologies, and mapping of ontological data about the fact that “Rome hosted the Olympics in 1960,” “in 1960 the Olympics were held in Rome,” “Roman Olympics”=“the Olympic Games in Rome,” “win the Olympics”=“to become an Olympic champion.” And also “the Olympic Games in Rome”=“the Summer Olympic Games of 1960.”
The ontology may include descriptions of, for example, entities, classes (concepts, understandings), attributes, relationships, and ontological facts. For example, entities or objects are instances of classes that represent the basic level of concepts (objects). Also, the classes may represent a set, collection, concepts, classes in programming, object types, forms (varieties) of things, etc. Examples of classes may be Person (persons), Geographical Object (geographic objects), Company (companies), Organization (organizations), Numerical Value (numerical values), etc.
Attributes express aspects, properties, features, characteristics or parameters that can have objects or classes. The relationship is a means of expressing the relationships that entities and classes have to each other and between themselves. Some of the events may cause changes in attributes or relationships. Ontologies are encoded using special languages of ontologies.
These ontological objects—entities, classes (concepts), attributes, relationships, ontological facts, may be included in the search query. Different approaches can be used to specify the words including in a query as ontological objects. For example, the query “Dakota” can designate a location, a person, or an organization, etc. In one embodiment, such objects can be clearly marked, for example “Dakota % person”. In another embodiment, if the option “Search the ontology” is included, the user can see a selection on a menu displaying choices of the type of ontological object, as indicated, for example, in
Taking into account the variations of semantic and syntactic structures and data received from ontologies (1480), the system can generate a comprehensive or exhaustive semantic request for indices (1460) of the corresponding text corpus (1470). Since the meaning (semantics) of the original question is fixed, during the search process (1440) the system seeks lexical meanings and semantic structures of sentences containing responses to the question, taking into account possible alternatives to formulating the question and the various syntactic and lexical variants of the answer. Also, the search is performed with reference to anaphoric and other referential links between sentences imposed in the construction of the semantic structures of the sentences parsed in step 107 (
Each retrieved text block can be formally assessed (1450) according to the degree of relevance to the query. In particular, in one implementation, the relevancy assessment (1450) takes into account the indexing of lexical meanings with probability in the construction of the index (109), as described in the illustration of the stage (108) depicted in
The computer platform (1500) may also include a number of input and output ports to transfer information out and to receive information. For interaction with a user, the computer platform (1500) may contain one or more input devices (such as a keyboard, a mouse, a scanner, and so forth) and a display device (1508) (such as a liquid crystal display). The computer platform (1500) may also have one or more read-only memory devices (1510) such as an optical disk drive (CD, DVD or other), a hard disk, or a tape drive. In addition, the computer platform (1500) may have an interface with one or more networks (1512) that provide connections with other networks and computer equipment. In particular, this may be a local area network (LAN), a wireless Wi-Fi network and may or may not be connected to the World Wide Web (Internet). It is understood that the computer facilities (1500) include appropriate analog and/or digital interfaces between the processor (1502) and each of the components (1504, 1506, 1508, 1510 and 1512).
The computer facilities (1500) are managed by the operating system (1514) and include various applications, components, programs, objects, modules and other, designated by the consolidated number 1516.
The programs used to implement the disclosed methods may be a part of an operating system or may be a specialized application, component, program, dynamic library, module, script, or a combination thereof. The disclosed methods and systems cannot be limited by the hardware mentioned earlier.
Implementations of the subject matter and the operations described in this specification can be implemented in digital electronic circuitry, or in computer software embodied on a tangible medium, firmware, or hardware, including the structures disclosed in this specification and their structural equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on one or more computer storage medium for execution by, or to control the operation of, data processing apparatus. Alternatively or in addition, the program instructions can be encoded on an artificially-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal that is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus. A computer storage medium can be, or be included in, a computer-readable storage device, a computer-readable storage substrate, a random or serial access memory array or device, or a combination of one or more of them. Moreover, while a computer storage medium is not a propagated signal, a computer storage medium can be a source or destination of computer program instructions encoded in an artificially-generated propagated signal. The computer storage medium can also be, or be included in, one or more separate components or media (e.g., multiple CDs, disks, or other storage devices). Accordingly, the computer storage medium may be tangible.
The operations described in this specification can be implemented as operations performed by a data processing apparatus on data stored on one or more computer-readable storage devices or received from other sources.
The term “client or “server” include all kinds of apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, a system on a chip, or multiple ones, or combinations, of the foregoing. The apparatus can include special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit). The apparatus can also include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, a cross-platform runtime environment, a virtual machine, or a combination of one or more of them. The apparatus and execution environment can realize various different computing model infrastructures, such as web services, distributed computing and grid computing infrastructures.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, declarative or procedural languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, object, or other unit suitable for use in a computing environment. A computer program may, but need not, correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform actions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for performing actions in accordance with instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such devices. Moreover, a computer can be embedded in another device, e.g., a mobile telephone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device (e.g., a universal serial bus (USB) flash drive), to name just a few. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto-optical disks; and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
To provide for interaction with a user, implementations of the subject matter described in this specification can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube), LCD (liquid crystal display), OLED (organic light emitting diode), TFT (thin-film transistor), plasma, other flexible configuration, or any other monitor for displaying information to the user and a keyboard, a pointing device, e.g., a mouse, trackball, etc., or a touch screen, touch pad, etc., by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input. In addition, a computer can interact with a user by sending documents to and receiving documents from a device that is used by the user; for example, by sending webpages to a web browser on a user's client device in response to requests received from the web browser.
Implementations of the subject matter described in this specification can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the subject matter described in this specification, or any combination of one or more such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), an inter-network (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks).
The features disclosed herein may be implemented on a smart television module (or connected television module, hybrid television module, etc.), which may include a processing circuit configured to integrate Internet connectivity with more traditional television programming sources (e.g., received via cable, satellite, over-the-air, or other signals). The smart television module may be physically incorporated into a television set or may include a separate device such as a set-top box, Blu-ray or other digital media player, game console, hotel television system, and other companion device. A smart television module may be configured to allow viewers to search and find videos, movies, photos and other content on the web, on a local cable TV channel, on a satellite TV channel, or stored on a local hard drive. A set-top box (STB) or set-top unit (STU) may include an information appliance device that may contain a tuner and connect to a television set and an external source of signal, turning the signal into content which is then displayed on the television screen or other display device. A smart television module may be configured to provide a home screen or top level screen including icons for a plurality of different applications, such as a web browser and a plurality of streaming media services, a connected cable or satellite media source, other web “channels”, etc. The smart television module may further be configured to provide an electronic programming guide to the user. A companion application to the smart television module may be operable on a mobile computing device to provide additional information about available programs to a user, to allow the user to control the smart television module, etc. In alternate embodiments, the features may be implemented on a laptop computer or other personal computer, a smartphone, other mobile phone, handheld computer, a tablet PC, or other computing device.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any inventions or of what may be claimed, but rather as descriptions of features specific to particular implementations of particular inventions. Certain features that are described in this specification in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be changed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product embodied on a tangible medium or packaged into multiple such software products.
Thus, particular implementations of the subject matter have been described. Other implementations are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In certain implementations, multitasking or parallel processing may be utilized.
This application is a continuation-in-part of U.S. patent application Ser. No. 13/173,649 filed on Jun. 30, 2011, entitled “METHOD AND SYSTEM FOR SEMANTIC SEARCHING OF NATURAL LANGUAGE TEXTS” and U.S. patent application Ser. No. 13/173,369, filed on Jun. 30, 2011, entitled “METHOD AND SYSTEM FOR SEMANTIC SEARCHING”, which are continuations-in-part of U.S. patent application Ser. No. 12/983,220, filed on Dec. 31, 2010, entitled “Method and System for Semantic Searching”, which is a continuation of U.S. patent application Ser. No. 11/548,214, filed on Oct. 10, 2006, entitled “METHOD AND SYSTEM FOR ANALYZING VARIOUS LANGUAGES AND CONSTRUCTING LANGUAGE-INDEPENDENT SEMANTIC STRUCTURES”, now U.S. Pat. No. 8,078,450, which is currently co-pending, or is an application of which a currently co-pending application is entitled to the benefit of the filing date, the disclosure of which is incorporated herein by reference. This application claims priority under 35 USC 119 to Russian patent application 2013132622, filed Jul. 15, 2013, the disclosure of which is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4706212 | Toma | Nov 1987 | A |
5068789 | Van Vliembergen | Nov 1991 | A |
5128865 | Sadler | Jul 1992 | A |
5146405 | Church | Sep 1992 | A |
5175684 | Chong | Dec 1992 | A |
5268839 | Kaji | Dec 1993 | A |
5301109 | Landauer et al. | Apr 1994 | A |
5386556 | Hedin et al. | Jan 1995 | A |
5418717 | Su et al. | May 1995 | A |
5426583 | Uribe-Echebarria Diaz De Mendibil | Jun 1995 | A |
5475587 | Anick et al. | Dec 1995 | A |
5477451 | Brown et al. | Dec 1995 | A |
5490061 | Tolin et al. | Feb 1996 | A |
5497319 | Chong et al. | Mar 1996 | A |
5510981 | Berger et al. | Apr 1996 | A |
5550934 | Van Vliembergen et al. | Aug 1996 | A |
5559693 | Anick et al. | Sep 1996 | A |
5677835 | Carbonell et al. | Oct 1997 | A |
5678051 | Aoyama | Oct 1997 | A |
5687383 | Nakayama et al. | Nov 1997 | A |
5696980 | Brew | Dec 1997 | A |
5715468 | Budzinski | Feb 1998 | A |
5721938 | Stuckey | Feb 1998 | A |
5724593 | Hargrave et al. | Mar 1998 | A |
5737617 | Bernth et al. | Apr 1998 | A |
5752051 | Cohen | May 1998 | A |
5768603 | Brown et al. | Jun 1998 | A |
5784489 | Van Vliembergen et al. | Jul 1998 | A |
5787410 | McMahon | Jul 1998 | A |
5794050 | Dahlgren et al. | Aug 1998 | A |
5794177 | Carus et al. | Aug 1998 | A |
5826219 | Kutsumi | Oct 1998 | A |
5826220 | Takeda et al. | Oct 1998 | A |
5848385 | Poznanski et al. | Dec 1998 | A |
5873056 | Liddy et al. | Feb 1999 | A |
5884247 | Christy | Mar 1999 | A |
6006221 | Liddy et al. | Dec 1999 | A |
6023697 | Bates | Feb 2000 | A |
6055528 | Evans | Apr 2000 | A |
6076051 | Messerly et al. | Jun 2000 | A |
6081774 | De Hita et al. | Jun 2000 | A |
6182028 | Karaali et al. | Jan 2001 | B1 |
6223150 | Duan et al. | Apr 2001 | B1 |
6233544 | Alshawi | May 2001 | B1 |
6243669 | Horiguchi | Jun 2001 | B1 |
6243670 | Bessho et al. | Jun 2001 | B1 |
6243689 | Norton | Jun 2001 | B1 |
6246977 | Messerly et al. | Jun 2001 | B1 |
6260008 | Sanfilippo | Jul 2001 | B1 |
6266642 | Franz et al. | Jul 2001 | B1 |
6275789 | Moser et al. | Aug 2001 | B1 |
6278967 | Akers et al. | Aug 2001 | B1 |
6282507 | Horiguchi et al. | Aug 2001 | B1 |
6285978 | Bernth et al. | Sep 2001 | B1 |
6330530 | Horiguchi et al. | Dec 2001 | B1 |
6356864 | Foltz et al. | Mar 2002 | B1 |
6356865 | Franz et al. | Mar 2002 | B1 |
6381598 | Williamowski et al. | Apr 2002 | B1 |
6393389 | Chanod et al. | May 2002 | B1 |
6442524 | Ecker et al. | Aug 2002 | B1 |
6463404 | Appleby | Oct 2002 | B1 |
6470306 | Pringle et al. | Oct 2002 | B1 |
6601026 | Appelt et al. | Jul 2003 | B2 |
6604101 | Chan et al. | Aug 2003 | B1 |
6622123 | Chanod et al. | Sep 2003 | B1 |
6658627 | Gallup et al. | Dec 2003 | B1 |
6675159 | Lin | Jan 2004 | B1 |
6721697 | Duan et al. | Apr 2004 | B1 |
6760695 | Kuno et al. | Jul 2004 | B1 |
6778949 | Duan et al. | Aug 2004 | B2 |
6871174 | Dolan et al. | Mar 2005 | B1 |
6871199 | Binnig et al. | Mar 2005 | B1 |
6901399 | Corston | May 2005 | B1 |
6901402 | Corston-Oliver et al. | May 2005 | B1 |
6928448 | Franz et al. | Aug 2005 | B1 |
6937974 | D'Agostini | Aug 2005 | B1 |
6947923 | Cha et al. | Sep 2005 | B2 |
6965857 | Decary | Nov 2005 | B1 |
6983240 | Ait-Mokhtar et al. | Jan 2006 | B2 |
6986104 | Green et al. | Jan 2006 | B2 |
7013264 | Dolan et al. | Mar 2006 | B2 |
7020601 | Hummel et al. | Mar 2006 | B1 |
7027974 | Busch et al. | Apr 2006 | B1 |
7050964 | Menzes et al. | May 2006 | B2 |
7085708 | Manson et al. | Aug 2006 | B2 |
7146358 | Gravano et al. | Dec 2006 | B1 |
7167824 | Kallulli | Jan 2007 | B2 |
7191115 | Moore | Mar 2007 | B2 |
7200550 | Menezes et al. | Apr 2007 | B2 |
7263488 | Chu et al. | Aug 2007 | B2 |
7269594 | Corston-Oliver et al. | Sep 2007 | B2 |
7346493 | Ringger et al. | Mar 2008 | B2 |
7356457 | Pinkham et al. | Apr 2008 | B2 |
7475015 | Epstein et al. | Jan 2009 | B2 |
7493253 | Ceusters | Feb 2009 | B1 |
7672831 | Todhunter et al. | Mar 2010 | B2 |
7739102 | Bender | Jun 2010 | B2 |
8078450 | Anisimovich et al. | Dec 2011 | B2 |
8145473 | Anisimovich et al. | Mar 2012 | B2 |
8214199 | Anismovich et al. | Jul 2012 | B2 |
8229730 | Van Den Berg et al. | Jul 2012 | B2 |
8285728 | Rubin | Oct 2012 | B1 |
8301633 | Cheslow | Oct 2012 | B2 |
20010014902 | Hu et al. | Aug 2001 | A1 |
20010029455 | Chin et al. | Oct 2001 | A1 |
20010037328 | Pustejovsky | Nov 2001 | A1 |
20020040292 | Marcu | Apr 2002 | A1 |
20030158723 | Masuichi et al. | Aug 2003 | A1 |
20030176999 | Calcagno et al. | Sep 2003 | A1 |
20030182102 | Corston-Oliver et al. | Sep 2003 | A1 |
20030204392 | Finnigan et al. | Oct 2003 | A1 |
20040098247 | Moore | May 2004 | A1 |
20040122656 | Abir | Jun 2004 | A1 |
20040172235 | Pinkham et al. | Sep 2004 | A1 |
20040193401 | Ringger et al. | Sep 2004 | A1 |
20040254781 | Appleby | Dec 2004 | A1 |
20050010421 | Watanabe et al. | Jan 2005 | A1 |
20050015240 | Appleby | Jan 2005 | A1 |
20050080613 | Colledge | Apr 2005 | A1 |
20050086047 | Uchimoto et al. | Apr 2005 | A1 |
20050137853 | Appleby et al. | Jun 2005 | A1 |
20050155017 | Berstis et al. | Jul 2005 | A1 |
20050171757 | Appleby | Aug 2005 | A1 |
20050209844 | Wu et al. | Sep 2005 | A1 |
20050240392 | Munro, Jr. et al. | Oct 2005 | A1 |
20060004563 | Campbell et al. | Jan 2006 | A1 |
20060004653 | Strongin | Jan 2006 | A1 |
20060080079 | Yamabana | Apr 2006 | A1 |
20060095250 | Chen et al. | May 2006 | A1 |
20060217964 | Kamatani et al. | Sep 2006 | A1 |
20060224378 | Chino et al. | Oct 2006 | A1 |
20060293876 | Kamatani et al. | Dec 2006 | A1 |
20070010990 | Woo | Jan 2007 | A1 |
20070016398 | Buchholz | Jan 2007 | A1 |
20070083359 | Bender | Apr 2007 | A1 |
20070083505 | Ferrari et al. | Apr 2007 | A1 |
20070100601 | Kimura | May 2007 | A1 |
20080319947 | Latzina et al. | Dec 2008 | A1 |
20090063472 | Pell et al. | Mar 2009 | A1 |
20110055188 | Gras | Mar 2011 | A1 |
20110072021 | Lu et al. | Mar 2011 | A1 |
Number | Date | Country |
---|---|---|
2011160204 | Dec 2011 | WO |
Entry |
---|
Bolshakov, “Co-Ordinative Ellipsis in Russian Texts: Problems of Description and Restoration”, Published in: Proceedings COLING '88 Proceedings of the 12th conference on Computational linguistics—vol. 1 doi>10.3115/991635.991649, 1988, 65-67. |
Hutchins, “Mashine Translation: Past, Present, Future”, (Ellis Horwood Series in Computers and their Applications) Ellis Horwood:Chichester, 1986, 382 pp. ISBN 0-85312-788-3. |
Mitamura, et al., “An Efficient Interlingua Translation System for Multi-Lingual Document Production”, http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.44.5702, Jul. 1, 1991. |
Number | Date | Country | |
---|---|---|---|
20140114649 A1 | Apr 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 11548214 | Oct 2006 | US |
Child | 12983220 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13173649 | Jun 2011 | US |
Child | 14142701 | US | |
Parent | 13173369 | Jun 2011 | US |
Child | 13173649 | US | |
Parent | 12983220 | Dec 2010 | US |
Child | 13173369 | US |