This disclosure relates to term expansion. More specifically, the disclosure relates to determining semantically equivalent terms for use within a computational model.
There are over 500 billion gigabytes of digital information in the world today. Starting in 2010, the total amount of digital information in existence will begin to increase exponentially. No one human is capable of reviewing this information, much less making sense of it. No matter the domain of interest, humans cannot be expected to find the nuggets of critical information in this sea of data, information, and knowledge. Complicating matters is that in today's information society, data, information, and knowledge are often distributed across vast computer networks.
As a result of this ever growing sea of data and the distribution thereof, there is a need for computer based information technology (“IT”) applications that can sift through huge amounts of digital data to find content that is current, relevant, and contextually appropriate. The goal of any such IT system is to assist a human user, or in some cases a digital agent representing a human user, in quickly discovering relevant data, information, and knowledge that would be impossible to discover by human effort alone due to the extremely large data sets, knowledge stores, and associated computer networks.
The need for processing large amounts of digital data is especially acute in the area of national security. We are faced today with increasing threats from adversaries around the world. The solemn task of protecting against future attacks rests with the world's intelligence agencies. Intelligence agencies are constantly investigating potential threats so that any adversarial activities can be timely thwarted. In doing so, agencies must process large volumes of information in order to uncover any hints, clues, or insights about potential attacks. These agencies need vastly improved IT systems so they can effectively and timely “connect the dots” and ensure that any opportunity to thwart a planned attack is not lost.
But the need to process large amounts of digital data is not exclusive to intelligence agencies. The need arises in a wide variety of fields. These fields include, for example, medicine and epidemiology. A large percentage of the information currently stored on today's computers relates to medical records. Health agencies have a continuing need for a more effective means to review and make sense of this information. The ability for health care workers to meaningfully review data on emerging diseases would help in anticipating future epidemics and pandemics. This, in turn, would lead to the timely production of vaccines.
Ultimately, there is a growing need in many different fields for improved IT systems that allow human users to systematically review large data sets or knowledge stores in order to obtain information that is relevant, timely, and contextually appropriate.
The disclosure provides both a system and a method for expanding variables within a computational model. The computational model, which can be a Bayesian-network, includes input and output variables that are interrelated via a conditional probability table. Term expansion is accomplished via a lexical database and a logic engine to determine semantic equivalents that are relevant to the computational model. The expanded terms allow the computational model to be related to instance data, which may be in the form of a dynamic ontology. Input variable expansion permits the computational model to be populated with semantically relevant instance data from the ontology, and output variable expansion permits the computational model to be associated with semantically relevant ontology nodes.
The disclosed system has several important advantages. For example, the system permits term expansion to locate semantically equivalent and logically relevant terms.
The term expansion disclosed herein permits users to populate computational models with relevant instance data.
A further possible advantage is the ability to expand output terms within a computational model to allow the model to be linked with relevant nodes within a dynamic ontology.
Still yet another possible advantage is to create a system of term whereby expanded terms can be linked to associated computational models and variables.
The present system permits term expansion to be carried out systematically and without the need for a human operator.
Various embodiments of the invention may have none, some, or all of these advantages. Other technical advantages of the present invention will be readily apparent to one skilled in the art.
For a more complete understanding of the present disclosure and its advantages, reference is now made to the following descriptions, taken in conjunction with the accompanying drawings, in which:
The present disclosure relates to a system and method for expanding variables within a computational model. The computational model, which can be a Bayesian-network, includes input and output variables that are interrelated via a conditional probability table. Term expansion is accomplished via a lexical database and a logic engine to determine semantic equivalents that are relevant to the computational model. The expanded terms allow the computational model to be related to instance data, which may be in the form of a dynamic ontology. Input variable expansion permits the computational model to be populated with semantically relevant instance data from the ontology, and output variable expansion permits the computational model to be associated with semantically relevant ontology nodes.
In the illustrated example, two input variables 22, “ΔDate” and “ΔLocation,” are related to a single output variable 24, “Weapons Smuggling Event.” The input variables 22 are related to other events by the CPT. In this example, the CPT specifies the probability of a Weapons Smuggling Event if a Militia Training Event and a Military Convoy Event occur (note
A more detailed discussion of this computational model 20 and the associated ontology is contained in co-pending and commonly owned U.S. patent application Ser. No. 12/748,514 filed on Mar. 29, 2010 and entitled “System and Method for Predicting Event Via Dynamic Ontologies.” The contents of this co-pending application are fully incorporated herein for all purposes.
The computational model 20 must be populated with instance data from actual events. This instance data can be collected over time and stored in a knowledge base or data center. In one non-limiting example, the instance data is formatted into a dynamic ontology 26, such as the ontology illustrated in
The disclosed system is described next in connection with
The client may likewise communicate with the central server over a network. As used herein, the term network refers to wireless or wireline communication that can be carried out via any number of known protocols, including, but not limited to, Internet Protocol (IP), Wireless Access Protocol (WAP), Frame Relay, or Asynchronous Transfer Mode (ATM). Any other suitable protocols using voice, video, data, or combinations thereof, can also be employed. The network may include one or more local area networks (LANs), radio access networks (RANs), metropolitan area networks (MANS), wide area networks (WANs), and/or all or a portion of the global computer network known as the Internet, and/or any other communication system or systems at one or more locations.
The central server may include a series of one or more modules or logic engines, which may be in the form of programs or subroutines running on the central server. The embodiment disclosed in
The extracted terms are then sent to expansion module 52 where various semantic equivalents are determined. This is achieved by calling upon a lexical database 58 that groups nouns, verbs, adjectives, and adverbs into sets of cognitive synonyms. One suitable lexical database is WordNet,® which is run by Princeton University. Information regarding WordNet® can be found at http://wordnet.princeton.edu/ (last visited Dec. 27, 2010). Other currently available term expanders are suitable, such as the semantic reverse query expansion (SRQE) system from Raytheon Company (“Express Sense”). The lexical database 58 returns a series of candidate terms based upon the extracted terms submitted. Thereafter, expansion module 52 reviews the candidate terms and determines the appropriate word sense. For example, if the term “weapon” is returned by extraction module 48, lexical database 58 may return various candidate terms, such as “gun,” “bomb,” or “firearm.” Some of the candidate terms may have more than one word sense. For instance, expansion module 52 may have to differentiate “bomb” as used to describe an explosive bomb, from “bomb” as used to describe an event that fails badly. Candidate terms that do not match the appropriate word sense are discarded. Expansion module 52 can be used to further determine appropriate “nyms” for any semantically equivalent terms. Nyms include, but are not limited to, hypernyms, holonyms, hyponyms, meronyms, acronyms, synonyms, verb participles, triponyms, entailments, and coordinate terms. “Expanded terms” as used hereinafter includes terms returned by the lexical database and having the appropriate word sense, as well as any associated nyms.
The relevance of the expanded terms can be further verified via logic engine 54. This is accomplished by comparing the expanded terms to the remaining terms in computational model 20. By comparing the expanded terms to the terms associated with the other input and output variables (22 and 24), the validity of the expanded terms can be verified. Any expanded terms that do not logically fit with the remaining terms are discarded as invalid. Commercially available logic engines can be employed in this step.
The final module is a mapping module 56 that maps the expanded terms to the computational model 20 and variables (22 and 24) from which the expanded terms were obtained. More specifically, the validated semantic equivalents obtained from the logic engine 54 are linked to the input and output variables (22, 24) from the B-net 20 from which they were obtained. This mapping is carried out by way of the previously extracted URI data contained in the ontologies under evaluation, which is stored in URI registry 62 (note
The mapping information utilizes a binding of system choice (XML, RDF, RDFS, OWL Lite, OWL, Full OWL, KIF, DAML, OIL, DAML+OIL, etc). Mapping information for all term representation(s) stored include: 1) unique ID of the B-Net, and 2) unique ID of the variables in a CPT of a unique B-Net. The unique ID for a B-Net is obtained by extracting the URI of the B-Net contained in a registry. The unique ID for term(s) that represent variables in a CPT is obtained by extracting the URI of the term in a registry. Semantically equivalent terms contained in the onomasticon can be used by the B-Net and CPT when formulating queries or when mediating terms in a CPT, and an existing ontology model such as ontology 26 in
Referencing the data in onomasticon 64 permits expansion of both the input and the output variables (22 and 24) in the computational table. The input variables can be expanded in order to permit the input variables to be populated with semantically equivalent and logically relevant instance data from the ontological models 26. More specifically, if terms for the input variables 22 are known, equivalent terms from the key concept nodes 32 can be used as semantically equivalent Key Concept Nodes 32. This is illustrated in
Likewise, expanding the terms associated with the output variable 24 permits output data to be more productively used. It also permits Key Concept Nodes 32 to be connected to semantically equivalent and logically relevant Event Nodes 36. For instance, in the example illustrated in
The method associated with the present invention is illustrated with reference to
Alternative methodology to expand term(s) that represent input variables in a CPT includes the following steps: 1) Extract the term(s) representing an input variable(s) in a conditional probability table; 2) Take the extracted term(s) (for example “location”) and submit to a term expander to determine a word sense; 3) Determine word sense from senses returned; 4) obtain “nyms” if they exist for the term (nyms include hypernyms, holonyms, hyponyms, meronyms, verb participles, triponyms, entailments, and coordinate terms for the extracted terms; 5) Reason about nyms suitability as semantically equivalent term(s) to the input variable term(s); 6) Extract B-Net URI; 7) Extract input variable URI; and 8) Update onomasticon with verified terms and mapping information.
Alternative methodology to expand term(s) that represent output variables in a CPT includes the following steps: 1) Extract the term(s) representing an output variable(s) in a conditional probability table; 2) Take the extracted term(s) (for example “weapon”) and submit to a term expander to determine a word sense; 3) Determine word sense from senses returned; 4) obtain nyms if they exist for the term (i.e. nouns hypernyms, holonyms, hyponyms, meronyms, verb participles, triponyms, entailments, and coordinate terms); 5) reason about the nyms suitability as semantically equivalent term(s) to the output variable term(s); 6) extract B-Net URI; 7) extract output variable URI; 8) update onomasticon with verified terms and mapping information.
Although this disclosure has been described in terms of certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure.