The invention pertains to drawing information from various kinds of databases, and particularly the invention pertains to organizing the information. More particularly, the invention pertains to discovery of further information from organizing it.
The invention is a system for obtaining data from various sources. The data may be organized into different types of cluster sets. Each cluster set may have one or more clusters of related items. Elements of various kinds may be pulled from the data. The elements may be put together into one or more clusters for each kind of elements. The clusters may be refined relative to one another and in view of integrated properties of the clusters together. Elements may be added or removed from the clusters during refinement. Examples of the elements may be people and events. Examples of clusters of such elements may be groups and goals, respectively.
One way to understand the huge amount of data available from current sources (e.g., the internet, C4ISR video and text data, auto-collected cyber-security data, and so on) is to organize data into groups of related items (a.k.a., cluster sets). Cluster sets can model a wide range of real-world networks such as hostile collaboration networks or social networks, news stories on a given topic, related commercial items that have some security implication, and so forth. Fast, scalable and effective cluster set discovery can improve situational awareness as well as cyber and physical security.
Because of a disparity of the data sources, it is significant to understand who are the actors in the environment (i.e., node disambiguation) and what are the goals being achieved by them. In light of node disambiguation being a challenge, data from multiple sources may be tied together with quantitative probabilities (at best), qualitative probabilities (still useful), similarity values (difficult to integrate), and/or no measure of confidence.
There may be a chicken-and-egg issue. To improve results, goal-based event analysis may inform group discovery. It may enable discovery of “disconnected” members of the group who regularly contribute to completion of shared goals (e.g., a dead drop participant). Similarly, group analysis should inform goal discovery. It may enable discovery of goals for which the associated events are distributed across the different members of a group. So an issue is which comes first, that is, the discovery of the groups or discovery of the goals.
Multi-way network analysis may be effected. An observation is that people working towards the same set of goals may leave a patterned event signature, since the events needed to achieve these goals may be similarly distributed over the set of people working towards them.
A present solution may be based on the observation in that people groups and goals need to be discovered simultaneously while also informing each other. A multi-way approach may incorporate the following. One may start with a single group containing all people and many event singleton clusters. Then one may iteratively split people groups and merge event clusters, while conditioning each splitting/joining system on the other. An extension of the multi-way clustering approach may, for instance, be empirically shown to improve the clustering quality of documents in an information retrieval domain.
The present approach may be summarized in the following. First, there may be clustering events into goals. The clustering may be guided by an ontology of goals. That may mean to group only those events that can satisfy a goal. Second, one should find the most likely set of goals being satisfied. Events from the same goal cluster may connect people nodes suggesting that those people belong to the same social group (i.e., working towards a common goal).
Third, one may partition a network of people into social groups. One may find or seek groups of people working towards the same goal while simultaneously disambiguating their identities. The finding or seeking may be based on node similarity and/or equivalence probability ties (node disambiguation), and based on social relationships (common goal recognition). Groups of people may be connected through events suggesting that those events are used to satisfy the same goal. Once social groups are determined, one may disambiguate nodes within the social groups based on “similarity” measures but also informed by group membership. This is because any two given actors are more likely to be the same person if they are both acting towards the same goal compared to if they are acting towards different or conflicting goals. The latter may result in a smaller issue to solve.
Goals and intents of actors and groups may be determined from a wide range of data sources. One may cluster events to result in goals and cluster people to result in groups.
A summary of relevant experience may be noted. One area of focus may include node disambiguation and group detection. This area may involve who the actors are in the environment and a video surveillance domain. There may be multi-objective graph partitioning for node disambiguation and group discovery. Another area may include activity detection. A question may be what the goals, being achieved by actors, are in the environment. A Scyllarus tool may provide goal-centric reasoning in the cyber network domain (noted herein).
A system may discover intents of actors and groups from multi-modal data. Multi-modal data may be from a wide range of sources which incorporate video, internet, reports of interviews, observations, investigations, documents, and so on. For instance, the actors may be people who want to attack the U.S. or not attack it. These actors may be clustered into groups that have a common intent. There may be two groups which arise from such situation.
Events that are documented in multi-modal data may be clustered into goals. One goal may be to attack the U.S. Examples of events may be an attack on a U.S. Army unit, missiles hitting a U.S. embassy, and a U.S. radio station being jammed. One or more of these events could be clustered into another goal, e.g., jamming. One or more people may likewise be clustered into more than one group.
The system may refine or improve the group or goal clustering. It may iteratively refine a group or goal by taking the other type of cluster sets into account. For instance, one may take the goal of intent to drive U.S. forces out of a foreign country. This goal may be one of a group of people. However, data may show a person contributing to the goal but is not in the group or has no contact with the group. Yet this person may be put into the group (i.e., clustering).
In another way, a person of a group, who is not contributing to a goal of the group, can be removed from the group. Events or occurrences may have several explanations resulting in their being associated with several goals. This may be regarded as goal-based event analysis leading to group discovery. One may look to the intent of the actors of the events or occurrences to determine the goal and the corresponding group having that goal. In another way, a discovery of goals for which associated events are distributed across various members of a group, may be regarded as a group analysis that informs goal discovery.
Network analysis as indicated herein may include group activity detection. A modularity measure may express the difference between the actual and expected interactions/events of individuals within each social group. The measure may be shown to be a superior heuristic used to identify groups of people over a cut size. The present approach may provide good scalable modularity-based partitioning algorithms. A previous approach may be one or more orders of magnitude slower than cut-based partitioning for a data set with 10,000 nodes. The present approach may handle uncertainty regarding node disambiguation, in that uncertainty-tolerant formulations for key clustering algorithms may be developed. The present multi-objective optimization framework may account for a similarity of tracks (to identify actors in the environment) and a level of activity with each group of individuals working towards the same goal.
There may be reasoning over disparate sources. Networks may exist to transfer, aggregate, coordinate, or destroy information, physical assets, money, and so on, via relationships/transactions that vary in type (e.g., digital or physical), direction, size, frequency, and so forth, between entities such as individuals, organizations, legal structures, and so on, that have goals such as shared/conflicting, and so on. An ontology may link these elements and allow reasoning over static/dynamic network information, common or conflicting goals, common owners/actors, shared assets, and more.
Models exist that may be unified to incorporate, but not be limited to, cyber network attack detection, and transportation and financial networks. Goals may be an essential unifying element in that they naturally cross-domain and are temporally persistent, more so than agents, individuals and organizations. Diverse groups may cooperate and/or compete around goals.
Groups 16, 17, 18 and 19 may result from clustering of people according to location, profession, social organization, and financial relationship, respectively. Other criteria may be used as a basis for clustering. Goals 21, 22 and 23 may result from clustering of events according to attacking the U.S., raising money for a charity, and building a financial business, respectively. The groups and goals may form a grid resulting in a 2-dimensional matrix 27. Other criteria may be used as a basis for clustering.
There may be optimization of groups with a movement of people from one group to another as indicated by lines 24. For instance, a mother of children who are terrorists may not be a terrorist herself. She may be moved from the group to which she was clustered, due to being a mother of some in the group to another group, which may be a church organization. She may also be moved out to no group to into multiple groups simultaneously dependent upon her properties and the properties of the groups.
There may be an optimization of goals with a movement of events from one goal to another as indicated by lines 25. For instance, an event of raising money may be in the goal of raising the money for a charity but actually the money to be raised is for supporting terrorists. The event may be removed from the goal of raising money for a charity to the goal of attacking the U.S. The event may also be moved out of all goals or moved into multiple goals simultaneously dependent upon the event and goal properties.
There may be integrated optimization. Groups and goals may be optimized relative to each other. People and events may both be changed as indicated by lines 26 to better refine the groups and corresponding goals. For example, if the mother of children who are terrorists has been associated to an event of raising money, during integrated optimization, she might be moved out of the terrorist group and at the same time the event of raising money might be moved out of the raising money for supporting terrorism goal.
There may be groups with people or members who are associated with similar sets of events. Ideally, a desire would be to cluster events associated with people from meaningful sets of groups, and cluster groups with members who are associated with meaningful sets of events.
Groups may be clustered based upon their members that are associated with instances of events. This may result in meaningful classes of groups. Events may be clustered based upon aggregated people association with the meaningful classes of groups noted herein. This may result in meaningful classes of events. Groups may be clustered based upon aggregated membership associations of the meaningful classes of events noted herein. This may result in clusters of groups whose members are associated with meaningful sets of events.
Diagram 80 shows the incremental clustering solutions of people 84 and events 85 at, for example, three hierarchic levels 81, 82 and 83. If one discovers “high quality” groups of people, one can obtain “better quality” groups of events satisfying a common goal. At each step or level, one may maximize a mutual clustering quality measure similar to a mutual information measure in an information retrieval domain. Diagram 80 may be an instantiation of the approach shown in
An example of a tool which may provide goal-centric reasoning over cyber network ontology may be a computer network security tool (CNST). In a particular example, framework architecture may apply incorporate, use or otherwise be associated with a modified version of SCYLLARUS™ (Scyllarus) by Honeywell International Inc. (See U.S. patent application Ser. No. 12/547,415, filed Aug. 25, 2009.) Scyllarus may be regarded as a CNST. The CNST may be described and referred to herein in conjunction with the present approach and system. Other kinds of tools may be used as a CNST. As a particular example, the framework architecture may apply Bayesian logic to cyber events (such as network-based intrusion detection) and to events associated with other networks (such as non-computer networks) in order to cluster cyber events into goals. As another particular example, the framework architecture can be used to determine if two or more graphs are related, such as by using probabilities that various nodes in each graph are equivalent.
The following applications may be relevant. U.S. patent application Ser. No. 12/547,415, filed Aug. 25, 2009, and entitled “Framework for Scalable State Estimation Using Multi Network Observations”, is hereby incorporated by reference. U.S. patent application Ser. No. 12/369,692, filed Feb. 11, 2009, and entitled “Social Network Construction Based on Data Association”, is hereby incorporated by reference. U.S. patent application Ser. No. 12/187,991, filed Aug. 7, 2008, and entitled “System for Automatic Social Network Construction from Image Data”, is hereby incorporated by reference. U.S. patent application Ser. No. 12/124,293, filed May 21, 2008, and entitled “System Having a layered Architecture for Constructing a Dynamic Social Network from Image Data”, is hereby incorporated by reference.
In the present specification, some of the matter may be of a hypothetical or prophetic nature although stated in another manner or tense.
Although the present system has been described with respect to at least one illustrative example, many variations and modifications will become apparent to those skilled in the art upon reading the specification. It is therefore the intention that the appended claims be interpreted as broadly as possible in view of the prior art to include all such variations and modifications.