GRAPH-BASED CONDITION IDENTIFICATION

Information

  • Patent Application
  • 20240394393
  • Publication Number
    20240394393
  • Date Filed
    August 24, 2022
    2 years ago
  • Date Published
    November 28, 2024
    5 months ago
Abstract
A computer implemented method for detecting the existence of a condition indicated by data represented by a set of input graph data structures can include receiving at least a pair of training graph data structures of nodes and edges wherein each node indicates one or more characteristics of an event and each edge indicates an association between events, and wherein at least a subset of nodes and edges in each training graph relate to the existence of the condition, identifying an association between at least one pair of nodes in which each node of a pair occurs in a disparate training graph and at least one of the pair of nodes relates to the existence of the condition, and generating an edge between the pair of nodes so as to generate a composite training graph including at least a pair of the training graph data structures; extracting a proper subgraph of the composite training graph including at least one of the at least one pair of nodes, such that the proper subgraph indicates the existence of the condition including nodes and edges from each of the pair of graphs for comparison with the set of input graphs to identify an indication of the existence of the condition by the input graphs.
Description
TECHNICAL FIELD

The present disclosure relates to the identification of the existence of a condition identified by data represented by graph data structures.


BACKGROUND

Physical occurrences such as physical security occurrences are beneficially detected and identified in good time for reactive, remediative and/or responsive measures. For example, criminal acts against equipment used by the telecommunications industry can result in considerable costs for communications providers and degradation or interruption of service for their customers.


SUMMARY

It is therefore beneficial to detect occurrences of such events in an effective and timely manner.


According to a first aspect of the present disclosure, there is provided a computer implemented method for detecting the existence of a condition indicated by data represented by a set of input graph data structures, the method comprising: receiving at least a pair of training graph data structures of nodes and edges wherein each node indicates one or more characteristics of an event and each edge indicates an association between events, and wherein at least a subset of nodes and edges in each training graph relate to the existence of the condition, the method comprising: identifying an association between at least one pair of nodes in which each node of a pair occurs in a disparate training graph and at least one of the pair of nodes relates to the existence of the condition, and generating an edge between the pair of nodes so as to generate a composite training graph including at least a pair of the training graph data structures; extracting a proper subgraph of the composite training graph including at least one of the at least one pair of nodes, such that the proper subgraph indicates the existence of the condition including nodes and edges from each of the pair of graphs for comparison with the set of input graphs to identify an indication of the existence of the condition by the input graphs.


In some embodiments, the set of input graph data structures includes at least two input graphs of nodes and edges, and the method further comprises: identifying an association between at least one pair of nodes in the input graphs in which each node of a pair occurs in a disparate input graph, and generating an edge between the pair of nodes so as to generate a composite input graph including at least a pair of input graph data structures; searching the composite input graph for occurrences of the proper subgraph to identify an indication of the existence of the condition by the input graphs so as to determine the existence of the condition.


In some embodiments, identifying an association between a pair of nodes includes one or more of: identifying a semantic association between the pair of nodes; identifying a vector similarity between the pair of nodes based on a vector embedding; identifying a geospatial similarity between the pair of nodes; identifying an association based on centrality, node-degree, eigenvector or betweenness of the pair of nodes; identifying a temporal similarity between the pair of nodes; and applying a clustering process in which the pair of nodes are clustered together.


In some embodiments, the proper subgraph is defined based on one or more predetermined criteria for identifying limits of one or more of a size, scope or extent of the proper subgraph.


In some embodiments, searching the composite input graph for occurrences of the proper subgraph includes searching for arrangements of nodes and edges between nodes in the proper subgraph occurring in the composite input graph irrespective of data stored or represented by or with the nodes of the proper subgraph and the composite input graph.


In some embodiments, data stored by one or more nodes and/or edges of the proper subgraph and the composite input graph is protected from disclosure.


In some embodiments, the protected data is protected by one or more of: encryption; data obfuscation; data redaction; data removal; and data replacement.


In some embodiments, the condition is a security condition.


According to a second aspect of the present disclosure, there is a provided a computer system including a processor and memory storing computer program code for performing the method set out above.


According to a third aspect of the present disclosure, there is a provided a computer system including a processor and memory storing computer program code for performing the method set out above.





BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:



FIG. 1 is a block diagram a computer system suitable for the operation of implementations of the present disclosure.



FIG. 2 is a component diagram of an arrangement for detecting the existence of a condition indicated by data represented by a set of input graph data structures according to an exemplary implementation of the present disclosure.



FIG. 3 is a flowchart of a method for detecting the existence of a condition indicated by data represented by a set of input graph data structures according to an exemplary implementation of the present disclosure.





DETAILED DESCRIPTION


FIG. 1 is a block diagram of a computer system suitable for the operation of embodiments of the present disclosure. A central processor unit (CPU) 102 is communicatively connected to a storage 104 and an input/output (I/O) interface 106 via a data bus 108. The storage 104 can be any read/write storage device such as a random-access memory (RAM) or a non-volatile storage device. An example of a non-volatile storage device includes a disk or tape storage device. The I/O interface 106 is an interface to devices for the input or output of data, or for both input and output of data. Examples of I/O devices connectable to I/O interface 106 include a keyboard, a mouse, a display (such as a monitor) and a network connection.


Physical occurrences such as physical security occurrences involving happenings taking place at one or a number of geospatial locations can be indicative of a condition such as the occurrence of a security event. For example, in the telecommunications industry, an event such as criminal damage to telecommunications equipment such as a cellular tower, cabinet, pole or the like, can occur at a geospatial location and can involve occurrences related to, and/or indicative of, the event occurring in one or more geospatial locations. Similarly, criminal activity can be associated with occurrences taking place at one or more geospatial locations, such occurrences being potentially disparate. For example, the presence of an entity or individual at a first location, the undertaking of one or more particular behaviors at a second location, the detection of a vehicle at a third location by automated number plate recognition, and the occurrence of a crime at a fourth location can all be related and indicative of criminal behavior leading to the crime.


Whereas related events may be readily associated and used to infer the existence of a particular condition, seemingly unrelated events or sets of events may not be so readily associated. Implementations of the present disclosure are operable with graph data structures including nodes and edges in which nodes are indicative of characteristics of an event and edges are indicative of associations between events. In particular, multiple such graph data structures of events are processed in implementations of the present disclosure to identify associations therebetween for generating a subgraph as a motif of the existence of the condition suitable for use in searching input graphs. Identifications of such a subgraph motif in input graphs serve to indicate the existence of the condition by the input graphs.


Thus, embodiments of the disclosure involve initially processing training graph data structures that are known to include data related to the existence of the condition. At least two such training graphs are processed to identify associations between the training graphs by way of associations between pairs of events where each node in a pair occurs in a disparate training graph. Such associations between a pair of nodes can be identified based on known graph comparison and node comparison techniques such as, inter alia: identifying a semantic association between the pair of nodes; identifying a vector similarity between the pair of nodes based on a vector embedding; identifying a geospatial similarity between the pair of nodes; identifying an association based on centrality, node-degree, eigenvector or betweenness of the pair of nodes; identifying a temporal similarity between the pair of nodes; and applying a clustering process in which the pair of nodes are clustered together.


Once associations are determined, a new edge is generated between associated nodes in the training graphs to generate a composite training graph from which a proper subgraph is extracted including the associated nodes. It will be appreciated by those skilled in the art that the term “proper” subgraph is intended to refer to a subgraph of the composite training graph in which at least one node or edge in the composite graph is not present in the subgraph. The proper subgraph thus constitutes a basis on which other composite graphs may be searched to identify an indication of the existence of the condition. In some implementations, the definition of the proper subgraph can be defined based on one or more predetermined criteria for identifying limits of one or more of a size, scope or extent of the proper subgraph.


Thus, in use, the proper subgraph is used to search a composite graph generated from a plurality of input graphs with associations identified therebetween to inform a determination of an identification of the existence of the condition.



FIG. 2 is a component diagram of an arrangement for detecting the existence of a condition indicated by data represented by a set of input graph data structures 204, 206 according to an exemplary implementation of the present disclosure. Training graphs 200 and 202 are graph data structures of nodes and edges in which each node is indicative of one or more characteristics of an event and each edge indicates an association between events. At least a subset of nodes and edges in each training graph relates to the existence of a condition, such as a security condition or the like.


An association identifier 208 is provided as a hardware, software, firmware or combination component arranged to identify an association between at least one pair of nodes in which each node of a pair occurs in a different one of the training graphs 200, 202. In particular, the association is identified to occur between a pair of nodes in which at least one node of the pair relates to the existence of the condition. Such an association identified by the association identifier 208 is represented by the generation of a new edge between the pair of associated nodes so as to generate a composite of the two training graphs 200, 202-a composite training graph. The new edge thus constitutes a link between the training graphs 200, 202 via at least one node related to the existence of the condition.


A proper subgraph extractor 212 is provided as a hardware, software, firmware or combination component arranged to extract a proper subgraph 214 of the composite training graph including at least one of the pairs of associated nodes identified by the association identifier 208. The proper subgraph 214 thus constitutes a criterion for searching composite graphs to identify indications of the condition.


In use, at least two input graphs 204, 206 are received by an association identifier 210. The association identifier 210 can be substantially similar to that of association identifier 208 described above except that the association identifier 210 identifies associations between the input graphs 204, 206 without knowledge of whether nodes in the input graphs 205, 206 are related to the condition. Thus, the association identifier 210 generates a composite input graph of at least a pair of input graphs.


A graph searcher 216 is provided as a hardware, software, firmware or combination component arranged to search the composite input graph for occurrences of the proper subgraph 214. In some implementations of the present disclosure, the graph searcher 216 searches the composite input graph for occurrences of the proper subgraph 214 by searching for arrangements of nodes and edges between nodes in the proper subgraph 214 that occur in the composite input graph. In particular, in some implementations, the graph searcher 216 does not search for particular data stored or represented by or in association with nodes and/or edges in the proper subgraph 214, such that the proper subgraph 214 constitutes a graph motif (the structure of a graph) on which basis the composite input graph is searched. Accordingly, literal identity between data represented by the proper subgraph 214 and subgraphs of the composite input graph are not required.


In some implementations the proper subgraph 214 is converted into a convenient format such as a JSON object format for storage in a suitable database.


Thus, the graph searcher 216 is operable to process the composite input graph with the proper subgraph 214 to identify indications of the existence of the condition in the composite input graph.


In some implementations, the data stored in either or both the training graphs 202, 202 and/or the input graphs 204, 206 can include sensitive data such as personal identification information, financial information, confidential information, or information implicated by the European General Data Protection Regulation (GDPR) or similar such regulations or provisions elsewhere. Accordingly, in some arrangements the proper subgraph 214 and/or the composite input graph may include data that is not, should not or cannot be readily reproduced, stored, shared or used without breaching privacy or regulatory requirement, for example. As previously described, some implementations of the disclosure are operable on the basis of comparisons by the graph searcher 216 of arrangements of nodes and edges between nodes in the proper subgraph 214 occurring in the composite input graph, irrespective of the data stored in or by the subgraph 214 or composite input graph. Accordingly, in some implementations, data stored in the proper subgraph 214 and/or composite input graph can be sanitized, encrypted, redacted or otherwise protected such that the data is not accessible, shared or available, while retaining the ability of an implementation of the present disclosure to identify the structure of the proper subgraph 214 occurring in the composite input graph.



FIG. 3 is a flowchart of a method for detecting the existence of a condition indicated by data represented by a set of input graph data structures 204, 206 according to an exemplary implementation of the present disclosure. Initially, at 300, the method receives the training graphs 200, 202. Subsequently, at 302, the method identifies an association between nodes in disparate training graphs 200, 202 to generate a new edge therebetween so constituting a composite training graph at 304. At 306 proper subgraph 214 is extracted for identifying indications of the existence of the condition in input graphs.


Insofar as embodiments of the disclosure described are implementable, at least in part, using a software-controlled programmable processing device, such as a microprocessor, digital signal processor or other processing device, data processing apparatus or system, it will be appreciated that a computer program for configuring a programmable device, apparatus or system to implement the foregoing described methods is envisaged as an aspect of the present disclosure. The computer program may be embodied as source code or undergo compilation for implementation on a processing device, apparatus or system or may be embodied as object code, for example.


Suitably, the computer program is stored on a carrier medium in machine or device readable form, for example in solid-state memory, magnetic memory such as disk or tape, optically or magneto-optically readable memory such as compact disk or digital versatile disk etc., and the processing device utilizes the program or a part thereof to configure it for operation. The computer program may be supplied from a remote source embodied in a communications medium such as an electronic signal, radio frequency carrier wave or optical carrier wave. Such carrier media are also envisaged as aspects of the present disclosure.


It will be understood by those skilled in the art that, although the present disclosure has been described in relation to the above described example embodiments, the disclosure is not limited thereto and that there are many possible variations and modifications which fall within the scope of the disclosure.


The scope of the present disclosure includes any novel features or combination of features disclosed herein. The applicant hereby gives notice that new claims may be formulated to such features or combination of features during prosecution of this application or of any such further applications derived therefrom. In particular, with reference to the appended claims, features from dependent claims may be combined with those of the independent claims and features from respective independent claims may be combined in any appropriate manner and not merely in the specific combinations enumerated in the claims.

Claims
  • 1. A computer implemented method for detecting an existence of a condition indicated by data represented by a set of input graph data structures, the method comprising: receiving at least a pair of training graph data structures of nodes and edges, wherein each node indicates one or more characteristics of an event and each edge indicates an association between events, and wherein at least a subset of the nodes and the edges in each training graph data structure relate to the existence of the condition;identifying an association between at least one pair of nodes in which each node of a pair occurs in a disparate training graph data structure and at least one of the pair of nodes relates to the existence of the condition, and generating an edge between the pair of nodes so as to generate a composite training graph data structure including at least a pair of the training graph data structures; andextracting a proper subgraph of the composite training graph data structure including at least one of the at least one pair of nodes, such that the proper subgraph indicates the existence of the condition including the nodes and the edges from each of the pair of graphs for comparison with the set of input graph data structures to identify an indication of the existence of the condition by the input graph data structures.
  • 2. The method of claim 1, wherein the set of input graph data structures includes at least two input graph data structures of nodes and edges, and the method further comprises: identifying an association between at least one pair of nodes in the at least two input graph data structures in which each node of a pair occurs in a disparate input graph data structure, and generating an edge between the pair of nodes so as to generate a composite input graph including at least a pair of input graph data structures; andsearching the composite input graph for occurrences of the proper subgraph to identify an indication of the existence of the condition by the input graph data structures so as to determine the existence of the condition.
  • 3. The method of claim 1, wherein identifying an association between a pair of nodes includes one or more of: identifying a semantic association between the pair of nodes; identifying a vector similarity between the pair of nodes based on a vector embedding; identifying a geospatial similarity between the pair of nodes; identifying an association based on centrality, node-degree, eigenvector or betweenness of the pair of nodes; identifying a temporal similarity between the pair of nodes; or applying a clustering process in which the pair of nodes are clustered together.
  • 4. The method of claim 1, wherein the proper subgraph is defined based on one or more predetermined criteria for identifying limits of one or more of a size, a scope, or an extent of the proper subgraph.
  • 5. The method of claim 2, wherein searching the composite input graph data structure for occurrences of the proper subgraph includes searching for arrangements of the nodes and the edges between the nodes in the proper subgraph occurring in the composite input graph data structure irrespective of data stored or represented by or with the nodes of the proper subgraph and the composite input graph.
  • 6. The method of claim 5, wherein data stored by at least one or more nodes or one or more edges of the proper subgraph and the composite input graph data structure is protected from disclosure.
  • 7. The method of claim 6, wherein the protected data is protected by one or more of: encryption; data obfuscation; data redaction; data removal; or data replacement.
  • 8. The method of claim 1, wherein the condition is a security condition.
  • 9. A computer system comprising a processor and memory storing computer program code for performing the method of claim 1.
  • 10. A non-transitory computer-readable storage medium comprising computer program code to, when loaded into a computer system and executed thereon, cause the computer system to perform the method as claimed in claim 1.
Priority Claims (1)
Number Date Country Kind
2113473.9 Sep 2021 GB national
PRIORITY CLAIM

The present application is a National Phase entry of PCT Application No. PCT/EP2022/073618, filed Aug. 24, 2022, which claims priority from GB Application No. 2113473.9 filed Sep. 21, 2021, each of which hereby fully incorporated herein by reference.

PCT Information
Filing Document Filing Date Country Kind
PCT/EP2022/073618 8/24/2022 WO