Periodicity is a prevalent temporal pattern, and its presence ranges from natural phenomena such as lunar cycles and animal extinction, to man-made phenomena such as product pricing, and search engine activities. In addition to these attribute-level of periodicity (e.g., keyword search count over time), there also exists interaction-level of periodicity, such as the animals' interactions with each other, users exchanging emails with one another, employee logging in to severs, customers purchasing goods, etc.
Capturing periodical patterns and determining the underlying periodicity is highly beneficial for various applications and have been studied in terms of time series and event mining.
However, many periodical patterns exist beyond time series and the occurrences of events. More often, these patterns can be naturally formatted as dynamic graphs. For example, users in an email system can be viewed as nodes, and the emails exchanged between two users can be viewed as edges. In addition, the emails exchanged among a group of users can be captured by a subgraph (including multiple users and edges). Detecting the periodicity of nodes and edges can be straightforward, since their occurrences can both be viewed as events in a timeline and applying an event-based periodicity mining algorithm (e.g., ePeriodicity) can be sufficient. Nonetheless, finding the periodicity of subgraphs is not trivial for at least two reasons: the sparsity of subgraph appearances in the dynamic graphs' time spans, and the difficulties in obtaining high-quality subgraph embeddings.
Embodiments of the disclosure address this problem and other problems individually and collectively.
One embodiment is related to a method comprising: obtaining, by a computer, node embeddings and node periodicity classifications for a plurality of nodes in a graph, and edge embeddings and edge periodicity classifications for a plurality of edges in the graph for each time of a time period; determining, by the computer, subgraph embeddings based on a subgraph of the graph, times in the time period, the node embeddings for nodes in the subgraph, the edge embeddings for edges in the subgraph, the node periodicity classifications for the nodes in the subgraph, and the edge periodicity classifications for the edges in the subgraph; translating, by the computer, each subgraph embedding of the subgraph embeddings for each time of the time period into projected subgraph embeddings of a plurality of projected subgraph embeddings; for the subgraph, aggregating, by the computer, the plurality of projected subgraph embeddings into an aggregated subgraph embedding; and determining, by the computer, if the subgraph is periodic based upon at least the aggregated subgraph embedding.
One embodiment is related to a computer comprising: a processor; and a computer-readable medium coupled to the processor, the computer-readable medium comprising code executable by the processor for implementing a method comprising: obtaining node embeddings and node periodicity classifications for a plurality of nodes in a graph, and edge embeddings and edge periodicity classifications for a plurality of edges in the graph for each time of a time period; determining subgraph embeddings based on a subgraph of the graph, times in the time period, the node embeddings for nodes in the subgraph, and the edge embeddings for edges in the subgraph, the node periodicity classifications for the nodes in the subgraph, and the edge periodicity classifications for the edges in the subgraph; translating each subgraph embedding the subgraph embeddings for each time of the time period into projected subgraph embeddings; for the subgraph, aggregating the plurality of projected subgraph embeddings into an aggregated subgraph embedding; and determining if the subgraph is periodic based upon at least the aggregated subgraph embedding.
One embodiment is related to a system comprising: a plurality of devices; a database; and a data analysis computer comprising: a processor; and a computer-readable medium coupled to the processor, the computer-readable medium comprising code executable by the processor for implementing a method comprising: obtaining, from the database, node embeddings and node periodicity classifications for a plurality of nodes in a graph, and edge embeddings and edge periodicity classifications for a plurality of edges in the graph for each time of a time period; determining subgraph embeddings based on a subgraph of the graph, times in the time period, the node embeddings for nodes in the subgraph, and the edge embeddings for edges in the subgraph, the node periodicity classifications for the nodes in the subgraph, and the edge periodicity classifications for the edges in the subgraph; translating each subgraph embedding the subgraph embeddings for each time of the time period into projected subgraph embeddings; for the subgraph, aggregating the plurality of projected subgraph embeddings into an aggregated subgraph embedding; and determining if the subgraph is periodic based upon at least the aggregated subgraph embedding.
Further details regarding embodiments of the disclosure can be found in the Detailed Description and the Figures.
Prior to discussing embodiments of the disclosure, some terms can be described in further detail.
An “interaction” may include a reciprocal action or influence. An interaction can include a communication, contact, or exchange between parties, devices, and/or entities. Example interactions include a transaction between two parties and a data exchange between two devices. In some embodiments, an interaction can include a user requesting access to secure data, a secure webpage, a secure location, and the like. In other embodiments, an interaction can include a payment transaction in which two devices can interact to facilitate a payment.
“Interaction data” can include data related to and/or recorded during an interaction. In some embodiments, interaction data can be transaction data of the network data. Transaction data can comprise a plurality of data elements with data values.
A “user” may include an individual. In some embodiments, a user may be associated with one or more personal accounts and/or mobile devices. The user may also be referred to as a cardholder, account holder, or consumer in some embodiments.
A “user device” may be a device that is operated by a user. Examples of user devices may include a mobile phone, a smart phone, a card, a personal digital assistant (PDA), a laptop computer, a desktop computer, a server computer, a vehicle such as an automobile, a thin-client device, a tablet PC, etc. Additionally, user devices may be any type of wearable technology device, such as a watch, earpiece, glasses, etc. The user device may include one or more processors capable of processing user input. The user device may also include one or more input sensors for receiving user input. As is known in the art, there are a variety of input sensors capable of detecting user input, such as accelerometers, cameras, microphones, etc. The user input obtained by the input sensors may be from a variety of data input types, including, but not limited to, audio data, visual data, or biometric data. The user device may comprise any electronic device that may be operated by a user, which may also provide remote communication capabilities to a network. Examples of remote communication capabilities include using a mobile phone (wireless) network, wireless data network (e.g., 3G, 4G or similar networks), Wi-Fi, Wi-Max, or any other communication medium that may provide access to a network such as the Internet or a private network.
A “resource provider” may be an entity that can provide a resource such as goods, services, information, and/or access. Examples of resource providers includes merchants, data providers, transit agencies, governmental entities, venue and dwelling operators, etc.
“Machine learning” can include an artificial intelligence process in which software applications may be trained to make accurate predictions through learning. The predictions can be generated by applying input data to a predictive model formed from performing statistical analyses on aggregated data. A model can be trained using training data, such that the model may be used to make accurate predictions. The prediction can be, for example, a classification of an image (e.g., identifying images of cats on the Internet) or as another example, a recommendation (e.g., a movie that a user may like or a restaurant that a consumer might enjoy).
In some embodiments, a model may be a statistical model, which can be used to predict unknown information from known information. For example, a learning module may be a set of instructions for generating a regression line from training data (supervised learning) or a set of instructions for grouping data into clusters of different classifications of data based on similarity, connectivity, and/or distance between data points (unsupervised learning). The regression line or data clusters can then be used as a model for predicting unknown information from known information. Once model has been built from learning module, the model may be used to generate a predicted output from a new request. A new request may be a request for a prediction associated with presented data. For example, a new request may be a request for classifying an image or for creating a recommendation for a user.
A “topological graph” or “graph” can include a representation of a graph in a plane of distinct vertices connected by edges. The distinct vertices in a topological graph may be referred to as “nodes.” Each node may represent specific information for an event or may represent specific information for a profile of an entity or object. The nodes may be related to one another by a set of edges, E. An “edge” may be described as an unordered pair composed of two nodes as a subset of the graph G=(V, E), where is G is a graph comprising a set V of vertices (nodes) connected by a set of edges E. For example, a topological graph may represent a transaction network in which a node representing a transaction may be connected by edges to one or more nodes that are related to the transaction, such as nodes representing information of a device, a user, a transaction type, etc. An edge may be associated with a numerical value, referred to as a “weight”, that may be assigned to the pairwise connection between the two nodes. The edge weight may be identified as a strength of connectivity between two nodes and/or may be related to a cost or distance, as it often represents a quantity that is required to move from one node to the next.
A “subgraph” or “sub-graph” can include a graph formed from a subset of elements of a larger graph. The elements may include vertices and connecting edges, and the subset may be a set of nodes and edges selected amongst the entire set of nodes and edges for the larger graph. For example, a plurality of subgraph can be formed by randomly sampling graph data, wherein each of the random samples can be a subgraph. Each subgraph can overlap another subgraph formed from the same larger graph.
The term “node” can include a discrete data point representing specified information. Nodes may be connected to one another in a topological graph by “edges,” which may be assigned a value known as an edge weight in order to describe the connection strength between the two nodes. For example, a first node may be a data point representing a first device in a network or a user in an interaction network, and the first node may be connected in a graph to a second node representing a second device in the network or a resource provider in the interaction network. In some embodiments, the connection strength may be defined by an edge weight corresponding to how quickly and easily information may be transmitted between the two nodes. An edge weight may also be used to express a cost or a distance required to move from one state or node to the next. For example, a first node may be a data point representing a first position of a machine, and the first node may be connected in a graph to a second node for a second position of the machine. The edge weight may be the energy required to move from the first position to the second position.
A “periodicity classification” can include an indication of whether or not something is periodic. A periodicity classification can classify a node or an edge as being “periodic” or as being “not periodic.” In some embodiments, a periodicity classification can be a binary periodicity classification. In other embodiments, a periodicity classification can be a value in a range of continuous values.
An “embedding” can include a relatively low-dimensional space into which high-dimensional vectors can be translated. An embedding can be a mapping of a discrete—categorical—variable to a vector of continuous numbers. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully represent categories in the transformed space. Embeddings can make it easier to perform machine learning on large inputs (e.g., sparse vectors representing words, etc.). An embedding can capture some of the semantics of the input by placing semantically similar inputs close together in the embedding space. An embedding can be learned and reused across models.
“Projected embeddings” can include embeddings that have be projected into an embedding space. A projected embedding can be created by translating an embedding from one embedding space to another embedding space. For example, a plurality of embeddings for a node, where each embedding is a node embedding for a time of a timespan, can each be projected into single embedding space. The single embedding space can be selected to be the embedding space of the final timestamp embedding space. However, it is understood that any embedding space can be selected.
“Aggregated embeddings” can include embeddings that have been combined into a single embedding. An aggregated embedding can be created from a plurality of projected embeddings. As such, the projected embeddings can represent a node at each time stamp, while the aggregated embedding can represent the node across each time stamp.
A “processor” may include a device that processes something. In some embodiments, a processor can include any suitable data computation device or devices. A processor may comprise one or more microprocessors working together to accomplish a desired function. The processor may include a CPU comprising at least one high-speed data processor adequate to execute program components for executing user and/or system-generated requests. The CPU may be a microprocessor such as AMD's Athlon, Duron and/or Opteron; IBM and/or Motorola's PowerPC; IBM's and Sony's Cell processor; Intel's Celeron, Itanium, Pentium, Xeon, and/or XScale; and/or the like processor(s).
A “memory” may be any suitable device or devices that can store electronic data. A suitable memory may comprise a non-transitory computer readable medium that stores instructions that can be executed by a processor to implement a desired method. Examples of memories may comprise one or more memory chips, disk drives, etc. Such memories may operate using any suitable electrical, optical, and/or magnetic mode of operation.
A “server computer” may include a powerful computer or cluster of computers. For example, the server computer can be a large mainframe, a minicomputer cluster, or a group of servers functioning as a unit. In one example, the server computer may be a database server coupled to a Web server. The server computer may comprise one or more computational apparatuses and may use any of a variety of computing structures, arrangements, and compilations for servicing the requests from one or more client computers.
Table 1 illustrates definitions for various symbols that appear herein.
For simplicity of illustration, a certain number of components are shown in
Messages between the devices in
The plurality of devices 102 can include any number of devices capable of obtaining data and/or performing interactions. Examples of user devices include user devices, mobile devices, access devices, resource provider devices, etc. Each device of the plurality of devices 102 can directly or indirectly (e.g., via an intermediate computer, not shown) provide data to a database 104 regarding interactions performed by the device.
For example, a first device of the plurality of devices 102, such as a user device, can perform an interaction, such as a transaction, with a second device of the plurality of devices 102, such as a resource provider computer. The first device and the second device can both provide data relating to the interaction to the database 104. For example, the first device and the second device can provide interaction data, device identifiers, device types (e.g., user device or resource provider device), and/or any other data relating to the devices or the interaction between the devices.
The database 104 can include an electronic storage system that can store data obtained by the plurality of devices 102. The database 104 can include any suitable database. The database 104 may be a conventional, fault tolerant, relational, scalable, secure database such as those commercially available from Oracle™ or Sybase™.
The data obtained by the plurality of devices 102 can be stored as a dynamic graph comprising a plurality of nodes and a plurality of edges. The plurality of nodes of the dynamic graph can represent the plurality of devices 102. The plurality of edges can represent interactions that occur between devices of the plurality of devices 102.
The data analysis computer 106 can include a computer or server computer capable of analyzing the data stored in the database. The data analysis computer 106 can obtain the data from the database 104 and analyze the data as described in further detail herein. For example, the data analysis computer 106 can determine whether or not a subgraph included in the graph is periodic.
The memory 202 can be used to store data and code. The memory 202 may be coupled to the processor 204 internally or externally (e.g., cloud based data storage), and may comprise any combination of volatile and/or non-volatile memory, such as RAM, DRAM, ROM, flash, or any other suitable memory device. For example, the memory 202 can store graph data.
The computer readable medium 208 may comprise code, executable by the processor 204, for performing a method comprising: obtaining, by a computer, node embeddings and node periodicity classifications for a plurality of nodes in a graph, and edge embeddings and edge periodicity classifications for a plurality of edges in the graph for each time of a time period; determining, by the computer, subgraph embeddings based on a subgraph of the graph, times in the time period, the node embeddings for nodes in the subgraph, the edge embeddings for edges in the subgraph, the node periodicity classifications for the nodes in the subgraph, and the edge periodicity classifications for the edges in the subgraph; translating, by the computer, each subgraph embedding of the subgraph embeddings for each time of the time period into projected subgraph embeddings of a plurality of projected subgraph embeddings; for the subgraph, aggregating, by the computer, the plurality of projected subgraph embeddings into an aggregated subgraph embedding; and determining, by the computer, if the subgraph is periodic based upon at least the aggregated subgraph embedding.
The embedding module 208A may comprise code or software, executable by the processor 204, for embedding nodes, edges, and/or subgraphs. The embedding module 208A, in conjunction with the processor 204, can determine node embeddings for a plurality of nodes, determine edge embeddings for a plurality of edge embeddings, and a subgraph embedding for a subgraph. The embedding module 208A, in conjunction with the processor 204, can perform machine learning embedding process to learn the embeddings.
The embedding module 208A, in conjunction with the processor 204, can determine the node embeddings using a static node embedding method. For example, the embedding module 208A, in conjunction with the processor 204, can utilize a static node embedding method that includes a random-walk-based method as performed in, for example, DeepWalk and node2vec. For example, the node2vec framework learns low-dimensional representations for nodes in a graph by optimizing a neighborhood preserving objective. The objective is flexible, and the algorithm accommodates for various definitions of network neighborhoods by simulating biased random walks. Specifically, it provides a way of balancing the exploration-exploitation tradeoff that in turn leads to representations obeying a spectrum of equivalences from homophily to structural equivalence.
The embedding module 208A, in conjunction with the processor 204, can determine the edge embeddings using a static edge embedding method. For example, the embedding module 208A, in conjunction with the processor 204, can utilize a modified version of edge2vec. The modified edge learning process can aim to optimize a loss function that takes into account the global edge proximity, the local edge proximity, and the all-time edge proximity. The modified edge learning process will be described in further detail herein.
The embedding module 208A, in conjunction with the processor 204, can determine the subgraph embeddings using a subgraph representation learning method. For example, the embedding module 208A, in conjunction with the processor 204, can determine a subgraph embedding for a subgraph based on the subgraph of the graph, times in the time period, node embeddings for nodes in the subgraph, edge embeddings for edges in the subgraph, node periodicity classifications for the nodes in the subgraph, and edge periodicity classifications for the edges in the subgraph. The embedding module 208A, in conjunction with the processor 204, can determine the subgraph embeddings using a modified SubGNN process, as described in further detail herein.
The graph translation module 208B may comprise code or software, executable by the processor 204, for translating a plurality of embeddings from a plurality of time stamps into projected embeddings. The graph translation module 208B, in conjunction with the processor 204, can perform embedding projection using a machine learning process that optimizes a loss function to obtain transition matrices to translate the input embeddings to obtain embeddings that are close to the embeddings at the final time stamp. As such, the graph translation module 208B, in conjunction with the processor 204, can project input embeddings from each time stamp to a single time stamp's (selected to be the final timestamp) embedding space. By doing so, the graph translation module 208B, in conjunction with the processor 204, can create a space in which each projected embedding can be accurately compared with one another.
The embedding aggregation module 208C may comprise code or software, executable by the processor 204, for aggregating a plurality of projected embeddings. The embedding aggregation module 208C, in conjunction with the processor 204, can determine an aggregated embedding based on the plurality of projected embeddings. As such, the embedding aggregation module 208C, in conjunction with the processor 204, can combine the plurality of projected embeddings, where each projected embedding is for a different time stamp, into a single aggregated embedding. The aggregated embedding can represent the edge, node, or subgraph (e.g., entity) for which the projected embeddings represent.
For example, the embedding aggregation module 208C, in conjunction with the processor 204, can utilize a recurrent neural network to aggregate the embeddings of a given node or edge across time. The embedding aggregation module 208C, in conjunction with the processor 204, can learn an embedding matrix and a transition matrix using the recurrent neural network. The embedding matrix can convert the node or edge projected embeddings to a smaller dimension. The transition matrix can aid in transforming the projected embeddings into a single aggregated embedding.
The periodicity detection module 208D may comprise code or software, executable by the processor 204, for determining whether or not a node or edge is periodic. The periodicity detection module 208D, in conjunction with the processor 204, can perform a binary periodicity classification using a classification and regression task. The periodicity detection module 208D, in conjunction with the processor 204, can determine a node periodicity classification for each input aggregated node embedding and an edge periodicity classification for each input aggregated edge embedding. For example, the periodicity detection module 208D, in conjunction with the processor 204, can perform a machine learning classification method to determine whether or not the node or edge is periodic.
The network interface 206 may include an interface that can allow the data analysis computer 200 to communicate with external computers. The network interface 206 may enable the data analysis computer 200 to communicate data to and from another device (e.g., a database, a processing computer, etc.). Some examples of the network interface 206 may include a modem, a physical network interface (such as an Ethernet card or other Network Interface Card (NIC)), a virtual network interface, a communications port, a Personal Computer Memory Card International Association (PCMCIA) slot and card, or the like. The wireless protocols enabled by the network interface 206 may include Wi-Fi™. Data transferred via the network interface 206 may be in the form of signals which may be electrical, electromagnetic, optical, or any other signal capable of being received by the external communications interface (collectively referred to as “electronic signals” or “electronic messages”). These electronic messages that may comprise data or instructions may be provided between the network interface 206 and other devices via a communications path or channel. As noted above, any suitable communication path or channel may be used such as, for instance, a wire or cable, fiber optics, a telephone line, a cellular link, a radio frequency (RF) link, a WAN or LAN network, the Internet, or any other suitable medium.
Compared to nodes and edges, the occurrences of subgraphs in dynamic graphs' time spans are significantly sparser. In a given a limited window, it may be feasible to have enough observations to determine the periodicity of nodes and edges, yet be infeasible to determine the periodicity of subgraphs. For example, take the Enron email dataset.
Although leveraging the learned node and edge features to further construct subgraph embeddings that can ease the subgraph periodicity detection appears to be good strategy on a high-level, the approach itself requires many detailed considerations. To start, while there has been abundant node representation learning algorithms that generate high-quality node embeddings [10]-[18], the same cannot be said for edge embeddings [19]-[21], not to mention the even more limited development in subgraph embeddings [22], [23]. To precisely capture a set of periodicity-aware subgraph embeddings in the dynamic graphs, being able to obtain high-quality edge and subgraph embeddings is needed.
To address the above technical problems in learning the periodicity of subgraphs from a representation learning approach, embodiments provide for a hierarchical system that first learns periodicity-aware embeddings of nodes and edges, then further leverages this information to learn the subgraph embeddings that are optimal for subgraph periodicity detection. For example, a system according to embodiments can first learn static snapshots of dynamic graphs' node and edge embeddings separately, where embodiments improve an existing edge embedding method [21] to enhance its embedding quality in downstream periodicity detection task. The learned embeddings are then projected onto the same latent space with an embedding translation method [24] and aggregated throughout the dynamic graph's time span with a recurrent neural network to a single embedding. The predicted periodicity scores are then sent to the subgraph embedding stage, where each subgraph is decomposed into a node graph and an edge graph. Using the previously learned node and edge embeddings, combined with the periodicity scores, embodiments can utilize a message-passing-based subgraph embedding method [22] to generate each snapshot's subgraph embeddings. The subgraph embeddings are then also projected onto the same space across time span, aggregated with a recurrent neural network, and finally passed to a fully connected layer to predict the underlying periodicity.
As such, embodiments described herein provide for the following technical solutions and advantages to the aforementioned technical problems. 1) Embodiments provide for determining the periodicity of subgraphs in dynamic graphs from a hierarchical and neural-network-based approach. 2) Embodiments provide a modified version of edge representation learning method by introducing a new loss function that captures the presence of edges at a given time stamp. As such, the edge embeddings are more fitting for predicting the edge periodicity through the dynamic graphs' time span. 3) Embodiments utilize the learned periodicity-aware node and edge embeddings and their periodicity scores into the recently proposed subgraph embedding method, SubGNN [22]. Through evaluation on two real-world datasets, it can be shown that the learned periodicity-aware subgraph embeddings can better detect the subgraphs' periodicity compared to the non-periodicity-aware embeddings.
Embodiments can define a dynamic graph as =(
), where
={V(t)}t∈T is a collection of node sets over time,
={E(t)}t∈T is a collection of edge sets over time, and
is the complete time span of the dynamic graph
. A subgraph at time t can be defined as
(t)=(V(t)′, E(t)′), where V(t)′⊆V(t) and E(t)′⊆E(t). Then given a dynamic graph
, and a set of subgraphs
(t)={S(t)}t∈T′ embodiments aim to learn the periodicity-aware embeddings of nodes, edges, and subgraphs, and further leverage the embeddings to predict the underlying periodicity of each entity.
As a first example, the nodes of the dynamic graph can represent users and resource providers. The edges of the dynamic graph can represent interactions (e.g., transactions, etc.) between the users and resource providers. A subgraph of the dynamic graph can include a collection of users, resource providers, and interactions that relate to one another in some manner. The subgraph can represent, for example, interactions that take place during and related to a small town's local festival. The subgraph may only appear in the dynamic graph at particular times. Embodiments can determine whether or not the subgraph is periodic. For example, the subgraph can be periodic if the local festival occurs once a month when the collection of nodes and edges of the subgraph appear in the dynamic graph. As such, embodiments can determine that the small town's local festival both exists and takes place monthly based on dynamic graph.
As a second example, the nodes of the dynamic graph can represent computers in a computer network. The edges of the dynamic graph can represent communication channels or linkages between the computers in the computer network. A subgraph of the dynamic graph can represent, for example, a grouping of computers that process actions for a massively multiplayer online game (MMO) on behalf of player computers and game server computers. The subgraph may appear periodically based on players using the computers playing the MMO with one another. Large groups of players playing the MMO together can impact network traffic and the load of the game server computers. Embodiments can determine whether or not the subgraph is periodic. For example, the subgraph can be periodic if the large group of players that play the MMO together appears in the subgraph at somewhat regular intervals (e.g., appears on weekends, appears on Tuesday nights, appears on the second Thursday of each month, etc.). As such, embodiments can determine the periodicity of the subgraph and allow computers in the network to prepare for network load balancing and/or other processes that limit overloading the game server computers during the high usage times when the subgraphs appear.
In some embodiments, input data used by a computer to predict the underlying periodicity of each entity (e.g., that is represented by a node) can be prepared along with training labels for system training. Any known periodicity labels can be utilized by the system. For example, the system can utilize prior knowledge of the periodicity of nodes and edges in order to properly learn the periodicity-aware embeddings of each type of entities. For periodicity label generation, to obtain the ground truth of node and edge periodicity, embodiments can leverage the approach of event periodicity mining that can handle a portion of missing data proposed in [8]. Given the complete time span of a dynamic graph
, the computer can construct an occurrence vector of length
, where each value is either 0 or 1, representing whether at that time stamp the corresponding entity (e.g., represented by a node) is present. Through an interval cutting and overlapping approach, the method in [8] then outputs a probability score of each tested periodicity value.
In some embodiments, a subgraph of interest can be provided to the computer prior to determining the subgraph periodicity. For subgraph pattern mining there has been several approaches proposed regarding mining frequently appearing subgraph patterns across dynamic graphs. Embodiments can utilize any suitable subgraph pattern mining process to determine one or more subgraphs from a dynamic graph, such as, gSpan [28].
Referring to
Before describing the individual steps illustrated in
The obtained XV and Xe are then passed into an embedding projection method (e.g., MUSE [24]), where all the snapshots' embeddings during t=1 . . . 1 are projected to t=. The projected embeddings are denoted as X*V and X*ε.
The projected embeddings are then fed into a recurrent neural network to aggregate each node and edge's sequence of embeddings throughout the time span . The aggregated embedding of a node v is denoted as hv and the aggregated embedding of an edge e is denoted as he.
The aggregated embeddings are then used to conduct the first prediction task, the binary periodicity classification. The classified results are then sent to two modules: the periodicity detection module and the subgraph embedding generation module. The periodicity detection module is similar to the binary classification module, where the usage of the classified results in the subgraph embedding generation module serves as an embedding amplifier on the nodes or edges that are determined as periodical.
After the above embeddings and information are in place, embodiments can then generate the embeddings of each pre-determined subgraph for each static snapshot in the dynamic graph with subgraph embedding method (e.g., SubGNN [22]). The rest of the steps leading up to detecting the periodicity is then similar to the ones for nodes and edges, except the part where embodiments do not classify the subgraphs into periodical or non-periodical classifications. Rather, embodiments incorporate a periodicity filter mask to facilitate the detection of the underlying periodicity of the subgraphs.
Returning to
To embed each snapshot's node embeddings, various node embedding methods can be used. There are several graph representation learning algorithms proposed that are able to generate high quality node embeddings that are useful for various downstream tasks, including methods for static graphs [10]-[15] and dynamic graphs [16]-[18]. The dynamic graph embedding approaches in general suffer from high complexity and tend to carry over the presence of nodes across times, whereas differentiating between whether a node is present in a specific time stamp or not is crucial in the periodicity detection task [25]. Therefore, embodiments utilize a static approach to obtain node embeddings, contrary to a general assumed use of a dynamic graph embedding. As such, the computer can determine the node embeddings for the plurality of nodes using a static node embedding process including a random walk sampling during a machine learning process. In addition, in order to accurately capture the presence of nodes, random-walk-based methods such as DeepWalk [13] and node2vec [15] are more suitable than the ones without random walks. This is because the non-present nodes are not reached by random walks unless it is sampled at the random start step.
To conduct a first-order random walk at a given snapshot t of the dynamic graph, the transition probability of the random walks can be expressed as follows:
where ci is the node reached at the i-th step of the walk, πvx is the transition probability from node v to x, and Z is a normalizing constant.
As node2vec encounters the BFS and DFS strategies when conducting random walks, the p and q parameters that help to control the bias towards BFS and DFS can be incorporated into the unnormalized transition probability for the second-order random walk as
where wvx is the weight of the edge (v, x). The αpq(t, x) is a biased term formulated as
where dtx denotes the length of the shortest path between t and x.
The sampled walks are used as inputs to a skipgram-like model to then obtain embeddings for each node at time stamp t [26], which are denoted as Xv(t).
At step S504, the computer can determine a plurality of edge embeddings for a plurality of edges. For example, for each time step of the plurality of time steps, the computer can determine the plurality of edge embeddings for that time step, where the plurality of edge embeddings correspond to edges in the graph at a single time step. In some embodiments, the computer can utilize a modified version of edge2vec [21], to embed each snapshot's edge embeddings. The computer can determine the edge embeddings for the plurality of edges based on training a machine learning model that optimizes a loss function that takes into account global edge proximity, local edge proximity, and all-time edge proximity.
Given two edges, e1=(s1, t1) and e2=(s2, t2), the two edges have local edge proximity of 1 if either s1=s2 or t1=t2. Otherwise, the local proximity is 0. Their global edge proximity is defined by the similarity of their neighborhood vectors, which is defined as follows.
where wij denote the connection between vertices i and j, which can be directly retrieved from the graph's adjacency matrix.
1) Global Edge Proximity: The edge2vec algorithm aims to preserve the global proximity of edges through optimizing the deep autoencoders. For each input in an autoencoder, a neighborhood vector ne is fed in and denoted as xe(0). For each layer n in the encoder, its input is the output from the previous layer after passing through a fully connected layer:
The decoder works in a symmetrical way, where the inputs of each layer n−1 of the decoder is the output of the previous layer n (decoder's layers are numbered decreasingly) after passing through a fully connect layer:
For the autoencoder to be well trained, it should have as little difference as possible between the input at the beginning xe(0) and the final output of the decoder ye(0). This leads to the first loss to the first loss to optimize, global loss.
where Ie is an indicator vector for xe(0). Let Ie={Ie,i}i=12|V|, xe(0)={xe,i(0)}=i=12|V|, and the value of the indicator vectors can be determined as:
2) Local Edge Proximity: To preserve the local proximity of edges, edge2vec aims to pull embeddings of edges with local proximity closer, while push embeddings of edges with no local proximity further apart. To achieve this, edge2vec borrows the idea of skipgram from word2vec [26]. For each given edge pair ep=(e, e′) (e and e′ both have local proximity of 1), it draws λ negative edges whose local proximity with e and e′ are both 0. The local proximity loss of a given edge pair is thus
The loss functions considered by the original edge2vec end here. However, so far the similarity and dissimilarity between edges only considered within a single snapshot of the overall dynamic graphs. In order the achieve the differentiation of edges across time, embodiments further provide for an additional loss function to consider.
3) All-time Edge Proximity: To differentiate between whether an edge exists in a given time t in the dynamic graph or not, embodiments introduce an all-time edge proximity. The all-time edge proximity is a modification based on local, where instead of having non-local-proximity edges as negative samples, the computer samples the ones that do not exist in the given time t but do exist in other times t′. In this way, the all-time edge proximity loss can be written as follows.
where e″ is an edge that is also present at time t (but is not necessary to have local proximity with e), and ej is a negative edge that does not exist in time t.
To this end, embodiments introduce the three losses the modified edge embedding process that for example, edge2vec, aims to optimize. The overall objective of the modified edge2vec is therefore added together a follows.
The finally obtained edge embedding at each timestamp t is then denoted as Xe(t).
At step S506, the computer can translate each node embedding of the plurality of node embeddings for each time of the time period into projected node embeddings. For example, the plurality of node embeddings can be normalized across each time step. Additionally, at step S508, the computer can translate each edge embedding of the plurality of edge node embeddings for each time of the time period into projected edge embedding. By translating each node embedding and edge embedding into a similar node embedding space and edge embedding space, respectively, the embeddings can be more accurately compared to one another.
As an example of translating embeddings, translating a node embedding will be discussed. It is understood that translating each node embedding into projected node embeddings is performed in a similar manner to translating each edge embedding into projected edge embeddings. For example, for each time stamp t's node embeddings Xv(t) and edge embeddings Xe(t) (where t≠), the computer can learn transition matrices
and
such that the projected embeddings are close to the embeddings at the final time stamp
(e.g., Xv(
) and Xe(
).
More specifically, a discriminator for projecting embeddings from t to is trained to differentiate between the embeddings sampled from and X*(
) and
. Formally, the discriminator's loss function can be written as follows.
By minimizing the loss above, the computer is able to leverage the obtained and
and get the final projected embeddings of nodes and edges across time as X*V and X*ε.
At step S510, for each node in the plurality of nodes, the computer can aggregate the projected node embeddings for a particular node into an aggregated node embedding. The aggregated node embedding can represent a node for all time steps of the plurality of time steps. Additionally, at step S512, for each edge in the plurality of edges, the computer can aggregate the projected edge embeddings into an aggregated edge embedding. The aggregated edge embedding can represent an edge for all time steps of the plurality of time steps.
In some embodiments, the computer can perform steps S510 and S512 after steps S502 and S504, respectively. For example, after determining the node embeddings, the computer can aggregate the plurality of node embeddings or derivatives thereof into an aggregated node embedding. The derivatives of the plurality of node embeddings can be the plurality of projected node embeddings. Additionally, after determining the edge embeddings, the computer can aggregate the plurality of edge embeddings or derivatives thereof into an aggregated edge embedding. The derivatives of the plurality of edge embeddings can be the plurality of projected edge embeddings.
As an example of aggregating projected embeddings, aggregating projected node embeddings will be discussed. It is understood that aggregating each projected node embedding into an aggregated node embedding is performed in a similar manner to aggregating each edge embedding into an aggregated edge embedding. So far, the computer has generated node and edge embeddings across the time span of the dynamic graph , and projected them to the same embedding space as the last time stamp (e.g., t=
). To aggregate the embeddings of a given entity across time, the computer can feed the sequence of embeddings into a recurrent neural network, where a node/edge embedding at time t is an aggregation from its embeddings during 1 . . . t−1, as described below.
where Wh is the embedding matrix that converts a given node/edge embedding to smaller dimension, and Uh is a transition matrix to from t−1 to t.
As such, aggregating the projected node embeddings and aggregating the projected edge embeddings includes inputting the projected node embeddings into a first recurrent neural network to aggregate each node sequence of embeddings throughout the time period and inputting the projected edge embeddings into a second recurrent neural network to aggregate each edge sequence of embeddings throughout the time period. The first recurrent neural network and the second recurrent neural network can be executed according to equation 15 to determine the embedding matrix and the transition matrix. The computer can perform a machine learning process to determine (e.g., learn) the embedding matrix and the transition matrix for both the first recurrent neural network and the second recurrent neural network. The computer can then obtain the aggregated node embeddings and the aggregated edge embeddings using the learned embedding matrices and transition matrices.
After the aggregation, the computer holds (e.g., stores in memory) the aggregated node and edge embeddings, denoted as hv and he, respectively.
At step S514, for each node in the plurality of nodes, the computer can classify the aggregated node embedding as being periodic or not periodic to produce node periodicity classifications. Additionally, at step S516, for each node in the plurality of edges, the computer can classify the aggregated edge embedding as being periodic or not periodic to produce edge periodicity classifications. The computer can classify the aggregated node embeddings and the aggregated edge embeddings using a binary machine learning classification method. The binary machine learning classification method can label the input aggregated node embeddings and the input aggregated edge embeddings as being “periodic” or as being “not periodic.”
As an example of classifying aggregated node embeddings, classifying an aggregated node embedding will be discussed. It is understood that classifying each aggregated node embedding is performed in a similar manner to classifying each aggregated edge embedding. For example, with the aggregated node embeddings hv and the aggregated edge embeddings he, binary periodicity classification and numerical periodicity detection are be performed, by the computer, as a classification and regression task. For example, the computer can classify the aggregated node embeddings and the aggregated edge embeddings as follows:
where for both the binary periodicity classification and numerical periodicity detection tasks, the dimensions of yt are set to 1. The output of the binary classification task can be denoted as pv and pe, and leverage them for further representation learning of subgraphs, which we will discuss next.
At step S518, the computer can output node periodicity determinations based on the node periodicity classifications determined at step S514.
At step S520, the computer can output edge periodicity determinations based on the edge periodicity classifications determined at step S516.
At step S521, after obtaining the node embeddings and the node periodicity classifications for the plurality of nodes in the graph, and the edge embeddings and the edge periodicity classifications for the plurality of edges in the graph for each time of the time period, the computer can determine subgraph embeddings.
The computer can determine the subgraph embeddings based on a subgraph of the graph and based on times in the period (e.g., time steps), and the node embeddings for the nodes of the subgraph, and the edge embeddings for the edges of the subgraph, the node periodicity classifications for the nodes of the subgraph, and the edge periodicity classifications for the edges of the subgraph. The computer can determine a subgraph embedding for the subgraph at each time stamp.
At step S522, after determining the subgraph embeddings, the computer can translate each subgraph embedding of the subgraph embeddings for each time of the time period into projected subgraph embeddings of a plurality of projected subgraph embeddings. The subgraphs embeddings can be projected into subgraph embeddings similar to the node embeddings being projected into node embeddings, described at step S506.
For example, the computer can project each subgraph embedding, representing the subgraph at each timestamp, of the subgraph embeddings into a same coherent space. The computer can project each subgraph embedding into an embedding space of the last timestamp's subgraph embedding. The projection can be performed using a machine learning process to learn transition matrices, as described in further detail herein.
At step S524, after translating each subgraph embedding, the computer can aggregate, for the subgraph, the plurality of projected subgraph embeddings into an aggregated subgraph embedding. The aggregated subgraph embedding can represent the subgraph for all time steps. The plurality of projected subgraph embeddings embedding can be aggregated into an aggregated subgraph embedding in a similar manner to the plurality of projected node embeddings being aggregated into an aggregated node embedding, described at step S510. For example, the aggregation can be performed using a recurrent neural network to learn an embedding matrix and a transition matrix, as described herein.
In some embodiments, the computer can perform step S524 after step S521. For example, after determining the subgraph embeddings, the computer can aggregate the plurality of subgraph embeddings or derivatives thereof into an aggregated subgraph embedding. The derivatives of the plurality of subgraph embeddings can be the plurality of projected subgraph embeddings.
At step S526, after determining the aggregated subgraph embedding, the computer can determine if the subgraph is periodic based upon aggregated subgraph embedding, the node periodicity classifications, and the edge periodicity classifications. For example, the computer can pass the aggregated subgraph embedding into a fully connected recurrent neural network layer to predict the underlying periodicity of the subgraph. The recurrent neural network can be trained to identify whether or not an aggregated subgraph embedding is “periodic “or “not periodic” based on the node periodicity classifications and the edge periodicity classifications.
In some embodiments, determining if the subgraph is periodic can include the computer determining an estimated periodicity score based upon at least the aggregated subgraph embedding. The computer can then determine whether or not the estimated periodicity score exceeds a periodicity score threshold. The periodicity score threshold can be a predetermined value. If the estimated periodicity score exceeds the periodicity score threshold, the computer can output an indication that the subgraph is periodic. In some cases, the computer can output the estimated periodicity score. For example, the computer can determine an estimated periodicity score of 0.9 (on a scale of 0 to 1). The computer can compare the estimated periodicity score of 0.9 to a periodicity score threshold of 0.85. Since the estimated periodicity score of 0.9 exceeds the periodicity score threshold of 0.85, the computer can output an indication that the subgraph associated with the estimated periodicity score is periodic.
There have been several proposals regarding the learning of subgraphs' representations. To incorporate the learned embedding and periodicity of nodes and edges from the steps above, some embodiments can utilize a modified SubGNN [22]. SubGNN is designed on top of a message passing framework [27], which makes it easy to consider nodes' and edges' embeddings as well as periodicity information.
Prior to step S602, a computer can determine a subgraph S of a dynamic graph. The computer can determine a plurality of subgraphs using subgraph pattern mining learning process and then selecting the subgraph from the plurality of subgraphs. The computer can select a subgraph of interest from the plurality of subgraphs included in the dynamic graph.
At step S602, the computer can decompose the subgraph S into a sub-node graph S′v and a sub-edge graph S′e. The sub-node graph S′v can include information about the nodes of the subgraph S. The sub-edge graph S′e can include information about the edges of the subgraph S.
At step S604, after obtaining the sub-node graph S′v, the computer can process the sub-node graph S′v during steps S606-S614.
At step S606, the computer can obtain node embeddings for the plurality of nodes in the subgraph S as included in the sub-node graph S′v. The node embeddings can be created, for example, at step S502 as illustrated in
At step S608, the computer can determine node messages. The node messages can be data items that are created based on the node embeddings (determined at step S502 in
As an example, for the sub-node graph S′v, the computer can identify anchor patches of the subgraph S following the channels proposed in [22] and pass the message from anchor patches to the given subgraph as follows.
where X is the channel, A is the corresponding anchor patch, and aX is the encoding of the anchor patch, which can be directly retrieved from the previously learned and projected node embeddings.
At step S610, the computer can obtain node periodicity classifications for the plurality of nodes included in the sub-node graph S′v. The node periodicity classifications can be generated at step S514 as illustrated in
At step S612, the computer can determine aggregated messages using the determined node messages and the sub-node graph. The node messages can be determined using an order-invariant aggregation function (AGG). Furthermore, the node message for a particular node can be weighted by the node's periodicity (e.g., as determined at step S514 as illustrated in
The computer can then aggregate the messages weighted by the node periodicity classifications as follows.
where AGG is an order-invariant aggregation function and pvA is the periodicity of the anchor patch A based on the previously obtained pv.
At step S614, the computer can generate embeddings for the nodes in the sub-node graph. For example, the computer can create subgraph-node embeddings based on the aggregated messages.
Steps S616-S626 are similar to steps S604-S614, but are performed with the edge subgraph and the associated edge data rather than the node subgraph and the associated node data, and will not be repeated here.
At step S628, the computer can combine the subgraph-node embeddings and the subgraph-edge embeddings determined at steps S614 and S626. For example, the computer can determine an overall subgraph embedding using the subgraph-node embeddings and the subgraph-edge embeddings.
The computer can determine the subgraph embeddings using the aggregated node messages and the aggregated edge messages (e.g., gX,c), a learnable weight matrix (WX), and an activation function (σ). For example, the embedding of the subgraph S′v is determined as follows.
After determining the subgraph embedding, the computer can perform the steps following S521, as illustrated in
A summarized version of the training process method of the proposed system is illustrated in Algorithm 1, below.
do
− 1 do
;
;
;
;
do
− 1 do
;
;
Embodiments were evaluated with both synthetic dataset and real-world datasets. For the real-world datasets, we used Enron email dataset [29] and arXiv citation dataset by Open Graph Benchmark [30]. We conducted the periodicity detection tasks for both binary classification and regression. The results are shown in Table II, below.
Although the steps in the flowcharts and process flows described above are illustrated or described in a specific order, it is understood that embodiments of the invention may include methods that have the steps in different orders. In addition, steps may be omitted or added and may still be within embodiments of the invention.
Any of the software components or functions described in this application may be implemented as software code to be executed by a processor using any suitable computer language such as, for example, Java, C, C++, C#, Objective-C, Swift, or scripting language such as Perl or Python using, for example, conventional or object-oriented techniques. The software code may be stored as a series of instructions or commands on a computer readable medium for storage and/or transmission, suitable media include random access memory (RAM), a read only memory (ROM), a magnetic medium such as a hard-drive or a floppy disk, or an optical medium such as a compact disk (CD) or DVD (digital versatile disk), flash memory, and the like. The computer readable medium may be any combination of such storage or transmission devices.
Such programs may also be encoded and transmitted using carrier signals adapted for transmission via wired, optical, and/or wireless networks conforming to a variety of protocols, including the Internet. As such, a computer readable medium according to an embodiment of the present invention may be created using a data signal encoded with such programs. Computer readable media encoded with the program code may be packaged with a compatible device or provided separately from other devices (e.g., via Internet download). Any such computer readable medium may reside on or within a single computer product (e.g. a hard drive, a CD, or an entire computer system), and may be present on or within different computer products within a system or network. A computer system may include a monitor, printer, or other suitable display for providing any of the results mentioned herein to a user.
The above description is illustrative and is not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of the disclosure. The scope of the invention should, therefore, be determined not with reference to the above description, but instead should be determined with reference to the pending claims along with their full scope or equivalents.
One or more features from any embodiment may be combined with one or more features of any other embodiment without departing from the scope of the invention.
As used herein, the use of “a,” “an, ” or “the” is intended to mean “at least one,” unless specifically indicated to the contrary.
The present application is a PCT application of and claims priority to U.S. Provisional Application 63/209,183, filed on Jun. 10, 2021, which is incorporated herein by reference for all purposes in its entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2022/033022 | 6/10/2022 | WO |
Number | Date | Country | |
---|---|---|---|
63209183 | Jun 2021 | US |