The present disclosure relates generally to the field of neural network technology. Specifically, the present disclosure relates to systems and methods for generating permutation invariant representations for graph convolutional neural networks.
Graph convolutional networks (GCNs) are a form of machine learning which utilize a graph's adjacency matrix to learn a set of latent node embeddings. Permuting an order of the nodes in the adjacency leads to a different ordering of the rows in a latent embedding matrix. As such, any network that produces an estimate over these embeddings would not be consistent over permutations of the node ordering.
The foregoing problem results in prior art machine learning systems needing to learn all possible permutations of every graph in a training set. Accordingly, such a requirement necessitates additional processing time, memory, and computational complexity. Therefore, there is a need for computer systems and methods which can generate mapping such that any node ordering of a particular graph provides an identical result, thereby improving an ability of computer systems to more efficiently process data in a GCN. These and other needs are addressed by the computer systems and methods of the present disclosure.
The present disclosure relates to systems and methods for generating permutation invariant representations for graph convolutional neural networks. Specifically, the system includes one or more nodes having one or more features, a graph convolutional network (GCN), a permutation invariant mapping (PIM) Engine, and a fully connected network. The system generates a first matrix and a second matrix using the nodes and features. The first matrix and the second matrix are processed by the GCN and the GCN generates a set of node embeddings based on the first matrix and the second matrix. The set of node embeddings are processed by the PIM engine, where the PIM engine generates permutation data, such as a permutation invariant representation of a graph. To generate the permutation data, the PIM engine utilizes an ordering approach or a kernel approach. The permutation data is then processed by a fully connected network, which generates output data.
The foregoing features of the invention will be apparent from the following Detailed Description of the Invention, taken in connection with the accompanying drawings, in which:
The present disclosure relates to computer systems and methods for generating permutation invariant representations for graph convolutional neural networks, as described in detail below in connection with
By way of background, an embedding is a relatively low-dimensional space into which a system can translate high-dimensional vectors. Embeddings make it easier to perform machine learning on large inputs, such as sparse vectors representing words. Generally, the embeddings are flattened in preparation for applying a deep network (a class of machine learning which uses non-linear processing units' multiple layers for feature transformation and extraction).
The set of node embeddings 22 are received as inputs by the PIM engine 24. The PIM engine 24 generates permutation data 26, such as a permutation invariant representation of the graph (G). To generate the permutation data 26, the PIM engine 24 can use an ordering approach or a kernel approach. Both will be explained in greater detail below. The permutation data 26 is then received as an input by the fully connected network 28. The fully connected network 28 can be any type of neural network, such as a convolutional neural network, a deep neural network, a recurrent neural network, a machine learning system, etc. The fully connected network 28 then generates the output data 30, which can be expressed as a final estimate {circumflex over (p)}.
In step 46, the system 10 inputs the first matrix 16 and the second matrix 18 into the GCN 20. In step 48, the system 10 generates a set of node embeddings 22 using the GCN 20. By way of example, the set of node embeddings 22 can take the form of a latent feature matrix (denoted as Y∈n×d). However, those skilled in the art would understand that the GCN 20 can generate other types of matrixes that can be utilized by the system 10 of the present disclosure. To generate the latent feature matrix, a first layer can be denoted by H1=σ(ÂXW1) and subsequent layers can be denoted by Hi+1=σ(ÂHiWi+1). A permutation equivariant/covariant for any valid permutation matrix it is denoted as follows: if GCN(A, X)=Y, then GCN(πAπT, πX)=πY.
In step 50, the system inputs the set of node embeddings 22 into the PIM engine 24 and generates permutation data 26, which is invariant to row permutations of the set of node embeddings 22 (e.g., Y∈n×d). Specifically, the PIM engine 24 takes an equivalent relation ˜ on n×d such that given V, V′∈n×d, V˜V′ if ∃π∈Sn s. t. V′=ηV. Next, the PIM engine 24 determines ϕ: n×d→m×D such that the following is observed: (1) per mutation invariance; if V˜V′ then ϕ(V)=ϕ(V′); (2) injectivity modulo permutations; if ϕ(V)=ϕ(V′), then V˜V′; and (3) Lipschitz; ∥ϕ(V)−ϕ(V′)∥2≤Lminπ∈S
The PIM engine 24 then generates Z using, for example, the ordering approach 60 (as shown in
Let λ:n→n,λ(y)=(yπ(k))k=1n, where yπ(1)≥ . . . ≥yπ(n)
Z
(i)=λ(Ŷi) Equation 1
In step 76, the system 10 generates a set of node embeddings (Z∈m), where y(i) ∈d represents an ith row of Y (transposed to a column vector), and where Zi is expressed by Equation 3, below:
Testing and analysis of the above systems and methods will now be discussed in greater detail. The system 10 of the present disclosure was run on a QM9 dataset, which is comprised of one hundred thirty four thousand (134,000) chemical compounds along with thirteen computational derived quantum chemical properties for each compound. The system 10 performed regression over these values, where a norm of static polarization α (Bohr3) was observed.
The functionality provided by the present disclosure could be provided by computer software code 106, which could be embodied as computer-readable program code stored on the storage device 104 and executed by the CPU 112 using any suitable, high or low level computing language, such as Python, Java, C, C++, C #, .NET, MATLAB, etc. The network interface 108 could include an Ethernet network interface device, a wireless network interface device, or any other suitable device which permits the server 102 to communicate via the network. The CPU 112 could include any suitable single-core or multiple-core microprocessor of any suitable architecture that is capable of implementing and running the computer software code 106 (e.g., Intel processor). The random access memory 114 could include any suitable, high-speed, random access memory typical of most modern computers, such as dynamic RAM (DRAM), etc.
Having thus described the system and method in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present disclosure described herein are merely exemplary and that a person skilled in the art can make any variations and modification without departing from the spirit and scope of the disclosure. All such variations and modifications, including those discussed above, are intended to be included within the scope of the disclosure.
The present application claims the benefit of U.S. Provisional Application Ser. No. 62/857,947 filed on Jun. 6, 2019, the entire disclosure of which is expressly incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62857947 | Jun 2019 | US |