The subject matter disclosed herein relates to data analysis, and in particular to the use of machine learning techniques, such as deep learning, to analyze data, including graph inputs.
Artificial neural networks are a learning tool used in a variety of data analysis tasks, such as data classification, regression, de-noising, dimension reduction and so forth. In addition to optimizing performance of neural networks on these tasks, there is also a need to interpret the outputs of the neural network and learnt parameters for a greater understanding of the specific problem and domain.
In the context of graph learning applications, use of artificial neural networks may be unsuitable for achieving an understanding of which nodes of the graph data being analyzed contribute more to the analysis results. In particular, processing by the neural network is typically done in such a manner that the underlying graph topology is hidden from the learning unit. This may hinder interpretation of the data analysis results, as it may be difficult to interpret which nodes of the graphs contributes more to the analysis results, e.g., the regression or classification results.
Certain embodiments commensurate in scope with the originally claimed subject matter are summarized below. These embodiments are not intended to limit the scope of the claimed subject matter, but rather these embodiments are intended only to provide a brief summary of possible embodiments. Indeed, the invention may encompass a variety of forms that may be similar to or different from the embodiments set forth below.
In one aspect of the present approach, a neural network is provided. In accordance with this aspect, the neural network comprises an input layer, a plurality of hidden layers, and an output layer downstream from the plurality of hidden layers. The output layer is configured to provide an output of the neural network. The plurality of hidden layers include a first hidden layer configured as a graph-node layer. The trained graph-node layer encodes edge incidence information related to input graph data and constrains data relationships analyzed by downstream hidden layers.
In a further aspect of the present approach, a method for processing graph inputs is provided. In accordance with this method, graph data is received as an input at an input layer of a neural network. The graph data is constrained based on edge relationships at a graph-node layer prior to the constrained graph data being processed by one or more hidden layers of the neural network. The constrained graph data is processed using the one or more hidden layers. An output of the one or more hidden layers is generated at an output layer of the neural network.
In an additional aspect of the present approach, one or more non-transitory computer-readable media encoding processor-executable routines are provided. In accordance with this aspect, the routines, when executed by a processor, cause acts to be performed comprising: receiving graph data as an input at an input layer of a neural network; constraining the graph data based on edge relationships at a graph-node layer prior to the constrained graph data being processed by one or more hidden layers of the neural network; processing the constrained graph data using the one or more hidden layers; and generating an output of the one or more hidden layers at an output layer of the neural network.
These and other features, aspects, and advantages of the present invention will become better understood when the following detailed description is read with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure
While aspects of the following discussion are provided in the context of analysis of medical data, it should be appreciated that the present techniques are not limited to such medical contexts. Indeed, the provision of examples and explanations in such a medical context is only to facilitate explanation by providing instances of real-world implementations and applications. However, the present approaches may also be utilized in other suitable data analysis context. In general, the present approaches may be useful in any data analysis context where graph inputs are analyzed. As used herein, such graphs or graph inputs may be understood as encompassing graphs (V,E), where each graph is defined by its vertices (nodes, V) and edges (E) that connect the vertices. The topology of such graphs may be obtained or realized by “gluing” together edges coincident on the same vertex.
The present approach addresses certain challenges associated with data analysis of unstructured data, including graph inputs. Such unstructured data may be generated by spatially distributed but connected sensors and/or interacting multi-agent systems, and thus differing sensor measurements may be associated with different nodes, with the nodes themselves potentially having a spatial relationship to one another. By way of a practical example, brain imaging may utilize spatially distributed but connected sensors to measure electrical activity in the brain, generating graph data as an output.
When analyzing such graph data using deep learning techniques, such data (as discussed in greater detail below) is “flattened”, which may lead to the loss of certain informational aspects of the data. This may reduce the interpretability of the significance of different nodes or node locations in the analyzed data. Such a lack of interpretability may have real world consequences. For example in a healthcare context, an expert clinician or radiologist may want to know which features (i.e., nodes) contributed to a specific disease condition along with a prediction. Similarly, in a multi-sensor or multi-agent systems, it might be useful to identify contributions from individual sensors/agents towards the common goal.
In addition, other approaches may compute node-based metrics from the graphs and input these metrics to a neural network, allowing weights from the input layer to be used as a measure of node influences. However, node-based metrics over simplify the graph representation and complete reconstruction of the original data is not possible. Hence, such approaches may lead to underperformance and inaccurate results.
The present approach addresses these issues by encoding the explicit graph topology in the artificial neural network analysis, and thereby improves interpretability or relevance of individual or groups of graph nodes, facilitating diagnostic and/or predictive aspects of the analysis. In particular, in one approach a node based layer is inserted for encoding graph topology while inputting data from graphs (as adjacency matrices) to an artificial neural network. By introducing an extra layer that encodes topology, the “graph” structure is retained & transmitted through the neural networks, which aids interpretability of the results. Once such a network is trained, the weights of the node based layer indicate the relative influence of individual nodes in performing the learning task.
With the preceding introductory comments in mind, aspects of the present approaches described herein utilize neural networks in the analysis of unstructured data, including graph data. Neural networks as discussed herein may encompass deep neural networks, fully connected networks, convolutional neural networks (CNNs), perceptrons, auto encoders, recurrent networks, wavelet filter banks, or other neural network architectures. These techniques are referred to herein as deep learning techniques, though this terminology may also be used specifically in reference to the use of deep neural networks, which is a neural network having a plurality of layers.
As discussed herein, deep learning techniques (which may also be known as deep machine learning, hierarchical learning, or deep structured learning) are a branch of machine learning techniques that employ mathematical representations of data and artificial neural network for learning. By way of example, deep learning approaches may be characterized by their use of one or more algorithms to extract or model high level abstractions of a type of data of interest. This may be accomplished using one or more processing layers, with each layer typically corresponding to a different level of data abstraction and, therefore potentially employing or utilizing different aspects of the initial data or outputs of a preceding layer (i.e., a hierarchy or cascade of layers) as the target of the processes or algorithms of a given layer. In a data analysis context, this may be characterized as different layers corresponding to the different feature levels or levels of abstraction in the data.
In general, the processing from one level or abstraction to the next can be considered as one ‘stage’ of the analysis process. Each stage of the analysis can be performed by separate neural networks or by different parts of one larger neural network. For example, as discussed herein, a single deep learning network may cover all stages in an analytic process (e.g., from an initial input to an output data set). Alternatively, separate distinct deep learning network(s) may each cover only one stage (or a subset of stages) of the overall analysis process.
As part of the initial training of deep learning processes to solve a particular problem, training data sets may be employed that have known initial values and known or desired values for a final output of the deep learning process. The training of a single stage may have known input values corresponding to one representation space and known output values corresponding to a next-level representation space. In this manner, the deep learning algorithms may process (either in a supervised or guided manner or in an unsupervised or unguided manner) the known or training data sets until the mathematical relationships between the initial data and desired output(s) are discerned and/or the mathematical relationships between the inputs and outputs of each layer are discerned and characterized. Similarly, separate validation data sets may be employed in which both the initial and desired target values are known, but only the initial values are supplied to the trained deep learning algorithms, with the outputs then being compared to the outputs of the deep learning algorithm to validate the prior training and/or to prevent over-training.
With the preceding in mind,
The loss or error function 62 measures the difference between the network output 60 and the training target 64. In certain implementations, the loss function may be a mean squared error (MSE). Alternatively, the loss function 62 could be defined by other metrics associated with the particular task in question, such as a softmax function.
With the preceding in mind, the neural network 50 may be trained for use in the analysis of unstructured data, including graph data, as discussed herein.
The task of the trained neural network might be to regress or classify graph inputs. However, as noted above, depending on the actual application, there may be significance to understanding which nodes of the graphs contributed more to the regression and/or classification analysis. In particular, in conventional approaches, the graph is fed to a neural network as a flattened vector of its adjacency matrix. This conventional approach obscures or hides the underlying graph topology from the learning unit. As discussed herein, the present approach instead preserves the graph topology (i.e., spatial information derived from the adjacency matrix) to enable or otherwise facilitate node based inferences. In one example, this is accomplished by adding a layer for nodes to the neural network 50 next to the input layer 54, as discussed in greater detail below. The nodal influences are computed from weights of the added node layer in the trained neural network 50.
As a consequence, and turning to
With the above in mind,
With the preceding in mind, and turning to
Conversely, turning to
By way of example, an input neuron for the edge (i,j) is connected to neurons corresponding to nodes i and j. Thus, ith node in the new layer is connected to all inputs corresponding to ith row and ith column of the adjacency matrix 140. Since the ith row and ith column of the adjacency matrix 140 are the edges that connect to the ith graph-node, the graph topology information is thus fed to the topology aware neural network 100. Each input neuron has only two connections and this leads to reduce the number of parameters compared to using a dense layer (i.e., where each input is connected to each output).
This may be expressed as:
where Wk is an input corresponding to the first hidden layer (i.e., the graph-node layer 150) weight matrix, i is the neuron index of the input layer 54 (i.e., input edge index), and j is the neuron index of the first hidden layer (i.e., the graph-node layer 150), hence the graph node index.
This equation is illustrated diagrammatically in
In a further implementation, it may be noted that, in many cases of graph learning, the edge list is not exhaustive. Thus, some of the edges may not be a part of any of the input graph instances. In such an instance, it is possible to get a super set of edges from the inputs and to use this super set as an input feature vector for the neural network. Thus, the topology of the graph is fixed and the inputs instances vary only in the edge weights.
Turning to
The trained graph-node layer constitutes a set of zero and non-zero weights, with non-zero weights corresponding to edge incidence and zero weights corresponding to a lack of edge incidence. The non-zero weights may be across a range of values from greater than zero to less than or equal to 1, with the magnitude of the weight conveying useful information. Correspondingly, the trained weights 198 of the node-based layer 54 may be analyzed to derive indications of relative or absolute nodal influence 200 in the analysis performed by the topology aware graph neural network 100. Such measures of nodal influence may be used to interpret the significance of different nodes or node locations in the analyzed data.
The present approach was applied in a functional magnetic resonance imaging (fMRI) study. In particular, the present framework was applied to resting state fMRI based functional connectivity graph data (total: 100 scans from 10 volunteers) for a reconstruction task with an autoencoder neural network. Nodal influence measures were extracted from the same. In this context, each sensor generates time series data particular to that node which corresponds to a spatial location in the brain. Graph input or graph data, particular to a node (spatial location in the brain) corresponds to the correlation coefficient between time courses of that location to other locations in the brain. Processing of the time series data in accordance with the present approach yields functional brain network data, including analysis of the significance of respective nodes corresponding to respective measurement sites, and thereby to respective brain structures, which may be useful in data interpretation.
As will be appreciated some or all of the preceding aspects may be performed or otherwise implemented using a processor-based system such as shown in
As illustrated, the computing device 280 may include various hardware components, such as one or more processors 282, one or more busses 284, memory 286, input structures 288, a power source 290, a network interface 292, a user interface 294, and/or other computer components useful in performing the functions described herein.
The one or more processors 282 are, in certain implementations, microprocessors configured to execute instructions stored in the memory 286 or other accessible locations. Alternatively, the one or more processors 282 may be implemtned as application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or other devices designed to perform functions discussed herein in a dedicated manner. As will be appreciated, multiple processors 282 or processing components may be used to perform functions discussed herein in a distributed or parallel manner.
The memory 286 may encompass any tangible, non-transitory medium for storing data or executable routines, including volatile memory, non-volatile memory, or any combination thereof. Although shown for convenience as a single block in
The input structures 288 are used to allow a user to input data and/or commands to the device 280 and may include mice, touchpads, touchscreens, keyboards, and so forth. The power source 290 can be any suitable source for providing power to the various components of the computing device 280, including line and battery power. In the depicted example, the device 100 includes a network interface 292. Such a network interface 292 may allow communication with other devices on a network using one or more communication protocols. In the depicted example, the device 100 includes a user interface 114, such as a display configured to display images or date provided by the one or more processors 282.
As will be appreciated, in a real-world context a processor-based systems, such as the computing device 280 of
Technical effects of the invention include adding information about graph topology along with edge weights as a first hidden layer of a neural network. In this manner, better spatial information is transferred to the neural network, which will serve to aid in more accurate learning. For example, in neuroimaging data, potential biomarkers can be derived from graph analysis using the nodal layer.
This written description uses examples to disclose the invention, including the best mode, and also to enable any person skilled in the art to practice the invention, including making and using any devices or systems and performing any incorporated methods. The patentable scope of the invention is defined by the claims, and may include other examples that occur to those skilled in the art. Such other examples are intended to be within the scope of the claims if they have structural elements that do not differ from the literal language of the claims, or if they include equivalent structural elements with insubstantial differences from the literal languages of the claims.