The present invention relates to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference and, more particularly, to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference using a unique spiking neural network.
In machine learning, Bayesian inference is a popular framework for making decisions by estimating the conditional dependencies between different variables in the data. The inference task is often computationally expensive and performed using conventional digital computers.
One prior method for probabilistic computation uses synaptic updating to perform Bayesian inference (see Literature Reference No. 1 of the List of Incorporated Literature References), but the method is only a mathematical theory and not biologically plausible.
Thus, a continuing need exists for an approach that is biologically plausible and, therefore, easy to implement in neuromorphic hardware.
The present invention relates to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference, and more particularly, to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference using a unique spiking neural network. The system comprises a neuromorphic hardware for implementing a spiking neural network comprising a plurality of neurons to compute the conditional probability of two random variables X and Y according to the following:
w*P(X)=P(X,Y),
where P denotes probability, and w denotes a synaptic weight between a first neuron and a connected second neuron. An X neuron and a Y neuron are configured to spike along with the random variables X and Y. The spiking neural network comprises an increment path for w that is proportional to a product of w*P(X), a decrement path for w that is proportional to P(X, Y), and delay and spike timing dependent plasticity (STDP) parameters such that w increases and decreases with the same magnitude for a single firing event.
In another aspect, the spiking neural network comprises a plurality of synapses, wherein all neurons, except for the B neuron, have the same threshold voltage, and wherein the synaptic weight w between the A neuron and the B neuron is the only synapse that has STDP, wherein all other synapses have a fixed weight that is designed to trigger post-synaptic neurons when pre-synaptic neurons fire.
In another aspect, a sign of the STDP is inverted such that if the A neuron spikes before the B neuron, the synaptic weight w is decreased.
In another aspect, the spiking neural network further comprises an XY neuron connected with both the A neuron and the B neuron, and wherein a delay is imposed between the XY neuron and the A neuron, which causes an increase in the synaptic weight w.
In another aspect, wherein when the X neuron fires, the B neuron spikes after the A neuron in proportion to the synaptic weight w, such that a spiking rate for the B neuron depends on a product between a spiking rate of the X neuron and the synaptic weight w.
In another aspect, the spiking neural network implemented by the neuromorphic hardware further comprises a subtractor circuit, and the subtractor circuit is used to compare the random variables X and Y.
Finally, the present invention also includes a computer implemented method. The computer implemented method includes an act of causing a computer to execute instructions and perform the resulting operations.
The objects, features and advantages of the present invention will be apparent from the following detailed descriptions of the various aspects of the invention in conjunction with reference to the following drawings, where:
The present invention relates to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference and, more particularly, to a system for computing conditional probabilities of random variables for structure learning and Bayesian inference using a unique spiking neural network. The following description is presented to enable one of ordinary skill in the art to make and use the invention and to incorporate it in the context of particular applications. Various modifications, as well as a variety of uses in different applications will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to a wide range of aspects. Thus, the present invention is not intended to be limited to the aspects presented, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
In the following detailed description, numerous specific details are set forth in order to provide a more thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without necessarily being limited to these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the present invention.
The reader's attention is directed to all papers and documents which are filed concurrently with this specification and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference. All the features disclosed in this specification, (including any accompanying claims, abstract, and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
Furthermore, any element in a claim that does not explicitly state “means for” performing a specified function, or “step for” performing a specific function, is not to be interpreted as a “means” or “step” clause as specified in 35 U.S.C. Section 112, Paragraph 6. In particular, the use of “step of” or “act of” in the claims herein is not intended to invoke the provisions of 35 U.S.C. 112, Paragraph 6.
Before describing the invention in detail, first a list of cited references is provided. Next, a description of the various principal aspects of the present invention is provided. Finally, specific details of various embodiment of the present invention are provided to give an understanding of the specific aspects.
The following references are cited and incorporated throughout this application. For clarity and convenience, the references are listed herein as a central resource for the reader. The following references are hereby incorporated by reference as though fully set forth herein. The references are cited in the application by referring to the corresponding literature reference number, as follows:
Various embodiments of the invention include three “principal” aspects. The first is a system for computing conditional probabilities of random variables for structure learning and Bayesian inference. The system is typically in the form of a computer system operating software (e.g., neuromorphic hardware) or in the form of a “hard-coded” instruction set. This system may be incorporated into a wide variety of devices that provide different functionalities. Neuromorphic hardware is any electronic device which mimics the natural biological structures of the nervous system. The implementation of neuromorphic computing on the hardware level can be realized by oxide-based memristors, spintronic memories, threshold switches, and transistors. The second principal aspect is a method, typically in the form of software, implemented using the neuromorphic hardware (digital computer).
The digital computer system (neuromorphic hardware) is configured to perform calculations, processes, operations, and/or functions associated with a program or algorithm. In one aspect, certain processes and steps discussed herein are realized as a series ofinstructions (e.g., software program) that reside within computer readable memory units and are executed by a digital computer. When executed, the instructions cause the digital computer to perform specific actions and exhibit specific behavior, such as described herein.
In machine learning, Bayesian inference is a popular framework for making decisions by estimating the conditional dependencies between different variables in the data. The inference task is often computationally expensive and performed using conventional digital computers. Described herein is a unique spiking neural network to compute the conditional probabilities of random variables for structure learning and Bayesian inference. “Random variables” is a statistical term meaning that their values over time are taken from a kind of random distribution. The X and Y neurons are made to spike along with these random variables. In addition to drastically reducing the number of required neurons in the network by inverting the spike-timing-dependent plasticity (STDP) parameters and improving the accuracy by employing a dynamic threshold, the spiking neural network has a new network topology that enables efficient computation beyond the state-of-the-art methods.
The advantages of using a spiking neural network to calculate conditional probabilities are two-fold: energy-efficiency and parallelism. Spiking neural networks are often more energy-efficient than conventional computers due to the elimination of high frequency clocking operations in digital circuits. Neural computations are event-driven, which means that they consume energy only when necessary or when new information is available. Moreover, neural networks are highly parallel. Data is processed simultaneously through multiple neural pathways. These two characteristics are utilized in the system described herein to devise a power-efficient, massively-parallel machine for tackling the Bayesian inference task, as will be described in further detail below.
Neuron B (element 102) has a voltage threshold which, if exceeded, causes it to spike. Spikes from neuron A (element 100) weighted by w accumulate voltage in neuron B (element 102) until this threshold is reached, at which point neuron B (element 102) spikes, w is incremented, and the voltage level is reset to zero. This means that if w=0, {dot over (w)}=0, and if w increases {dot over (w)} also increases (i.e., there is a monotonic mapping between w and {dot over (w)}). Since the likelihood of neuron B (element 102) spiking (and w being incremented) is directly proportional to the value of w, write {dot over (w)}=w for the instantaneous change in w at time t. The goal is to use these dynamics to create a conditional probabilistic computation unit, which can then be used to model chains of conditional variables corresponding to larger Bayesian networks.
(3.1) Neural Network Topology of the Probabilistic Computation Unit (PCU+)
(3.1.1) Conditional Probability Computation
The neural network topology described herein is capable of computing the conditional probability of two random variables (X and Y). From Bayes' theorem, the conditional probability can be computed as
which is stored in the synaptic weight w between neuron A (element 100) and neuron B (element 102) after the computation. Rearranging Equation (1), an equation is obtained that describes the desired equilibrium of the neural network as follows:
w*P(X)=P(X,Y) (2)
When the left-hand side of Equation (2) equals the right-hand side, w equals the conditional probability, P(Y|X). It is, therefore, the goal to design a neural network that has the following characteristics:
To realize the functions listed above, the PCU+ architecture shown in
where Vi is the membrane potential in volt for the ith neuron. Ij is an indicator function for identifying which of the pre-synaptic neurons have fired:
After updating the membrane potentials for each of the neurons, the membrane potentials are compared with the threshold voltages Vthreshold to determine if the neuron fires or not. Vthreshold is set to a constant value (e.g., one volt) except for neuron B (element 102), whose threshold is adjusted dynamically as described later. After the firing event, the membrane potential is reset to Vreset; Vreset is zero for all neurons. The method described herein uses reverse STDP update rules and the topology shown in
(3.1.2) STDP Reversal
In one embodiment, the sign of STDP was purposely inverted such that if neuron A (element 100) spikes before neuron B (element 102) in
(3.1.3) Increment Path for w
(3.1.4) Decrement Path for w
To decrease the weight, w, connect the X neuron (element 300) to neuron A (element 100), as shown in
speed of w decreasing=P(X)*w. (6)
(3.1.5) Dynamic Threshold
In order for Equation (6) to hold true, the firing threshold for neuron B (element 102) needs to be adjusted dynamically according to the sum of the input voltages. This is named the dynamic threshold Vdt, which only affects neuron B (element 102). More specifically, the threshold is reduced by the same amount as the overshoot of membrane potential from the previous firing event. Equation (7) describes the update rule for Vdt. Vthreshold is set to one volt in one embodiment. The dynamic threshold allows the network to calculate the product between w and P(X) more accurately; otherwise, w will suffer from imprecision because it takes the same number of spikes from the A neuron (element 100) to trigger the B neuron (element 102) when w is larger than 0.5. The dynamic threshold is updated each time after neuron B (element 102) fires according to this equation:
V
dt
=V
threshold−(VB−Vdt), (7)
where Vdt is the voltage threshold for the B neuron (element 102). Vthreshold is the normal voltage threshold, which is set to 1V. VB is the membrane potential of the B neuron (element 102) after accumulating all of the input voltages at the current time step. Note that in order for Equation (7) to work as designed, it is important to avoid conflicts between the increment path and the decrement path. For this reason, a delay of four units (0.4 ms) is imposed between the X neuron (element 300) and the A neuron (element 100) to allow w to increase first then decrease in the case where both paths are triggered.
(3.1.6) Modes of Operation
The neuron network can be operated in two modes: one is the training phase and the other is the measurement phase. The objective of the training phase is to allow weight w to converge to the targeted P(Y|X). During this phase, spikes from the two random variables are fed to neuron X (element 300) and neuron Y (element 304 in
(3.1.7) Subtractor Circuit
Likelihood{X→Y}=|P(Y)−P(Y|X)|=firing rate of F*resolution (8)
where the firing rate of F is measured in Hertz (Hz), and resolution is measured in seconds. Resolution is the time interval between two data points in the input data stream. Likelihood{X→Y} is a unit-less number between 0 and 1, which is compared with a threshold to determine the dependency between X and Y.
The S1 (element 602), the S2 (element 604), and the F (element 600) neurons are also integrate-and-fire type neurons with the same equations as described in Equations (3) through (5). All of their thresholds are set to one volt, and all the connection weights are set to 1.0000001. In addition, the S1 (element 602) and S2 (element 604) neurons have a lower and upper membrane potential limits to prevent the voltages from running away uncontrollably; the voltage limits are set to [−5,5] volts in one embodiment. The probability of firing for neuron S1 (element 602), S2 (element 604), and F (element 600) can be described using the following equations:
where P(S1), P(S2), and P(F) are unit-less numbers between zero and one that represent the probabilities of firing at any given time step. The max(·,0) operators in equations (9) through (11) are used to account for the fact that probabilities cannot be negative.
In summary,
(3.1.8) Spiking Encoder/Decoder
(3.1.9) Neuronal Topology (
The neuromorphic hardware required for implementing the neuronal network topology (element 1108) in
where A+ and A− are the maximum and minimum gains, and τ+ and τ− are timescales for weight increase and decrease (
(3.1.10) Neuromorphic Compiler (
The neuromorphic compiler (element 1110) is detailed in U.S. application Ser. No. 16/294,886, entitled, “Programming Model for a Bayesian Neuromorphic Compiler,” which is hereby incorporated by reference as though fully set forth herein. It is a programming model which lets users query Bayesian network probabilities from the learned conditional model (element 1112) for further processing or decision making. The learned conditional model (element 1112) refers to the conditional properties learned between input random variables. For example, if the probability of Y=5 given that X=3 is 60%, then P(Y|X)=0.6 is part of the learned conditional model (element 1112). For instance, in a fault message application this can be preventative repair of expected future faults based on current faults. Preventative repair is when parts on a machine (e.g., vehicle) are replaced before they actually wear out, so that they don't end up wearing out while in operation. For example, if a user sees a system fault message #1 while a vehicle is in operation, and the user knows that P(system fault message #2|system fault message #1)=95%, the user can preventatively replace the vehicle part corresponding to system fault message #2 in anticipation that it will fail soon as well.
Furthermore, two more neurons D (element 1310) and E (element 1312) were added, with input from B (element 102) and delays τ1 (element 1308) and τ2 (element 1314), respectively. Since τ1 (element 1308)<τ2 (element 1314), B (element 102) will cause C (element 1306) to spike twice: once through D (element 1310) (fast path) and once through E (element 1312) (slow path). Since the delay τ1 is less than τ2, the spikes that travel from B→E→C will take longer than the spikes that travel from B→D→C, which explains the designation between “fast path” and “slow path”. Thus, for each spike from the tonic input IT (element 1302) that causes w (element 1304) to increase, there are two spikes coming from C (element 1306) that cause it to decrease. As a result, w (element 1304) decreases in proportion to itself (i.e., {dot over (w)}=wIT−2wIT=−wIT). An additional neuron I (element 1316) inhibits neurons D (element 1310) and E (element 1312) so that they will not spike with IP (element 1300), and there will be no associated decrease in w (element 1304). Note that if B (element 102) spikes as a result of IP (element 1300) it will not spike again due to IT (element 1302) because of B's (element 102) refractory period. This network now models the fixed point: {dot over (w)}=−wIT+IP.
(3.2) Experimental Studies
PCU+ is able to compute the conditional probability of two random variables, as shown in
In addition, the PCU+ with Subtractor Circuit according to embodiments of the present disclosure was applied to solve structure learning problems. In structure learning, the goal is to identify the causal relationships in a Bayesian network. More precisely, in experimental studies, the goal was to find the dependencies between ten different random variables, in which the current value of a variable affects the future values of other variables (i.e., Granger causality (see Literature References No. 4). In order to test for Granger causality, the following pre-processing techniques were performed on the data before being fed to the PCU+. The data for each random variable is a stream of 0s and 1s, recording the occurrence of errors as a time series. Before a pair of data streams (e.g., X and Y) are fed to the PCU+, the data for Y is first shifted earlier by one time step. Effectively, P(Yt+1|Xt) is being calculated instead of P(Yt|Xt). This is necessary because the cause and the effect happens sequentially in time; by shifting Y forward in the training dataset, whether the current X (i.e., Xt) has an effect on future Y (i.e. Yt+1) was tested. The dependencies were identified by computing P(Yt+1|Xt) between all pairs of random variables in the system and calculating the deviation of the conditional probabilities, P(Yt+1|Xt), from their intrinsic values, P(Yt+1). The deviation, |P(Yt+1)−P(Yt+1|Xt)|, is encoded in the firing rate of neuron F in each PCU+. A threshold is defined, in which the link is flagged as significant, and a conclusion is made that X causes Y to happen.
One hundred PCU+ were used to compute the conditional probabilities between all combinations of the ten random variables. The table in
Bayesian inference is ubiquitous in data science and decision theory. The invention described herein can be used to reduce the operational cost for aircrafts and vehicles through preventive maintenance and diagnostics, enable maneuvering (i.e., driving) of an autonomous vehicle through performing a Bayesian inference task, enhance real-time mission planning for unmanned aircrafts, and facilitate unsupervised structure learning in new environments. Bayesian decision theory is a statistical approach to the problem of pattern classification. Pattern classification, an example of a Bayesian inference task, has several applications, including object detection and object classification. Additionally, an application of the invention described herein is estimating conditional probabilities between fault messages of a ground vehicle or aircraft for fault prognostic models.
In the application of a self-driving vehicle, one or more processors of the system described herein can control one or more motor vehicle components (electrical, non-electrical, mechanical), such as a brake, a steering mechanism, suspension, or safety device (e.g., airbags, seatbelt tensioners, etc.). Further, the vehicle could be an unmanned aerial vehicle (UAV), an autonomous self-driving ground vehicle, or a human operated vehicle controlled either by a driver or by a remote operator. For instance, upon object detection (i.e., a Bayesian inference task) and recognition, the system can cause the autonomous vehicle to perform a driving operation/maneuver (such as steering or another command) in line with driving parameters in accordance with the recognized object. For example, if the system recognizes a bicyclist, another vehicle, or a pedestrian, the system described herein can cause a vehicle maneuver/operation to be performed to avoid a collision with the bicyclist or vehicle (or any other object that should be avoided while driving). The system can cause the autonomous vehicle to apply a functional movement response, such as a braking operation followed by a steering operation, to redirect vehicle away from the object, thereby avoiding a collision.
Other appropriate responses may include one or more of a steering operation, a throttle operation to increase speed or to decrease speed, or a decision to maintain course and speed without change. The responses may be appropriate for avoiding a collision, improving travel speed, or improving efficiency. As can be appreciated by one skilled in the art, control of other device types is also possible. Thus, there are a number of automated actions that can be initiated by the autonomous vehicle given the particular object detected and the circumstances in which the system is implemented.
The system according to embodiments of the present disclosure also provides an additional functionality of measuring the deviation from the normal probability, which can be used to indicate potential causality in a Bayesian network. With dynamic threshold, the new network fixes a key problem in the prior art, which is that the computation becomes inaccurate when the conditional probability exceeds a threshold.
Finally, while this invention has been described in terms of several embodiments, one of ordinary skill in the art will readily recognize that the invention may have other applications in other environments. It should be noted that many embodiments and implementations are possible. Further, the following claims are in no way intended to limit the scope of the present invention to the specific embodiments described above. In addition, any recitation of “means for” is intended to evoke a means-plus-function reading of an element and a claim, whereas, any elements that do not specifically use the recitation “means for”, are not intended to be read as means-plus-function elements, even if the claim otherwise includes the word “means”. Further, while particular method steps have been recited in a particular order, the method steps may occur in any desired order and fall within the scope of the present invention.
The present application is a Continuation-in-Part application of U.S. application Ser. No. 16/294,815, filed in the United States on Mar. 6, 2019, entitled, “A Neuronal Network Topology for Computing Conditional Probabilities,” which is a Non-Provisional Application of U.S. Provisional Application No. 62/659,085, filed in the United States on Apr. 17, 2018, entitled, “A Neuronal Network Topology for Computing Conditional Probabilities,” the entirety of which are incorporated herein by reference. The present application is ALSO a Non-Provisional Application of U.S. Provisional Application No. 62/790,296, filed in the United States on Jan. 9, 2019, entitled, “A Spiking Neural Network for Probabilistic Computation,” the entirety of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
62659085 | Apr 2018 | US | |
62790296 | Jan 2019 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16294815 | Mar 2019 | US |
Child | 16577908 | US |