The present invention relates to a neural network device, a neural network system, a processing method, and a recording medium.
As a form of neural network, there is a feed-forward spiking neural network (SNN). A spiking neural network is a network formed by connecting spiking neuron models (also referred to as spiking neurons or simply neurons).
A feed-forward type is one network configuration method, being a network in which information transmission in layer-to-layer coupling is one way. Each layer of a feed-forward spiking neural network is composed of one or more spiking neurons, with there being no connection between the spiking neurons in the same layer.
As illustrated in
A first layer (layer 1011 in the example of
The spiking neuron 1021 simulates signal integration and spike generation (firing) by the cell body of a biological neuron.
A transmission pathway 1022 simulates signal transmission by axons and synapses in biological neurons. The transmission path 1022 is arranged by connecting two spiking neurons 1021 between adjacent layers, and transmits a spike from the spiking neuron 1021 in the anterior layer to the spiking neuron 1021 in the posterior layer side.
In the example of
The spiking neuron model is a model that has a membrane potential as an internal state, with the membrane potential evolving over time according to a differential equation. As a general spiking neuron model, a leaky integrate-and-fire neuron model is known, evolving over time according to a differential equation such as Eq. (1).
Here, v(n)i indicates the membrane potential in the i-th spiking neuron model of the No. n layer. αleak is a constant coefficient indicating the magnitude of the leak in the leaky integrate-and-fire model. I(n)i indicates the postsynaptic current in the i-th spiking neuron model of the No. n layer. w(n)ij is a coefficient indicating the strength of the connection from the j-th spiking neuron model of the No. n−1 layer to the i-th spiking neuron model of the No. n layer, and is called a weight.
t indicates time. t(n−1)j indicates the firing timing (fire time) of the j-th neuron in the No. n−1 layer. κ is a function that indicates the effect of spikes transmitted from the previous layer on the postsynaptic current.
When the membrane potential exceeds the threshold value Vth, the spiking neuron model generates spikes (fires), after which the membrane potential returns to the reset value Vreset. In addition, the generated spikes are transmitted to the spiking neuron model of the connected posterior layer.
As described above, Vth indicates the threshold value of the membrane potential. Vreset indicates the reset value of the membrane potential. t(n−1)1 indicates the firing timing of the first neuron in the No. n−1 layer. t(n−1)2 indicates the firing timing of the second neuron in the No. n−1 layer. t(n−1)3 indicates the firing timing of the third neuron in the No. n−1 layer.
In both the first firing at time t(n−1)1 and the third firing at time t(n−1)3, the membrane potential v(n)t does not reach the threshold value Vth. On the other hand, in the second firing at time t(n−1)2, the membrane potential v(n)t reaches the threshold value Vth, and immediately thereafter, drops to the reset value Vreset.
Spiking neural networks are expected to consume less power than deep learning models when incorporated into hardware with CMOS (Complementary MOS) or the like. One of the reasons is that the human brain is a low power consumption computing medium equivalent to 20 watts (W), and spiking neural networks can mimic the cerebral activity of such low power consumption.
In order to create hardware with power consumption equivalent to that of the brain, it is necessary to develop an algorithm for spiking neural networks, following the calculation principle of the brain. For example, it is known that image recognition can be performed using a spiking neural network, and various supervised learning algorithms and unsupervised learning algorithms have been developed.
In the algorithm of the spiking neural network, there are a number of methods for information transmission by spikes, and in particular, the frequency method and the time method are often used.
In the frequency method, information is transmitted based on how many times a specific neuron has fired in a fixed time interval. On the other hand, in the time method, information is transmitted at the timing of spikes.
As shown in
Hardware power consumption increases as the number of spikes rises, so power consumption can be reduced by using a time-based algorithm.
It has been reported that various problems can be solved by using a feed-forward spiking neural network. For example, in the network configuration shown in
A learning process is required for a spiking neural network to make correct predictions. For example, in the learning process of recognizing an image, image data and label data which is the answer thereof are used.
In the learning process, the spiking neural network receives the input of data and outputs predicted values. Then, the learning mechanism for causing the spiking neural network to perform learning calculates the prediction error, which is the difference between the predicted value output by the spiking neural network and the label data (correct answer). The learning mechanism causes the spiking neural network to perform learning by minimizing the loss function L defined from the prediction error by optimizing the weight of the network in the spiking neural network.
For example, the learning mechanism can minimize the loss function L by updating the weight as in Eq. (2).
Here, Δw(n)ij indicates an increase or decrease in the weight w(n)ij. If the value of Δw(n)ij is positive, the weight w(n)ij is increased. If the value of Δw(n)ij is negative, the weight w(n)ij is reduced.
η is a constant called the learning coefficient.
In the stochastic gradient descent method, the weight is updated once using some training data. When the weight update is repeated multiple times using all the training data, the repeating unit is called an epoch. Stochastic gradient descent generally performs tens to hundreds of epochs to converge learning. Further, updating the weight with one set of data (one input data and one label data) is called online learning, and updating with two or more sets of data is called mini-batch learning.
As mentioned above, it has been reported that various problems can be solved by using a feed-forward spiking neural network. For example, as described above, image data can be input to the input layer so that the network can predict the answer for that image.
For example, in the task of recognizing an image of three numbers from 0 to 2, as shown in
Dedicated hardware for spiking neural networks is generally called neuromorphic hardware. As for the mounting of this hardware, mounting by an analog circuit and mounting by a digital circuit are known.
It is generally required to reduce the power consumption and circuit area of hardware. However, on the other hand, if a complicated neuron model or a complicated learning rule is implemented, the power consumption and the circuit area will end up being increased.
In a neuron model, a form including a non-linear function is often adopted because of its compatibility with biological neurons.
The movement of memory data such as a weight makes a large contribution to the power consumption of neuromorphic hardware. Therefore, in the learning rule, power consumption can be reduced by using an algorithm with less data movement. In order to reduce the movement of data, one or both of reducing the number of movements and reducing the movement distance of data may be performed.
Non-Patent Document 2 reported improving recognition accuracy by using a non-leaky integrate-and-fire model in which the constant αleak of Eq. (1) was set to 0. In Non-Patent Document 2, the model represented by Eq. (3) is used as the non-leaky integrate-and-fire model.
Here, exp is a natural exponential function. τ indicates a constant.
It is preferable to be able to simplify the model of the neural network.
For example, while in Non-Patent Document 2, the non-leaky integration model shown in the above Eq. (3) includes a non-linear function (exp (−x/τ)), it is preferable that the model be constructed without this non-linear function from the viewpoint of model simplification.
An object of the present invention is to provide a neural network device, a neural network system, a neural network processing method, and a recording medium capable of solving the above-mentioned problems.
According to a first example aspect of the present invention, a neural network device includes: a neuron model means configured as a non-leaky integrate-and-fire spiking neuron and a spiking neuron with which a postsynaptic current is represented using a step function, the neuron model means being fired once at most in one process of a neural network to indicate an output of the neural model means itself at firing timing; and a transfer processing means for transferring information between the neuron model means.
According to a second example aspect of the present invention, a processing method includes the steps of: performing an action of a spiking neuron, the spiking neuron being a non-leaky integrate-and-fire spiking neuron and a spiking neuron with which a postsynaptic current is represented using a step function, the spiking neuron being fired once at most in one process of a neural network to indicate output of the spiking neuron itself at firing timing; and performing information transfer between the spiking neuron.
According to a third example aspect of the present invention, a recording medium stores a program for causing an ASIC to execute the steps of: performing an action of a spiking neuron, the spiking neuron being a non-leaky integrate-and-fire spiking neuron and a spiking neuron with which a postsynaptic current is represented using a step function, the spiking neuron being fired once at most in one process of a neural network to indicate output of the spiking neuron itself at firing timing; and performing information transfer between the spiking neuron.
According to the present invention, a model of the neural network can be made relatively simple.
Hereinbelow, example embodiments of the present invention will be described, but the following example embodiments do not limit the invention claimed. Also, all combinations of features described in the example embodiments may not be essential to the solution of the invention.
In the example of
The neural network device 100 shown in
Of each layer of the neural network device 100, the first layer (layer 111) corresponds to the input layer. The last layer (fourth layer, layer 114) corresponds to the output layer. The layers between the input layer and the output layer (second layer (layer 112) and third layer (layer 113)) correspond to hidden layers.
The neuron model unit 121 is configured as a spiking neuron (spiking neuron model), and simulates signal integration and spike generation (firing) by the cell body.
The transmission processing unit 122 simulates signal transmission by axons and synapses. The transmission processing unit 122 is arranged by connecting two neuron model units 121 between arbitrary layers, and transmits spikes from the neuron model unit 121 on the front layer side to the neuron model unit 121 on the rear layer side.
In the example of
The neural network system according to the example embodiment has, for example, the configuration shown in
With such a configuration, the neural network device 100 receives data input and outputs a predicted value. The prediction error calculation unit 200 calculates a prediction error, which is the difference between the prediction value output by the neural network device 100 and the label data (correct answer), and outputs the prediction error to the learning processing unit 300. The learning processing unit 300 causes the neural network device 100 to perform learning by minimizing the loss function L defined from the prediction error by optimizing the network weight of the neural network device 100.
The neural network device 100 and the learning processing unit 300 may be configured as separate devices or may be configured as one device.
The spiking neuron model (neuron model unit 121) according to the example embodiment will be described. As the neuron model unit 121, a non-leaky spiking neuron model is used. This model is defined as Eq. (4).
Here, v(m)i indicates the membrane potential in the i-th neuron model unit 121 of the m-th layer.
I(m)i indicates the postsynaptic current in the i-th neuron model unit 121 of the m-th layer. As mentioned above, t indicates the time. I(m)i(t) represents the postsynaptic current I(m)i as a function of time t.
w(m)ij is a coefficient (weight) indicating the strength of the connection from the j-th neuron model unit 121 of the m−1 layer to the i-th neuron model unit 121 of the m-th layer. t(m−1)j indicates the firing timing of the j-th neuron model unit 121 of the m−1 layer. θ indicates a step function.
The step function θ is expressed as in Eq. (5).
The step function θ(t) is a function having a constant value of θ(t)=1 when t≥0 and a constant value of θ(t)=0 when t<0, and can be calculated with simple processing compared to a non-linear function such as exp(−x/τ).
As described above, the network of the neural network device 100 is configured as a feed-forward multi-layer network. Further, it is assumed that each of the neuron model units 121 fires at most once for one input to the neural network device 100.
Further, it is assumed that the output of the neural network device 100 is indicated by the firing timing of the neuron model unit 121 of the output layer. For example, the output of the neural network device 100 may be shown using the representation method described with reference to
According to the neuron model unit 121, it is possible to achieve a relatively simple model represented by a weighted linear sum of step functions as shown in Eq. (4). For example, the model shown in Eq. (4) can be evaluated as simpler than the model shown in Eq. (3).
When the processing of the neuron model unit 121 is executed by software, the neuron model becomes a relatively simple model, so that the processing load is relatively light, the processing time is relatively short, and the power consumption is relatively low. When the processing of the neuron model unit 121 is executed by hardware, the neuron model becomes a relatively simple model, so that in addition to the processing load being relatively light, the processing time being relatively short, and the power consumption being relatively low, the circuit area of the hardware is relatively small.
With the neuron model unit 121, the recognition accuracy is high in that the model does not include leaks.
In addition, the neuron model unit 121, on the point of using the time method, consumes less power than the frequency method.
Next, the learning algorithm in the neural network system 1 will be described.
The SpikeProp algorithm is known as a method for deriving the derivative ∂L/∂w(n)ij in the weight update rule of the above Eq. (2) (see Non-Patent Document 3). For example, the loss function L is defined by Eq. (6) using the firing timing of the neurons in the final layer.
Here, t(N), indicates the firing timing of the i-th neuron in the output layer. Note that “N” is used to denote the output layer as No. N layer.
t(N), indicates the firing timing of the i-th instruction signal (the firing timing of the i-th neuron in the output layer in the instruction signal). Moreover, here, the non-leaky neuron model shown in Eq. (3) is targeted.
The differential by weight of the loss function is shown by Eq. (7) using the chain rule.
The differential by weight of the loss function here is found by differentiating the loss function by weight.
Here, the propagation error is defined as in Eq. (8).
Ultimately, in order to find the derivative, ∂t(n)i/∂w(n)ij and ∂t(n+1)i/∂t(n)j are required to be calculated. Using the SpikeProp method ∂t(n)i/∂w(n)ij can be derived as in Eq. (9).
Further, ∂t(n+1)i/∂t(n)j can be derived as in Eq. (10).
As shown in Eqs. (9) and (10), in order to calculate ∂t(n)i/∂w(n)ij and ∂t(n+1)i/∂t(n)j with the SpikeProp algorithm, it is necessary to calculate the sum of the weights in the same layer.
On the other hand, the neural network device 100 uses a learning rule simplified by approximating ∂t(n)i/∂w(n)ij and ∂t(n+1)i/∂t(n)j. The derivation of this learning rule will be described.
First, it is assumed that the firing timing of the i-th neuron (neuron model unit 121) in the No. n layer is stochastically determined by the firing probability density Rni(t). As mentioned above, t indicates time. From the observed firing timing t(n)i of the No. n layer and the firing timing t(n−1)j of the neuron (neuron model unit 121) in the previous layer, the functional form of the firing probability density R(n)i(t) is estimated.
In this model, each neuron (neuron model unit 121) fires only once or less, so it is the first firing timing that has information. Therefore, the time at which the distribution of the first firing timing (first firing time) of the neuron obtained from the estimated firing function (functional form of the firing probability density) reaches the maximum value is set as the firing timing t(n)i of the No. n layer.
By assuming the above model, the functional change δR(n)i(t) of the firing probability density when the weight w(n)ij has changed can be obtained. As shown in Eq. (11), the change δt(n)i of the firing timing can be obtained from this change of the firing probability density function.
[Eq. 11]
δwij(n)→δRi(n)(t)→δti(n) (11)
Eq. (11) shows the relationship that the change in the firing probability density R(n)i(t) is obtained according to the change in the weight w(n)ij, and the change in the firing timing t(n)i of the No. n layer can be obtained according to the change in the firing probability density R(n)i (t). From this relationship, the change in the firing timing t(n)i of the No. n layer can be obtained from the change in the weight w(n)ij.
From the relationship of Eq. (11), an approximation of partial differential can be obtained as in Eq. (12).
The firing probability density R(n)i(t) can be approximated by the slope of the membrane potential (time differential) in the non-leaky spiking neuron model. This approximation is given by Eq. (13).
Further, this function is approximated to the piecewise linear function Rlinear(t) to obtain Eq. (14).
Here, α and t′ are both constants, and as shown in
The upper row of
The probability of the first firing timing when the firing probability density is given by the piecewise linear function Rlinear(t) can be calculated as follows. That is, assuming that the probability of never firing by time t is x(t), this satisfies the differential equation of Eq. (15).
Solving the differential equation of Eq. (15) gives Eq. (16).
Therefore, the first spike firing probability density Pf(t) can be obtained as in Eq. (17).
It can be seen that the first spike firing probability density Pf(t) is non-negative and satisfies the definition of probability as in Eq. (18).
The time t* at which the first spike firing probability density Pf(t) takes the maximum value is shown as in Eq. (19) because the time differential of Pf(t) is 0 (∂Pf(t)/∂t=0).
Eq. (20) is obtained by imposing a condition in which this time t* matches the output spike time.
Next, the change in the firing probability density Rni(t) of the neuron (neuron model unit 121) when the weight changes is shown by Eq. (21).
The firing probability density is expressed by the piecewise linear function Rlinear(t), and the change δRi(t) is expressed by Eq. (22).
[Eq. 22]
R
linear(t)+δRi(t)≈a(t−t′)θ(t−t′)+δwijθ(t−tjn−1) (22)
This change is shown in
The horizontal axis of
The firing timing when this firing function is given is expressed as t(n)i+δt(n)i. The equation to be solved in order to obtain t(n)i+δt(n)i is expressed by Eq. (23).
Alternatively, the equation to be solved in order to obtain t(n)i+δt(n)i is expressed by Eq. (24).
The solution of Eq. (24) is as shown in Eq. (25), with the initial condition being x(0)=1.
A in Eq. (25) is shown as in Eq. (26).
At this time, the time t* at which the first spike firing probability density Pf(t) takes the maximum value is as shown in Eq. (27).
The time change of the output spike estimated when the weight has changed by δw(n)ij is expressed by Eq. (28).
Eq. (29) is obtained as an approximate value of the partial differential.
Next, the approximation of ∂t(n)i/∂t(n−1)j is performed.
Similar to the above, as shown in Eq. (30), the partial differential is obtained by deriving the relationship between δt(n−1)j and t(n)i by passing through the change of the firing probability density R.
[Eq. 30]
δtj(n−1)→δR→ti(n) (30)
Specifically,
The horizontal axis of
The piecewise linear function Rlinear(t), which linearly approximates the firing probability density, averages the spikes from all neurons in n−1 layer (neuron model unit 121) to the i-th neuron in the n layer (neuron model unit 121), and can be transformed as in Eq. (31).
The first term (w(n)ij θ(t−t(n−1)j)(t(n−1)j−t′)) in parentheses of Eq. (31) is due to the contribution of firing of the jth neuron of the No. n−1 layer. The second term (Σj≠jw(n)ij′θ(t−t(n−1)j′)(t(n−1)j−t′)) is due to the contribution of firing of neurons other than the jth of the No. n−1 layer. The change δRlinear(t) of the firing probability density Rlinear(t) can be considered as the change δα of the slope α of Rlinear(t).
The inside of the parentheses of Eq. (31) shows the slope α, and the part that changes when the firing time t(n−1)j of the neurons in the anterior layer has changed by δt(n−1)j is only the first term due to the contribution of firing of the jth neuron in the No. n−1 layer. That is, the slope changes as shown in Eq. (32).
Eq. (33) can be obtained from Eq. (19).
As a result, the partial differential ∂t(n)i/∂t(n−1)j can be approximated as in Eq. (34).
Here, the constant τ is set as in Eq. (35).
[Eq. 35]
τ=(tj(n−1)−t′) (35)
Eq. (36) is obtained using τ.
From the above, an approximate learning rule of the weight of any layer in the neural network device 100 can be derived. Below, as specific examples, a learning rule of the No. N layer and a learning rule of the No. N−1 layer will be described.
The learning rule of the output layer is as shown in Eq. (37).
Here, η(n) is expressed as in Eq. (38).
η(n)0 indicates the learning rate. Here, the learning rate η(n) is redefined by using the combination of the learning rate η(n)0 and the slope α of the firing probability density as shown in Eq. (38). In Eq. (38), the slope α of the firing probability density is treated as a constant.
The learning processing unit 300 performs learning of the output layer by updating the weight w(N)ij for the input to the neuron model unit 121 of the output layer based on Eq. (37). As described above, the weight w(N)ij indicates the strength of the connection between the j-th neuron model unit 121 of the No. N−1 layer and the i-th neuron model unit 121 of the No. Nth layer. Being an output layer, it should be read as n=N in Eq. (37).
A specific example of the learning rule (weight update rule) of the hidden layer is as shown in Eq. (39).
Eq. (40) is used for η(n−1).
The learning processing unit 300 performs learning of the hidden layer by updating the weight w(n)ij with respect to the input of the nth layer (here, the hidden layer) to the neuron model unit 121 based on the Eq. (39). To do. As described above, the weight w(n)ij indicates the strength of the connection between the j-th neuron model portion 121 of the n-lth layer and the i-th neuron model portion 121 of the nth layer.
In the example of
When the algorithm according to the example embodiment is executed by software, the network weight update process is relatively simple, so that the processing load is relatively light, the processing time is relatively short, and the power consumption is relatively low. When the algorithm according to the example embodiment is executed by hardware, the network weight update process is relatively simple, so that in addition to the processing load being relatively light, the processing time being relatively short, and the power consumption being relatively small, the circuit area of the hardware is relatively small.
A simulation example of the neural network device 100 according to the example embodiment is shown. The MNIST data set, which is a handwritten character data set, was learned using the model according to the example embodiment (see Eq. (4)) and the learning algorithm according to the example embodiment (see
In the simulation, the weight of the neural network is updated using the training data, and the performance is evaluated using the test data. The weight is not updated using the test data.
The network used in the simulation has three layers, the first layer being constituted of 169 input spiking neurons, and the second and third layers being constituted with 500 and 10 spiking neurons, respectively, (refer to Eq. (4)).
The input spiking neuron preprocesses the 28×28 pixel image data of the input data by convolution and reduces it to 169 pixels of 13×13. This reduces the amount of data and enables efficient simulation.
Online learning was conducted to update the weight for each image.
In addition, a simulation using the SpikeProp algorithm shown in
With reference to
The classification error rate was 3.8% in the SpikeProp algorithm and 4.9% in the approximation algorithm. It can thus be seen that the classification error rate is almost the same even when the approximation algorithm is used.
The other learning algorithms in the neural network system 1 will be described.
The differential by weight of the loss function is as shown in Eq. (41).
The two terms on the right side of Eq. (41) (∂t(l)i/∂w(l)ij and ∂t(l+1)s/∂t(l)i) are linearly approximated using a time evolution equation of the membrane potential, and a simple learning rule is derived.
As described above, w(l)ij indicates the strength (weight) of the connection from the j-th neuron in the 1-1st layer to the i-th neuron in the 1-th layer. t(l)i indicates the firing timing of the i-th neuron in the 1-th layer.
Derivation of the learning rule is possible by finding the partial differential “∂t(l)i/∂w(l)ij” and “∂t(l+1)s/∂t(l)i” shown in Eq. (41). These can be calculated by the SpikeProp method as in Eq. (42).
However, in both of the two equations shown in Eq. (42), in the sum of the denominators on the right side (Σs), the sum is taken only when the neurons in the presheaf that are connected to the weight of interest fire earlier than the neurons in the posterior layer. By approximating this denominator to the mean field, it is possible to greatly reduce the number of parameters required for learning.
First, ∂t(l)i/∂w(l)ij will be described.
The horizontal axis of
The above-mentioned displacement ΔV of the membrane potential can be derived as shown in Eq. (43) as illustrated in
[Eq. 43]
ΔV=ΔW(ti(l)−tj(l−1)) (43)
Then, by using the time τ(l)j at which the firing was first transmitted to the j-th neuron in the l-layer and the threshold value Vth of the firing, it is possible to linearly approximate the time evolution of the membrane potential v*(l)i(t) with respect to time. As a result of this approximation, the equation for the time evolution of the membrane potential can be derived as in Eq. (44).
The firing timing t*(l)i under this approximation can be derived by solving Eq. (45).
[Eq. 45]
v*
i
(l)(t**i(l))=Vth (45)
The derived equation is as shown in Eq. (46).
Thereby, it is possible to approximate ∂t(l)i/∂w(l)ij by taking the limit of ΔW→0 at (t*(l)i−t(l)i)/ΔW. An approximate expression of partial differential can be derived as in Eq. (47).
Next, an approximate expression of ∂t(l+1)j/∂t(l)k is derived.
The horizontal axis of
As shown in
The firing timing t*(l+1)j under this approximation can be derived by solving Eq. (49).
[Eq. 49]
v*
j
(l+1)(t*j(l+1))=Vth (49)
The firing timing t*(l+1)j is derived as in Eq. (50).
Thereby, it is possible to approximate ∂t(l+1)j/∂t(l)k by taking the limit of ΔT→0 at (t*(l+1)j−t(l+1)j)/ΔT. An approximate expression of partial differential can be derived as in Eq. (51).
Accordingly, ∂t(l)i/∂w(l)ij is approximated as in Eq. (52).
∂t(l+1)j/∂t(l)k is approximated as in Eq. (53).
By using the derived approximate equation of ∂t(l)i/∂w(l)ij (Eq. (52)) and the approximate equation of ∂t(l+1)j/∂t(l)k (Eq. (53)), it is possible to derive a learning rule that greatly reduces the referencing of information of other neuron models.
The learning processing unit 300 applies, for example, the approximations shown in the Eqs. (52) and (53) when learning based on the above Eq. (41). The learning based on Eq. (41) can be applied to both the learning of the output layer and the learning of the hidden layer. The learning processing unit 300 may be made to perform learning of either the output layer or the hidden layer by learning by applying the approximations shown in Eq. (52) and (53) to Eq. (41), or may be made to perform learning of both.
As described above, the neuron model unit 121 is configured as a non-leaky integrate-and-fire spiking neuron and a spiking neuron with which a postsynaptic current is represented by using a step function, being fired once at the most in one process of a neural network to indicate the output of the neural model unit 121 itself at firing timing. The transmission processing unit 122 transmits information between the neuron model units 121.
One process of the neural network referred to here is a process in which the neural network outputs output data to a set of input data. For example, when a neural network performs pattern matching, one matching process corresponds to an example of one process of a neural network.
According to the neural network device 100, the neuron model unit 121 can be a relatively simple model using the step function under the conditions of leaks of the neuron model unit 121 being eliminated and all the neuron model units 121 firing only once or less.
When the processing of the neuron model unit 121 is executed by software, the neuron model becomes a relatively simple model, so that the processing load is relatively light, the processing time is relatively short, and the power consumption is relatively low. Further, when the processing of the neuron model unit 121 is executed by hardware, the neuron model becomes a relatively simple model, so that in addition to the processing load being relatively light, the processing time is relatively short, and the power consumption being relatively low, the circuit area of the hardware is relatively small.
According to the neuron model unit 121, on the point of being a model that does not include leaks, due to being a model in which neurons have no time constant, and not depending on the time constant of input data, the recognition accuracy is high.
In addition, the neuron model unit 121, on the point of using the time method, consumes less power than the frequency method.
Further, the learning processing unit 300 causes at least one of the output layer and the hidden layer of the neural network device 100 to be learned using a learning rule that applies at least either one of the approximation of the differential by weight of the firing time and the approximation of the differential by firing time of the firing time, obtained using a linear approximation of the time evolution of the membrane potential. Thereby, in the neural network system 1, learning of at least one of the output layer and the hidden layer can be executed by a relatively simple process using approximation.
When the learning algorithm by the learning processing unit 300 is executed by software, the learning processing is relatively simple, so that the processing load is relatively light, the processing time is relatively short, and the power consumption is relatively low. Further, when the learning algorithm by the learning processing unit 300 is executed by hardware, the learning processing becomes relatively simple, so that in addition to the processing load being relatively light, the processing time being relatively short, and the power consumption being relatively low, the circuit area of the hardware is relatively small.
Note that differential by weight of the firing time means differential of the firing time by the weight. Differential of firing time by firing time means that the firing time of a certain neuron model unit 121 is differentiated by the firing time of another neuron model unit 121.
Further, the learning processing unit 300 performs learning on the output layer of the neural network device by using a learning rule expressed using the slope of the firing probability density.
Thereby, in the neural network system 1, it is possible to find a change in the firing time based on the change in the firing probability density, and in this respect, the change in the firing time can be obtained relatively easily.
Next, a configuration of the example embodiment of the present invention will be described with reference to
In this configuration, each neuron model unit 11 is configured as a non-leaky integrate-and-fire spiking neuron and a spiking neuron with which a postsynaptic current is represented using a step function, being fired once at the most in one process of a neural network to indicate the output of the neural model unit 11 itself at firing timing. The transmission processing unit 12 transmits information between the neuron model units 11.
According to the neural network device 10, the neuron model unit 11 can be a relatively simple model using the step function under the condition of leaks of the neuron model unit 11 being eliminated and all the neuron model units 11 firing only once or less.
When the processing of the neuron model unit 11 is executed by software, the neuron model becomes a relatively simple model, so that the processing load is relatively light, the processing time is relatively short, and the power consumption is relatively low. Further, when the processing of the neuron model unit 11 is executed by hardware, the neuron model becomes a relatively simple model, so that in addition to the processing load being relatively light, the processing time being relatively short, and the power consumption being relatively low, the circuit area of the hardware is relatively small, the hardware circuit area is relatively small.
According to the neuron model unit 11, on the point of being a model that does not include leaks, due to being a model in which neurons have no time constant, and not depending on the time constant of input data, the recognition accuracy is high.
In addition, the neuron model unit 11, on the point of using the time method, consumes less power than the frequency method.
All or part of the neural network system 1 or all or part of the neural network device 10 may be implemented in dedicated hardware.
When the above-mentioned neural network system 1 is mounted on the dedicated hardware 500, the operation of each of the above-mentioned processing units (neural network device 100, neuron model unit 121, transmission processing unit 122, prediction error calculation unit 200, learning processing unit 300) is stored in the dedicated hardware 500 in the form of a program or a circuit.
All or part of the neural network system 1 or all or part of the neural network device 10 may be mounted on an ASIC (application specific integrated circuit).
An ASIC in which all or part of the neural network system 1 or all or part of the neural network device 10 is mounted executes the calculation by electronic circuits such as a CMOS. Each electronic circuit may independently implement neurons in the layer, or may implement multiple neurons in the layer. Similarly, the circuits that calculate neurons may be used only for the calculation of a certain layer, or may be used for the calculation of a plurality of layers.
When all or part of the neural network device 10 is mounted on an ASIC, the ASIC is not limited to a specific one. For example, all or part of the neural network device 10 may be mounted on an ASIC that does not have a CPU. Further, the storage device used for mounting of the neural network device 10 may be arranged in a distributed manner on the chip.
It should be noted that by recording a program for realizing all or some of the functions of the neural network system 1 in a computer-readable recording medium, loading the program recorded in the recording medium into the computer system and executing the program, various processes may be performed. Note that the “computer system” referred to here includes an OS (Operating System) and hardware such as peripheral devices.
Further, the “computer-readable recording medium” is a portable medium such as a flexible disk, a magneto-optical disk, a ROM, or a CD-ROM, or a storage device such as a hard disk built in a computer system. Further, the above-mentioned program may be a program for realizing some of the above-mentioned functions, or may be a program for realizing the above-mentioned functions in combination with a program already recorded in the computer system.
Although the example embodiments of the present invention have been described in detail with reference to the drawings, a specific configuration is not limited to the example embodiments, with designs and the like within a range not deviating from the gist of the present invention also being included.
This application is based upon and claims the benefit of priority from Japanese patent application No. 2019-052880, filed Mar. 20, 2019, the disclosure of which is incorporated herein in its entirety by reference.
The present invention may be applied to a neural network device, a neural network system, a processing method and a recording medium.
Number | Date | Country | Kind |
---|---|---|---|
2019-052880 | Mar 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/011909 | 3/18/2020 | WO | 00 |